View Full Version : VC audio quality
Timon
06-01-2004, 09:47 AM
Hi there
Well im new here... hope to help others and get some help....
i have a question :
What is the expected audio quality from a room system (and you all know that the sky is the limit) ?
How can you tell if the quality in a conferencing room is poor , ok , the best ??
can you spec this quality with numbers like dynamic range / THD ... ?
Thanks :-)
vtjoe
06-01-2004, 12:00 PM
Hey Timon,
I'm not an audio-phile, but from my limited knowledge:
G.711 and G.728 audio standards provide approx 3khz sampling - which is telephone quality. Not good.
For "wideband" audio, G.722 and G.722.1 use 7 kHz. This is sufficient for the human voice (speech) and is optimal for most videoconferencing applications.
The above standards compress at different bandwidths (16 kbps to 64 kbps) and require different computational complexity. Most videoconferencing endpoints will try to optimize the audio encoder used - usually to G.722 or G.722.1 depending on bandwidth.
Some manufacturers use audio codecs outside of the ITU standards. Polycom (Siren14) and Sony (MPEG) both use 14kHz sampling methods. For music and related applications, this is the highest quality available in a videoconference and you will hear a significant difference. However, as stated before it is inconsequential bandwidth for human speech. All that is needed is 7 kHz.
Polycom has some good sound bite samples of music:
http://www.polycom.com/partners/1,1438,pw-...00.html#Compare (http://www.polycom.com/partners/1,1438,pw-1234,00.html#Compare)
Now, that doesn't really answer the question because we are ignoring the most important part - Echo Cancellation.
If you ever hear your own voice back during a conference the far site most likely has a problem with echo cancellation. When testing audio quality, you must always do it in a call. Remember, audio quality must be tested on both sides.
1) Look at microphone placement related to the audio speakers. Microphone's should not be placed next to speakers. Directional microphones should be placed in a direction opposite of speakers.
2) All people in a room should hear the remote site at approximately 70 dB. Simply put, the far end volume should be equivalent to somebody speaking at a normal voice in the room.
3) The design of the room needs to be examined. A room with two glass walls and a marble table will most likely cause echo problems.
Most videoconferencing codecs have a way of handling echo cancellation. However, not all manufacturers will handle it the same way – this will play a significant role for audio quality.
If you need speech reinforcement in a room, you will need to rely on an external device to provide the echo cancellation like the Clear One (Getner) XAP400 or XAP800. At this point, you will want an audio consultant or integrator who has experience with videoconferencing.
Timon
06-01-2004, 02:33 PM
WOW !! thanks for your time and your knowlage !!!
What i still need is a way to measure if the combination of all of these parameters and standards including the end points can be taged as "High quality audio system" (including microphone and speakers)
Lets say that i have two endpoints (same systems) in two "not so perfect " rooms (we need some echo for our tests) , the bandwidth is limited by the endpoint only (lets say G722).
In one room i put some kind of speaker (lets say 2 meters from the mic) with 1Khz signal and in the other room i have a calibrated microphone also 2 meters from the speaker...
Its easy to measure the amplitute (as you said it will be 70db) and the bandwidth (by sweeping the audio source and finding the -3db points)
Now what ? what other parameters can i measure and which results should i expect ??
Im looking for a method that will not relay on my ears and will give a real tool for comparing between V.C systems from the audio point of "view"
Thanks again to vtjoe and any other V.C expert who can answer these questions
Gary Miyakawa
06-01-2004, 02:53 PM
I do want to follow up just a touch on the Polycom audio. When going from Polycom to Polycom, they support an audio alg called Siren 14. This provides 14khz of audio (using anywhere from 32kbps (I think) to 56kbps. I believe that most everyone agrees that this audio is the best in the videoconferencing industry (I've never seen anyone disagree but I suppose there is someone out there... B) )
You might consider using their VSX or iPower products (both support the 14khz of audio) for your design. You will certainly need an audio system that can faithfully reproduce those levels (not just TV speakers... That's why the VSX7000 has a Subwoofer). You will also most likely need some audio microphone mixing. I've had good success with the Vortex product line from Polycom (extremely flexible with the software interface).
Polycom has a number of white papers that would be worth looking thru talking about room design. These cover lighting, room color, room audio and room video. It would be worth it to take a look at them since the needs of Videoconferencing tend to be a little more demanding.
Best of luck!
Gary Miyakawa
P.S. Where are you located ?? I might be able to put you in touch with someone local to you with some "real" expert knowledge (unlike mine!!!)
George
06-01-2004, 03:48 PM
Originally posted by Timon@Jun 1 2004, 12:33 PM
Now what ? what other parameters can i measure and which results should i expect ??
Im looking for a method that will not relay on my ears and will give a real tool for comparing between V.C systems from the audio point of "view".
Ok lemme take a stab as well...
It sounds like what might help in your case would be a pink noise generator and a method of testing the audio at the other end but that kinda depends on what you have sitting at the other end and a few other variables in between...
Test the audio throughput with a pink noise generator from your participant's chair through the system and at the other end and having a device on the other end to view the incoming db reading.
With that testing set up in mind, in my opinion, the answer to your question about parameters that determine the ideal set up were mentioned here in this thread.
It would appear that a room that tests at 14khz/70db both incoming and at the far end without changing setting between tests and properly configured echo cancellation (0 echo) would be pretty much the ideal set up utilizing presently on the market technologies. The optimal configuration if you will.
Even echo cancellers have their limits so the room should be as free of ambient noises as possible. To avoid operational confusion I'd also define a single echo cancellation point and work the settings from there making sure other echo cancellers in this path are off.
Ok that's my stab for now. Maybe more later.
tom9933
06-02-2004, 10:31 AM
Working with a bunch of broadcast engineers, this type of topic has come up more than once. What I can tell you is that without disabling the echo canceller it’s very difficult to “calibrate” a room. For example if you send tone into a codec with the internal echo canceller turned on you will probably find that it goes up and down in volume at the far site. Now I have not tried pink noise (yet, maybe a project for today) but I assume it would produce similar results.
What puzzles me about this concept is the idea of a rating scale. Typically things like THD and Dynamic Range are measured in very controlled environments so I could see that a codec could be measured in this way, but measuring the entire system seems rather difficult (too many variables). The only thing I can think of at this point would be some sort of flatness measurement. Meaning that, in theory if you put pink noise in at the other site you should see a relatively flat image on the RTA. I’m wondering if you could apply a formula to the data from the RTA and then produce a rating number. Now the other big variable here would be the quality of the pink noise generator and the resolution of the RTA. I suppose the best bet would be to compare a direct reading (i.e. generator hardwired directly to the RTA) to a reading post codecs. Overall it sounds like a good discussion topic.
shogun2
06-07-2004, 12:45 AM
I'll add another 2 C worth.
In our designs we come at things from another view - maximising intelligibillity. There are various formulae for measuring intelligibility of a room and the sound system within it. I would suggest you do a bit of research on the subject of STI and RASTI
Maximising of intelligibilty not only means being able to hear a wide frequency response, but it also includes the "sound of the room". This is why it s imprtant to get the physical room attributes right as well as the electronic signal chain. Excessive reverberation destroys intelligibility to a point where a perfect signal from the far end will be unintelligble in the room.
Room noise from mechanical factors, such as air handling units also can mask the far end speech.
There is also the issue of acoustic comfort.
Using ceiling speakers placed on top of the audience dislocates the sound source from the image. Speakers are better located at the vision source. This may require some very special electronics and speakers to achieve the spacial effect desired. Ceiling speakers are notoriously poor when it comes to reproducing high quality speech.
I would also suggest that it is not just about measuring one end to the other, but back again as well.
High acoustic gain before feedback is essential to maintain system stability, and is even more essentiall when there is local sound reinforcement.
All our tests are done with tools such as MLSSA and others.
We currently design our systems utilising the Biamp Audia DSP based audio mixers with inbuilt AEC cards.
Rod
vBulletin® v3.7.2, Copyright ©2000-2008, Jelsoft Enterprises Ltd.