Good day, WebRTC enjoyers!
I observe sound artifacts when several people interrupt one another during a conversation. For example, if Bob is already speaking and Alice says a few words, this artifact occurs. It sounds like a short click, similar to the noise made when someone touches a microphone. This artifact appears even when both speakers’ microphones have a sampling rate of 48,000 Hz.
Are the client browsers? If not, can you test with browsers?
What if you use different sampling rates in the audiobridge?
Just to rule out unknown/unexpected bugs, could you try commenting default_expectedloss and setting to false audiolevel events?
Why using an untagged libopus commit? Try with v1.5.2 and v1.4.
Thanks for the reply.
We have updated libopus to version 1.5.2 (we didn’t try 1.4 yet) and reset all settings to default, but the issue still persists.
The clients are using Chrome 136 on Windows.
During testing, I noticed more specific conditions for reproducing the problem:
There must be three or more people in the room. We could not reproduce it in one-on-one calls (as well as in videoroom with several people)
The artifact appears when someone starts speaking (interrupting another) after a long silence. When people have been talking for a while, the artifacts disappear.
The closer the microphone is to the mouth, the higher the probability of the artifact occurring. Folks with AirPods-like microphones (or ones using laptops` micro) reproduce the artifact much less often, than the owners of gamer-style headphones, where you talk right to the micro.
It seems to me that the sound amplitude of two or more audio tracks adds up, causing an overflow. This is especially noticeable at the start of a phrase when the autogain is still adapting.
By the way, the offical audiobridge demo also produces artifacts. Just plug in a gaming headphones with micro by the mouth and you’ll notice it (3+ people required).
Sure, buy me some gaming headphones and I’ll check
Jokes apart, if it’s the browser doing sudden gain adjustments, not sure there’s much we can do. Mixing is a sum of signals, and clipping is a part of that if the sum of things gets too loud. We don’t have any form of compression or anything like that, it would be way too complex to add to the code.
On a related note, compression is something you can add yoursefl on the client side via WebAudio, in order to try and avoid spikes at the source. We have a dumb example (not sure how effective) in the demos.
Thanks!
We have tried compression; it definitely removes the spikes, but the sound becomes a bit unnatural and not really suitable for production.
On the other hand, it’s almost impossible to have such artifacts in VideoRoom, where the browser somehow mixes tracks to avoid them. I haven’t examined the Chromium codebase yet, but I think the answer must be there (or even the method to copy).
Maybe there is a basic, simple way to remove spikes? For example, if we have two packets with a total volume louder than the last second’s average, we could adjust the volume of both.
P.S. I’m ready to send you the headphones—just let me know where (not a joke).