lorenzo
(Lorenzo Miniero)
March 17, 2023, 3:38pm
1
Hi all,
as the subject says, I just pushed a new experimental pull request that leverages RNNoise to perform noise reduction in AudioBridge rooms, by denoising incoming audio packets before they’re mixed. The feature is optional and configurable. Please refer to the PR text for a more comprehensive description of the feature, and for some notes on what works and what still doesn’t:
meetecho:master
← meetecho:rnnoise-audiobridge
opened 03:32PM - 17 Mar 23 UTC
This PR is an attempt to revive an effort initially contributed by @mirkobrankov… ic in #2260, by adding an optional and configurable support for RNNoise to the AudioBridge plugin: the idea is to basically add a mechanism to perform noise reduction in audio rooms with the help of the RNNoise library.
This patch is is quite different from Mirko's original contribution, though:
* First of all, the original patch performed noise reduction on the resulting mix: my patch, instead, has separate denoising contexts for each participant, meaning that each participant may or may not be denoised (it can be changed dynamically). The main reason for that is to improve the end result, since if many speakers are noisy, adding their contributions to the mix sums noise too: denoising participants at the "source", instead, should help clean up the signal before it's added to the mix, at the expense of course of some more CPU usage due to the multiple denoisers.
* Besides, Mirko's patch had a peculiar "packet skipping" feature, where you could tell the code to only denoise N packets out of M: not sure why it was done that way (the comments suggested the audio would be too robotic otherwise), but I didn't find any note related to that in the search I made for common practices when using RNNoise, so I didn't do that: in my patch, if denoising is enabled, you denoise all packets, so it's a on/off kind of thing.
Apart from that, I added ways to selectively enable/disable the feature. First of all, you can configure a room to enable denoising by default, by setting the `denoise` property to `true`: in that case, any participant that joins the room resunts in a denoiser instance created for them, unless they explicitly provide a `denoise: false` property when joining. A participant that joins a room where denoising is not enabled by default, can instead create a denoiser by joining with a `denoise: true` property. Denoising can also be enabled/disabled dynamically: participants can do that for themselves via a `configure` request, while room owners can use the synchronous `denoise_enable` and `denoise_disable` requests instead, by specifying the room and the specific participant to impact.
Coming to the actual implementation, it only partly works, due to some constraints in the RNNoise library that I haven't overcome entirely. Specifically, the `rnnoise_process_frame` function expects a buffer of exactly 480 samples to denoise: it can't be more and it can't be less. Depending on the sampling rate in use in the room, a single audio packet of 20ms received via WebRTC can contain a different number of samples: when using 16000 as a sampling rate, for instance, it will be 320 samples; 480 for 24000; 960 for 48000. This means that, at the moment, denoising works fine when you use a sampling rate of 24000 or 48000 (since can do one or two rounds of denoising with samples that are multiples of 480), while you get audio artifacts when using lower sampling rates instead. At the moment, I'm also getting artifacts when using stereo rooms (spatial audio): I do know that Opus uses interleaving when stereo is used, but even taking that into account (and so performing multiple rounds of denoising on different subsets of 480 samples) the artifacts are still there.
For this reasons, I decided to submit this as a draft pull request: in fact, it kinda works already, but it also definitely needs improvements, so I'm hoping that some fresh pairs of eyes looking at the code and/or people testing and providing feedback may help address what's currently not 100% working instead.
Tagging as `multistream` since I implemented this in `1.x`, but this will be trivial to port to `0.x` as well should this be merged eventually.
If you can review the code or test it in some of your rooms, that would be of great help!
Thanks,
Lorenzo
lorenzo
(Lorenzo Miniero)
November 22, 2023, 12:08pm
2
This PR has gone through a bit of a refactoring, and is getting close to being merged. If you’re interested in the feature, please test.