-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider rnnoise-nu #11
Comments
That is so awesome! Of course I will prefer to use yours. I guess it's time to code and get the release going. Do you think that is possible to add support for receiving floats instead of shorts? That would be cleaner too. |
Since rnnoise does receive floats, I assume you mean floats of the usual range from -1 to 1 instead of floats from -32768 to 32767 :) The models are trained on floating point values, of course, but for whatever reason the original author opted to train them on values from -32768 to 32767 (i.e., 16-bit signed integer values). It would be possible to retrain new models for standard float range, but I'd shy away from the idea of having a separate set of models for different formats, as that could create some very confusing results. Multiplying input values by 32768 and then dividing the output values again should be a very fast operation, since it's just a change to the exponent, and should be lossless on all unclipped audio data. |
Ah yes, just saw speech-denoiser/src/sdenoise.c Line 212 in ab14ff4
|
(Different sample rates, on the other hand, should be very doable. The FFT code is generic, and virtually everything relies only on the calculated bands or the FFT, so it's all doable. I'm yet to think of a way to do it without slowing down the code tho. Of course, some of the bands are above the Nyquist frequency of 12kHz audio, but then, if you're denoising 12kHz audio, something has gone dreadfully wrong in your life.) |
Right! Training the model for different types sounds like a lot of work. And I agree on your comment about sample rate. I will get back to you with something ASAP. |
It's not the amount of work that concerns me. It'd take a week or so, but oh well, c'est la vie. The problem is that because they would be different models, you'd get different results if you fed in identical data with different datatypes, which is just asking for confusion :) |
Parameterizable sample rate is done. Doesn't affect the frame size due to some limitations in the FFT used. In doing so I also eliminated some silly global state, so it is now correct and harmless to do multi-channel (or even multi-track) audio through rnnoise-nu by simply creating a DenoiseState per channel per track. I doubt that it would be faster to make rnnoise-nu itself multi-channel, as it would just have to mix them (or probably their FFTs) to do the analysis and then apply the results independently to each track. |
Well it's done. Thank you very much for your suggestion. Let me know if you find something wrong. |
A couple minor suggestions: speech-denoiser/src/sdenoise.c Line 41 in 57ed881
It shouldn't be necessary to define these parameters, as they're defined by the header. speech-denoiser/src/sdenoise.c Line 221 in 57ed881
Just a clarification: Is this parameter being set every frame because it can be changed at any time? speech-denoiser/src/sdenoise.c Line 113 in 57ed881
It's not necessary to both speech-denoiser/src/sdenoise.c Line 112 in 57ed881
It would be nice to have some way of parameterizing the different models, or even loading a model from a file (if LV2 makes that possible/easy). For what it's worth, the different models are well worth using, and work great on suitable input! |
First comment: Ooops! |
FYI, I'm suddenly in communication with the original author of RNNoise, so expect... Idonno, something. |
Nice! Go Gregor! |
are there any chances of a merge? did you guys speak of that possibility? |
Yes. Needs some cleanup, but something's moving upstream. What exactly is TBD :) |
I'd like to help if I can :) |
Hi,
I've been fussing with RNNoise and made a slightly-incompatible fork that:
(1) Supports multiple neural network models, several of which I've trained (and I'm still training more),
(2) Supports a simple ASCII file format for future models, and
(3) Parameterizes the maximum attenuation to perform.
I've been scratching my head over whether to learn a whole plugin infrastructure in order to get it working slightly more tastefully, and since you have mixing wet and dry as a todo item (with the goal presumably the same as maximum attenuation, but frankly doing it in the library is a bit cleaner), maybe we can scratch each other's itches.
My rnnoise: https://github.com/GregorR/rnnoise-nu
My rnnoise models (informational, not needed to use the library): https://github.com/GregorR/rnnoise-models
In terms of the library, the changes are small.
rnnoise_create
takes anRNNModel *
as an argument, or NULL to use the default (this was what necessitated breaking compatibility).rnnoise_models
returns a NULL-terminated list of models (meaningless string names) andrnnoise_get_model
gets a model by name. Alternatively,rnnoise_model_from_file
can load a model from a file, to be freed byrnnoise_model_free
. Finally,rnnoise_set_param
lets you set the (solitary) configurable parameter.(An aside: My knowledge of LV2 is limited to... well, nothing, but I was surprised to find while poking at your library that there doesn't seem to be any mention of number of channels. Does LV2 just send each channel through a different instance of the plugin, or is this plugin single-channel-specific?)
The text was updated successfully, but these errors were encountered: