Improvements from my branch #39

GregorR · 2018-09-21T02:13:32Z

Pursuant to our email exchange a week ago, I've sieved out the important parts for merge here:

The NN model is modular, and a different model can either be selected from a built-in set or loaded from an external file.
Many new models are included, trained in a variety of situations.
Maximum attenuation is parameterizable. How useful this feature is is questionable :)
Sample rate is parameterizable. This only affects the mapping of FFT bins to bands, so it's quite general, though it does not affect the input rate, so samples are still expected 480 at a time.
A bug in the !SMOOTH_BANDS mode is fixed

Not included are my broken "fix" to interpolation, the rename, or my various changes to the auto* scripts.

jmvalin · 2018-10-02T07:07:52Z

OK, so I still haven't had time to do a full review of your patches, but I guess I can at least give you some feedback:

Regarding the ability to have multiple models, I think the patches have a few issues. I think it can be a bit confusing for the user to have to pick a model. Also, from previous experience, having a global object like "const struct RNNModel model_orig" exposed in the API itself can cause all kinds of problems. I can't remember the details, only that I've been there before and decided it was best to avoid it. At least for now, my preference would be for a compile-time model selection. I think ultimately, it may be possible to have one model that works for everything. That being said, I like your modularization work as it makes it easier to experiment.
I don't think the sample rate patch works, or at least it's not the right way to do it. The problem is that even if you change the band boundaries to match the new sampling rate, the time difference between windows would still change if you keep the window size the same and change the sampling rate. That would cause a mismatch with the training conditions of the model. The only way to support different rates would be to change the frame size but not the bands. For example, for 32 kHz, the frame size would have to be 320 (640-sample FFT). This would at least make it possible to support the important rates like 8, 16, and 32 kHz. I guess rates like 44.1 kHz could always be achieved by just pretending the audio is 48 kHz since a 10% difference should be small enough.
As far as I can tell, the !SMOOTH_BANDS fix is wrong. Right now, the code won't work for !SMOOTH_BANDS because it would need a model trained for that (with fewer bands), but trying to get the code to run for !SMOOTH_BANDS will just give wrong results.

I'll try to have a closer look at the details of the other patches, but at least you have some comments already.

jmvalin · 2018-10-03T16:35:41Z

OK, I've thought again about the sampling rate change and even the solution I was proposing isn't quite perfect (though it may be good enough, I don't know) because of the pitch estimation/filtering. Of course, an easy solution is to just resample to 48 kHz at the input and then resample back to the original rate on output.

GregorR · 2019-04-09T18:51:07Z

Apologies for the absurdly slow response here, I have other commitments (as of course do you!)

(1) I think the fundamental problem is that regardless of whether it's possible to create a "perfect" model, it will never be all-covering, because people disagree on what is noise. The reason I found the default one to be unsuitable is that what I was recording was myself playing an instrument in my apartment, and the things I consider to be noise in that environment are very different than the things I'd consider to be noise in a crowded coffeeshop. It's not like every possible element in-between is needed, but it's unlikely that a model could be trained in such a way that it worked for both, not because of any limitations in the technique, but because they are actually different problems, albeit related ones. Perhaps a compromise would be to have only the options of the default, or to load from a file, and to include—or not—other models only as files?

(2) Yeah, with your description, I can see that it's not right. We're too far out of my expertise. I agree that just demanding the right sample rate in the first place would be a better option.

(3) The current version doesn't compile :). Literally all this change was was to make it compile. Perhaps it should just be nixed if it's neither working nor going to be fixed.

jmvalin · 2019-04-12T21:32:18Z

(1) I agree that we may indeed need multiple models. I still think a default build (./configure; make) should only include one model. From there, there's many possible ways of adding models. It could be a compile option --with-model=XYZ, it could be loading a file, or both.

(2) Yeah, I think attempting to handle other rates is going to be complicated. If it's really a useful feature, then the best would be to just integrate a resampler and so it transparently.

(3) Yeah, the best is to just remove that option. I'm wasn't planning on bringing it back.

In an attempt to avoid namespace collision, rename model_orig as rnnoise_model_orig. (Note that the previous implementation exported things like vad_data, so this at least only as one, prefixed, unnecessary public variable)

Extending the neural network dumper to dump to a simple text file format, and adding reader functions to read a neural network description from a FILE *.

GregorR · 2019-04-25T00:17:24Z

In an attempt to keep history clean, I recreated the branch with the irrelevant revisions stripped out, so apologies for the appearance of throwing a bunch of commits on at the same time. They're the same, with a couple bugs fixed, and the following changes:

(1) Sample rate adjustment is nixed.

(2) The "fix" to !SMOOTH_BANDS has been replaced with the elimination of that non-feature to clean up the code.

(3) Multiple models are supported exclusively through reading model files.

(4) The builtin model is used only by specifying NULL as the model; the "model_orig" variable is not public. (Note that this option was always available)

Thus, this PR now consists of the following changes:

(1) The RNN model is made modularizable and loadable from a file, with the existing model being loaded if none is otherwise provided.

(2) Warnings are fixed, headers made more modular, the broken !SMOOTH_BANDS implementation is removed, and other such minor cleanups.

(3) Maximum attenuation is made parameterizable. I retain that this feature itself is of arguable utility, but it more importantly brings with it a general configuration API that should scale to future changes.

FrnchFrgg · 2019-05-22T08:34:58Z

Note that in my case where the noise is very low (just cleaning up a voice over recording in a not so soundproof environment), I had to add a path without RRNoise and automate the mix because RRNoise was far too eager on my "fois" which means "times" in French. Roughly one over two sounded "...ois". It seems that the "fffff" sound is close enough to RRNoise's representation of noise that it gets removed.
So having the possibility to use other models might be useful here.

While doing that, I discovered that the phase behaviour of RRNoise is funky, so it cannot be mixed 50% wet. Using a maximum attenuation parameter that can be changed between every processes would enable having an automatable "dry/wet" control (with suitable smoothing of the control to avoid zipping of course, but that responsibility would rely on the wrapper DAW plugin like https://github.com/lucianodato/speech-denoiser/ that I use).

jmvalin · 2019-05-30T04:29:56Z

OK, I reviewed the entire stack and after some rebasing (reordering+fusing some patches), I merged all but three changes. The only remaining issue is related to the max attenuation. I'm OK with the general idea, but have two issues with the implementation:

The code that handles it is really convoluted for apparently no good reason. As far as I can tell, you could get almost the same behaviour by just doing a gain = max(gain, attenuation_floor).
Expressing the attenuation as a gain is really counter-intuitive. I think the attenuation should be specified as a (positive) value in dB. Ideally an int value would probably be best in case there's ever a fixed-point implementation of this lib.

So the three patches I didn't merge are

Added parameterizable maximum attenuation.
Let's scale our maximum attenuation correctly...
Added input of model file to the demo.
(the last one because I can't apply it without the first one)

petterreinholdtsen · 2021-01-18T10:07:39Z

So, should this be reworked and rebased to only include the remaining fixes or be closed?

gnif mentioned this pull request Mar 13, 2019

Add resample support via ssrc (libresample) werman/noise-suppression-for-voice#9

Closed

GregorR added 15 commits April 24, 2019 16:35

Modularizing the RNN model

95038d1

Avoid namespace collision

edf049d

In an attempt to avoid namespace collision, rename model_orig as rnnoise_model_orig. (Note that the previous implementation exported things like vad_data, so this at least only as one, prefixed, unnecessary public variable)

Fix some warnings

6b4d22b

Made dump_rnn output in nu format.

d2071d9

Neural network model files

6d49c2d

Extending the neural network dumper to dump to a simple text file format, and adding reader functions to read a neural network description from a FILE *.

Some fixes to make model files work.

61f587a

Corrected a buggy lack of allocation.

a2f4659

Added parameterizable maximum attenuation.

df246fd

Updated README

d4ad247

Adding instructions on training.

04d3008

Adding my own copyright lines would be wise...

42d8260

Let's scale our maximum attenuation correctly...

1eb410f

Removing the long since broken non-SMOOTH_BANDS mode.

42e1ffb

Fix to RNN file reader to set _size correctly.

3fe3790

Added input of model file to the demo.

2ab6253

GregorR force-pushed the merge branch from 6168a49 to 2ab6253 Compare April 25, 2019 00:05

Zadagu mentioned this pull request Mar 9, 2020

what does max attenuation mean? #113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements from my branch #39

Improvements from my branch #39

GregorR commented Sep 21, 2018

jmvalin commented Oct 2, 2018

jmvalin commented Oct 3, 2018

GregorR commented Apr 9, 2019 •

edited

Loading

jmvalin commented Apr 12, 2019

GregorR commented Apr 25, 2019 •

edited

Loading

FrnchFrgg commented May 22, 2019

jmvalin commented May 30, 2019

petterreinholdtsen commented Jan 18, 2021

Improvements from my branch #39

Are you sure you want to change the base?

Improvements from my branch #39

Conversation

GregorR commented Sep 21, 2018

jmvalin commented Oct 2, 2018

jmvalin commented Oct 3, 2018

GregorR commented Apr 9, 2019 • edited Loading

jmvalin commented Apr 12, 2019

GregorR commented Apr 25, 2019 • edited Loading

FrnchFrgg commented May 22, 2019

jmvalin commented May 30, 2019

petterreinholdtsen commented Jan 18, 2021

GregorR commented Apr 9, 2019 •

edited

Loading

GregorR commented Apr 25, 2019 •

edited

Loading