-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Automatic Gain Control #621
base: master
Are you sure you want to change the base?
Add Automatic Gain Control #621
Conversation
Since I do not feel fully qualified to review the algorithmic side of this, I am going to ask the wider rodio community for help in reviewing. I will also read up on AGC and see if I can find any problem with the implementation.
|
src/source/agc.rs
Outdated
self.current_gain = self.current_gain.clamp(0.1, self.absolute_max_gain); | ||
|
||
// Uncomment for debugging: | ||
println!("Current gain: {}", self.current_gain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we can do without these now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill leave them in for now so myself and others can test until everything is set in stone, then remove them before the final merge if thats ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very readable thanks! I have some minor things here and there. The algorithm makes sense to me though ill need to read up on AGC and look closer tomorrow. Did you come up with it yourself or did you find it somewhere? In either case we should attribute it in a comment somewhere.
Regarding the to_32
. That seems like a fine solution to me. I am wondering though would the algorithm still work if you had only i16? We could implement subtract, add multiply_by_f32, div_by_usize and modulo for Sample. There are trait in the std for all those I think.
That might save some conversions to f32, and in doing so speed up the implementation.
Ill look into it, but probably would need other people's input on what would be best to do there. It could be a premature optimization, but if it works, then that's a bonus. Thanks for the suggestion :)
I put this together myself, but I definitely used some algorithms and ideas I found online. So while I wrote the code, I built on what others have done before. By algorithms i mean mathematical ones
Where? |
- Implement asymmetric attack/release - Introduce MIN_ATTACK_TIME limit to prevent AGC instability - Clamp attack_time to prevent instability - Faster decrease, slower increase for smoother sound - Safeguard against extreme gain fluctuations
Integrated it with my GUI earlier, and it's working really well—Tested on a -10db audio file. Here’s a video showing the results: (Left is using sink without AGC) Video: 2024-09-27.20-00-04.mp4Let me know what you think of the new comment style and whether it is okay and descriptive enough. |
you got a good point there. It might very well be, its a real bummer rodio has no benchmarks/profile runs setup. We should not optimize without benchmark or profile. |
maybe the top of the source file? If you can list your inspiration and put your name on it, you made an algorithm for AGC you deserve some credit for that. Something along the likes of:
Its entirely optional if you do not feel like it you can leave this out. |
Great to hear! unfortunately the video is not playing for me (neither in browser or downloaded in VLC). Edit: video now works 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, this is the most well documented code I have seen in a long while ❤️. I am gonna show this to my friends. You have set a high bar for all the other PR's now :)
The only thing that could be improved is the --debug-gain
flag, I think checking an argument as a library is sup-optimal.
Your comment there mentions its for both debugging and monitoring. Do you mean its useful for the end user to be able to see the number? For example so they can tweak the parameters better?
If that's true then maybe a callback would be a good idea. Automatic gain control could get an extra argument (maybe an option?).
It would look like this then:
fn automatic_gain_control(
self,
target_level: f32,
attack_time: f32,
absolute_max_gain: f32,
gain_monitor: Option<Box<dyn Fn(f32)>>,
) -> AutomaticGainControl<Self>
The current monitoring code in fn next
could then become:
if let Some(callback) = self.gain_monitor {
callback(self.current_gain);
}
I have used Box<dyn Fn>
to prevent AGC getting a generic argument. That will incur a performance cost when using monitoring, its a tradeoff. What we could do is add a second source: MonitoredAutomaticGainControl
which is generic over the monitor callback. Then implement the current AutomaticGainControl
by just calling MonitoredAutomaticGainControl
with as callback a function that does nothing. The optimizer will optimize that function out for us.
Regarding logging for debugging purposes. Its probably a good idea to add either tracing or log to Rodio. We can enable them using a compile time feature so they are zero cost for any user that does not need them. They also integrate nicely in whatever application someone is writing.
I should really add some benchmarking to Rodio. AGC is now running for every sample, might be an idea to make it run only once every n samples to lower the perf hit of AGC. But that is definitely not for now but for a future PR/issue once we have benchmarking integrated into Rodio's CI. |
Running every n samples could work, but we risk issues with the audio not being able to adjust when needed. For example, a bass drop or a loud kick would possibly have a gain that is too high, making a crackling sound. Instead, we can make a threshold that logs the previous values and detects a large enough change. This is my testing while developing (AGC responsiveness threshold)
Although it most of the time indicates a 2.8% cpu usage, it's still such a small difference that I don't know if it's worth it. Plus, this is all using one core. By that, I mean 100% equals one core, and 200% equals two cores fully maxed out. In this case, we are using 1/36th of a core on average. So, maybe it's not worth it; the more delay we have, the more potential audio issues we will encounter, as it might not be able to react in time to lower it enough reasonably. In any case here is the code for that if we want to explore it in the future /// Checks if the peak level change is significant enough to trigger a gain update
///
/// This method calculates the absolute difference between the current peak level
/// and the last significant peak level. It considers a change to be significant
/// if this difference is 0.01 or greater, which is equivalent to a 1% change
/// in the signal level.
#[inline]
fn is_significant_change(&self) -> bool {
// Calculate the absolute difference directly using f32
let difference = (self.peak_level - self.last_significant_peak).abs();
// Consider a change significant if it's 0.01 or more (1% change)
difference >= 0.01
} |
definitely a good idea
Thank you so much! That really means a lot to me
Originally, it was just so I could view the changes while developing. However, now that I've refined the algorithm, it's become quite handy to read the values visually. This way, you don't have to guess using your ears. Realizing this, I'm planning to add a toggle for it, so we don't need to parse and loop over all arguments.
I don’t mind. If you're asking about the original inspiration for why I decided to work on this, it's because I am creating a graphical music player entirely in Rust, with the exception of two C backup dependencies. I wanted the audio to be automatically leveled and clear. |
e6c264e
to
97636d1
Compare
Force pushed so i don't add redundant commits to the history as i just merged the commits together |
This is a good idea but i wonder since our users will only do this once probably in debug mode why not just do // Output current gain value for debugging purposes
#[cfg(debug_assertions)]
println!("Current gain: {}", self.current_gain); The upside is that Rust's compiler's dead code elimination typically prevents unused functions and types from being included in the final binary of the application using this library. So, if people don't call |
I completely agree, thats not worth the effort and increased code complexity 👍 |
If anyone needs access to the value via a callback they can always open an issue, and then its easy enough to add. Always printing in debug mode seems like a bad idea, imagine having a tui app and in debug mode the view gets corrupted 😅. But the idea of determining compile time whether to print it is good. I see two options there (maybe there are more):
It seems best to go for the second option, though that is out of the scope for this PR, ill see if I can quickly add tracing to rodio. |
Yea that seems best also look into env_logger it's integrated directly with log and allows control over logging through environment variables Like this: Clarification I've never used any of these crates before, but I've heard of env_logger and the others. |
tracing builds upon log and env_logger, its a bit heavier to compile but I think that is okay since you have to opt into it. These crates are always split into two parts: the core that only gives you the I have added tracing on main now, see 95a466e on how to use it. And to see those traces in your tests/application you will want to do something like this: https://github.com/tokio-rs/tracing/blob/master/examples/examples/fmt.rs. For logging in AGC I would suggest |
(I have not looked at the new changes yet) I am working on adding benchmarks, and I notice that some sources accept any Sample type while others want f32. I would like to unify that in the future. As long as AGC has to cast samples to f32 it might be best to just demand that the Sample is f32 in the bounds. (see high_pass source as an example of this). There should probably be a contributors guide that mentions stuff like this... ill add it to my todo :) |
Okay I really do not know whats best now, I just noticed amplify using |
Yea a decent contributors guide can go a long way
Yea probably costs nothing. |
Thats with a release build? Kinda suprises me tbh. Yeah adding such a not is worth it even if its only on debug builds. |
Nope that's on a debug build on release the init it is ok |
Very nice, ill have a look :) Its a pitty I did not find time to do the benchmarks earlier. They are done now though 👍. You can merge them in from main and add your own in You run the benchmarks with From what I can see the conversions to f32 are, as expected, basically free:
|
next review will have to wait until after the weekend. Have a great sunday btw 👍 |
Ok check u then ill be here refining it more 👋 Have a great sunday aswell |
I can't think of any other improvements I could make to the code; so far, the current state is near final. Do note that some very edge case songs can have 1 or 2 samples of slight distortion only when the sound goes from almost zero to full blast Video below is the agc version song_waveform.mp4Here is the source video for reference song_waveform2.mp4I don't know if you can hear the little clip when the beat drops, but this kind of thing is common with agc implementations and is an edge case for almost all agc. You can mitigate it with lookahead buffers, but that adds more latency and probably won't fully stop it. Remember, it has to be from almost absolute zero. 99% of songs will not be affected this is part of that 1% that will. Edit: if the videos do not load refresh the page Ironically, I kind of like it; it makes the bass a little punchier at the boost from near absolute zero to max, but it is something to be acknowledged as a downside. Edit2: The video is using the default: |
- Increase RMS_WINDOW_SIZE for more stable measurements of very low frequencies - Replace MIN_ATTACK_COEFF with release_coeff for improved customizability These changes provide better handling of low-frequency content and allow developers more control over AGC behavior, particularly in attack/release time adjustments.
Implement get_agc_control() to allow dynamic enabling/disabling of AGC during audio playback.
- Add references to get_agc_control method in automatic_gain_control docs - Include a quick start example demonstrating usage of get_agc_control
ohh nice idea, love it! 👍 |
I asked a friend to listen in, we both did not hear it. So its not an issue for us, I would say keep it as is. If someone has a problem with it they are of course free to add that look-ahead buffer as a variation on the current AGC (maybe with some generics). |
impl CircularBuffer { | ||
/// Creates a new CircularBuffer with a fixed size determined at compile time. | ||
/// | ||
/// The `_size` parameter is ignored as the buffer size is set by `RMS_WINDOW_SIZE`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why is it there? :)
/// which is crucial for real-time audio processing. | ||
#[derive(Clone, Debug)] | ||
struct CircularBuffer { | ||
buffer: [f32; RMS_WINDOW_SIZE], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32Kb is a rather large allocation on the stack. I am kinda surprised that clippy is not complaining here. Might be better to heap allocate this (vec).
You could also try replacing buffer+index in CircularBuffer
with VecDeque
. But that might have some negligible overhead. Since you added a benchmark we can measure that now 😝.
|
||
/// Calculates the mean of all values in the buffer. | ||
/// | ||
/// This operation is O(1) due to the maintained running sum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 🥳
/// * `release_time` - Time constant for gain decrease | ||
/// * `absolute_max_gain` - Maximum allowable gain | ||
#[inline] | ||
pub fn automatic_gain_control<I>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be pub(crate) I am pretty sure this is wrong in most rodio sources, see: #622 for more info.
min_attack_coeff: release_time, | ||
peak_level: 0.0, | ||
rms_window: CircularBuffer::new(RMS_WINDOW_SIZE), | ||
is_enabled: Arc::new(AtomicBool::new(true)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you added 'dynamic' controls in there. The current way we do controls in rodio is through the sink. @PetrGlad had the idea to try and do this with Arcs + Atomics just like you do here. I have no idea what the performance impact of this is, its something I have been discussing in #619 and which will move to its own issue in the future.
For now its a bit strange that enabled
can be changed via a handle (gotten through get_agc_control
) but the other variables can not. @PetrGlad AGC might be a good candidate for benchmarking dynamic controls (though atomics+arcs) since it has so many variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your ahead of us regarding the AGC control. Then there is a stack size issue we might wanna adress. Finally we should add AGC to the sink so everyone can easily enable it provided that AGC has hardly any preformance impact once disabled (I think you are already benchmarking that). Lets do that in a second PR though! Or if you have no time I can do that.
Implemented automatic gain control (AGC)
I implemented a to_f32 conversion for the sample as it was missing. I'm not entirely certain if this is optimal, so I welcome feedback on the implementation.
This pull request is in relation to. Closes #620
Here is my
main.rs
andCargo.toml
i used to test while developing:Id suggest an attack_time of around 0.5 to 2.5 that seems to be a sweet spot for songs I tested.
Also, target_level I adjusted to be 0.9 just to make sure it does not clip the audio; it was clipping for me at 1.0.Don't know where to write documentation or if I should be the one writing it. Idk, let me know; thanks :)