Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clanSound mixing issue when playing multiple sounds #130

Open
rombust opened this issue Sep 18, 2024 · 4 comments
Open

clanSound mixing issue when playing multiple sounds #130

rombust opened this issue Sep 18, 2024 · 4 comments

Comments

@rombust
Copy link
Collaborator

rombust commented Sep 18, 2024

I believe clanSound has a bug where if you play multiple sounds at the same time, they sound distorted (crackling).

The issue is when clanSound mixes sounds, it adds the output to the mix buffer, but it never divides the mix buffer by the number of sounds.

Thus sounds are louder, and distorts when they exceed the -1.0 to 1.0 range

I am surprised that it hasn't been reported before ?

Fix is:


void SoundOutput_Impl::apply_master_volume_on_mix_buffers()
{
 *** Add these 3 lines ***
	std::unique_lock<std::recursive_mutex> mutex_lock(mutex);
	if (sessions.empty())
		return;
  ***

	// Calculate volume on left and right channel:
	float left_pan = 1 - pan;
	float right_pan = 1 + pan;
	if (left_pan < 0.0f) left_pan = 0.0f;
	if (left_pan > 1.0f) left_pan = 1.0f;
	if (right_pan < 0.0f) right_pan = 0.0f;
	if (right_pan > 1.0f) right_pan = 1.0f;
	if (volume < 0.0f) volume = 0.0f;
	if (volume > 1.0f) volume = 1.0f;

	float left_volume = volume * left_pan;
	float right_volume = volume * right_pan;


	//SoundSSE::multiply_float(mix_buffers[0], mix_buffer_size, left_volume);
	//SoundSSE::multiply_float(mix_buffers[1], mix_buffer_size, right_volume);
  *** Changed these lines ***
	SoundSSE::multiply_float(mix_buffers[0], mix_buffer_size, left_volume / static_cast<float>(sessions.size()));
	SoundSSE::multiply_float(mix_buffers[1], mix_buffer_size, right_volume / static_cast<float>(sessions.size()));
}

@dpjudas
Copy link
Collaborator

dpjudas commented Sep 18, 2024

The issue here is complicated since if you just divide by maximum number of sounds mixed you end up with a way too low volume for what most expect.

The better way of dealing with this is to lower the master volume on sounds to, say, 75% so they don't immediately use the entire dynamic range. Since it can still happen, there should be a gain limiter on the mixed result so it will never go beyond 100%.

Note that the sound is actually not distorted today as the Windows mixer itself applies a gain limiter on the final output sent the sound card. That is required as you can't know if something else is also playing sound that will put it beyond the limit - i.e. your game + spotify playing sound at the same time.

@rombust
Copy link
Collaborator Author

rombust commented Sep 20, 2024

Thanks, I had not thought of that. The solution isn't simple for my use case.

Asking ChatGPT, the alternatives other than averaging the sound (with perceived reduced volume) is "RMS Normalization with Scaling". As always with ChatGPT it sounds convincing, but if it's right, that's what I would need to research.

#include <cmath>
#include <algorithm> // for std::min and std::max

float mixFourSamples(float sample1, float sample2, float sample3, float sample4) {
    // Step 1: Mix by averaging the four samples
    float mixed = (sample1 + sample2 + sample3 + sample4) / 4.0f;
    
    // Step 2: RMS Normalization (to preserve perceived loudness)
    float rms1 = sample1 * sample1;
    float rms2 = sample2 * sample2;
    float rms3 = sample3 * sample3;
    float rms4 = sample4 * sample4;
    
    // Calculate RMS of the mixed signal
    float rmsMixed = std::sqrt((rms1 + rms2 + rms3 + rms4) / 4.0f);
    
    // Avoid division by zero, and scale the mixed signal by its RMS
    if (rmsMixed > 0.0f) {
        mixed = mixed / rmsMixed;
    }

    // Step 3: Prevent clipping by limiting the mixed sample within the valid range (-1.0 to 1.0)
    mixed = std::min(1.0f, std::max(-1.0f, mixed));
    
    return mixed;
}

Oh, it also suggested "Dynamic Range Compression (Optional)"


float softLimit(float sample) {
    const float threshold = 0.9f;  // Define a threshold close to 1.0 to apply limiting
    if (fabs(sample) > threshold) {
        sample = threshold + (1.0f - threshold) * tanh(sample - threshold);
    }
    return sample;
}

float mixFourSamplesWithLimit(float sample1, float sample2, float sample3, float sample4) {
    // Step 1: Basic averaging mix
    float mixed = (sample1 + sample2 + sample3 + sample4) / 4.0f;
    
    // Step 2: Apply a soft limiter to avoid clipping
    mixed = softLimit(mixed);
    
    return mixed;
}

@dpjudas
Copy link
Collaborator

dpjudas commented Sep 20, 2024

I'm afraid I don't know enough about the subject to really say if what ChatGPT is talking about makes any sense or not.

I thought that OpenAL used a simple limiter where it looked for the max sample value encountered (within a certain time period, the 1024 samples are we mixing in one fragment) and divided it by that value if it was beyond 1.0. At closer inspection though it seems it is using this algorithm:

General topology and basic automation was based on the following paper:

D. Giannoulis, M. Massberg and J. D. Reiss,
"Parameter Automation in a Dynamic Range Compressor,"
Journal of the Audio Engineering Society, v61 (10), Oct. 2013

Available (along with supplemental reading) at:

http://c4dm.eecs.qmul.ac.uk/audioengineering/compressors/

Their implementation can be found in https://github.com/kcat/openal-soft/blob/master/core/mastering.h where the process function is what applies the limiter to a span of mixed samples.

I can't really say what the quality difference is between the simple limiter I described vs that paper vs whatever ChatGPT ranted about. I'm not that good at audio. :)

@rombust
Copy link
Collaborator Author

rombust commented Sep 20, 2024

Thanks for the info. I should re-learn the theory behind audio. It has been decades since I understood it. For example, if I play two sine waves, one with sin(theta) and the other sin(theta+pi). Would they, or should they cancel each other out. Or for example if one wave is 2sin(theta) and the other is sin(theta), is the combined sound louder. Even looking back how the Commodore Amiga handles the 4 channels in hardware. I don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants