[FEATURE] Offset estimation #491

mjcourte · 2024-10-12T02:23:50Z

Thanks for taking the time to write a feature request! Use the following prompts to help you describe what you're looking for. The more info you provide, the easier it'll be for me to understand what I can do to make it happen for you without having to come back and ask you questions.

Is your feature request related to a problem? Please describe.

Pursuant to conversations in #391, I'm sharing my offset estimation that seems to work in some cases where null/auto-estimate offset fails.

Describe the solution you'd like

I'm just sharing very rudimentary demonstration of offset calculation. Its not SOTA, its not ML/AI. Its 0.1 cpu seconds of goofy scipy.signal and numpy.

@sdatkinson you mention you have some tricky examples, please point me at them. I have no guarantee that this is better than what you are doing but I'm willing to spend a few GPU-minutes on testing.

Describe alternatives you've considered

N/A for now(?).

Additional context

MWE:

The final offset number in samples I use for training is the negative of the delay which is printed in the title of ax[1]. In this example, I would punch in -856, and got ESR = 0.01433 of a high gain, no cab model.

import matplotlib.pyplot as plt
import numpy as np
from scipy.io import wavfile
from scipy.signal import find_peaks
from pathlib import Path

def peak_region_search(signan, peak_index, window):
    region = np.logical_not(np.isnan(signan[peak_index-window:peak_index]))
    vals = np.flatnonzero(region)
    retval = vals[0] + peak_index - window
    return retval

ref_wav_path = Path.cwd().joinpath('wavs', 'v1_1_1.wav')
ref_wav = wavfile.read(ref_wav_path)
ref_signal = ref_wav[-1]
ref_peaks, _ = find_peaks(ref_signal[:int(1e5)])

wav_path = Path.cwd().joinpath('wavs', 'Dokken_BFTA_Plexi.wav')
wav = wavfile.read(wav_path)
signal = wav[-1]
peaks, _ = find_peaks(signal[:int(1e5)],
                          height=0.1, 
                          distance=22000)

delay = np.mean(peaks[:2] - ref_peaks[:2], dtype=int)
# P.S.: it appears this delay value, but negative, is possibly all that's needed

# P.S.: signan
# is a copy of the reamp signal, but with small values set to NaN
# to reduce plotting density
signan = signal.copy().astype(float)
signan_ext = np.max(np.abs(signan))
signan[np.abs(signan) < 1e-4 * signan_ext] = np.nan

estimated_left_peak = peak_region_search(signan, peaks[0], delay)
estimated_right_peak = peak_region_search(signan, peaks[1], delay)

est_corr_delay = np.mean(np.array([estimated_left_peak, estimated_right_peak]) - ref_peaks[:2], dtype=int)


fig, ax = plt.subplots(2,1, sharex=True, figsize=[8,8])

ax[0].plot(ref_signal, label='Reference v1_1_1.wav')
ax[0].scatter(ref_peaks[:2], ref_signal[ref_peaks[:2]], color='red')
ax[0].text(ref_peaks[0], ref_signal[ref_peaks[0]], f"{ref_peaks[0]}")
ax[0].text(ref_peaks[1], ref_signal[ref_peaks[1]], f"{ref_peaks[1]}")
ax[0].set_xlabel('Samples')
ax[0].grid()
ax[0].set_title('Reference v1_1_1.wav')


# find the index in signan of the last leftmost contiguous non-nan of peak

ax[1].plot(signan, label=wav_path.parts[-1])
ax[1].scatter(peaks[:2], signal[peaks[:2]], color='red', label='blip peaks')
ax[1].text(peaks[0], signal[peaks[0]], f"{peaks[0]}")
ax[1].text(peaks[1], signal[peaks[1]], f"{peaks[1]}")
ax[1].scatter([estimated_left_peak, estimated_right_peak], 
              signal[[estimated_left_peak, estimated_right_peak]], 
              color='green', 
              label='leading non-zero'
              )
#ax[1].scatter(estimated_right_peak, signal[estimated_right_peak], color='green')
ax[1].text(estimated_left_peak, 0, f"{estimated_left_peak}", ha='right', va='top')
ax[1].text(estimated_right_peak, 0, f"{estimated_right_peak}", ha='right', va='top')
ax[1].set_xlabel('Samples')
ax[1].grid()
ax[1].legend()
#EDIT
#ax[1].set_ylim(-0.5*signan_ext, 0.5*signan_ext)
ax[1].set_ylim(ax[0].get_ylim())
#END EDIT
ax[1].set_xlim(1e4, 4e4)
ax[1].set_title('Input ' + wav_path.parts[-1] + f', delay: {delay} ({est_corr_delay}) spls' )
fig.tight_layout()
fig.show()

sdatkinson · 2024-10-18T22:38:10Z

Here's some cases you can test your method on: Latency calibration challenge cases | Drive.

Let me know how it goes!

mjcourte · 2024-10-20T02:31:29Z

A quick first stab yields the following:

All standard model, 100 epoch.

Reamp File	Offset	ESR
1	-221	0.01740
2	0	0.00686
3	-28	0.01096
4	0	0.004833
5	-47	0.5142

output_2 and output_5 had < 0.01 ESR, so lets call them done
output_5 was a total failure here. I mistakenly allowed acausal search that first yielded +77 samples lol.
output_5 trips your data quality checker and aborts before even checking for an offset value or estimating one. Therefore all output_5 training for ESR is done with disable quality check.
output_1 and output_3 potential room for improvement, but we'll see.
1-4 were all totally suitable sounding to me in NAM.
What does the cab modelling switch even do? I ran with it on before checking the models were amp-only, but the models were still acceptable.
What I did find is that for many-epoch training, having overestimated the offset (too large negative number), within reason, didn't seem to adversely affect the ESR. Looking at the traces below for output 1, should one map peak-to-peak? Well, everything on the left shoulder of the peak is then acausal, so peak finding is too late, in a sense. But it still worked and sounded appropriate.

Using a brute-ish force search for ground truth

Reamp File	Offset	ESR
1	-225	0.01858
5	-85	0.07148

output_1, really not worth the time spent, would be better off training more epochs on the detected offset.
didn't bother with output_3 as no improvement in 1

output_5

I went pretty hard here, searching (in a manner not suitable for a pre-train operation)
- -200 to +100 in increments of 25 for 20 epochs.
- -110 to -50 in increments of 10 for 40 epochs
- -70 to -90 in increments of 4 for 50 epochs
I'm vaguely settled on -85 being a reasonable "ground truth" for this signal
No further linalg or DSP trickery could lead me to that answer from only v3_0_0 and output_5 wavs and two beers worth of code.
Inspecting the file, both signal and SNR are low, and there looks like blocking distortion, so this must be like a fuzz pedal or something, possibly with an noise gate or expander. There's also some weird acausal stuff going on which I'm not sure how it would happen in analog audio unless someone has digitally munged this wav.
This comports with my own trouble cases of cranked (read: dimed Plexi) Marshall-type sounds with lots of power amp distortion and noise with some blocking/saturation happening, and you get the post-signal-saturation noise recovery hiss ramp. Even with a reasonable offset, WaveNet doesn't do a good job of sorting out that mess.
My tl;dr here is unless you have some serious reason to want to be able to model this output_5, I'm happy to agree with your data quality check and say that its not modellable.

rossbalch · 2024-10-30T02:08:59Z

I've added a short 100hz sine into my recording that doesn't get exported but I use in case of tricky alignment, would that kind of thing be of any help in this case?

mjcourte added enhancement New feature or request priority:low Low-priority issues unread This issue is new and hasn't been seen by the maintainers yet labels Oct 12, 2024

mjcourte mentioned this issue Oct 12, 2024

[DOCUMENTATION] How to set the latency compensation manually #391

Open

sdatkinson removed the unread This issue is new and hasn't been seen by the maintainers yet label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Offset estimation #491

[FEATURE] Offset estimation #491

mjcourte commented Oct 12, 2024 •

edited

Loading

sdatkinson commented Oct 18, 2024

mjcourte commented Oct 20, 2024

rossbalch commented Oct 30, 2024

[FEATURE] Offset estimation #491

[FEATURE] Offset estimation #491

Comments

mjcourte commented Oct 12, 2024 • edited Loading

sdatkinson commented Oct 18, 2024

mjcourte commented Oct 20, 2024

A quick first stab yields the following:

Using a brute-ish force search for ground truth

output_5

rossbalch commented Oct 30, 2024

mjcourte commented Oct 12, 2024 •

edited

Loading