Identifying pattern in Audio File #44

aqibmumtaz · 2021-04-13T08:11:33Z

aqibmumtaz
Apr 13, 2021

How to find Pattern in an Audio format:

These are two approaches:

Directly convert the audio file into byte ndarray and apply simple all the 1-D(line pattern and line pps) algorithms we have created
Each sample is stored as a number (two bytes)
range of available combinations:
– 16 bits, 216 = 65,536
–we want both positive and negative values (To indicate compressions and rarefactions)
–we use one bit to indicate positive (0) or negative (1)?
– That leaves us with 15 bits – 15 bits, 215 = 32,768
– One of those combinations will stand for zero ( We’ll use a “positive” one, so that’s one less pattern for positives so the range is from -32,768 to 32,767)
A Sound has many values in it
– Numbers that represent the sound at that time in the sample
• We can get an array of SoundSample objects
– SoundSample[] sampleArray = sound1.getSamples();
Once we have array we can implement line-pattern and line PPs algos on these arrays?(what do you say)
Calculate the DFT from FFT of the audio signal and store that in the array we can proceed on that array too
OR
further from DFT apply windows and create and plot Spectrogram(In a spectrogram representation plot — one axis represents the time, the second axis represents frequencies and the colors represent magnitude (amplitude) of the observed frequency at a particular time.) and this will convert it into a image like we have now then we can find patterns or line of patterns as we are on 1-D line pattern and 2-D frame pattern identification thing (your thoughts)?

kwcckw · 2021-04-13T08:41:55Z

kwcckw
Apr 13, 2021

interesting, so how would be the pattern in audio file looks like?

For images, we may have low level feature, such as lines or edges, and then to blobs and shape, and with that we may achieve some higher level application such as recognition or detection.

In this case, for audio, would it be some applications such as background noise removal? Or auto tuning? To enable the visualization, what would be some other option to 'see' the frequency response other than spectrogram ?

And do you mind to share the code as well?

1 reply

aqibmumtaz Apr 16, 2021
Author

Images are 2-D matrix and have pixels like at 1,1 and 1,2 coordinates etc, for audio signals as they are 1-D while we have for every sampling rate audio at every time point/instant we have amplitude value as a low level feature.

As far as I know FFT(Fast Fourier Transform) is the methods to plot frequency response of audio signal. Its an algorithm for the implementation of DFT(Discrete Fourier Transform)

boris-kz · 2021-04-13T11:50:30Z

boris-kz
Apr 13, 2021
Maintainer

Not sure DFT is directly applicable here, let's do without it first. Maybe the alg will rediscover it.
As I mentioned before, time is the only dimension here, intensity and frequency are two parameters of input, not dimensions.
Those parameters are all positive, sign should only be in differences and deviations computed by cross-comp.
So, line_patterns will have to cross-compare two parameters in parallel, and initial pattern-defining sign will be a deviation of intensity match + frequency match? But in this case, intensity does represent physical force applied to generate sound, so the match should be defined directly, as min(i1,i2)? This is not the case with frequency though, it's match should be defined indirectly, as ave - abs(diff).

29 replies

aqibmumtaz Apr 16, 2021
Author

Yes, but that time-frequency overlap is with Ps, not primary inputs.
I guess we need direct FFT on primary input, then reverse FFT on each resulting frequency separately.
Then line_patterns will process in time domain on each frequency channel, forming Ps.
Then we align resulting Ps between channels in time, and compare time-overlapping Ps between frequencies, forming PPs.
But in that comp_P, I don't know how important is proximity in frequency domain: do we compare with incremental difference between frequencies, or between all frequencies at once?

Alright
Ps will be find between each frequency channels in time domain and then PPs form overlapping Ps. In frequency domain I think comparison will be between all frequency at once because in frequency domain we can't define or have wavelength which you are defining easily in time domain.

boris-kz Apr 16, 2021
Maintainer

Proximity in frequency domain still matters: the mechanism for generating sounds (such as vocal cords) is far more likely to drift or switch into neighboring frequencies. Also, voices of different people differ by a relatively small shift in frequency.
So, while comparison range will be greater in frequency domain, we still have to account for the distance.

boris-kz Apr 16, 2021
Maintainer

Yes, L + neg_L should be the T of the recurring frequency, so it is one of the most important parameter with more weight.

Regarding the match , so mostly we will form new match related parameters from both Ps parameters. But what would be the reason to use indirect match instead of direct match? Is it indirect match for derived parameters?

Indirect match is used for input variables where there is low correlation between magnitude and predictive value (inertia).
That's not the case with sound, amplitude is proportional to the effort needed to generate it.
But min is basically a co-detection, it doesn't depend difference. So, it's good for merging above-average patterns, but doesn't look like a similarity measure on it's own.
We may need to use projected match: min - abs diff / 2. Need to think about it.

aqibmumtaz Apr 17, 2021
Author

Proximity in frequency domain still matters: the mechanism for generating sounds (such as vocal cords) is far more likely to drift or switch into neighboring frequencies. Also, voices of different people differ by a relatively small shift in frequency.
So, while comparison range will be greater in frequency domain, we still have to account for the distance.

and how to calculate comparison range in frequency domain? just like what the range-comp in line_patterns?

boris-kz Apr 17, 2021
Maintainer

It's more like comp_slice_ in 2D alg, blob slice is just a more intuitive term for P here.
But at this point it only compares vertically adjacent Ps. So yes, like in range_comp we can use incremental range, but it will probably be incremental in the number of consecutive cross-compared frequencies, not in the difference in frequency. This would be selective per PP, defined by prior-range cross-comp.

boris-kz · 2021-04-13T21:00:11Z

boris-kz
Apr 13, 2021
Maintainer

I just checked for the first time, uncompressed audio is PCM, which doesn't represent frequency, only amplitude.
To find locally salient frequencies, we need either something like Fourier transform as a shortcut, or algorithm itself should discover them, which is more consistent. That can probably be done by a version of range_comp, except it will be storing individual matches, vs. summing them in a kernel. I was thinking about this version before, it makes sense for inputs that only match at a specific range. Then we compare these individual matches of the same input at different distances, to find the distance at which they frequently match. That distance is a wavelength. Not even sure this process is much different from FT?

3 replies

kwcckw Apr 14, 2021

I just checked for the first time, uncompressed audio is PCM, which doesn't represent frequency, only amplitude.
To find locally salient frequencies, we need either something like Fourier transform as a shortcut, or algorithm itself should discover them, which is more consistent. That can probably be done by a version of range_comp, except it will be storing individual matches, vs. summing them in a kernel. I was thinking about this version before, it makes sense for inputs that only match at a specific range. Then we compare these individual matches of the same input at different distances, to find the distance at which they frequently match. That distance is a wavelength. Not even sure this process is much different from FT?

I'm not really familiar in Fourier transform but this would looks partially similar. Since it would change the way we represent the data for the purpose of finding pattern within it.

boris-kz Apr 14, 2021
Maintainer

Actually, that frequency discovery won't be by range_comp in line_patterns, it will be done in line_PPs. That's because it's much easier to justify incremental-distance cross-comp of wave crests, vs. individual pixels. Match in sound is defined directly: as min intensity, thus positive patterns are spans of above-average intensity, which will be around those wave crests. This cross-comp P will also be done on negative Ps (spans between crests) and complemented +P,-P. That complemented span is a wavelength, the span over which it cross-matches is a PP: the span of recurring frequency we are looking for.

kwcckw Apr 15, 2021

Actually, that frequency discovery won't be by range_comp in line_patterns, it will be done in line_PPs. That's because it's much easier to justify incremental-distance cross-comp of wave crests, vs. individual pixels.

Yea, this is more reasonable since it's similar with image , we didn't perform recognition/detection with pixels but with features instead. So in this case, we are looking for similar recurring pattern within the frequency with some detected pattern, instead of single data point pattern. So incremental-distance cross-comp of wave crests should be better than individual data point.

Match in sound is defined directly: as min intensity, thus positive patterns are spans of above-average intensity, which will be around those wave crests. This cross-comp P will also be done on negative Ps (spans between crests) and complemented +P,-P. That complemented span is a wavelength, the span over which it cross-matches is a PP: the span of recurring frequency we are looking for.

From my understanding, this should be the case but i'm not so sure about the PP, is that correct? Or PP should be all +P and -P pair?

boris-kz · 2021-04-15T02:02:26Z

boris-kz
Apr 15, 2021
Maintainer

PP is a span of matching Ps, as P is a span of matching pixels. That match is computed for each P parameter, including L and neg_L.
Matching L + neg_L means PP represents a span of recurring frequency.

4 replies

kwcckw Apr 15, 2021

ah i see, and in current case, other than L, what would be the other potential matching parameter? Such as amplitude, mA? time mT? and right now all of these are applicable to time domain only right?

boris-kz Apr 15, 2021
Maintainer

Basically, this should be line_PPs as we have it now, I see no reason to modify it yet. Again, I want to keep the the same terms across images and audio as much as possible, so it's still intensity, match, and and L as we have in code. There is no need for mT, match of coordinate is just inverse distance, which is expressed by L + neg_L. But we need to add this distance to comp_P, and cross-comp of distance (to find recurring wavelength) will probably be a incremental-derivation version of comp_P. It will be similar to deriv__comp in line_patterns, but for distances vs. differences, and inside positive PPm only.

boris-kz Apr 15, 2021
Maintainer

So, the process in FFT is very different, it's based on integration vs. comparison. There should be salient frequency selection (>ave) after integration too, but I haven't seen it yet? This processes is specialized vs. our general, so it's obviously a lot faster. We need to explain how line_PPs can find salient frequencies, but use FFT to actually do that. Then run time-domain line_patterns within each frequency. Next is probably a version of comp_slice_ (without segment by direction), comparing time-overlapping Ps across adjacent frequencies, so it will be 2D. What do you think @aqibmumtaz?

aqibmumtaz Apr 19, 2021
Author

What I have understood is we have to go step vise like this

Calculate Ps in time domain after applying FFT on the primary input(that is applying direct fft on the audio signal), then applying ifft on each frequency and then calculating Ps for each frequency channel by applying 1-D line-pattern algo.

aqibmumtaz · 2021-04-16T07:37:48Z

aqibmumtaz
Apr 16, 2021
Author

and I am assuming our all the discussion is for mono audio file....Am I right @boris-kz ?
As for mono the extracted data array would be 1-D and for stereo we would have 2-D data array?

3 replies

boris-kz Apr 16, 2021
Maintainer

Yes, mono, The 2D I mentioned is frequency.

aqibmumtaz Apr 16, 2021
Author

frequency is 2-D or time-frequency overlapped signal will be 2-D?
because i didnt get 2-D frequency point

boris-kz Apr 16, 2021
Maintainer

Sorry, I meant that frequency is the 2nd dimension, time is the 1st one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identifying pattern in Audio File #44

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 40 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Identifying pattern in Audio File #44

aqibmumtaz Apr 13, 2021

Replies: 5 comments · 40 replies

kwcckw Apr 13, 2021

aqibmumtaz Apr 16, 2021 Author

boris-kz Apr 13, 2021 Maintainer

aqibmumtaz Apr 16, 2021 Author

boris-kz Apr 16, 2021 Maintainer

boris-kz Apr 16, 2021 Maintainer

aqibmumtaz Apr 17, 2021 Author

boris-kz Apr 17, 2021 Maintainer

boris-kz Apr 13, 2021 Maintainer

kwcckw Apr 14, 2021

boris-kz Apr 14, 2021 Maintainer

kwcckw Apr 15, 2021

boris-kz Apr 15, 2021 Maintainer

kwcckw Apr 15, 2021

boris-kz Apr 15, 2021 Maintainer

boris-kz Apr 15, 2021 Maintainer

aqibmumtaz Apr 19, 2021 Author

aqibmumtaz Apr 16, 2021 Author

boris-kz Apr 16, 2021 Maintainer

aqibmumtaz Apr 16, 2021 Author

boris-kz Apr 16, 2021 Maintainer

aqibmumtaz
Apr 13, 2021

Replies: 5 comments 40 replies

kwcckw
Apr 13, 2021

aqibmumtaz Apr 16, 2021
Author

boris-kz
Apr 13, 2021
Maintainer

aqibmumtaz Apr 16, 2021
Author

boris-kz Apr 16, 2021
Maintainer

boris-kz Apr 16, 2021
Maintainer

aqibmumtaz Apr 17, 2021
Author

boris-kz Apr 17, 2021
Maintainer

boris-kz
Apr 13, 2021
Maintainer

boris-kz Apr 14, 2021
Maintainer

boris-kz
Apr 15, 2021
Maintainer

boris-kz Apr 15, 2021
Maintainer

boris-kz Apr 15, 2021
Maintainer

aqibmumtaz Apr 19, 2021
Author

aqibmumtaz
Apr 16, 2021
Author

boris-kz Apr 16, 2021
Maintainer

aqibmumtaz Apr 16, 2021
Author

boris-kz Apr 16, 2021
Maintainer