Waveform attributes #745

ahiguera-mx · 2023-08-04T19:10:52Z

What is the problem / what does the code in this PR do
this function computes quantiles and can downsample waveforms, either for records, peaklets, peak
Can you briefly describe how it works?
Compute waveform attribures
Quantiles: represent the amount of time elapsed for a given fraction of the total waveform area to be observed in n_samples i.e. n_samples = 10, then quantiles are equivalent deciles
Can you give a minimal working example (or illustrate with a figure)?

Please include the following if applicable:

Update the docstring(s)
Update the documentation
Tests to check the (new) code is working as desired.
Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

…eform attributes like quantiles and downsampled waveform

github-actions · 2023-08-04T19:19:56Z

coverage: 91.571% (+0.03%) from 91.546% when pulling 85ab06d on waveform_attributes into 9b508f7 on master.

WenzDaniel · 2023-08-07T09:26:16Z

strax/processing/peak_properties.py

@@ -100,3 +100,52 @@ def compute_widths(peaks, select_peaks_indices=None):
    i = len(desired_fr) // 2
    peaks['width'][select_peaks_indices] = fr_times[:, i:] - fr_times[:, ::-1][:, i:]
    peaks['area_decile_from_midpoint'][select_peaks_indices] = fr_times[:, ::2] - fr_times[:, i].reshape(-1,1)
+
+@export
+#@numba.njit(nopython=True, cache=True)


Does this work or why is it commented out? In addition, is not this the same as this function is already doing:

strax/strax/processing/peak_properties.py

Line 28 in 9b508f7

def compute_index_of_fraction(peak, fractions_desired, result):

Hi Daniel
Thanks for checking at this, the functions calculate different things
compute_index_of_fraction()return the index at which certain area has been achieved if the fraction is 1 it will return peak['length'] so this is in units of samples.

compute_wf_attributes() for a given set of quantiles (fractions) it return the the amount of time elapsed for a given fraction of the total waveform area to be observed

So in other words, compute_index_of_fraction() is in units of samples and compute_wf_attributes() is in units of ns

also compute_wf_attributes() can handle an array of peaks whereas compute_index_of_fraction() can do only peak by peak
Do you think it would better to merge into one function?

Looking at these codes, compute_index_of_fraction can also return floats. I think it is better to utilize compute_index_of_fraction or modify it.

strax/strax/processing/peak_properties.py

Line 48 in 9b508f7

result[current_fraction_index] = i + area_needed / x

LuisSanchez25 · 2023-09-20T20:04:18Z

@jmosbacher This is the strax pull request we need for the other pluggin, if you could help us with this that would be great, I will work on updating the private_repository and the straxen pull request in the mean time.

jmosbacher · 2023-09-20T20:30:57Z

@LuisSanchez25 it doesnt look like you have addressed the comments from @dachengx and @WenzDaniel which seem valid.
From glancing at the code, to me it also looks like the following holds

compute_wf_attributes(...) == compute_index_of_fraction(...) * dt

So I think you need to demonstrate that this equality does not hold or give some other justification for adding this extra method before this can be merged.

LuisSanchez25 · 2023-09-20T20:41:59Z

Yes, I have not addressed this yet, Aaron and I discussed this a bit the other day so I was hoping we could talk to you about it at some point maybe tomorrow? I need a bit of time to check a few things here since I was not the main person working on this PR but wanted to make you aware of which PR we were talking about.

jmosbacher · 2023-09-20T21:06:05Z

strax/processing/peak_properties.py

+        cumsum_steps[-1] = sample_number_div_dt[-1] * dt
+        quantiles[i] = cumsum_steps[1:] - cumsum_steps[:-1]
+
+        if downsample_wf:


this whole downsampling makes no sense to me.

If i pass downsample_wf=False to this function it will return an array of zeros as waveforms. in what case would that be useful?

Why wouldnt you infer downsampling==True if num_samples > n_samples?

What if (num_samples / n_samples) is not an interger? wouldnt the last bin sum over a different number of samples? so why is waveforms[i] /= (step_size * dt) correct?

This is because the bayes pluggin needs the waveform but the SOM plugin does not. So if we dont need it we can save computational time by not calculating this, but if you know of a better way to achive this goal I am more than happy to change this.

As to your last point I do not really understand what you are saying, can you elaborate a bit more?

Lets take a simple example.

num_samples = 10 n_samples = 6 step_size = int(num_samples / n_samples) steps = np.arange(0, num_samples + 1, step_size) for j in range(n_samples): print(steps[j], ":", steps[j + 1]) 0 : 1 1 : 2 2 : 3 3 : 4 4 : 5 5 : 6

Does this look like correct downsampling to you?

Ok I think I know what you are saying now. So this is dealing with waveforms so num_samples will always be 200, but there are no safeguards against non integer divisions of the data. We can make an assert statement to make sure num_samples is divisible by n_samples. I will make that change.

The waveforms[i] /= (step_size * dt) I am now a bit confused about, the objective is to get rid of the dt dependance so we can compare all wavefroms to each other, but I did some quick dimensional analysis and I think we should be multiplying by dt instead of dividing since waveform amplitude is in units of PE/sample = PE/(dt) so we should be multiplying the numerator by dt. @ahiguera-mx since you worked on this more with the Bayes classification is there something we are missing?

Like I explained when you originally asked me where to place this function. Strax is an open source framework built to be used by experiments other than XENON. Function in strax cannot assume the data is exactly as it is in xenon.

The required bar in terms of code quality to have a PR merged is also higher.

Allocating a massive array of zeros just to be used sometimes is not a good approach. You should downsample ina different function. There are many standard techniques for downsampling, such as windowed reductions. I would stick to one of those.

Ok we dont need the downsampling for now so we can just remove that funcitonality, and make an assert statement to make sure data quantiles make sense, but the big question still is, given the difference between the functions, do we still want to try to merge it into one or should I leave them as separate things?

I would try to either:

use the existing function in the plugin or adapt it to be able to produce what you need

Provide some actual concrete example that demonstrates the need for the extra function. I find the hand wavy explanations very difficult to follow.

LuisSanchez25 · 2023-09-20T21:32:11Z

@LuisSanchez25 it doesnt look like you have addressed the comments from @dachengx and @WenzDaniel which seem valid. From glancing at the code, to me it also looks like the following holds
compute_wf_attributes(...) == compute_index_of_fraction(...) * dt
So I think you need to demonstrate that this equality does not hold or give some other justification for adding this extra method before this can be merged.

So these two functions are not the same, compute_index_of_fraction will give back an index for between 0 and 10% or 0 and 30% of the waveforms, the compute_wf_attributes function computes the index between deciles, (so 0-10%, 10-20%, ect...) so multiplying it by the dt wont cut it as we need to subtract subsequent values. We can still use the compute_index_of_fraction but the other difference is that the compute_index_of_fraction computes 1 waveform attribute at a time while the compute_wf_attributes does everything in one go. This is a functionality we could move to straxen when the plugin is called but we think it makes more sense to leave it here as this would be used by 2 plugins, the Bayes plugin and the SOM plugin

LuisSanchez25 · 2023-09-21T16:12:29Z

On a completely separate note, is there a reason there are no tests for the function compute_index_of_fraction in test_peak_properties?

WenzDaniel

I am sorry, I cannot reproduce what the function is supposed to do. At least my result is different from the expectation I get from the doc-string. Can you provide a full example with a full description of what I should expect to get by the function?

WenzDaniel · 2023-11-23T09:15:39Z

Okay after some trial and error I think I figured out how to use the function, but the results looks off to me. If I run:

res, _ = strax.compute_wf_attributes(peaklets[:2]['data'], peaklets[:2]['dt'], 10)

I get the following result:

where the blue lines are the cumulative of your returned deciles. The last one is pretty far off and even goes beyond the true peak.

I can achieve the very same with:

res2 = strax.index_of_fraction(peaklets[:2], np.arange(10.1)/10)
res2 = (res2.T * peaklets[:2]['dt']).T
res2_compute_diff = np.diff(res2, axis=1)

which is 3 lines of code. The blue lines are the deciles 0 -> 0.1, 0 -> 0.2 etc. as computed by index_of_fraction. The red lines are the cumulative of the difference of the deciles, so the same as your function returns.

The result is also a bit off compared to the deciles themselves, but much closer than your function. So I honestly cannot follow when you are saying that we cannot use this already implemented function.

In addition, the function is tested once here:

strax/tests/test_peak_properties.py

Line 46 in 21cc96e

def test_index_of_fraction(peak_length, data_length):

and once indirectly here:

strax/tests/test_peak_properties.py

Line 66 in 21cc96e

def test_compute_widths(peak_length, data_length, n_widths):

WenzDaniel · 2023-12-12T08:53:12Z

We should make a new PR with a revised waveform attributes function.

ahiguera-mx added 3 commits August 3, 2023 18:31

Adding compute_wf_attributes funciton, this funciton will compute wav…

cc962f2

…eform attributes like quantiles and downsampled waveform

working on test

cea7435

fixing testing

094dac8

dachengx self-requested a review August 4, 2023 19:13

ahiguera-mx added 2 commits August 4, 2023 14:42

solve codefactor issue

3f87c4c

fix issue with assert statment

85ab06d

WenzDaniel reviewed Aug 10, 2023

View reviewed changes

uncomment numba decorator

c7a265d

jmosbacher reviewed Sep 20, 2023

View reviewed changes

WenzDaniel requested changes Nov 23, 2023

View reviewed changes

WenzDaniel closed this Dec 12, 2023

WenzDaniel deleted the waveform_attributes branch December 12, 2023 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waveform attributes #745

Waveform attributes #745

ahiguera-mx commented Aug 4, 2023

github-actions bot commented Aug 4, 2023 •

edited

Loading

WenzDaniel Aug 7, 2023

ahiguera-mx Aug 10, 2023 •

edited

Loading

ahiguera-mx Aug 10, 2023

dachengx Aug 28, 2023

LuisSanchez25 commented Sep 20, 2023

jmosbacher commented Sep 20, 2023

LuisSanchez25 commented Sep 20, 2023

jmosbacher Sep 20, 2023

LuisSanchez25 Sep 20, 2023

jmosbacher Sep 20, 2023

LuisSanchez25 Sep 21, 2023

jmosbacher Sep 21, 2023

LuisSanchez25 Sep 21, 2023

jmosbacher Sep 21, 2023

LuisSanchez25 commented Sep 20, 2023 •

edited

Loading

LuisSanchez25 commented Sep 21, 2023

WenzDaniel left a comment

WenzDaniel commented Nov 23, 2023 •

edited

Loading

WenzDaniel commented Dec 12, 2023

Waveform attributes #745

Waveform attributes #745

Conversation

ahiguera-mx commented Aug 4, 2023

github-actions bot commented Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

ahiguera-mx Aug 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LuisSanchez25 commented Sep 20, 2023

jmosbacher commented Sep 20, 2023

LuisSanchez25 commented Sep 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LuisSanchez25 commented Sep 20, 2023 • edited Loading

LuisSanchez25 commented Sep 21, 2023

WenzDaniel left a comment

Choose a reason for hiding this comment

WenzDaniel commented Nov 23, 2023 • edited Loading

WenzDaniel commented Dec 12, 2023

github-actions bot commented Aug 4, 2023 •

edited

Loading

ahiguera-mx Aug 10, 2023 •

edited

Loading

LuisSanchez25 commented Sep 20, 2023 •

edited

Loading

WenzDaniel commented Nov 23, 2023 •

edited

Loading