Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waveform attributes #745

Closed
wants to merge 6 commits into from
Closed

Waveform attributes #745

wants to merge 6 commits into from

Conversation

ahiguera-mx
Copy link
Contributor

What is the problem / what does the code in this PR do
this function computes quantiles and can downsample waveforms, either for records, peaklets, peak
Can you briefly describe how it works?
Compute waveform attribures
Quantiles: represent the amount of time elapsed for a given fraction of the total waveform area to be observed in n_samples i.e. n_samples = 10, then quantiles are equivalent deciles
Can you give a minimal working example (or illustrate with a figure)?

Please include the following if applicable:

  • Update the docstring(s)
  • Update the documentation
  • Tests to check the (new) code is working as desired.
  • Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

ahiguera-mx added 3 commits August 3, 2023 18:31
@dachengx dachengx self-requested a review August 4, 2023 19:13
@github-actions
Copy link

github-actions bot commented Aug 4, 2023

Coverage Status

coverage: 91.571% (+0.03%) from 91.546% when pulling 85ab06d on waveform_attributes into 9b508f7 on master.

@@ -100,3 +100,52 @@ def compute_widths(peaks, select_peaks_indices=None):
i = len(desired_fr) // 2
peaks['width'][select_peaks_indices] = fr_times[:, i:] - fr_times[:, ::-1][:, i:]
peaks['area_decile_from_midpoint'][select_peaks_indices] = fr_times[:, ::2] - fr_times[:, i].reshape(-1,1)

@export
#@numba.njit(nopython=True, cache=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work or why is it commented out? In addition, is not this the same as this function is already doing:

def compute_index_of_fraction(peak, fractions_desired, result):

Copy link
Contributor Author

@ahiguera-mx ahiguera-mx Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Daniel
Thanks for checking at this, the functions calculate different things
compute_index_of_fraction()return the index at which certain area has been achieved if the fraction is 1 it will return peak['length'] so this is in units of samples.

compute_wf_attributes() for a given set of quantiles (fractions) it return the the amount of time elapsed for a given fraction of the total waveform area to be observed

So in other words, compute_index_of_fraction() is in units of samples and compute_wf_attributes() is in units of ns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also compute_wf_attributes() can handle an array of peaks whereas compute_index_of_fraction() can do only peak by peak
Do you think it would better to merge into one function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at these codes, compute_index_of_fraction can also return floats. I think it is better to utilize compute_index_of_fraction or modify it.

result[current_fraction_index] = i + area_needed / x

@LuisSanchez25
Copy link
Collaborator

@jmosbacher This is the strax pull request we need for the other pluggin, if you could help us with this that would be great, I will work on updating the private_repository and the straxen pull request in the mean time.

@jmosbacher
Copy link
Contributor

@LuisSanchez25 it doesnt look like you have addressed the comments from @dachengx and @WenzDaniel which seem valid.
From glancing at the code, to me it also looks like the following holds

compute_wf_attributes(...) == compute_index_of_fraction(...) * dt

So I think you need to demonstrate that this equality does not hold or give some other justification for adding this extra method before this can be merged.

@LuisSanchez25
Copy link
Collaborator

Yes, I have not addressed this yet, Aaron and I discussed this a bit the other day so I was hoping we could talk to you about it at some point maybe tomorrow? I need a bit of time to check a few things here since I was not the main person working on this PR but wanted to make you aware of which PR we were talking about.

cumsum_steps[-1] = sample_number_div_dt[-1] * dt
quantiles[i] = cumsum_steps[1:] - cumsum_steps[:-1]

if downsample_wf:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole downsampling makes no sense to me.

  • If i pass downsample_wf=False to this function it will return an array of zeros as waveforms. in what case would that be useful?
  • Why wouldnt you infer downsampling==True if num_samples > n_samples?
  • What if (num_samples / n_samples) is not an interger? wouldnt the last bin sum over a different number of samples? so why is waveforms[i] /= (step_size * dt) correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the bayes pluggin needs the waveform but the SOM plugin does not. So if we dont need it we can save computational time by not calculating this, but if you know of a better way to achive this goal I am more than happy to change this.

As to your last point I do not really understand what you are saying, can you elaborate a bit more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets take a simple example.

num_samples = 10
n_samples = 6
step_size = int(num_samples / n_samples)
steps = np.arange(0, num_samples + 1, step_size)

for j in range(n_samples):
    print(steps[j], ":", steps[j + 1])

0 : 1
1 : 2
2 : 3
3 : 4
4 : 5
5 : 6

Does this look like correct downsampling to you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I know what you are saying now. So this is dealing with waveforms so num_samples will always be 200, but there are no safeguards against non integer divisions of the data. We can make an assert statement to make sure num_samples is divisible by n_samples. I will make that change.

The waveforms[i] /= (step_size * dt) I am now a bit confused about, the objective is to get rid of the dt dependance so we can compare all wavefroms to each other, but I did some quick dimensional analysis and I think we should be multiplying by dt instead of dividing since waveform amplitude is in units of PE/sample = PE/(dt) so we should be multiplying the numerator by dt. @ahiguera-mx since you worked on this more with the Bayes classification is there something we are missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Like I explained when you originally asked me where to place this function. Strax is an open source framework built to be used by experiments other than XENON. Function in strax cannot assume the data is exactly as it is in xenon.
  • The required bar in terms of code quality to have a PR merged is also higher.
  • Allocating a massive array of zeros just to be used sometimes is not a good approach. You should downsample ina different function. There are many standard techniques for downsampling, such as windowed reductions. I would stick to one of those.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok we dont need the downsampling for now so we can just remove that funcitonality, and make an assert statement to make sure data quantiles make sense, but the big question still is, given the difference between the functions, do we still want to try to merge it into one or should I leave them as separate things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to either:

  • use the existing function in the plugin or adapt it to be able to produce what you need
  • Provide some actual concrete example that demonstrates the need for the extra function. I find the hand wavy explanations very difficult to follow.

@LuisSanchez25
Copy link
Collaborator

LuisSanchez25 commented Sep 20, 2023

@LuisSanchez25 it doesnt look like you have addressed the comments from @dachengx and @WenzDaniel which seem valid. From glancing at the code, to me it also looks like the following holds

compute_wf_attributes(...) == compute_index_of_fraction(...) * dt

So I think you need to demonstrate that this equality does not hold or give some other justification for adding this extra method before this can be merged.

So these two functions are not the same, compute_index_of_fraction will give back an index for between 0 and 10% or 0 and 30% of the waveforms, the compute_wf_attributes function computes the index between deciles, (so 0-10%, 10-20%, ect...) so multiplying it by the dt wont cut it as we need to subtract subsequent values. We can still use the compute_index_of_fraction but the other difference is that the compute_index_of_fraction computes 1 waveform attribute at a time while the compute_wf_attributes does everything in one go. This is a functionality we could move to straxen when the plugin is called but we think it makes more sense to leave it here as this would be used by 2 plugins, the Bayes plugin and the SOM plugin

@LuisSanchez25
Copy link
Collaborator

On a completely separate note, is there a reason there are no tests for the function compute_index_of_fraction in test_peak_properties?

Copy link
Collaborator

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I cannot reproduce what the function is supposed to do. At least my result is different from the expectation I get from the doc-string. Can you provide a full example with a full description of what I should expect to get by the function?

@WenzDaniel
Copy link
Collaborator

WenzDaniel commented Nov 23, 2023

Okay after some trial and error I think I figured out how to use the function, but the results looks off to me. If I run:

res, _ = strax.compute_wf_attributes(peaklets[:2]['data'], peaklets[:2]['dt'], 10)

I get the following result:

example

where the blue lines are the cumulative of your returned deciles. The last one is pretty far off and even goes beyond the true peak.

I can achieve the very same with:

res2 = strax.index_of_fraction(peaklets[:2], np.arange(10.1)/10)
res2 = (res2.T * peaklets[:2]['dt']).T
res2_compute_diff = np.diff(res2, axis=1)

which is 3 lines of code. The blue lines are the deciles 0 -> 0.1, 0 -> 0.2 etc. as computed by index_of_fraction. The red lines are the cumulative of the difference of the deciles, so the same as your function returns.

example2

The result is also a bit off compared to the deciles themselves, but much closer than your function. So I honestly cannot follow when you are saying that we cannot use this already implemented function.

In addition, the function is tested once here:

def test_index_of_fraction(peak_length, data_length):

and once indirectly here:
def test_compute_widths(peak_length, data_length, n_widths):

@WenzDaniel
Copy link
Collaborator

We should make a new PR with a revised waveform attributes function.

@WenzDaniel WenzDaniel closed this Dec 12, 2023
@WenzDaniel WenzDaniel deleted the waveform_attributes branch December 12, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants