-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Waveform attributes #745
Waveform attributes #745
Conversation
…eform attributes like quantiles and downsampled waveform
strax/processing/peak_properties.py
Outdated
@@ -100,3 +100,52 @@ def compute_widths(peaks, select_peaks_indices=None): | |||
i = len(desired_fr) // 2 | |||
peaks['width'][select_peaks_indices] = fr_times[:, i:] - fr_times[:, ::-1][:, i:] | |||
peaks['area_decile_from_midpoint'][select_peaks_indices] = fr_times[:, ::2] - fr_times[:, i].reshape(-1,1) | |||
|
|||
@export | |||
#@numba.njit(nopython=True, cache=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work or why is it commented out? In addition, is not this the same as this function is already doing:
strax/strax/processing/peak_properties.py
Line 28 in 9b508f7
def compute_index_of_fraction(peak, fractions_desired, result): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Daniel
Thanks for checking at this, the functions calculate different things
compute_index_of_fraction()
return the index at which certain area has been achieved if the fraction is 1 it will return peak['length'] so this is in units of samples.
compute_wf_attributes()
for a given set of quantiles (fractions) it return the the amount of time elapsed for a given fraction of the total waveform area to be observed
So in other words, compute_index_of_fraction()
is in units of samples and compute_wf_attributes()
is in units of ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also compute_wf_attributes()
can handle an array of peaks whereas compute_index_of_fraction()
can do only peak by peak
Do you think it would better to merge into one function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at these codes, compute_index_of_fraction
can also return floats. I think it is better to utilize compute_index_of_fraction
or modify it.
strax/strax/processing/peak_properties.py
Line 48 in 9b508f7
result[current_fraction_index] = i + area_needed / x |
@jmosbacher This is the strax pull request we need for the other pluggin, if you could help us with this that would be great, I will work on updating the private_repository and the straxen pull request in the mean time. |
@LuisSanchez25 it doesnt look like you have addressed the comments from @dachengx and @WenzDaniel which seem valid. compute_wf_attributes(...) == compute_index_of_fraction(...) * dt So I think you need to demonstrate that this equality does not hold or give some other justification for adding this extra method before this can be merged. |
Yes, I have not addressed this yet, Aaron and I discussed this a bit the other day so I was hoping we could talk to you about it at some point maybe tomorrow? I need a bit of time to check a few things here since I was not the main person working on this PR but wanted to make you aware of which PR we were talking about. |
cumsum_steps[-1] = sample_number_div_dt[-1] * dt | ||
quantiles[i] = cumsum_steps[1:] - cumsum_steps[:-1] | ||
|
||
if downsample_wf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole downsampling makes no sense to me.
- If i pass
downsample_wf=False
to this function it will return an array of zeros as waveforms. in what case would that be useful? - Why wouldnt you infer downsampling==True if num_samples > n_samples?
- What if (num_samples / n_samples) is not an interger? wouldnt the last bin sum over a different number of samples? so why is
waveforms[i] /= (step_size * dt)
correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the bayes pluggin needs the waveform but the SOM plugin does not. So if we dont need it we can save computational time by not calculating this, but if you know of a better way to achive this goal I am more than happy to change this.
As to your last point I do not really understand what you are saying, can you elaborate a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets take a simple example.
num_samples = 10
n_samples = 6
step_size = int(num_samples / n_samples)
steps = np.arange(0, num_samples + 1, step_size)
for j in range(n_samples):
print(steps[j], ":", steps[j + 1])
0 : 1
1 : 2
2 : 3
3 : 4
4 : 5
5 : 6
Does this look like correct downsampling to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think I know what you are saying now. So this is dealing with waveforms so num_samples will always be 200, but there are no safeguards against non integer divisions of the data. We can make an assert statement to make sure num_samples is divisible by n_samples. I will make that change.
The waveforms[i] /= (step_size * dt) I am now a bit confused about, the objective is to get rid of the dt dependance so we can compare all wavefroms to each other, but I did some quick dimensional analysis and I think we should be multiplying by dt instead of dividing since waveform amplitude is in units of PE/sample = PE/(dt) so we should be multiplying the numerator by dt. @ahiguera-mx since you worked on this more with the Bayes classification is there something we are missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Like I explained when you originally asked me where to place this function. Strax is an open source framework built to be used by experiments other than XENON. Function in strax cannot assume the data is exactly as it is in xenon.
- The required bar in terms of code quality to have a PR merged is also higher.
- Allocating a massive array of zeros just to be used sometimes is not a good approach. You should downsample ina different function. There are many standard techniques for downsampling, such as windowed reductions. I would stick to one of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok we dont need the downsampling for now so we can just remove that funcitonality, and make an assert statement to make sure data quantiles make sense, but the big question still is, given the difference between the functions, do we still want to try to merge it into one or should I leave them as separate things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to either:
- use the existing function in the plugin or adapt it to be able to produce what you need
- Provide some actual concrete example that demonstrates the need for the extra function. I find the hand wavy explanations very difficult to follow.
So these two functions are not the same, compute_index_of_fraction will give back an index for between 0 and 10% or 0 and 30% of the waveforms, the compute_wf_attributes function computes the index between deciles, (so 0-10%, 10-20%, ect...) so multiplying it by the dt wont cut it as we need to subtract subsequent values. We can still use the compute_index_of_fraction but the other difference is that the compute_index_of_fraction computes 1 waveform attribute at a time while the compute_wf_attributes does everything in one go. This is a functionality we could move to straxen when the plugin is called but we think it makes more sense to leave it here as this would be used by 2 plugins, the Bayes plugin and the SOM plugin |
On a completely separate note, is there a reason there are no tests for the function compute_index_of_fraction in test_peak_properties? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I cannot reproduce what the function is supposed to do. At least my result is different from the expectation I get from the doc-string. Can you provide a full example with a full description of what I should expect to get by the function?
Okay after some trial and error I think I figured out how to use the function, but the results looks off to me. If I run: res, _ = strax.compute_wf_attributes(peaklets[:2]['data'], peaklets[:2]['dt'], 10) I get the following result: where the blue lines are the cumulative of your returned deciles. The last one is pretty far off and even goes beyond the true peak. I can achieve the very same with: res2 = strax.index_of_fraction(peaklets[:2], np.arange(10.1)/10)
res2 = (res2.T * peaklets[:2]['dt']).T
res2_compute_diff = np.diff(res2, axis=1) which is 3 lines of code. The blue lines are the deciles 0 -> 0.1, 0 -> 0.2 etc. as computed by The result is also a bit off compared to the deciles themselves, but much closer than your function. So I honestly cannot follow when you are saying that we cannot use this already implemented function. In addition, the function is tested once here: strax/tests/test_peak_properties.py Line 46 in 21cc96e
and once indirectly here: strax/tests/test_peak_properties.py Line 66 in 21cc96e
|
We should make a new PR with a revised waveform attributes function. |
What is the problem / what does the code in this PR do
this function computes quantiles and can downsample waveforms, either for records, peaklets, peak
Can you briefly describe how it works?
Compute waveform attribures
Quantiles: represent the amount of time elapsed for a given fraction of the total waveform area to be observed in n_samples i.e. n_samples = 10, then quantiles are equivalent deciles
Can you give a minimal working example (or illustrate with a figure)?
Please include the following if applicable:
Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).