Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hot fix for copy_to_buffer #785

Merged
merged 13 commits into from
Dec 18, 2023
Merged

Add hot fix for copy_to_buffer #785

merged 13 commits into from
Dec 18, 2023

Conversation

WenzDaniel
Copy link
Collaborator

What is the problem / what does the code in this PR do
In this PR we add a hotfix for the issue described in #781. Currently, we raise an error in copy_to_buffer if the dtypes of input and buffer changes, but the same function name is used. This also leads to failing tests in XENONnT/straxen#1299 which is needed for XENONnT/straxen#1300.

Can you briefly describe how it works?
The PR is only a hotfix as it only adds a random string to the copy function name if the typing error is raised. However, this also means that the copy function is not cached after a dtype change unless the cache is cleared.

For the future:

A better approach would be to evaluate the cached function first and define the copy to buffer function name based on the dtype difference.

@JoranAngevaare
Copy link
Contributor

JoranAngevaare commented Dec 13, 2023

Can you not just change the name of the cache-key when you call the function? strax.copy_to_buffer(peaks, result, "bla") -> strax.copy_to_buffer(peaks, result, f"bla_{self.some_option").

Probably still good to have this check in place of course that the dtype matches 😉.

@WenzDaniel
Copy link
Collaborator Author

Hej @JoranAngevaare nice to hear from you. Yes I have the same concern, and I also thought about adding to every plugin where we are calling this function an extra option, but this is a bit too cumbersome in my opinion. Right now I have no idea for a good solution. I do not know how we can get the dtype in the chached function easily. I am open for suggestions.

@WenzDaniel
Copy link
Collaborator Author

Looks like that this hotfix does not work anyhow. For me locally it was working though.

@JoranAngevaare
Copy link
Contributor

JoranAngevaare commented Dec 13, 2023

Hi Daniel, likewise! Hope all is well. I just stumbled on this PR as I'm still subscribed to strax.

Something simple
The caching system is not very sophisticated, so you can just dump whatever string. I doubt that people are ever looking into the actual cached names.

Something blunt, that would also erase the need of the func_name is just adding the repr of the dtype:

>>> a=np.ones([1,2], dtype=[('a', np.int64), ('b', np.int64)])
>>> repr(a.dtype)
"dtype([('a', '<i8'), ('b', '<i8')])"

You can just add that to the func_name, or even just make the func_name optional and just cache the repr of the dtype. You can make it more fancy buy throwing the repr in the strax.deterministic_hash.

Something a bit more advanced
Some other things worth considering would be making the cache explicit, e.g. following the URLConfig _CACHE. I don't remember why I added it to globals at the time rather than some global dictionary.

Another, additional option might be using an LRU cache to make sure that the number of cached functions doesn't blow up, maybe something like straxen.CacheDict, a minimal implementation is also used in CMT1.

Both are probably overkill, I don't really believe that you can end up in a case where somehow the dtype is changed every time someone calls this function. But we have to keep that poor PhD in mind that will one day face the memory leak that my limited imagination had discarded 😉.

The minimal amount of work is probably adding a _COPY_CACHE global dictionary instead of globals() and raising an error if more than a 1e6 keys are cached.

@WenzDaniel
Copy link
Collaborator Author

The caching system is not very sophisticated, so you can just dump whatever string. I doubt that people are ever looking into the actual cached names

Yes, I also decided now to make it very simple. I just cache in addition the dtype when calling the function. In this way the number of cached functions should be well less than 10.

@WenzDaniel
Copy link
Collaborator Author

I do not get why it is failing here... locally it works...

@JoranAngevaare
Copy link
Contributor

JoranAngevaare commented Dec 13, 2023

It simply fails because coveralls is testing without numba, but why would you want to use a try-except block when you can just always add the dtype fields by default to the cached function. Also much better performance wise?

@WenzDaniel
Copy link
Collaborator Author

WenzDaniel commented Dec 13, 2023

Haha thanks @JoranAngevaare for pointing it out. I also noticed now that it is a different raise condition. I thought just the same about making it simpler :D. You should come back to us you are missed :-) I only use the first two letters of each field in the dtype to let the name not explode.

Being ACs comes with very little time for these things.... this sucks a bit...

@coveralls
Copy link

coveralls commented Dec 13, 2023

Coverage Status

coverage: 91.276% (-0.3%) from 91.581%
when pulling 61d0e75 on fix_copy_to_buffer
into b0ca3cb on master.

JoranAngevaare
JoranAngevaare previously approved these changes Dec 15, 2023
Copy link
Contributor

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Daniel, I'm sure there is still loads of fun to be had in XENON 😉.

Changes look good! Seems simple and effective. Only one suggestion that would also cache the dtype of the field might be something as below, the field_names_hash will be anyway unreadable so you might consider including the np.int64 np.int16 etc. as well.

Additionally, you can now make the func_name arguement optional, I don't really think there is a use case where you'd use the same buffer fields but for some reason need a new func_name.

Good to have the test in place 👍 !

strax/dtypes.py Outdated Show resolved Hide resolved
@WenzDaniel WenzDaniel merged commit 70f69c8 into master Dec 18, 2023
11 checks passed
@WenzDaniel
Copy link
Collaborator Author

I merge since Joran basically approved.

@WenzDaniel WenzDaniel deleted the fix_copy_to_buffer branch December 18, 2023 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants