Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-41562: Rewrite much of DatasetRef.from_simple to fix cache problem. #902

Merged
merged 1 commit into from
Nov 8, 2023

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Nov 3, 2023

The previous caching did not respect storage class overrides; all DatasetRefs with the same UUID and component would end up using the first storage class deserialized.

The fix here is to always call overrideStorageClass when using the cache, rather than add the storage class to the cache key. That's because we expect the value of the cache to mostly be in avoiding reconstruction of the data ID, and overrideStorageClass doesn't touch that.

By the same token, the dataset type name has been removed from the cache key as well, with only refs for parent dataset types cached. This should shrink the cache a bit and improve performance in the usual (no-component) case.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@TallJimbo TallJimbo force-pushed the tickets/DM-41562 branch 3 times, most recently from 0100cc3 to 31044ad Compare November 3, 2023 17:33
Copy link

codecov bot commented Nov 3, 2023

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (60026e2) 87.87% compared to head (4130889) 87.86%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #902      +/-   ##
==========================================
- Coverage   87.87%   87.86%   -0.02%     
==========================================
  Files         281      281              
  Lines       36435    36440       +5     
  Branches     7601     7604       +3     
==========================================
- Hits        32019    32017       -2     
- Misses       3263     3273      +10     
+ Partials     1153     1150       -3     
Files Coverage Δ
python/lsst/daf/butler/persistence_context.py 59.25% <100.00%> (ø)
python/lsst/daf/butler/_dataset_ref.py 80.99% <25.00%> (-2.56%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TallJimbo TallJimbo force-pushed the tickets/DM-41562 branch 2 times, most recently from 6a0d1ea to 6469dc0 Compare November 3, 2023 17:47
Copy link
Contributor

@natelust natelust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places where I think there are various branches that are essentially the same, and I think could be smithed to be a less complicated branch structure, but I am not convinced that it is worth the time investment to do so.

I still feel there is subtle inconsistency here I can't quite put my finger on, but it could just be the mess that is transforming types giving me the impression, and it certainly seems better than what was here

@TallJimbo
Copy link
Member Author

My hope is that someday we'll be able to get rid of almost all of the Serialized* types anyway, in favor of normalized, directly-serializable collections of things. We'll be adding those for client server, and then it'll be a matter of getting Quantum and QuantumGraph to stop using them.

The previous caching did not respect storage class overrides; all
DatasetRefs with the same UUID and component would end up using the
first storage class deserialized.

The fix here is to always call overrideStorageClass when using the
cache, rather than add the storage class to the cache key.  That's
because we expect the value of the cache to mostly be in avoiding
reconstruction of the data ID, and overrideStorageClass doesn't touch
that.

By the same token, the dataset type name has been removed from the
cache key as well, with only refs for parent dataset types cached.
This should shrink the cache a bit and improve performance in the
usual (no-component) case.
@TallJimbo TallJimbo merged commit faf02a0 into main Nov 8, 2023
15 of 17 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-41562 branch November 8, 2023 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants