DM-41562: Rewrite much of DatasetRef.from_simple to fix cache problem. #902

TallJimbo · 2023-11-03T17:29:51Z

The previous caching did not respect storage class overrides; all DatasetRefs with the same UUID and component would end up using the first storage class deserialized.

The fix here is to always call overrideStorageClass when using the cache, rather than add the storage class to the cache key. That's because we expect the value of the cache to mostly be in avoiding reconstruction of the data ID, and overrideStorageClass doesn't touch that.

By the same token, the dataset type name has been removed from the cache key as well, with only refs for parent dataset types cached. This should shrink the cache a bit and improve performance in the usual (no-component) case.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2023-11-03T17:39:09Z

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (60026e2) 87.87% compared to head (4130889) 87.86%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #902      +/-   ##
==========================================
- Coverage   87.87%   87.86%   -0.02%     
==========================================
  Files         281      281              
  Lines       36435    36440       +5     
  Branches     7601     7604       +3     
==========================================
- Hits        32019    32017       -2     
- Misses       3263     3273      +10     
+ Partials     1153     1150       -3

Files	Coverage Δ
python/lsst/daf/butler/persistence_context.py	`59.25% <100.00%> (ø)`
python/lsst/daf/butler/_dataset_ref.py	`80.99% <25.00%> (-2.56%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

natelust

There are a few places where I think there are various branches that are essentially the same, and I think could be smithed to be a less complicated branch structure, but I am not convinced that it is worth the time investment to do so.

I still feel there is subtle inconsistency here I can't quite put my finger on, but it could just be the mess that is transforming types giving me the impression, and it certainly seems better than what was here

TallJimbo · 2023-11-08T14:55:56Z

My hope is that someday we'll be able to get rid of almost all of the Serialized* types anyway, in favor of normalized, directly-serializable collections of things. We'll be adding those for client server, and then it'll be a matter of getting Quantum and QuantumGraph to stop using them.

The previous caching did not respect storage class overrides; all DatasetRefs with the same UUID and component would end up using the first storage class deserialized. The fix here is to always call overrideStorageClass when using the cache, rather than add the storage class to the cache key. That's because we expect the value of the cache to mostly be in avoiding reconstruction of the data ID, and overrideStorageClass doesn't touch that. By the same token, the dataset type name has been removed from the cache key as well, with only refs for parent dataset types cached. This should shrink the cache a bit and improve performance in the usual (no-component) case.

TallJimbo force-pushed the tickets/DM-41562 branch 3 times, most recently from 0100cc3 to 31044ad Compare November 3, 2023 17:33

TallJimbo force-pushed the tickets/DM-41562 branch 2 times, most recently from 6a0d1ea to 6469dc0 Compare November 3, 2023 17:47

natelust approved these changes Nov 8, 2023

View reviewed changes

TallJimbo force-pushed the tickets/DM-41562 branch from 6469dc0 to 4130889 Compare November 8, 2023 14:57

TallJimbo merged commit faf02a0 into main Nov 8, 2023
15 of 17 checks passed

TallJimbo deleted the tickets/DM-41562 branch November 8, 2023 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-41562: Rewrite much of DatasetRef.from_simple to fix cache problem. #902

DM-41562: Rewrite much of DatasetRef.from_simple to fix cache problem. #902

TallJimbo commented Nov 3, 2023 •

edited

Loading

codecov bot commented Nov 3, 2023 •

edited

Loading

natelust left a comment

TallJimbo commented Nov 8, 2023

DM-41562: Rewrite much of DatasetRef.from_simple to fix cache problem. #902

DM-41562: Rewrite much of DatasetRef.from_simple to fix cache problem. #902

Conversation

TallJimbo commented Nov 3, 2023 • edited Loading

Checklist

codecov bot commented Nov 3, 2023 • edited Loading

Codecov Report

natelust left a comment

Choose a reason for hiding this comment

TallJimbo commented Nov 8, 2023

TallJimbo commented Nov 3, 2023 •

edited

Loading

codecov bot commented Nov 3, 2023 •

edited

Loading