-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-30439: Refactor DataCoordinate #769
base: main
Are you sure you want to change the base?
Conversation
I know this needs work to deal with conflicts, created as a draft PR to make it easier to examine the changes. |
acdf262
to
fe6fe36
Compare
👍 Note that I definitely want to revisit some of the code here before merging it - my thinking on the custom containers has evolved quite a bit since I wrote those here. RFC-834 is in play for the |
ac6121c
to
ac82d70
Compare
In all of the PEP-544 examples, Protocols appear last, and the original order confuses pylance.
This starts a new _containers subpackage that will be extended to include other butler primitives, like DimensionRecords and DatasetRefs.
This renames some existing classes, so it is a breaking API change, but one whose effects should be minimal to nonexistent outside daf_butler. Overall it aligns the hierarchy better with the built-in typing hierarchy, injecting some interfaces that had implementations but no ABCs, and it cleans up the handling of mutability by separating frozenset from set views.
We've historically treated DimensionRecord lookup failures as a non-error (we just don't return records we didn't find), at least mostly because we allow implied dimension dependencies to be None (e.g. an exposure need not have a physical_filter). But that's actually broken because of DM-21840, and I now think the right solution there is to just make those foreign keys NOT NULL after all. That's a long way of saying that I think we could _probably_ start treating DimensionRecord lookup failures as errors, and simplify a lot of code in the process (though a lot of test would break). I'm not confident enough to try that now and I don't want to deal with fixing the tests, but I am confident enough to say that lookup failures are so rare that we *can* simplify by not caching them, and that's what this commit does. This is relevant because I'm about to reimplement the caching entirely in terms of the new DimensionRecord containers, and I don't want those to have to worry about caching failed lookups. But I wanted to make the (slight) behavioral change a separate commit (this one) from the refactoring one.
This is now being used as a callback in the new cache object, and for consistent error-handling there we need these to behave more consistently.
This was a test for some cleverness that avoids a copy, but I'm not sure all of that cleverness is really worth its weight, and I don't think we should be testing something the user is not supposed to care about.
This adds a new method to Registry and SqlRegistry without a RemoteRegistry implementation, and that's why it isn't decorated with abstractmethod yet. I'll fix that after I make the DimensionRecord containers serializable. This also adds a check to DataCoordinate.standardize that any passed keys are actually dimension names; this addresses a long-standing problem where users would get a keyword argument to some higher-level API wrong, and it would be forwarded as **kwargs down to DataCoordinate.standardize and then silently ignored. And it turns out we were doing that even in our own test utility code!
This reimplements some of the special handling of non-standard keys from Butler._findDatasetRef in the hopes of being able to move it all down to Registry (and thus work on many more interfaces). But it's just a start at that; I realized while trying to make Butler._findDatasetRef use the new code that we really need to make queryDatasets work on CALIBRATION collections first. But I think what I've done so far will still be useful eventually, so I'm keeping it.
ac82d70
to
4f55339
Compare
Rebasing to w.2021.46 was relatively easy (although I'm sure tests don't pass). The first major problem comes with #601 since that is when @andy-slac reorganizes how dimension record query results are returned such that order_by/limit can be supported and that provides a completely different return type with different parameters to the new, on this ticket, Not sure where to take this now. |
Checklist
doc/changes