-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-47325: Add API for parsing butler dataset URIs (butler and ivo) #1113
Conversation
9dd5098
to
57d6ec3
Compare
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #1113 +/- ##
==========================================
+ Coverage 89.45% 89.46% +0.01%
==========================================
Files 366 366
Lines 48684 48773 +89
Branches 5897 5907 +10
==========================================
+ Hits 43548 43636 +88
Misses 3721 3721
- Partials 1415 1416 +1 ☔ View full report in Codecov by Sentry. |
57d6ec3
to
19d4c15
Compare
1a3d455
to
957f200
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded to request for feedback on ivo: URIs.
00b1254
to
a8d44d4
Compare
index_file.flush() | ||
with mock_env({"DAF_BUTLER_REPOSITORY_INDEX": index_file.name}): | ||
butler_factory = LabeledButlerFactory() | ||
factory = butler_factory.bind(access_token=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rra does this approach work for you in the cutout service?
factory = butler_factory.bind(access_token=token)
...
ref = Butler.get_dataset_from_uri(dataset_uri, factory=factory)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cutout service passes a Butler instance into the backend, so the pattern looks like:
butler_factory = LabeledButlerFactory()
def _get_backend(label: str, token: str) -> ImageCutoutBackend:
# Called for each cutout
factory = butler_factory.bind(access_token=token)
butler = factory.create_butler(label=label)
# ...
return ImageCutoutBackend(butler, projection_finder, output, tmpdir)
Is that what you had in mind? Basically move the access token parameter to create_butler
to an intermediate step to create a factory with a bound token? If so, that would be fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. No. It's probably the wrong API for you. Somewhere you are parsing the butler://
URI and I'm trying to provide code here that will hide the URI structure from you (so that we can also support the new ivo://
URIs). This PR creates two APIs: one that parses the URI and returns the butler repo label and the UUID, and another API (that is the one I talk about here) that lets you retrieve the DatasetRef directly from the URI and a butler factory. Maybe get_dataset_from_uri
should return the Butler
instance along with the DatasetRef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I could pass the return value of bind
into the backend along with the URIs as-is without parsing them and then the backend can do whatever it needs to do? That would be even more convenient. In other words, have the constructor of ImageCutoutBackend
take a Butler factor instead of a Butler instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing the bound factory and URI into whatever is wanting to know the DatasetRef is what we want here.
17248d9
to
8be2fc4
Compare
8be2fc4
to
f339fa3
Compare
85a0f0a
to
2064fab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, I think the new methods make sense and are useful. Only a couple thoughts/suggestions here
if parsed.scheme == "ivo": | ||
# Do not validate the netloc or the path values. | ||
qs = urllib.parse.parse_qs(parsed.query) | ||
if "repo" not in qs or "id" not in qs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what is expected of query param keys in terms of case sensitivity. But I think no harm in treating those as case-insensitive as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we are generating these IVO IDs so I don't really want the added complication of converting the dict to a dict with case insensitive keys (I have a recollection of using such a special dict at some point but I'm not sure where it was).
Otherwise there is no easy way for the caller to get the dataset: butler, ref = Butler.get_dataset_from_uri(uri) thing = butler.get(ref)
The URI scheme and netloc are meant to be case insensitive. Since the butler URI scheme relies on butler labels which are not case-insensitive, the netloc is not downcased in that situation.
2064fab
to
3cd5774
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this now, though we are still discussion what will go into the authority
field in the IVOID versus what will go in the resource key
field (hostname and path fields, colloquially speaking).
I am assured that that's not going to require additional work on the code affected by this PR.
Checklist
doc/changes
configs/old_dimensions