Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-47325: Add API for parsing butler dataset URIs (butler and ivo) #1113

Merged
merged 10 commits into from
Dec 6, 2024

Conversation

timj
Copy link
Member

@timj timj commented Nov 1, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

@timj timj force-pushed the tickets/DM-47325 branch from 9dd5098 to 57d6ec3 Compare November 1, 2024 21:46
Copy link

codecov bot commented Nov 1, 2024

Codecov Report

Attention: Patch coverage is 98.90110% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.46%. Comparing base (d632886) to head (2cbb8f0).
Report is 11 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
python/lsst/daf/butler/_labeled_butler_factory.py 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1113      +/-   ##
==========================================
+ Coverage   89.45%   89.46%   +0.01%     
==========================================
  Files         366      366              
  Lines       48684    48773      +89     
  Branches     5897     5907      +10     
==========================================
+ Hits        43548    43636      +88     
  Misses       3721     3721              
- Partials     1415     1416       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timj timj force-pushed the tickets/DM-47325 branch from 57d6ec3 to 19d4c15 Compare November 1, 2024 22:03
python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
@timj timj force-pushed the tickets/DM-47325 branch 3 times, most recently from 1a3d455 to 957f200 Compare November 7, 2024 23:07
Copy link
Contributor

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded to request for feedback on ivo: URIs.

python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
index_file.flush()
with mock_env({"DAF_BUTLER_REPOSITORY_INDEX": index_file.name}):
butler_factory = LabeledButlerFactory()
factory = butler_factory.bind(access_token=None)
Copy link
Member Author

@timj timj Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rra does this approach work for you in the cutout service?

factory = butler_factory.bind(access_token=token)
...
ref = Butler.get_dataset_from_uri(dataset_uri, factory=factory)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cutout service passes a Butler instance into the backend, so the pattern looks like:

butler_factory = LabeledButlerFactory()

def _get_backend(label: str, token: str) -> ImageCutoutBackend:
    # Called for each cutout
    factory = butler_factory.bind(access_token=token)
    butler = factory.create_butler(label=label)
    # ...
    return ImageCutoutBackend(butler, projection_finder, output, tmpdir)

Is that what you had in mind? Basically move the access token parameter to create_butler to an intermediate step to create a factory with a bound token? If so, that would be fine here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. No. It's probably the wrong API for you. Somewhere you are parsing the butler:// URI and I'm trying to provide code here that will hide the URI structure from you (so that we can also support the new ivo:// URIs). This PR creates two APIs: one that parses the URI and returns the butler repo label and the UUID, and another API (that is the one I talk about here) that lets you retrieve the DatasetRef directly from the URI and a butler factory. Maybe get_dataset_from_uri should return the Butler instance along with the DatasetRef?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I could pass the return value of bind into the backend along with the URIs as-is without parsing them and then the backend can do whatever it needs to do? That would be even more convenient. In other words, have the constructor of ImageCutoutBackend take a Butler factor instead of a Butler instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing the bound factory and URI into whatever is wanting to know the DatasetRef is what we want here.

@timj timj force-pushed the tickets/DM-47325 branch from 8be2fc4 to f339fa3 Compare December 3, 2024 22:18
@timj timj requested a review from gpdf December 3, 2024 22:27
@timj timj force-pushed the tickets/DM-47325 branch 4 times, most recently from 85a0f0a to 2064fab Compare December 4, 2024 04:34
Copy link

@stvoutsin stvoutsin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, I think the new methods make sense and are useful. Only a couple thoughts/suggestions here

python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
tests/test_simpleButler.py Show resolved Hide resolved
python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
if parsed.scheme == "ivo":
# Do not validate the netloc or the path values.
qs = urllib.parse.parse_qs(parsed.query)
if "repo" not in qs or "id" not in qs:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is expected of query param keys in terms of case sensitivity. But I think no harm in treating those as case-insensitive as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we are generating these IVO IDs so I don't really want the added complication of converting the dict to a dict with case insensitive keys (I have a recollection of using such a special dict at some point but I'm not sure where it was).

timj added 7 commits December 5, 2024 13:48
Otherwise there is no easy way for the caller to get the
dataset:

butler, ref = Butler.get_dataset_from_uri(uri)
thing = butler.get(ref)
The URI scheme and netloc are meant to be case insensitive.
Since the butler URI scheme relies on butler labels which
are not case-insensitive, the netloc is not downcased in
that situation.
@timj timj force-pushed the tickets/DM-47325 branch from 2064fab to 3cd5774 Compare December 5, 2024 20:51
Copy link
Contributor

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this now, though we are still discussion what will go into the authority field in the IVOID versus what will go in the resource key field (hostname and path fields, colloquially speaking).

I am assured that that's not going to require additional work on the code affected by this PR.

@timj timj merged commit d66c8bb into main Dec 6, 2024
19 checks passed
@timj timj deleted the tickets/DM-47325 branch December 6, 2024 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants