-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More frictionless S3 direct access #431
Comments
Also wondering if #424 relates to this, will need to dig into that a bit more to understand if will help. |
@abarciauskas-bgse Currently with earthaccess/earthaccess/store.py Lines 198 to 262 in dd61f23
earthaccess which allows initializing this s3fs instance with a profile or IAM metadata option. This is unrelated to #424 as the same region requirement is only enforced for temporary access tokens generated by a DAAC's Cumulus s3_credentials endpoint. Let's circle up next week and we can kick off a PR for the IAM "escape hatch" since this will be pretty clutch functionality for improving the VEDA JupyterHub user experience 👍
|
thanks @sharkinsspatial that all makes sense to me. |
@luzpaz - @sharkinsspatial and I discussed a proposal for how to implement S3 access using the IAM role instead of S3 credentials, so bypassing all the earthdata login methods. The API we imagine is: import earthaccess
earthaccess.login(strategy="iam") An option This if self.auth.use_iam:
return s3fs.S3FileSystem(anon=False) Let us know what you think. |
Hi @abarciauskas-bgse I like the first_result = earthaccess.search_data(
short_name='MUR-JPL-L4-GLOB-v4.1',
cloud_hosted=True,
count=1
)
# Granules found: 7899
fileset = earthaccess.open(first_result)
fileset [<File-like object S3FileSystem, podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc>] If we use the links directly we need to tell earthaccess which provider it should use so it can grab the credentials from a dictionary, although this should be more dynamic, in the near future earthaccess should infer which credential endpoint it needs to use. results = earthaccess.search_data(
short_name='MUR-JPL-L4-GLOB-v4.1',
cloud_hosted=True,
count=3
)
# if this collection had more than one file per granule we'll have to flatten the list instead of grabbing the first link
links = [g.data_links(access="direct")[0] for g in results]
fileset = earthaccess.open(links, provider="POCLOUD") [<File-like object S3FileSystem, podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc>,
<File-like object S3FileSystem, podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20020602090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc>,
<File-like object S3FileSystem, podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20020603090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc>] I'm curious if you get the same results, the Openscapes hub also has an assumed role in the environment but it may be different from what VEDA is doing. In any case, I think the |
Thank you @betolink! I'll put the IAM strategy implementation on our todo list, but happy if someone on your team gets to it first 😄 |
I'm not sure if #444 solves the error I reported above but I'll take another look when I get a chance 👍🏽 |
Just noting that an earthaccess upgrade (to v0.8.2) in the VEDA hub resolves the issue of |
earthaccess allows for filtering datasets by cloud_hosted, and allows for discovering the S3 links using data_links(access="direct"), and even downloading. But I'm not able to use earthdata to open the data directly from S3 using the VEDA JupyterHub. Could this be because the VEDA JupyterHub is associated with a role for Earthdata cloud access?
Right now this is how the code is executing:
earthaccess responds it can't open the dataset, even though this code was run in-region. I'm using the VEDA hub with direct access so I can resort to using xarray + s3fs to open the link, but having
earthaccess.open
work for direct access would be good to add for in-region users who are not using a NASA-managed hub like VEDA.Ideally, this search and open would be like:
This is very much the example from the README (minus the
access="direct"
parameter), but, at least in the VEDA JupyterHubresults
and.open
are using anHTTPFileSystem
not S3.Perhaps the issue is it's not recognizing that the code is being run in-region?
Apologies if I missed something about how the library is supposed to work!
The text was updated successfully, but these errors were encountered: