-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opening virtual datasets (dmr-adapter) #606
base: main
Are you sure you want to change the base?
Conversation
earthaccess/virtualizarr.py
Outdated
Exception | ||
If the DMR++ file is not found or if there is an error parsing the DMR++ | ||
""" | ||
from virtualizarr.readers.dmrpp import DMRParser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that you're not actually using the filetype='dmr++'
option to virtualizarr.open_virtual_dataset
here. It seems to me that one alternative option would be for everything in zarr-developers/VirtualiZarr#113 to also live in this library, as it already pretty much entirely uses public virtualizarr API... But I guess that depends whether you think the dmr++ option to virtualizarr.open_virtual_dataset
is likely to be useful outside of the context of the earthaccess
library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason is because the parser has an additional kwarg data_filepath
which is required in cases where the dmr path cannot be simply derived by just adding .dmrpp
. If there is a way for engine
specific args in virtualizarr.open_dataset
then I can switch to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason is because the parser has an additional kwarg data_filepath which is required in cases where the dmr path cannot be simply derived by just adding .dmrpp.
I'm not sure I understand - why is the main filepath
you pass to virtualizarr.open_virtual_dataset
not sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case: virtualizarrr.open_dataset(filepath=“s3://air.dmrpp”, data_filepath=“s3://datafiles/air.nc”, engine="dmr++")
when the dmr path is independent from data path. The chunk manifest needs to store the data_filepath
instead of the dmr filepath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh - does the dmr++ data not contain the path to the original data??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the dmr file only contains the file name and not the full path. This is an example of the name that a dmr file contains name="20210715090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that virtualizarr renaming paths was added which can solve this issue. I will just call vz.open_dataset
and then rename the data paths using earthaccess
results. Then I can switch to the public virtualizarr
API now and remove the _parse_dmr
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what does that imply for my original question:
It seems to me that one alternative option would be for everything in zarr-developers/VirtualiZarr#113 to also live in this library, as it already pretty much entirely uses public virtualizarr API...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betolink How about if I move the parser code into earthaccess
? Now that I'm thinking about it makes more sense to be in the NASA related repository. It would also make unit tests easier since earthaccess
can easily access NASA dmrpp files. Then this PR will add dmrpp.py
and virtualizarr.py
This PR looks good to me (we need to fix some minor formatting issues with Ruff). Maybe the only missing thing would be a notebook demonstrating how to use this feature? @ayushnag |
https://gist.github.com/ayushnag/bcf946a71122f5e7a54bc72b581bd31b Better viewing experience: https://nbviewer.org/gist/ayushnag/bcf946a71122f5e7a54bc72b581bd31b If there's more you want me to add or if any step is unclear I can update the notebook |
virtualizarr
version (with numpy 2.0 manifest)📚 Documentation preview 📚: https://earthaccess--606.org.readthedocs.build/en/606/