Opening virtual datasets (dmr-adapter) #606

ayushnag · 2024-06-18T23:33:09Z

Closes Opening virtual datasets with NASA dmrpp files #605
Fix scale_factor, add_offset bug
Find precise permissions URL (not DAAC general)
Add docs
Unit tests. Update current test to test specific portions of the virtual dataset and also test the numpy/dask loaded dataset
Check indirect access support
Update earthaccess documentation
Use updated virtualizarr version (with numpy 2.0 manifest)

📚 Documentation preview 📚: https://earthaccess--606.org.readthedocs.build/en/606/

TomNicholas · 2024-06-19T03:09:45Z

earthaccess/virtualizarr.py

+    Exception
+    If the DMR++ file is not found or if there is an error parsing the DMR++
+    """
+    from virtualizarr.readers.dmrpp import DMRParser


It's interesting that you're not actually using the filetype='dmr++' option to virtualizarr.open_virtual_dataset here. It seems to me that one alternative option would be for everything in zarr-developers/VirtualiZarr#113 to also live in this library, as it already pretty much entirely uses public virtualizarr API... But I guess that depends whether you think the dmr++ option to virtualizarr.open_virtual_dataset is likely to be useful outside of the context of the earthaccess library.

The only reason is because the parser has an additional kwarg data_filepath which is required in cases where the dmr path cannot be simply derived by just adding .dmrpp. If there is a way for engine specific args in virtualizarr.open_dataset then I can switch to that

The only reason is because the parser has an additional kwarg data_filepath which is required in cases where the dmr path cannot be simply derived by just adding .dmrpp.

I'm not sure I understand - why is the main filepath you pass to virtualizarr.open_virtual_dataset not sufficient?

For this case: virtualizarrr.open_dataset(filepath=“s3://air.dmrpp”, data_filepath=“s3://datafiles/air.nc”, engine="dmr++") when the dmr path is independent from data path. The chunk manifest needs to store the data_filepath instead of the dmr filepath

Huh - does the dmr++ data not contain the path to the original data??

No the dmr file only contains the file name and not the full path. This is an example of the name that a dmr file contains name="20210715090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"

I noticed that virtualizarr renaming paths was added which can solve this issue. I will just call vz.open_dataset and then rename the data paths using earthaccess results. Then I can switch to the public virtualizarr API now and remove the _parse_dmr function

So what does that imply for my original question:

It seems to me that one alternative option would be for everything in zarr-developers/VirtualiZarr#113 to also live in this library, as it already pretty much entirely uses public virtualizarr API...

@betolink How about if I move the parser code into earthaccess? Now that I'm thinking about it makes more sense to be in the NASA related repository. It would also make unit tests easier since earthaccess can easily access NASA dmrpp files. Then this PR will add dmrpp.py and virtualizarr.py

betolink · 2024-06-20T14:43:33Z

This PR looks good to me (we need to fix some minor formatting issues with Ruff). Maybe the only missing thing would be a notebook demonstrating how to use this feature? @ayushnag

ayushnag · 2024-06-20T17:48:22Z

https://gist.github.com/ayushnag/bcf946a71122f5e7a54bc72b581bd31b

Better viewing experience: https://nbviewer.org/gist/ayushnag/bcf946a71122f5e7a54bc72b581bd31b

If there's more you want me to add or if any step is unclear I can update the notebook

base features

06592f1

ayushnag marked this pull request as draft June 18, 2024 23:33

ayushnag mentioned this pull request Jun 18, 2024

Opening virtual datasets with NASA dmrpp files #605

Open

TomNicholas reviewed Jun 19, 2024

View reviewed changes

betolink self-assigned this Jun 25, 2024

ayushnag added 4 commits August 6, 2024 22:28

Merge branch 'nsidc:main' into dmr-adapter

c3eafb1

added load param

90f296a

Merge branch 'nsidc:main' into dmr-adapter

dd006a3

add dmrpp parser and test

b626a41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opening virtual datasets (dmr-adapter) #606

Opening virtual datasets (dmr-adapter) #606

ayushnag commented Jun 18, 2024 •

edited

Loading

TomNicholas Jun 19, 2024

ayushnag Jun 19, 2024

TomNicholas Jun 19, 2024

ayushnag Jun 19, 2024

TomNicholas Jun 19, 2024

ayushnag Jun 19, 2024

ayushnag Jun 27, 2024 •

edited

Loading

TomNicholas Aug 6, 2024

ayushnag Aug 7, 2024

betolink commented Jun 20, 2024

ayushnag commented Jun 20, 2024 •

edited

Loading

Opening virtual datasets (dmr-adapter) #606

Are you sure you want to change the base?

Opening virtual datasets (dmr-adapter) #606

Conversation

ayushnag commented Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayushnag Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betolink commented Jun 20, 2024

ayushnag commented Jun 20, 2024 • edited Loading

ayushnag commented Jun 18, 2024 •

edited

Loading

ayushnag Jun 27, 2024 •

edited

Loading

ayushnag commented Jun 20, 2024 •

edited

Loading