-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fsspec functionality to viirs_sdr
reader
#2534
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I haven't been a big part of these types of changes in Satpy. I'm hoping @pnuu or @mraspaud can take a look at this before it gets merged. We may want to have some tests added for this, but I'm not sure if we required it for the changes to the NetCDF helper file handler.
Thanks so much for starting the work on this though. This will be great to have.
Hm looks like these changes are breaking the
I don't see any mocking in the tests so this looks like it is purely these changes that are breaking things. I could be wrong too... |
Hello! Since I was mentioned here related to access to HDF5 files in the cloud, here's the link to my recent presentation about cloud-optimized HDF5 files (applies to netCDF-4 files, too). I hope you find it useful. |
@martin-rdz did you check why your changes broke the epic tests? |
Not yet. Sorry was occupied with other projects. |
That would be great! |
c6fb728
to
ed9948f
Compare
I tried to dig into the issue. From my understanding, I reckon, this issue would also occur if a In my test-environment In case that did not fix the issue, I can have an even deeper look. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2534 +/- ##
==========================================
+ Coverage 95.39% 95.40% +0.01%
==========================================
Files 371 371
Lines 52690 52796 +106
==========================================
+ Hits 50263 50370 +107
+ Misses 2427 2426 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
pre-commit.ci autofix |
Pull Request Test Coverage Report for Build 7469856199Warning: This coverage report may be inaccurate.We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
💛 - Coveralls |
31104fb
to
e9e5fcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me. I'm a little concerned that the new if
statement will break something, but as far as I can tell it should only fix things that we didn't know were wrong. I guess the last question for me is how hard would it be to add a test for this change? I'm not sure we need to test data actually coming from S3, but something local wrapped in an fsspec or FSFile object would be good. We already have helpers used by the Scene to do this conversion (s3:// URL to FSFile):
But this could be done with a local file too.
A test to make sure that a Path object isn't opened when passed to the HDF5 file handler would be good too.
For sure I can have a look at the tests as well. Tough I am not very experienced on how to inject the dummy files into the test environment. |
It will be difficult to test with VIIRS SDR files because they are so complex. It may be better (and I'm OK with this) to create a fake file handler based on the HDF5 handler and in your test setup (you could create a pytest fixture as well) create a very very basic HDF5 file to give to it. I don't remember but you might be able to get away with creating the HDF5 handler itself and not a subclass, but there may be some abstract base class stuff that will be mad that some methods aren't implemented. You should be able to then make Mock versions of some filename inputs (a Path-like object, a FSFile, etc) and in your test assert that things are called in a certain order. For example, make sure Let me know if I can provide more guidance. I won't have much time to help actually write the tests, but I'll try to help where I can. |
I'm running into the main issue you ran into here with passing a |
Apologies, for the long delay, but I was occupied with other issues. |
@martin-rdz Thanks for coming back to this. As for the location, I would put is in the test_viirs_sdr.py file. |
@mraspaud Is this true? |
@martin-rdz we may have to check this specifically, but it may be that the h5netcdf engine of xarray ( |
Ok what are our cases, Readers are given:
I tested the same with h5py.File and all was about as expected. TestingIn [1]: import fsspec
In [2]: import xarray as xr
In [3]: fn = "/data/satellite/abi/20181127/OR_ABI-L1b-RadC-M3C01_G16_s20183291257184_e20183291259557_c20183291300002.nc"
In [4]: fs_file = fsspec.open(fn)
In [5]: fs_file
Out[5]: <OpenFile '/data/satellite/abi/20181127/OR_ABI-L1b-RadC-M3C01_G16_s20183291257184_e20183291259557_c20183291300002.nc'>
In [6]: import pathlib
In [7]: isinstance(fs_file, pathlib.Path)
Out[7]: False
In [9]: open_fs_file = fs_file.open()
In [10]: open_fs_file
Out[10]: <fsspec.implementations.local.LocalFileOpener at 0x7bcf34137910>
In [11]: open_ds = xr.open_dataset(open_fs_file)
In [12]: open_ds
Out[12]:
<xarray.Dataset>
Dimensions: (y: 3000, x: 5000,
...
In [13]: open_ds = xr.open_dataset(open_fs_file, engine="netcdf4")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
File ~/miniconda3/envs/satpy_py312/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:373, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
370 filename = os.fspath(filename)
372 if not isinstance(filename, str):
--> 373 raise ValueError(
374 "can only read bytes or file-like objects "
375 "with engine='scipy' or 'h5netcdf'"
376 )
378 if format is None:
379 format = "NETCDF4"
ValueError: can only read bytes or file-like objects with engine='scipy' or 'h5netcdf'
In [14]: open_ds = xr.open_dataset(open_fs_file, engine="h5netcdf")
In [15]: p = pathlib.Path(fn)
In [16]: path_ds = xr.open_dataset(p, engine="netcdf4")
In [17]: path_ds.dims
Out[17]: FrozenMappingWarningOnValuesAccess({'y': 3000, 'x': 5000, 'number_of_time_bounds': 2, 'band': 1, 'number_of_image_bounds': 2, 'num_star_looks': 24})
In [27]: open_path = p.open()
In [28]: path_ds = xr.open_dataset(open_path, engine="h5netcdf")
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
In [29]: path_ds = xr.open_dataset(open_path, engine="h5netcdf") Other comments
@martin-rdz I think this is generally true. Most readers are rather naive and don't assume or implement any "stream" functionality like reading in byte arrays/strings. |
Sorry for all the comments. I see @mraspaud suggested adding tests to the reader for this logic, but I disagree with this. If the reader(s) were ever extracted out of Satpy (or just this VIIRS SDR reader) then we'd lose out on all of this important testing. I'll see if I can add some testing for this. |
following suggestion by @djhoese
e9e5fcb
to
d3fe3fe
Compare
Ok I added some tests and refactored others. Turns out I didn't need to refactor the FSFile tests to write mine, but...well...that's done now. All of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix and tests @martin-rdz @djhoese !
I would just skip the file deletions when pytest's tmp_path or tmp_path_factory are used.
Assuming tests pass, does anyone else have more comments? Otherwise I can merge this tomorrow (or someone else can). |
Add the option to read AWS S3 files in the
viirs_sdr
reader, similarly to theabi_l1b
version, as described in the documentation.h5py also is fine with the return of
satpy.readers.open_file_or_filename
.(Thanks to @ajelenak for the inspiration in his jupyter notebook).
Though I am not sure where to properly document the added functionality.
Cheers,
martin