Add fsspec functionality to `viirs_sdr` reader #2534

martin-rdz · 2023-07-23T16:00:10Z

Adds to Add support for s3 buckets in OLCI and ABI l1 readers #1439
Tests added
Fully documented

Add the option to read AWS S3 files in the viirs_sdr reader, similarly to the abi_l1b version, as described in the documentation.
h5py also is fine with the return of satpy.readers.open_file_or_filename.
(Thanks to @ajelenak for the inspiration in his jupyter notebook).
Though I am not sure where to properly document the added functionality.

Cheers,
martin

djhoese

Looks good to me, but I haven't been a big part of these types of changes in Satpy. I'm hoping @pnuu or @mraspaud can take a look at this before it gets merged. We may want to have some tests added for this, but I'm not sure if we required it for the changes to the NetCDF helper file handler.

Thanks so much for starting the work on this though. This will be great to have.

djhoese · 2023-07-24T15:10:47Z

Hm looks like these changes are breaking the test_epic_l1b_h5.py tests:

FAILED satpy/tests/reader_tests/test_epic_l1b_h5.py::TestEPICL1bReader::test_times - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
FAILED satpy/tests/reader_tests/test_epic_l1b_h5.py::TestEPICL1bReader::test_counts_calibration - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
FAILED satpy/tests/reader_tests/test_epic_l1b_h5.py::TestEPICL1bReader::test_refl_calibration - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
FAILED satpy/tests/reader_tests/test_epic_l1b_h5.py::TestEPICL1bReader::test_bad_calibration - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
FAILED satpy/tests/reader_tests/test_epic_l1b_h5.py::TestEPICL1bReader::test_load_ancillary - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

I don't see any mocking in the tests so this looks like it is purely these changes that are breaking things. I could be wrong too...

ajelenak · 2023-09-04T16:55:25Z

Hello!

Since I was mentioned here related to access to HDF5 files in the cloud, here's the link to my recent presentation about cloud-optimized HDF5 files (applies to netCDF-4 files, too). I hope you find it useful.

mraspaud · 2023-09-07T14:21:59Z

@martin-rdz did you check why your changes broke the epic tests?

martin-rdz · 2023-09-08T10:15:26Z

Not yet. Sorry was occupied with other projects.
But for sure I can have a look.

mraspaud · 2023-09-08T11:40:14Z

That would be great!

martin-rdz · 2023-09-12T12:10:59Z

I tried to dig into the issue. From my understanding, satpy/tests/reader_tests/test_epic_l1b_h5.py passed the filename (....pytest-10/test_times0/epic_1b_20150613120251_03.h5) as a pathlib.PosixPath.
pathlib posses a .open method, hence open_file_or_filename succeeded in opening the file as a io.TextIOWrapper, which caused trouble downstream.

I reckon, this issue would also occur if a pathlib.Path is provided by the user and decided to fix it within open_file_or_filename with an explicit if statement.

In my test-environment test_epic_l1b_h5.py and test_abi_l1b.py succeeded, but I could not check for all the reader tests as I am lacking some libraries.

In case that did not fix the issue, I can have an even deeper look.

codecov · 2023-09-12T12:18:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (442325d) 95.39% compared to head (529de8f) 95.40%.
Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2534      +/-   ##
==========================================
+ Coverage   95.39%   95.40%   +0.01%     
==========================================
  Files         371      371              
  Lines       52690    52796     +106     
==========================================
+ Hits        50263    50370     +107     
+ Misses       2427     2426       -1

Flag	Coverage Δ
behaviourtests	`4.15% <0.71%> (-0.01%)`	⬇️
unittests	`96.01% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

djhoese · 2023-09-12T15:16:18Z

pre-commit.ci autofix

satpy/readers/__init__.py

coveralls · 2023-09-12T15:48:46Z

Pull Request Test Coverage Report for Build 7469856199

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.01%) to 95.965%

Totals
Change from base Build 7379871277:	0.01%
Covered Lines:	50496
Relevant Lines:	52619

💛 - Coveralls

djhoese

This looks reasonable to me. I'm a little concerned that the new if statement will break something, but as far as I can tell it should only fix things that we didn't know were wrong. I guess the last question for me is how hard would it be to add a test for this change? I'm not sure we need to test data actually coming from S3, but something local wrapped in an fsspec or FSFile object would be good. We already have helpers used by the Scene to do this conversion (s3:// URL to FSFile):

https://github.com/martin-rdz/satpy/blob/e9e5fcbf1fcaede5e97d4407086a12376fd64944/satpy/utils.py#L660-L675

But this could be done with a local file too.

A test to make sure that a Path object isn't opened when passed to the HDF5 file handler would be good too.

@mraspaud @pnuu any other requests?

martin-rdz · 2023-10-03T21:49:27Z

For sure I can have a look at the tests as well. Tough I am not very experienced on how to inject the dummy files into the test environment.
Might take a while until I get those running.

djhoese · 2023-10-04T14:01:46Z

It will be difficult to test with VIIRS SDR files because they are so complex. It may be better (and I'm OK with this) to create a fake file handler based on the HDF5 handler and in your test setup (you could create a pytest fixture as well) create a very very basic HDF5 file to give to it. I don't remember but you might be able to get away with creating the HDF5 handler itself and not a subclass, but there may be some abstract base class stuff that will be mad that some methods aren't implemented. You should be able to then make Mock versions of some filename inputs (a Path-like object, a FSFile, etc) and in your test assert that things are called in a certain order. For example, make sure .open() is never called on the Path-like object or at least not until it is done by the HDF5/h5py library. Passing an FSFile for a local file (your test file) should also work...I think.

Let me know if I can provide more guidance. I won't have much time to help actually write the tests, but I'll try to help where I can.

djhoese · 2023-10-29T02:31:28Z

I'm running into the main issue you ran into here with passing a Path object and having Satpy try to give xarray.open_dataset a TextIOWrapper rather than the original filename. We should see what we can do to get this PR updated and/or merged.

martin-rdz · 2023-12-02T16:50:45Z

Apologies, for the long delay, but I was occupied with other issues.
I am still trying to figure out what we should actually test for and would appreciate some guidiance.
So the open_file_or_filename should never return something that cannot be handled by xarray.open_dataset, especially no io.TextIOWrapper? Is this generally true for all readers?
That test would then ideally be located in satpy/tests/test_readers.py?

mraspaud · 2023-12-04T09:54:19Z

@martin-rdz Thanks for coming back to this.
In my opinion, the test to add would be to just pass an FSFile instance with eg a local fsspec file in it and make sure that file is actually opened. As @djhoese mentioned earlier, it's probably easier with just a dummy file. The rest of the filehandler is already tested, so that would be enough I think.

As for the location, I would put is in the test_viirs_sdr.py file.

djhoese · 2023-12-04T16:25:38Z

So the open_file_or_filename should never return something that cannot be handled by xarray.open_dataset, especially no io.TextIOWrapper? Is this generally true for all readers?

@mraspaud Is this true?

djhoese · 2024-01-04T17:18:44Z

@martin-rdz we may have to check this specifically, but it may be that the h5netcdf engine of xarray (xr.open_dataset(..., engine="h5netcdf")) can read TextIOWrapper objects or file-like objects. I believe it reads it as a series of bytes. The other engines (ex. netcdf4) can't handle open file-like objects like this last time I checked.

djhoese · 2024-01-04T17:50:38Z

Ok what are our cases, Readers are given:

fsspec OpenFile objects. These are not pathlib.Path subclasses and they have an .open() method which returns a LocalFileOpener (for local files of course). These Opener objects can be passed to xr.open_dataset as long as h5netcdf if available. Xarray will switch the default engine to h5netcdf if it detects a file-like object (see below).
Path objects. These are obviously a Path subclass and have a .open() method which returns a TextIOWrapper. The Path itself is passable to h5netcdf and netcdf4 engines for xr.open_dataset. The open path gives the expected error when passed to the netcdf4 engine. It also gives an error when passed to h5netcdf for some reason. I'm not sure what's going on here. ...oh I need to .open(mode="rb") but that syntax isn't supported by fsspec.
String local filenames. Not Path subclasses and no .open() methods. They work.

I tested the same with h5py.File and all was about as expected. Path works, fsspec Opener works, Path.open(mode="rb") also worked.

Testing

In [1]: import fsspec

In [2]: import xarray as xr

In [3]: fn = "/data/satellite/abi/20181127/OR_ABI-L1b-RadC-M3C01_G16_s20183291257184_e20183291259557_c20183291300002.nc"

In [4]: fs_file = fsspec.open(fn)

In [5]: fs_file
Out[5]: <OpenFile '/data/satellite/abi/20181127/OR_ABI-L1b-RadC-M3C01_G16_s20183291257184_e20183291259557_c20183291300002.nc'>

In [6]: import pathlib

In [7]: isinstance(fs_file, pathlib.Path)
Out[7]: False

In [9]: open_fs_file = fs_file.open()

In [10]: open_fs_file
Out[10]: <fsspec.implementations.local.LocalFileOpener at 0x7bcf34137910>

In [11]: open_ds = xr.open_dataset(open_fs_file)

In [12]: open_ds
Out[12]:
<xarray.Dataset>
Dimensions:                                 (y: 3000, x: 5000,
...

In [13]: open_ds = xr.open_dataset(open_fs_file, engine="netcdf4")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...

File ~/miniconda3/envs/satpy_py312/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:373, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    370     filename = os.fspath(filename)
    372 if not isinstance(filename, str):
--> 373     raise ValueError(
    374         "can only read bytes or file-like objects "
    375         "with engine='scipy' or 'h5netcdf'"
    376     )
    378 if format is None:
    379     format = "NETCDF4"

ValueError: can only read bytes or file-like objects with engine='scipy' or 'h5netcdf'


In [14]: open_ds = xr.open_dataset(open_fs_file, engine="h5netcdf")

In [15]: p = pathlib.Path(fn)

In [16]: path_ds = xr.open_dataset(p, engine="netcdf4")

In [17]: path_ds.dims
Out[17]: FrozenMappingWarningOnValuesAccess({'y': 3000, 'x': 5000, 'number_of_time_bounds': 2, 'band': 1, 'number_of_image_bounds': 2, 'num_star_looks': 24})

In [27]: open_path = p.open()

In [28]: path_ds = xr.open_dataset(open_path, engine="h5netcdf")
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
...

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

In [29]: path_ds = xr.open_dataset(open_path, engine="h5netcdf")

Other comments

So the open_file_or_filename should never return something that cannot be handled by xarray.open_dataset, especially no io.TextIOWrapper? Is this generally true for all readers?

@martin-rdz I think this is generally true. Most readers are rather naive and don't assume or implement any "stream" functionality like reading in byte arrays/strings.

djhoese · 2024-01-04T18:53:01Z

Sorry for all the comments. I see @mraspaud suggested adding tests to the reader for this logic, but I disagree with this. If the reader(s) were ever extracted out of Satpy (or just this VIIRS SDR reader) then we'd lose out on all of this important testing. I'll see if I can add some testing for this.

@djhoese

following suggestion by @djhoese

djhoese · 2024-01-04T20:35:14Z

Ok I added some tests and refactored others. Turns out I didn't need to refactor the FSFile tests to write mine, but...well...that's done now.

All of the Path cases of my tests failed without the changes from this PR. Otherwise the only thing that surprised me was that I originally wrote the FSFile based opening tests with FSFile(local_filename) and they fail for the same reason I described in my big comment above: the .open call opens them in text mode instead of as bytes. So it fails do decode the binary data as UTF-8. An expected failure, but also something that should probably work. @mraspaud thoughts?

mraspaud

LGTM, thanks for the fix and tests @martin-rdz @djhoese !
I would just skip the file deletions when pytest's tmp_path or tmp_path_factory are used.

satpy/tests/test_readers.py

djhoese · 2024-01-10T02:15:40Z

Assuming tests pass, does anyone else have more comments? Otherwise I can merge this tomorrow (or someone else can).

martin-rdz requested review from djhoese and mraspaud as code owners July 23, 2023 16:00

djhoese approved these changes Jul 24, 2023

View reviewed changes

djhoese added enhancement code enhancements, features, improvements component:readers labels Jul 24, 2023

martin-rdz force-pushed the feature-viirs-fsspec branch from c6fb728 to ed9948f Compare September 12, 2023 11:57

djhoese reviewed Sep 12, 2023

View reviewed changes

satpy/readers/__init__.py Outdated Show resolved Hide resolved

martin-rdz force-pushed the feature-viirs-fsspec branch from 31104fb to e9e5fcb Compare October 3, 2023 09:53

djhoese requested changes Oct 3, 2023

View reviewed changes

djhoese mentioned this pull request Dec 11, 2023

MODIS_L2 reader not working with FSFile #2629

Open

martin-rdz added 2 commits January 4, 2024 13:02

adapt viirs_sdr reader for aws s3

9beba3c

fix issue where open_file_or_filename tries to open pathlib path

130361b

replaced PosixPath and WindowsPath with pathlib.Path

d3fe3fe

following suggestion by @djhoese

djhoese force-pushed the feature-viirs-fsspec branch from e9e5fcb to d3fe3fe Compare January 4, 2024 19:02

djhoese added 3 commits January 4, 2024 14:01

Switch FSFile tests to pytest and fixtures

1c65aaf

Add more details to open_file_or_filename docstring

362c2e9

Add initial open_file_or_filename tests

c5d4f6b

Allow Windows temporary files to fail to be removed

180a0b8

mraspaud approved these changes Jan 9, 2024

View reviewed changes

satpy/tests/test_readers.py Outdated Show resolved Hide resolved

satpy/tests/test_readers.py Outdated Show resolved Hide resolved

satpy/tests/test_readers.py Outdated Show resolved Hide resolved

Remove unnecessary test file removal

64eedbc

Fix contextmanager definition in test_readers.py

529de8f

mraspaud assigned djhoese and martin-rdz Jan 10, 2024

mraspaud merged commit bca22b4 into pytroll:main Jan 10, 2024
17 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fsspec functionality to `viirs_sdr` reader #2534

Add fsspec functionality to `viirs_sdr` reader #2534

martin-rdz commented Jul 23, 2023

djhoese left a comment

djhoese commented Jul 24, 2023

ajelenak commented Sep 4, 2023

mraspaud commented Sep 7, 2023

martin-rdz commented Sep 8, 2023

mraspaud commented Sep 8, 2023

martin-rdz commented Sep 12, 2023

codecov bot commented Sep 12, 2023 •

edited

Loading

djhoese commented Sep 12, 2023

coveralls commented Sep 12, 2023 •

edited

Loading

djhoese left a comment

martin-rdz commented Oct 3, 2023

djhoese commented Oct 4, 2023

djhoese commented Oct 29, 2023

martin-rdz commented Dec 2, 2023

mraspaud commented Dec 4, 2023

djhoese commented Dec 4, 2023

djhoese commented Jan 4, 2024

djhoese commented Jan 4, 2024

djhoese commented Jan 4, 2024

djhoese commented Jan 4, 2024

mraspaud left a comment

djhoese commented Jan 10, 2024

Add fsspec functionality to viirs_sdr reader #2534

Add fsspec functionality to viirs_sdr reader #2534

Conversation

martin-rdz commented Jul 23, 2023

djhoese left a comment

Choose a reason for hiding this comment

djhoese commented Jul 24, 2023

ajelenak commented Sep 4, 2023

mraspaud commented Sep 7, 2023

martin-rdz commented Sep 8, 2023

mraspaud commented Sep 8, 2023

martin-rdz commented Sep 12, 2023

codecov bot commented Sep 12, 2023 • edited Loading

Codecov Report

djhoese commented Sep 12, 2023

coveralls commented Sep 12, 2023 • edited Loading

Pull Request Test Coverage Report for Build 7469856199

Warning: This coverage report may be inaccurate.

💛 - Coveralls

djhoese left a comment

Choose a reason for hiding this comment

martin-rdz commented Oct 3, 2023

djhoese commented Oct 4, 2023

djhoese commented Oct 29, 2023

martin-rdz commented Dec 2, 2023

mraspaud commented Dec 4, 2023

djhoese commented Dec 4, 2023

djhoese commented Jan 4, 2024

djhoese commented Jan 4, 2024

Testing

Other comments

djhoese commented Jan 4, 2024

djhoese commented Jan 4, 2024

mraspaud left a comment

Choose a reason for hiding this comment

djhoese commented Jan 10, 2024

Add fsspec functionality to `viirs_sdr` reader #2534

Add fsspec functionality to `viirs_sdr` reader #2534

codecov bot commented Sep 12, 2023 •

edited

Loading

coveralls commented Sep 12, 2023 •

edited

Loading