Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make NetCDF file cache handling compatible with dask distributed #2822

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Commits on Jun 14, 2024

  1. Configuration menu
    Copy the full SHA
    7f6a8d4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6d31c20 View commit details
    Browse the repository at this point in the history
  3. Start utility function for distributed friendly

    Start work on a utility function to get a dask array from a dataset
    variable in a way that is friendly to dask.distributed.
    gerritholl committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    1e26d1a View commit details
    Browse the repository at this point in the history
  4. Parameterise test and simplify implementation

    For the distributed-friendly dask array helper, parameterise the test
    to cover more cases.  Simplify the implementation.
    gerritholl committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    be40c5b View commit details
    Browse the repository at this point in the history
  5. Force shape and dtype. First working prototype.

    We need to force the shape and the dtype when getting the
    dask-distributed-friendly xarray-dataarray.  Seems to have a first
    working prototype now.
    gerritholl committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    cbd00f0 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. Add group support and speed up tests

    Add group support for getting a dask distributed friendly dask array.
    Speed up the related tests by sharing the dask distributed client setup
    and breakdown.
    gerritholl committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    af4ee66 View commit details
    Browse the repository at this point in the history
  2. Add partial backward-compatibility fol file handle

    Add partial backward compatibility for accessing the file handle
    attribute when using caching with a NetCDF4FileHandler base class.
    Backward incompatibility is not 100%.  Deleting the FileHandler closes
    the manager and therefore the ``file_handle`` property, however, when
    accessing the ``file_handle`` property after deleting the
    ``FileHandler``, it is reopened.  Therefore, calling `__del__()``
    manually and then accessing ``fh.file_handle`` will now return an open file
    (was a closed file).  This should not happen in any sane use scenario.
    gerritholl committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    dad3b14 View commit details
    Browse the repository at this point in the history
  3. Respect auto_maskandscale with new caching

    With the new dask-distributed-friendly caching, make sure we are
    respecting auto_maskandscale and are not applying scale factors twice.
    gerritholl committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    fc58ca4 View commit details
    Browse the repository at this point in the history
  4. Remove needless except block

    Remove a dead code except block that should never be reached.
    gerritholl committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    09c821a View commit details
    Browse the repository at this point in the history
  5. Test refactoring

    Migrate TestNetCDF4FileHandler from unittest.TestCase to a regular
    class.  Use a pytest fixture for the temporary NetCDF file.
    gerritholl committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    4f9c5ed View commit details
    Browse the repository at this point in the history
  6. Broaden test match string for test_filenotfound

    Broaden the string that is matched against in
    TestNetCDF4FileHandler.test_filenotfound.  On Linux and MacOS the
    expected failure gives "No such file or directory".  On Windows it gives
    "Invalid file format".
    gerritholl committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    ec76fa6 View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2024

  1. fix docstring example spelling

    Fix the spelling in the docstring example using netCDF4.
    
    Co-authored-by: David Hoese <[email protected]>
    gerritholl and djhoese committed Jul 24, 2024
    Configuration menu
    Copy the full SHA
    06d8811 View commit details
    Browse the repository at this point in the history
  2. Prevent unexpected type promotion in unit test

    Add a workaround to prevent an unexpected type promotion in the unit
    test for dask distributed friendly dask arrays.
    gerritholl committed Jul 24, 2024
    Configuration menu
    Copy the full SHA
    aaf91b9 View commit details
    Browse the repository at this point in the history
  3. Use block info getting a dd-friendly da

    When getting a dask-distributed friendly dask array from a NetCDF file
    using the CachingFileManager, use the information provided in bloc_info
    on the array location in case we are reading not the entire variable.
    gerritholl committed Jul 24, 2024
    Configuration menu
    Copy the full SHA
    a2ad42f View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2024

  1. Rename to serialisable and remove group argument

    Rename get_distributed_friendly_dask_array to
    get_serialisable_dask_array and remove the group argument, moving the
    responsibility for handlings groups to the caller.
    gerritholl committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    9126bbe View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5e576f9 View commit details
    Browse the repository at this point in the history
  3. GB -> US spelling

    Pytroll uses US spelling.  Rename serializable to serialisable.
    
    Remove removed keyword argument from call.
    gerritholl committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    63e7507 View commit details
    Browse the repository at this point in the history
  4. Ensure meta dtype

    Ensure that the meta we pass to map_blocks also has the right dtype.
    Not sure if this is necessary when map_blocks already has the right
    dtype, but it can't hurt.
    gerritholl committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    ea04595 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'main' into bugfix-2815

    Fixing three merge conflicts.
    gerritholl committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    523671a View commit details
    Browse the repository at this point in the history
  6. Fix spelling in test

    gerritholl committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    fde3896 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2024

  1. Clarify docstring

    gerritholl committed Jul 26, 2024
    Configuration menu
    Copy the full SHA
    5b137e8 View commit details
    Browse the repository at this point in the history
  2. Use cache already in scene creation

    When caching, make sure we use the CachingFileManager already upon scene
    creation and not only by the time we are loading.
    gerritholl committed Jul 26, 2024
    Configuration menu
    Copy the full SHA
    c2b1533 View commit details
    Browse the repository at this point in the history
  3. Use helper function rather than subclass

    Don't subclass netCDF4.Dataset, rather just return an instance from a
    helper function.  Seems good enough and gets rid of the weird error
    messages upon exit.
    gerritholl committed Jul 26, 2024
    Configuration menu
    Copy the full SHA
    9fce5a7 View commit details
    Browse the repository at this point in the history
  4. restore non-cached group retrieval

    Some readers read entire groups; this needs xarray kwargs to be set even
    if caching is used.
    gerritholl committed Jul 26, 2024
    Configuration menu
    Copy the full SHA
    4993b65 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2024

  1. Configuration menu
    Copy the full SHA
    7c173e7 View commit details
    Browse the repository at this point in the history