Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block-mapped resample with the help of flox #1848

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Jul 19, 2024

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes #xyz
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGELOG.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

Implements resample_map. This function is meant for all da.resample(...).map(...) calls. These, flox cannot improve automatically so we use some flox logic to help. The idea is to map the resample-map construct on each block in parallel. This is possible by first rechunking the array so that chunks boundary fit with resampling period boundaries (this is a flox function).

The main improvement should come from the fact that map_blocks hides much of the complexity to dask, so the resulting graph is much lighter. I still have to better test the performance of this. My goal would be to have some short text in xclim's doc that highlights when the option is useful and when it is not. The option is activated through set_options.

The current function works only when the input object is of the same type as the output one. So some functions couldn't be wrapped with this yet. The most important untouched code for the moment is the missing checks where I think this could help a lot.

Does this PR introduce a breaking change?

It should not. This is completely optional.

Other information:

In progress, I still need to prove the performance boost.

This depends on #1845 because I need all improvements for PC.

@github-actions github-actions bot added the indicators Climate indices and indicators label Jul 19, 2024
@aulemahal aulemahal changed the title Resample map Block-mapped resample-map with the help of flox Jul 19, 2024
pyproject.toml Outdated Show resolved Hide resolved
Base automatically changed from generic-season to main August 1, 2024 20:53
@aulemahal aulemahal changed the title Block-mapped resample-map with the help of flox Block-mapped resample with the help of flox Sep 6, 2024
@aulemahal aulemahal marked this pull request as ready for review September 6, 2024 18:09
Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm far from an expert here, but the implementation looks good!


Returns
-------
clean_obj :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any?

xclim/indices/_threshold.py Outdated Show resolved Hide resolved
xclim/indices/helpers.py Outdated Show resolved Hide resolved
xclim/indices/helpers.py Outdated Show resolved Hide resolved
xclim/indices/helpers.py Outdated Show resolved Hide resolved
Comment on lines +32 to +35
try:
import flox
except ImportError:
flox = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better if we were adding flox to the tox configuration, then parametrizing the tests to use flox if it's there? I believe we have one build that tests the extras recipe already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but my current "testing" only consist in hijacking an existing test and enabling the new option in a certain case. A importorskip('flox') would be much better, but I'll do that on a test specific to resample_map. To come in the next working days...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that sounds fine to me! Just keep it in mind.

@github-actions github-actions bot added the approved Approved for additional tests label Sep 6, 2024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an alternative suggestion here (and would save me from opening a PR):

  1. Add flox (with a baseline pin version) to the extras recipe.
  2. Add flox and pot >=0.9.4 to the environment.yml.
  3. Add fastnanquantile >=0.0.2 to the environment.yml section for pip dependencies.
  4. Remove flox from the DEP001 list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests indicators Climate indices and indicators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants