Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd behaviour when chunks are used along quantiles and time grouper dimensions in SDBA EQM. #1481

Closed
1 task done
isaachcamp opened this issue Sep 26, 2023 · 2 comments · Fixed by #1482
Closed
1 task done

Comments

@isaachcamp
Copy link

isaachcamp commented Sep 26, 2023

Generic Issue

  • xclim version: 0.45.0
  • xarray version: 2023.8.0
  • Python version: 3.11.5
  • Operating System: CentOS Linux release 7.3.1611

Description

I'm attempting to bias correct precipitation data using the EmpiricalQuantileMapping class from the sdba module. The code runs with no problems, however, I noticed there was some strange results. When plotting the long-term monthly accumulated mean precipitation of adjusted data over a few grid cells, the corrected series looks considerably different to the reference data still -- in some cases worse than the raw data.

The transfer function was applied directly back on to the same years that were used in the training stage, so I would expect the adjusted data to look almost identical to the reference data (with some a smoothing from the monthly grouper and 3-month window).

I believe the cause stems from loading the QM.ds from a ZARR file and leaving the chunks along the quantiles and month dimensions unchanged. If I remove all chunks, or leave chunks along only latitude and longitude dimensions, I get the expected behaviour.

I've attached an image of the long-term monthly accumulated mean precipitation for a few grid cells. The yellow and purple lines (chunks along lat-lon and no chunks respectively) perfectly match, and they closely follow the reference values (VCSN, red line). The blue line shows the adjusted values with chunks along quantiles and month dimensions, and the green line shows the raw values.

xclim_issue_with_chunks

I can recreate this issue by taking two time series as ref and hist, with the following lines:

group = sdba.Grouper("time.month", window=3)
QM = sdba.EmpiricalQuantileMapping.train(
            ref,
            hist,
            nquantiles=50,
            group=group,
            kind="*"
        )

adj_nochunks = QM.adjust(hist])

QM.ds = QM.ds.chunk({"month": 3, "quantiles": 10})
adj_chunks = QM.adjust(hist)

ref.groupby('time.month').mean().plot(label="ref")
adj_nochunks.groupby('time.month').mean().plot(label="nochunks")
adj_chunks.groupby('time.month').mean().compute().plot(label="chunks")
import matplotlib.pyplot as plt; plt.legend()

I get this plot:

xclim_one_cell_chunks

Code of Conduct

  • I agree to follow this project's Code of Conduct
@aulemahal
Copy link
Collaborator

Hi @isaachcamp !

Thank you for raising this issue, this slipped through the testing... Sadly, the way sdba is implemented, chunking along month and quantiles is not supported. The bug here is that it didn't raise an error. The adjust method will interpolate along those dimensions, so they need to be complete in the call. As we use xarray's map_blocks to parallelize the method on chunks, those chunks must not split the "reduced" or "interpolated" dimensions.

map_blocks is used in order to minimize the complexity of the dask graph by uniting all the steps in a single task. While this might affect performance negatively for small datasets, the gain for large ones is significant. (At least it was 2 years ago). However, it does limit the possibilities for chunking. The time coordinate must be complete for train and so do the distribution dimensions (quantiles and the grouping dim ('month', 'dayofyear', etc)).

I'll make PR that adds an error with a meaningful message.

@isaachcamp
Copy link
Author

That's great, thanks @aulemahal!

aulemahal added a commit that referenced this issue Sep 28, 2023
<!--Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [x] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #1481
- [x] Tests for the changes have been added (for bug fixes / features)
- [ ] (If applicable) Documentation has been added / updated (for bug
fixes / features)
- [x] CHANGES.rst has been updated (with summary of main changes)
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added

### What kind of change does this PR introduce?
If a `map_blocks`-wrapped function receives input chunked along the
dimensions marked as "reduced", it raises an error.

This means one cannot chunk the training dataset (`QM.ds`, for example)
along the distribution ('quantiles') or group ('month', 'dayofyear')
dimensions.

Previously, this didn't raise an error, but the adjustment would be done
separately for each chunk, yielding incorrect results, silently.

### Does this PR introduce a breaking change?
No.

### Other information:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants