Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MBCn (Npdf_transform and related functions) #1551

Open
6 of 14 tasks
coxipi opened this issue Dec 7, 2023 · 0 comments
Open
6 of 14 tasks

Improve MBCn (Npdf_transform and related functions) #1551

coxipi opened this issue Dec 7, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@coxipi
Copy link
Contributor

coxipi commented Dec 7, 2023

Addressing a Problem?

Currently, MBCn is not fast enough to be realistically used on our servers for

  • an ensemble of ~96 simulations
  • three variables pr, tasmin, tasmax
  • ~40000 spatial points over 30 years grouped as dayofyear-31.
    This may be performance issues that could be addressed with better algorithm, or slowdowns because dask is overwhelmed, in which this is more related to how we organize computations and tasks.

Potential Solution

This is a list of ideas stemming from the xclim hackathon on MBCn:

  • Better algorithms

    • Faster interp method "linear" interpolation can be 5x faster than "nearest" in some test cases, it would be good to confirm we can also get improvements in the case of interest
    • Use implementations like xclim.core.utils.nan_calc_percentiles? Add numba in the mix
  • Better organization, less redundancy in operations Reduce the need to recompute many times the same information

    • Simulation, Moving window, horizons Currently, Npdf_transform only accepts ref, hist, sim which are 30 year datasets with the same dimensions. If we want to compute another horizon, say sim2, we would need to re-run everything. One workaround would be to let the Npdf_transform know that sim has a dimension time which is a say 30 year-span, and another dimension say horizon which are different futur periods of 30 years, e.g. horizons = ["1981-2010", "2011-2040", ...]. This kind of dataset can be obtained with construct_moving_yearly_window
    • Separate training and adjustment in Npdf_transform Another way to approach this is to make the Npdf_transform more modular, and separate the training and adjusting. We would need to keep in note the rotations that are used and the adjustment factors in each rotation
    • Pre-compute adjustment factors on a given rank to perform interpolation only onceThis could go even farther. Currently, interpolation over quantiles and adjustment factor, {q, af_q}, yields a scipy function f(q) which is applied to the ranks of the simulation to perform the QDM adjustment, f(r). But, we re-do the interpolation to find f each time we want to adjust a given simulation. In reality, this information about interpolation could be obtained once, then re-used every time. Since we want to correct simulation with the same number of years (and more exactly, we want the same number of time points as the reference dataset, the number of ranks on which f must be applied. We could pre-determine the values of f(r) in this case. UPDATE: Unfortunately, because of how ties in ranks are treated (if say the 3 smallest values are equal, their ranks is 2,2,2, and not 1,2,3), this idea does not work.

    Mostly done, but not as clean as possible. I would like some validation before. I would say this achieves what we wanted. There is a fast_npdf_train and fast_npdf_adjust in npdf_np_modular_interp. The training gives adjustment factors that can needs to be reused in the adjusting part. Also, the adjusting part can receive a simulation dataset with movingwin to treat multiple dimensions at the same time. This could get its own clean class if we judge it's worth it? A lot of heavy-lifting is done in pure numpy, so it's messy. Ideally, it could be nice to emulate the performance with a more xclim-y approach, so maybe we don't want to promote the ugly numpy hacks to real xclim implementations.

  • Rotations

    • Continue to explore the choice of optimal rotations and the convergence of the Npdf-transform
    • Use PCA instead?
  • Understanding better what's going on

    • Study dask graphs with and without Npdf_transform (in the case without, just replace with a dummy_Npdf_transform that does nothing)

This is less on the side of xclim, but:
Use month instead of dayofyear can be a "reduce expectations" solution.

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@coxipi coxipi added the enhancement New feature or request label Dec 7, 2023
@coxipi coxipi self-assigned this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant