Usage on bias correcting extremes #1512

mubali101 · 2023-10-26T12:48:08Z

Setup Information

Xclim version: 0.45.0
Python version: 3.10.9
Operating System: Linux

Context

Edit: I'm trying to understand the usage of xclim.sdba.adjustment.ExtremeValues by using the tasmax from the tutorial dataset.

Does cluster_thresh means the q_thresh percentile is calculated on the values exceeding the cluster_thresh? And how to set different cluster_thresh for different locations?
How should I modify the usage if I want to calibrate coldest Temperatures instead?
I also saw that you have referenced Julia ClimateTools. Is the xclim implementation also automatically combining emperical quantile mapping and GPD-based corrections similar to Julia ClimateTools ?

import xarray as xr
import xclim
from xclim import sdba
from xclim.core.units import convert_units_to
from xclim.testing import open_dataset

# open tutorial temperature data
dref = open_dataset(
    "sdba/ahccd_1950-2013.nc", chunks={"location": 1}, 
    drop_variables=["lat", "lon"]).sel(time=slice("1980", "2010"))

dref = dref.drop_vars('pr')
dref = dref.assign(
    tasmax=convert_units_to(dref.tasmax, "K"),
)
dsim = open_dataset(
    "sdba/CanESM2_1950-2100.nc", chunks={"location": 1}, drop_variables=["lat", "lon", "pr"]
)

dhist = dsim.sel(time=slice("1981", "2010"))

# train GPD-based correction
ev = sdba.adjustment.ExtremeValues.train(ref=dref.tasmax, hist=dhist.tasmax, cluster_thresh='295 K', q_thresh=0.95)

# adjust
hist_bc = ev.adjust(scen=dref.tasmax, sim=dhist.tasmax)

Thank you!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

aulemahal · 2023-10-27T16:26:34Z

Hi @mubali101 !

`cluster_thresh`

Yes, only values above cluster_thresh are used in the quantile computation with q_thresh.

Sadly, no, there is no way to specify a spatially-variant cluster_thresh. I think this would be possible to implement in xclim, it looks like a task of moderate complexity. We will be happy to guide and help you if you'd like to try it.

The alternative for the moment would be to loop over your points...

Lower extremes

Sounds dumb, but the ">=" is hardcoded everywhere. So again, there's no elegant way to calibrate low extremes.

Here's a funky alternative for your example: inverse the data before the correction and inverse the result.

# train GPD-based correction
ev = sdba.adjustment.ExtremeValues.train(ref= -dref.tasmax, hist= -dhist.tasmax, cluster_thresh='-265 K', q_thresh=0.95)

# adjust
hist_bc = -ev.adjust(scen=-dhist.tasmax, sim=-dhist.tasmax)

xclim doesnt care if it receives negative Kelvins. But make sure all your inputs (and cluster_thresh) have the same units (K) before doing that!

Combination

Yes, but you have to perform the EQM yourself before. Here's a modified version of your code:

import xarray as xr
import xclim
from xclim import sdba
from xclim.core.units import convert_units_to
from xclim.testing import open_dataset

# open tutorial temperature data
dref = open_dataset(
    "sdba/ahccd_1950-2013.nc", chunks={"location": 1}, 
    drop_variables=["lat", "lon", "pr"]
).sel(time=slice("1981", "2010"))

dref = dref.assign(
    tasmax=convert_units_to(dref.tasmax, "K"),
)
dsim = open_dataset(
    "sdba/CanESM2_1950-2100.nc", chunks={"location": 1}, drop_variables=["lat", "lon", "pr"]
)

dhist = dsim.sel(time=slice("1981", "2010"))

# Initial EQM
EQM = sdba.adjustment.EmpiricalQuantileMapping,.train(
    dref.tasmax, dhist.tasmax, nquantiles=50, kind='+', group='time'
)
hist_bc1 = EQM.adjust(dsim.tasmax)

# train GPD-based correction
ev = sdba.adjustment.ExtremeValues.train(
    ref=dref.tasmax, hist=dhist.tasmax, cluster_thresh='295 K', q_thresh=0.95
)

# adjust
hist_bc2 = ev.adjust(scen=hist_bc1, sim=dhist.tasmax)

In your example, you were passing dref.tasmax to the scen argument of ev.adjust. This is incorrect : scen is the first-order bias-adjusted data and should have the exact same shape as sim. In this example, I perform a simple EQM as the first-order adjustment.

The power and frac arguments of adjust control the way the two timeseries (first-order and second-order) are merged, in a similar way as the julia code.

Hope that helps!

RondeauG · 2023-10-27T17:07:07Z

Hi! I'm one of the co-authors of this method. @aulemahal already did a good job at answering your questions. As he mentioned, the version of the code here in xclim only performs the correction of extreme values, so you'll need to call another bias ajustment method such as DQM or EQM prior to calling ExtremeValues.

A few relevant details:

This method was primarily designed for precipitation or other similar variables that have a strong distribution tail. We never tested it for temperatures, so you'll likely have to do additional verifications on your side to make sure that the statistical fit is adequate. This is also why >= was hard-coded, although the trick mentioned by @aulemahal should do the job.
Linked to the point above, the role of cluster_thresh was to eliminate drizzle before computing the 95th percentile (q_thresh), using cluster_thresh='1 mm d-1'. So in that instance, no spatial variation was warranted. I'm not sure what cluster_thresh should be for temperature... Maybe just 0°K, in order to include all data? Here too, you'll have to explore to find the better value for your variables.

EDIT: Another potential issue is that a multiplicative adjustment has been hard-coded, since it was designed for precipitation. This is probably not adequate for temperatures (we usually use additive correction factors for those). Here too, you'll probably want to look at what the function is doing to make sure the results are realistic.

mubali101 · 2023-10-27T17:27:20Z

Hi @RondeauG @aulemahal, Thanks for your explanations, really appreciate it.

Flipping the data was also what I had in mind, good to have a confirmation on that.
cluster_thresh naming confused me with peak-over-threshold used in GPD to identify clusters. Maybe additional context can be added in the function help.
Thanks for highlighting the point on multiplicative adjustment. I can try to dig once more in the source code but I struggled last time because there are lot of nested function calls making it complex to understand for someone unfamiliar with the project. Do you have a separate channel if there are specific questions on the source code?

aulemahal · 2023-10-27T17:46:23Z

Indeed, after many optimization attemps, the code is now a bit complex and hard to follow...

In theory we have a gitter channel : https://app.gitter.im/#/room/#Ouranosinc_xclim:gitter.im. It's not very dynamic, but I think the core team will receive notifications.

RondeauG · 2023-10-27T17:53:51Z

Diving into the code a bit more, cluster_thresh as we have written it actually plays a double role that only really works for precipitation.

It is used once to subset data that is greater than the value (aka, remove drizzle with 1 mm d-1), but you are right in that it is also reused later when determining the clusters for the peak-over-threshold method, which is where the variable gets its name from. This is fine for precipitation, since it will drop back to 0 after an event, but this will not easily work for temperature. For it to work correctly for your use case, we'd need to separate it in 2.

mubali101 · 2023-11-09T12:05:42Z

@RondeauG @aulemahal an update on this:

tried with negative T, and I started getting some error messages about NANs.
I switched to pyextremes for my use case. It is quite handy for EV-analysis, specially for checking sensitivity of shape, scale parameters, etc., and can fit GPD, GEV on both -- values exceeding or below a threshold.
However, it only works with pandas series. Maybe xclim can make use of it for extending GPD-based bias calibration.
The adjustment factors that you have implemented using Roy et al. 2023 seems quite clever. Are there any literature that discuss adjusting Temperature in a similar fashion? I found papers mostly on adjusting precipitation.
I still need to test if the exact adjustment as you have used could work for my case. But it will need some more digging into the source code from my side.

RondeauG · 2023-11-15T14:37:52Z

Hi! I can't really comment on pyextremes or the feasibility of bringing it into xclim, but P.Roy tried to use GPD-based bias adjustment on temperature in an earlier version of our method. From what I can recall, he ended up putting it aside since the GPDs would sometimes create unphysical results (such as T° > 40°C in places where we know that such values can't happen). This could have been because of bugs or other issues that have been resolved since, but temperatures tend to follow normal distributions anyway and are much easier to bias correct using quantile mapping. I'd guess that this last part is why most papers that use extreme value analysis for bias correction also mainly work with precipitation.

aulemahal · 2023-11-15T14:56:50Z

I didn't know about pyextremes! It seems well coded and even it is pandas-based, wrapping it in sdba could be feasible. We would maybe lose some performance (mainly from the wrapping overhead and in comparison with our usage of numba), but we might gain more options. This would however necessitate work and the core team might not have time for it. We would be happy to help with a contributed PR!

I think it would mainly consist in wrapping the numpy functions of pyextremes with xr.apply_ufunc calls, all from within a map_groups/blocks of xclim.sdba.

mubali101 · 2023-11-22T15:45:52Z

@aulemahal @RondeauG many thanks for your replies.

I tested pyextremes a bit more. Unfortunately a big drawback is that it only works with pandas datetime and one can't convert from cftime to pandas datetime when number of years exceed what pandas currently support.

On EQ mapping for temperature calibration, in my use case of calibrating colder extremes, the tail data calibration is quite important. EQ mapping seem to work fine for the tail using the following code:

# using 10K quantiles
QM = sdba.EmpiricalQuantileMapping.train(
ref=era5_data, hist=model_data, nquantiles=10000, group="time", kind="+"
)

# interp="cubic" doesn't differ for tails
QM.adjust(model_data, extrapolation="constant", interp="linear")
qm_model_adjusted = QM.adjust(model_data, extrapolation="constant", interp="linear")

But it is not clear to me how exactly it is calibrating the values that are more extreme than the reference data. Probably similar to this issue. Does extrapolation="constant" controls the tail extrapolation? The document points to xclim.sdba.utils.extrapolate_qm() but I can't find it in the utils.py file?

I have added a figure of Exceedance Probability plot from the calibrated data with different nquantile values used (100, 10K, 500K) with more extreme values on lower left, the orange curve being the reference data (ERA-5). It shows that tail calibration is quite decent as long as nquantiles is not too high, else the extremes will start clipping.

aulemahal · 2023-11-22T16:04:23Z

Do you have access to pandas >= 2 ? Because, with this version you may be able to have datetimes exceeding the usual 1642-2260 limitations. I don't recall the actual conversion method, but it has to do with the "unit" property of the Timestamp object. When set to 's', the time is stored as an int64 of seconds elapsed since 1970-01-01, which allows years after 2260. Xarray might not support having a datetime axis of that type, but a DataFrame should be usable. It still doesn't support non-standard calendars though.

Indeed, the documentation needs an update. The extrapolation was moved to the factor interpolation function itself. In your case (no special grouping), that's here :

xclim/xclim/sdba/utils.py

Lines 323 to 327 in 473a3e0

    
           if extrap == "constant": 
        
               fill_value = ( 
        
                   oldy[~np.isnan(oldy)][0], 
        
                   oldy[~np.isnan(oldy)][-1], 
        
               )

With "constant", it will simply apply the nearest (last or first) adjustment factor.

With this in mind, I am not surprised that you get stranger results as you increase the number of quantiles. When there are "too many" quantiles, the factor are not a smooth-ish function anymore but they get very noisy. Then the last factor can be quite unstable.

mubali101 added the support Questions and help for users/developers label Oct 26, 2023

aulemahal added the sdba Issues concerning the sdba submodule. label Jan 10, 2024

Ouranosinc locked and limited conversation to collaborators Jan 10, 2024

aulemahal converted this issue into discussion #1586 Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Usage on bias correcting extremes #1512

Usage on bias correcting extremes #1512

mubali101 commented Oct 26, 2023 •

edited by aulemahal

Loading

aulemahal commented Oct 27, 2023

RondeauG commented Oct 27, 2023 •

edited

Loading

mubali101 commented Oct 27, 2023 •

edited

Loading

aulemahal commented Oct 27, 2023

RondeauG commented Oct 27, 2023 •

edited

Loading

mubali101 commented Nov 9, 2023 •

edited

Loading

RondeauG commented Nov 15, 2023

aulemahal commented Nov 15, 2023

mubali101 commented Nov 22, 2023 •

edited

Loading

aulemahal commented Nov 22, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Usage on bias correcting extremes #1512

Usage on bias correcting extremes #1512

Comments

mubali101 commented Oct 26, 2023 • edited by aulemahal Loading

Setup Information

Context

Code of Conduct

aulemahal commented Oct 27, 2023

cluster_thresh

Lower extremes

Combination

RondeauG commented Oct 27, 2023 • edited Loading

mubali101 commented Oct 27, 2023 • edited Loading

aulemahal commented Oct 27, 2023

RondeauG commented Oct 27, 2023 • edited Loading

mubali101 commented Nov 9, 2023 • edited Loading

RondeauG commented Nov 15, 2023

aulemahal commented Nov 15, 2023

mubali101 commented Nov 22, 2023 • edited Loading

aulemahal commented Nov 22, 2023 • edited Loading

This issue was moved to a discussion.

mubali101 commented Oct 26, 2023 •

edited by aulemahal

Loading

`cluster_thresh`

RondeauG commented Oct 27, 2023 •

edited

Loading

mubali101 commented Oct 27, 2023 •

edited

Loading

RondeauG commented Oct 27, 2023 •

edited

Loading

mubali101 commented Nov 9, 2023 •

edited

Loading

mubali101 commented Nov 22, 2023 •

edited

Loading

aulemahal commented Nov 22, 2023 •

edited

Loading