Quantile static #32

WillyChap · 2024-03-22T16:42:47Z

This pull request provides access to the quantile transform scaler (via the bridgescaler package), It returns the exact tensors structure as our current scaling, so should cause no pipeline issues. I am optimistic it will help training alot.

Here is the use:

Dataset= ERA5Dataset(filenames=[FNS[nunu]],history_len=history_len,forecast_len=forecast_len,skip_periods=1,transform=transforms.Compose([
            NormalizeState_Quantile(scaler_file=conf["data"]["quant_path"]),
            ToTensor(history_len=history_len, forecast_len=forecast_len,static_variables=conf["data"]["static_vars"]),
        ]))
print(FNS[nunu])
BB_trancs_quant = Dataset.__getitem__(8784)

Dataset= ERA5Dataset(filenames=[FNS[nunu]],history_len=history_len,forecast_len=forecast_len,skip_periods=1,transform=transforms.Compose([
            NormalizeState(conf["data"]["mean_path"], conf["data"]["std_path"]),
            ToTensor(history_len=history_len, forecast_len=forecast_len,static_variables=conf["data"]["static_vars"]),
        ]))
print(FNS[nunu])
BB_trancs_std = Dataset.__getitem__(8784)

I am requesting two variables added to the data section of the config file, though i don't think it is static (I am happy to adjust once I know what needs to be added). They are currently integrated into crossformer.yml

The adjustment of the quantile scalar is apparent. Here is the difference between the Q at upper levels for a standard scaling (bottom) vs a quantile scaling (top).

Additionally, I have added a 'static variables' option to the to_tensor transform. This will now return a feild 'static' which provides the LandSea mask (scaled 0-1) and the topography (scaled 0-2). See Below:

This addresses #24

djgagne · 2024-03-22T16:55:01Z

I am impressed with the difference in resolution provided by the quantiletransformer. There appear to be some missing imports in transforms.py. Please add

import pandas as pd
from bridgescaler import read_scaler

at the top of transforms.py.

./build/lib/credit/transforms.py:108:26: F821 undefined name 'pd'
        self.scaler_df = pd.read_parquet(scaler_file)
                         ^
./build/lib/credit/transforms.py:109:61: F821 undefined name 'read_scaler'
        self.scaler_3ds = self.scaler_df["scaler_3d"].apply(read_scaler)
                                                            ^
./build/lib/credit/transforms.py:110:[68](https://github.com/NCAR/miles-credit/actions/runs/8393529079/job/22988683029#step:5:69): F821 undefined name 'read_scaler'
        self.scaler_surfs = self.scaler_df["scaler_surface"].apply(read_scaler)
                                                                   ^
./build/lib/credit/transforms.py:160:49: F821 undefined name 'pd'
                    e3d = xr.concat(var_slices, pd.Index(var_levels, name="variable")
                                                ^
./build/lib/credit/transforms.py:1[69](https://github.com/NCAR/miles-credit/actions/runs/8393529079/job/22988683029#step:5:70):99: F821 undefined name 'pd'
                    e_surf = xr.concat([value[v].sel(time=time) for v in self.surface_variables], pd.Index(self.surface_variables, name="variable")
                                                                                                  ^
./credit/transforms.py:108:26: F821 undefined name 'pd'
        self.scaler_df = pd.read_parquet(scaler_file)
                         ^
./credit/transforms.py:109:61: F821 undefined name 'read_scaler'
        self.scaler_3ds = self.scaler_df["scaler_3d"].apply(read_scaler)
                                                            ^
./credit/transforms.py:110:68: F821 undefined name 'read_scaler'
        self.scaler_surfs = self.scaler_df["scaler_surface"].apply(read_scaler)
                                                                   ^
./credit/transforms.py:160:49: F821 undefined name 'pd'
                    e3d = xr.concat(var_slices, pd.Index(var_levels, name="variable")
                                                ^
./credit/transforms.py:169:99: F[82](https://github.com/NCAR/miles-credit/actions/runs/8393529079/job/22988683029#step:5:83)1 undefined name 'pd'
                    e_surf = xr.concat([value[v].sel(time=time) for v in self.surface_variables], pd.Index(self.surface_variables, name="variable")

djgagne

I think some big speedups could be made if you use the channels_last=False transform functionality that I just added to bridgescaler this week. You shouldn't have to retrain the current scalers just yet.

djgagne · 2024-03-22T16:57:15Z

credit/transforms.py

+
+    def inverse_transform(self, x: torch.Tensor) -> torch.Tensor:
+        device = x.device
+        tensor = x[:, :(len(self.variables)*self.levels), :, :]  #B, Var, H, W


I would store len(self.variables) as a variable in init somewhere. That should save you some time, especially given how many time it is called here.

djgagne · 2024-03-22T17:00:27Z

credit/transforms.py

+        transformed_surface_tensor = surface_tensor.clone()
+        #3dvars
+        rscal_3d = np.transpose(torch.Tensor.numpy(x[:,:(len(self.variables)*self.levels),:,:].values),(0, 2, 3, 1))
+        self.scaler_3d.inverse_transform(rscal_3d)


There's a channels_last flag in the bridgescaler distributed transform and inverse_transform methods that can be set to False to do the transform in channels_first order even if the scaler was trained on channels_last data. You shouldn't have to reshape to channels_last anymore.

dkimpara · 2024-03-22T17:54:11Z

trainer.py and predict.py need to be updated to incorporate the new quantile scaler. I think another config flag needs to be added to facilitate this too - should we do this in this PR or another one?

dkimpara · 2024-03-22T21:15:39Z

will as discussed please:

remove hardcoded conf variables
have classes take in just conf as arg
add logging to say which scaler, and whether using static vars

dkimpara · 2024-03-26T17:50:32Z

static variables not yet in 1deg file: "/glade/u/home/wchapman/MLWPS/DataLoader/static_variables_ERA5_zhght_onedeg.nc"

dkimpara

quantile + static vars integrated and tested in train.py. pending speedups to quantile scaler in bridgescaler

@yingkaisha please see the new transforms and integrate into predict.py let me know if you'd like any help. Not sure if inverse transform for quantile exists yet. might need to do that in another PR

WillyChap · 2024-03-26T20:22:49Z

static variables not yet in 1deg file: "/glade/u/home/wchapman/MLWPS/DataLoader/static_variables_ERA5_zhght_onedeg.nc"

See path: '/glade/u/home/wchapman/MLWPS/DataLoader/LSM_static_variables_ERA5_zhght_onedeg.nc'

WillyChap added 2 commits March 22, 2024 10:24

quantile scaler and static vars

d59b122

quantile scaler and static vars

4a31e9c

WillyChap assigned djgagne and jsschreck Mar 22, 2024

djgagne requested changes Mar 22, 2024

View reviewed changes

small bug

731e988

WillyChap and others added 3 commits March 22, 2024 14:11

integrated quantile transforms into train.py

51d51e6

add to setup env

b26aef3

Shut up dhamma ill mess with the lock file

f9f4933

WillyChap mentioned this pull request Mar 25, 2024

add terrain static layer into models #13

Closed

dkimpara and others added 3 commits March 26, 2024 11:23

small fix to logger call

0eca300

make config file drive every transform

b418169

make config file drive every transform

0b1e752

WillyChap added 2 commits March 26, 2024 13:43

fix small bug

5d7efa1

fix small bug

4b2291e

dkimpara approved these changes Mar 26, 2024

View reviewed changes

jsschreck approved these changes Mar 26, 2024

View reviewed changes

jsschreck merged commit 60429e2 into main Mar 26, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantile static #32

Quantile static #32

WillyChap commented Mar 22, 2024

djgagne commented Mar 22, 2024

djgagne left a comment

djgagne Mar 22, 2024

djgagne Mar 22, 2024

dkimpara commented Mar 22, 2024

dkimpara commented Mar 22, 2024 •

edited

Loading

dkimpara commented Mar 26, 2024

dkimpara left a comment

WillyChap commented Mar 26, 2024

Quantile static #32

Quantile static #32

Conversation

WillyChap commented Mar 22, 2024

djgagne commented Mar 22, 2024

djgagne left a comment

Choose a reason for hiding this comment

djgagne Mar 22, 2024

Choose a reason for hiding this comment

djgagne Mar 22, 2024

Choose a reason for hiding this comment

dkimpara commented Mar 22, 2024

dkimpara commented Mar 22, 2024 • edited Loading

dkimpara commented Mar 26, 2024

dkimpara left a comment

Choose a reason for hiding this comment

WillyChap commented Mar 26, 2024

dkimpara commented Mar 22, 2024 •

edited

Loading