-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantile static #32
Quantile static #32
Conversation
I am impressed with the difference in resolution provided by the quantiletransformer. There appear to be some missing imports in transforms.py. Please add import pandas as pd
from bridgescaler import read_scaler at the top of transforms.py. ./build/lib/credit/transforms.py:108:26: F821 undefined name 'pd'
self.scaler_df = pd.read_parquet(scaler_file)
^
./build/lib/credit/transforms.py:109:61: F821 undefined name 'read_scaler'
self.scaler_3ds = self.scaler_df["scaler_3d"].apply(read_scaler)
^
./build/lib/credit/transforms.py:110:[68](https://github.com/NCAR/miles-credit/actions/runs/8393529079/job/22988683029#step:5:69): F821 undefined name 'read_scaler'
self.scaler_surfs = self.scaler_df["scaler_surface"].apply(read_scaler)
^
./build/lib/credit/transforms.py:160:49: F821 undefined name 'pd'
e3d = xr.concat(var_slices, pd.Index(var_levels, name="variable")
^
./build/lib/credit/transforms.py:1[69](https://github.com/NCAR/miles-credit/actions/runs/8393529079/job/22988683029#step:5:70):99: F821 undefined name 'pd'
e_surf = xr.concat([value[v].sel(time=time) for v in self.surface_variables], pd.Index(self.surface_variables, name="variable")
^
./credit/transforms.py:108:26: F821 undefined name 'pd'
self.scaler_df = pd.read_parquet(scaler_file)
^
./credit/transforms.py:109:61: F821 undefined name 'read_scaler'
self.scaler_3ds = self.scaler_df["scaler_3d"].apply(read_scaler)
^
./credit/transforms.py:110:68: F821 undefined name 'read_scaler'
self.scaler_surfs = self.scaler_df["scaler_surface"].apply(read_scaler)
^
./credit/transforms.py:160:49: F821 undefined name 'pd'
e3d = xr.concat(var_slices, pd.Index(var_levels, name="variable")
^
./credit/transforms.py:169:99: F[82](https://github.com/NCAR/miles-credit/actions/runs/8393529079/job/22988683029#step:5:83)1 undefined name 'pd'
e_surf = xr.concat([value[v].sel(time=time) for v in self.surface_variables], pd.Index(self.surface_variables, name="variable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some big speedups could be made if you use the channels_last=False transform functionality that I just added to bridgescaler this week. You shouldn't have to retrain the current scalers just yet.
|
||
def inverse_transform(self, x: torch.Tensor) -> torch.Tensor: | ||
device = x.device | ||
tensor = x[:, :(len(self.variables)*self.levels), :, :] #B, Var, H, W |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would store len(self.variables) as a variable in init somewhere. That should save you some time, especially given how many time it is called here.
transformed_surface_tensor = surface_tensor.clone() | ||
#3dvars | ||
rscal_3d = np.transpose(torch.Tensor.numpy(x[:,:(len(self.variables)*self.levels),:,:].values),(0, 2, 3, 1)) | ||
self.scaler_3d.inverse_transform(rscal_3d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a channels_last flag in the bridgescaler distributed transform and inverse_transform methods that can be set to False to do the transform in channels_first order even if the scaler was trained on channels_last data. You shouldn't have to reshape to channels_last anymore.
trainer.py and predict.py need to be updated to incorporate the new quantile scaler. I think another config flag needs to be added to facilitate this too - should we do this in this PR or another one? |
will as discussed please:
|
static variables not yet in 1deg file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quantile + static vars integrated and tested in train.py. pending speedups to quantile scaler in bridgescaler
@yingkaisha please see the new transforms and integrate into predict.py let me know if you'd like any help. Not sure if inverse transform for quantile exists yet. might need to do that in another PR
See path: '/glade/u/home/wchapman/MLWPS/DataLoader/LSM_static_variables_ERA5_zhght_onedeg.nc' |
This pull request provides access to the quantile transform scaler (via the bridgescaler package), It returns the exact tensors structure as our current scaling, so should cause no pipeline issues. I am optimistic it will help training alot.
Here is the use:
I am requesting two variables added to the data section of the config file, though i don't think it is static (I am happy to adjust once I know what needs to be added). They are currently integrated into crossformer.yml
The adjustment of the quantile scalar is apparent. Here is the difference between the Q at upper levels for a standard scaling (bottom) vs a quantile scaling (top).
Additionally, I have added a 'static variables' option to the to_tensor transform. This will now return a feild 'static' which provides the LandSea mask (scaled 0-1) and the topography (scaled 0-2). See Below:
This addresses #24