Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: variable scaling, pressure level scalings only applied in specific circumstances #52

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from

Conversation

sahahner
Copy link
Member

@sahahner sahahner commented Dec 27, 2024

Solve the problem explained in issue #7 by refactoring the variable scalings into a general variable scaling and a pressure level scaling.
@mc4117 , @pinnstorm and me came up with a new structure. This PR implements this.

This is first draft. Feedback very welcome!

  • allow several variable level scaling (i.e. pressure level and model level)
  • implement/update tests
  • decide: do we want to allow scaling by variable_ref and variable_name, i.e. scale q_50 by q and q_50?
  • get variable level and name from dataset metadata

@sahahner sahahner linked an issue Dec 27, 2024 that may be closed by this pull request
2 tasks
@b8raoult
Copy link
Collaborator

Please consider using the knowledge about variables that come from the dataset metadata. See https://github.com/ecmwf/anemoi-transform/blob/7cbf5f3d4baa37453022a5a97e17cc71a5b8ceeb/src/anemoi/transform/variables/__init__.py#L47

@sahahner sahahner linked an issue Dec 30, 2024 that may be closed by this pull request
@sahahner
Copy link
Member Author

Please consider using the knowledge about variables that come from the dataset metadata. See https://github.com/ecmwf/anemoi-transform/blob/7cbf5f3d4baa37453022a5a97e17cc71a5b8ceeb/src/anemoi/transform/variables/__init__.py#L47

We have given this some thought, and after wanting to use the information from the dataset in the beginning, I have opted for allowing the definition of our own groups here to use different scaling for self-defined groups.
Also, I was also told that it is possible to build datasets without information about the variable types and therefore not to rely on that metadata.
If you have strong opinions on this I am happy to discuss it again.

@sahahner sahahner changed the title pressure level scalings only applied in specific circumstances refactor variable scaling, pressure level scalings only applied in specific circumstances Jan 2, 2025
@FussyDuck
Copy link

FussyDuck commented Jan 2, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ sahahner
✅ mc4117
❌ pinnstorm
You have signed the CLA already but the status is still pending? Let us recheck it.

@HCookie HCookie self-requested a review January 6, 2025 14:36
@JPXKQX
Copy link
Member

JPXKQX commented Jan 8, 2025

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

@mc4117
Copy link
Member

mc4117 commented Jan 9, 2025

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

Seems like a good idea! Would you like to add this in this PR?

@JPXKQX
Copy link
Member

JPXKQX commented Jan 9, 2025

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

Seems like a good idea! Would you like to add this in this PR?

I’m not sure what the best approach is. On the one hand, adding more work to this PR would increase its complexity, which might make it more logical to address this refactor in a future PR. On the other hand, this PR already introduces some changes to the configs, and the future PRs would also involve changes to the configs. From this, it might be better to have 1 PR and communicate all the changes to users at once. What do you think?

@pinnstorm
Copy link
Member

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

Seems like a good idea! Would you like to add this in this PR?

I’m not sure what the best approach is. On the one hand, adding more work to this PR would increase its complexity, which might make it more logical to address this refactor in a future PR. On the other hand, this PR already introduces some changes to the configs, and the future PRs would also involve changes to the configs. From this, it might be better to have 1 PR and communicate all the changes to users at once. What do you think?

I'm happy for it to be included in this PR! Not sure if @sahahner or @mc4117 have other views?

@mc4117 mc4117 marked this pull request as ready for review January 20, 2025 14:44
@mc4117
Copy link
Member

mc4117 commented Jan 21, 2025

@jakob-schloer @HCookie could you review this please?

@HCookie HCookie changed the title refactor variable scaling, pressure level scalings only applied in specific circumstances fix!: variable scaling, pressure level scalings only applied in specific circumstances Jan 22, 2025
Comment on lines +111 to +125
if isinstance(config_container, list):
scalar = [
(
instantiate(
scalar_config,
scaling_config=config.training.variable_loss_scaling,
data_indices=data_indices,
statistics=statistics,
statistics_tendencies=statistics_tendencies,
)
if scalar_config["name"] == "tendency"
else instantiate(
scalar_config,
scaling_config=config.training.variable_loss_scaling,
data_indices=data_indices,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the tendency scalar is a subclass of the same generic Scalar base class this overly specific if should be removed. My suggestion would be to add to the statistics, & tendencies to the base class and leave the usage up to the implementation.

Comment on lines +18 to +20
if TYPE_CHECKING:
from omegaconf import DictConfig
from anemoi.models.data_indices.collection import IndexCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will raise an issue once the CI is back, this block should be after any other imports.

Comment on lines +29 to +38
class BaseVariableLossScaler(ABC):
"""Configurable method converting variable to loss scaling."""

def __init__(
self,
scaling_config: DictConfig,
data_indices: IndexCollection,
metadata_variables: dict | None = None,
) -> None:
"""Initialise Scaler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it make sense to further 'genericsize' this class? BaseLossScalar?

Comment on lines +113 to +115
variable_groups:
default: sfc
pl: [q, t, u, v, w, z]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on the use of this would be useful here

self.default_group = self.scaling_config.variable_groups.default
self.metadata_variables = metadata_variables

self.ExtractVariableGroupAndLevel = ExtractVariableGroupAndLevel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable should not be camel case, and instead snake_case

)

@abstractmethod
def get_variable_scaling(self) -> np.ndarray: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick documentation here on what this method is expected to return would be good

Comment on lines +95 to +96
class GeneralVariableLossScaler(BaseVariableLossScaler):
"""General scaling of variables to loss scaling."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is hard to read, maybe I'm just missing the point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Under Review
Development

Successfully merging this pull request may close these issues.

Pressure Level Scalings only applied in specific circumstances Loss scalings
8 participants