Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OBS datasets use underscores in the version number #3051

Open
bouweandela opened this issue Feb 23, 2023 · 8 comments · May be fixed by #3840
Open

OBS datasets use underscores in the version number #3051

bouweandela opened this issue Feb 23, 2023 · 8 comments · May be fixed by #3840

Comments

@bouweandela
Copy link
Member

bouweandela commented Feb 23, 2023

There are several datasets from the OBS project, as produced by the esmvaltool data command, that use a version number with underscores in it. The following list was found in recipes:

CLARA-AVHRR version V002_01
CowtanWay version ghcn_short_krig_v2
CowtanWay version ghcn_short_uah_v2
CowtanWay version had4_krig_v1
CowtanWay version had4_krig_v2
CowtanWay version had4_short_krig_v2
CowtanWay version had4_short_uah_v2
CowtanWay version had4sst4_krig_v2
CowtanWay version had4_uah_v1
GPCC version v2018_025
GPCC version v2018_025-numgauge1
GPCC version v2018_05
GPCC version v2018_05-numgauge1
GPCC version v2018_10
GPCC version v2018_10-numgauge1
GPCC version v2018_25
GPCC version v2018_25-numgauge1
LAI3g version 1_regridded
MOBO-DIC_MPIM (underscore in dataset name)

This makes it impossible to extract the facet values from the filename because an underscore is also used as a separator between facets. It would be good to enable reading facets from the filename, but then we first need to fix the version numbers of the OBS datasets.

@valeriupredoi
Copy link
Contributor

so anything but underscores, right? ie this monster of a version version: v20.0e-0.25 would still be fine?

@bouweandela
Copy link
Member Author

bouweandela commented Feb 27, 2023

Unless the data ends up on ESGF at some point, because there a . character in a facet value is problematic because these are used as separators in the dataset_id.

@valeriupredoi
Copy link
Contributor

we should all use £££ instead - don't get too many of those in real life, might be nice to see them virtually 😆

@LisaBock
Copy link
Contributor

This is already solved for the ESACCI-WATERVAPOUR dataset (see PR #3282). While updating the dataset we removed all underscores from the version tag. So you can remove 'ESACCI-WATERVAPOUR' from the list above.

@bouweandela
Copy link
Member Author

bouweandela commented Mar 28, 2024

you can remove 'ESACCI-WATERVAPOUR' from the list above.

Done

@schlunma
Copy link
Contributor

schlunma commented Dec 9, 2024

@axel-lauer, @hb326 and myself just had a discussion about this. @axel-lauer will rename the existing datasets on Levante (_ -> -) and I will open a PR to use the different names in the recipes/CMORizers. Hopefully we can then merge ESMValGroup/ESMValCore#1943 very soon 🤞

@axel-lauer
Copy link
Contributor

I just renamed all of the datasets mentioned above by replacing the affected underscores _ with dashes -.

@schlunma
Copy link
Contributor

schlunma commented Dec 9, 2024

PR open here: #3840

On which other machines do we need to perform this renaming? Jasmin (@valeriupredoi)? Others?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants