Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new conversions #200

Closed
wants to merge 2 commits into from
Closed

new conversions #200

wants to merge 2 commits into from

Conversation

juliettelavoie
Copy link
Contributor

@juliettelavoie juliettelavoie commented May 12, 2023

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes #xyz
  • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • This PR does not seem to break the templates.
  • HISTORY.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • Add conversion to get tasmax and tasmin from dtr and tas.

Does this PR introduce a breaking change?

I don't think it does, but I changed the name of the conversion tasmax/min_from_dtr to make more a difference between the 4 functions.

Other information:

This is necessary because the EMDNA dataset only gives dtr and tas.

@RondeauG
Copy link
Collaborator

Have you tested it with both use cases? Can you get tasmax with both tasmin/DTR and tas/DTR?

I have the vague memory that the same output variable cannot be there twice in conversions.yml. As in, I had to make a choice between relative_humidity_from_dewpoint and relative_humidity, because both wouldn't work at the same time.

@juliettelavoie
Copy link
Contributor Author

Oh! I did not think about this!
So, if you want tasmin and a dataset has both tas, dtr and tasmin. search_data_catalogue gives you 2 lines with tas and dtr as it is listed last in the yml. (It gives tasmax if we switch the order). Which one should we prioritize ? I think tasmax as it doesn't assume a symmetrical distribution.

@RondeauG
Copy link
Collaborator

I think that DTR & tas would be the most common combination, although I see that this isn't what we had previously.

My suggestion would be to remove tasmax_from_dtr_and_tasmin and tasmax_from_dtr_and_tasmax from conversions.yml, but keep everything in conversions.py. A user that has 'tasmin' and 'dtr' would have to provide his own yaml, but could call upon the functions in conversions.py.

@juliettelavoie
Copy link
Contributor Author

juliettelavoie commented May 16, 2023

tasmin_from_dtr_and_tasmax is what we use for ESPO and info-crue-cmip6 to get tasmin back after adjusting dtr, so it is more common for now. I think this is probably our new norm, so it would be a bit annoying that xscen couldn't handle it without an extra file...

Just to be clear, it never fails (whether you have tas, tasmax or both). And if you have both, it would give the same answer regardless of the function used, so we don't need to get rid of one.

EDIT: They don't necesseraly give the same thing all the time, if tas != 0.5 *(tasmax+tasmin) then tasmin_from_dtr_and_tas doesn't work and in that case we should use tasmin_from_dtr_and_tasmax (the default now). But in general, we accept the assumption tas = 0.5 *(tasmax+tasmin), especially if tas is the only data available.

@juliettelavoie
Copy link
Contributor Author

Update: Gabriel a raison. It doesn't work. the first function declared can't be used.

@RondeauG
Copy link
Collaborator

For reference, see the conversation in #88 .

@juliettelavoie
Copy link
Contributor Author

So, derived variables are the keys to the Registry, which means we can't have 2 definitions of 1 variable.

Conclusion: workflow that need tasmin_from_tasmax in clean_up will need to call their own yaml. It shouldn't be needed in "normal" use of search_data_catalogs.

À moins que @aulemahal ait une idée de génie ?

@juliettelavoie juliettelavoie mentioned this pull request May 18, 2023
5 tasks
@aulemahal
Copy link
Collaborator

Malheureusement, je n'ai pas d'idée de génie. Ça demanderait une PR majeure dans intake-esm pour changer le registre de variables dérivées en autre chose qu'un simple dictionnaire.

@juliettelavoie
Copy link
Contributor Author

Le problème va être régler en créant tasmax et tasmin dans miranda. Cette PR deviendra inutile et on pourra garder seulement tasmax_from_dtr_and_tasmin dans xscen.

@RondeauG
Copy link
Collaborator

RondeauG commented May 23, 2023

Ça reste potentiellement un enjeu, car ça serait bien si on pouvait couvrir quelques combinaisons possibles au lieu de juste 1 seule. Je pourrais voir une PR compliquée où on modifie search_data_catalogs pour faire:

  1. Recherche avec les DerivedVariable par défaut (i.e. hurs_from_dewpoint, tasmin_from_dtr_and_tasmax)
  2. Recherches 2 à X avec les combinaisons supplémentaires (hurs, tasmin_from_dtr_and_tas) --> Répéter autant de fois qu'il existe de combinaisons possibles.
  3. Je ne crois pas qu'on puisse faire un pd.concat() des résultats, car il faut que les DerivedVariable soient gardés en mémoire dans le catalogue intake. À voir aussi comment on gèrerait une variable qui a ce qu'il faut pour être calculée de plusieurs manières.
  4. extract_dataset devrait aussi probablement être modifié.

@RondeauG
Copy link
Collaborator

Bref plutôt que de fermer ce PR et oublier la discussion, j'ouvrirais peut-être un Issue pour qu'on regarde nos options... un jour.

@juliettelavoie
Copy link
Contributor Author

Discussion transférée dans le issue #204 pour plus tard.

@juliettelavoie juliettelavoie deleted the conversion_for_emdna branch September 11, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants