Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement data frequency range specification in POD settings #270

Open
tsjackson-noaa opened this issue Aug 24, 2021 · 0 comments
Open

Implement data frequency range specification in POD settings #270

tsjackson-noaa opened this issue Aug 24, 2021 · 0 comments
Assignees
Labels
diagnostic Issue pertains to a contributed diagnostic framework Issue pertains to the framework code

Comments

@tsjackson-noaa
Copy link
Contributor

tsjackson-noaa commented Aug 24, 2021

What problem will this feature solve?
The upcoming GFDL model run to generate POD sample data (diag table info was added for documentation purposes in #269) will only output data at a minimum 6-hourly frequency, to save data size. The two current high-frequency PODs, convective_transition_diag and precip_diurnal_cycle, can in principle do a valid analysis on data of this frequency, but are currently set to request data at 1hr and 3hr frequencies respectively.

In order to run these PODs on the sample model data being generated, the POD settings file format needs to be extended to allow PODs to request data in a range of acceptable frequencies, and the data query logic needs to be extended to execute that query.

Describe the solution you'd like
The user-facing changes have been described in the docs for some time, but the feature hasn't been implemented in the framework's data query logic. Each varlist entry in the POD settings file can have optional min_frequency and max_frequency attributes to specify a range of acceptable data frequencies, as an alternative to the currently recognized frequency attribute.

  • Input parsing: I believe the code to parse these settings from the json file is already functional.
  • Input validation: verify min_frequency <= frequency <= max_frequency for each varlist entry.
  • Query rewriting: We would like PODs to be able to specify frequency to identify a preferred frequency for data, with the min_frequency-max_frequency range defining a fallback option if data at frequency is not available. The general mechanism for doing so is specifying alternate VarlistEntries, via the edit_request() method on the preprocessor. For VarlistEntries with both frequency and min_frequency-max_frequency specified, this would need to insert an alternate with the min_frequency-max_frequency range after every alternate in the linked list of alternates. This would happen after edit_request() is called, since it's preprocessor-independent.
  • Query logic: querying on the min_frequency-max_frequency range has been implemented but not tested.
  • Query tiebreaker logic: this is necessary to handle the case in which the query finds multiple variables with frequency in the min_frequency-max_frequency range. This should be done by defining a base class for the ExperimentSelectionMixin classes in data_sources.py, and defining a resolve_var_expt() method acting on the DataFrame of data catalog entries to select the row with the desired frequency (presumably the highest available within the range.)
  • POD compatibility: The code for convective_transition_diag and precip_diurnal_cycle should be checked to verify that these PODs properly deal with data at different frequencies -- the claim above is based on the PODs' documentation only and hasn't been substantiated.

Describe alternatives you've considered
N/A

Additional context

@wrongkindofdoctor wrongkindofdoctor self-assigned this Oct 1, 2021
@wrongkindofdoctor wrongkindofdoctor added diagnostic Issue pertains to a contributed diagnostic framework Issue pertains to the framework code labels Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diagnostic Issue pertains to a contributed diagnostic framework Issue pertains to the framework code
Projects
None yet
Development

No branches or pull requests

2 participants