Implement data frequency range specification in POD settings #270
Labels
diagnostic
Issue pertains to a contributed diagnostic
framework
Issue pertains to the framework code
What problem will this feature solve?
The upcoming GFDL model run to generate POD sample data (diag table info was added for documentation purposes in #269) will only output data at a minimum 6-hourly frequency, to save data size. The two current high-frequency PODs, convective_transition_diag and precip_diurnal_cycle, can in principle do a valid analysis on data of this frequency, but are currently set to request data at 1hr and 3hr frequencies respectively.
In order to run these PODs on the sample model data being generated, the POD settings file format needs to be extended to allow PODs to request data in a range of acceptable frequencies, and the data query logic needs to be extended to execute that query.
Describe the solution you'd like
The user-facing changes have been described in the docs for some time, but the feature hasn't been implemented in the framework's data query logic. Each
varlist
entry in the POD settings file can have optionalmin_frequency
andmax_frequency
attributes to specify a range of acceptable data frequencies, as an alternative to the currently recognizedfrequency
attribute.min_frequency
<=frequency
<=max_frequency
for each varlist entry.frequency
to identify a preferred frequency for data, with themin_frequency
-max_frequency
range defining a fallback option if data atfrequency
is not available. The general mechanism for doing so is specifying alternate VarlistEntries, via the edit_request() method on the preprocessor. For VarlistEntries with bothfrequency
andmin_frequency
-max_frequency
specified, this would need to insert an alternate with themin_frequency
-max_frequency
range after every alternate in the linked list of alternates. This would happen after edit_request() is called, since it's preprocessor-independent.min_frequency
-max_frequency
range has been implemented but not tested.min_frequency
-max_frequency
range. This should be done by defining a base class for the ExperimentSelectionMixin classes in data_sources.py, and defining a resolve_var_expt() method acting on the DataFrame of data catalog entries to select the row with the desired frequency (presumably the highest available within the range.)Describe alternatives you've considered
N/A
Additional context
The text was updated successfully, but these errors were encountered: