Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MDTF-diagnostics preprocessor update (#509)
* fix logging classes and procedure for datasourcebase class add routine to check the conda env specified by the POD for the required packages * Finalize the POD python package check in setup_pod * Change ref from util.Singleton to metaclass=util.Singleton since this is the correct way to inherit from the modified Singleton definition * modify the pod_setup conda package check to just verify the packages in PODs that require the python environment, since packages for NCL are not explicitly imported in the driver scripts * fix the ref to the coordinate table in the CMIP fieldlist * add calls to instantiate and populate translator object to the driver script * modify the VariableTranslator read_convention method and dependencies to handle objects in the Fieldlist files * clean up formatting add calls to register methods for abstract atts back to data_model.py * rework the register_coords method to accommodate multiple entries for a single standard name working on the call to Fieldlist.from_struct * add RegexDict function to basic module and util init module * change HorizontalCoord classes back to separate X and Y classes add try except block to catch duplicates to _DMDimensionsMixin build_axes method * update data model horizontal coord class refs in translation module move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import * remove atmos_realm and ocean_realm from modifers table and fieldlists change modeling_realm to realm in fieldlists * make realm a non-mandatory string attribute that defaults to an empty string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do * Refine _process_var and _process_coord methods to properly handle modifier and realm entries when defining fieldlist lookup tables remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there * change HorizonatalCoordinate refs to X and Y Coordinate refs to match the change in class names in varlist_util.py * add type hints and placeholder methods to DataSourceBase class * remove varlist_util.varlistCoordinateMixin class and move need_bounds attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util * add units defs to lat and lon dims in pod settings files * add axis defs to the lat and lon dimensions in the pod settings files * pass pod object to get_varlist instead of just the pod_vars in pod_setup define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call * add varlist_util with changes from prior commit * clean up dataclass.py formatting * replace var_dict parameter with parent in get_varlist method definition * minor cleanup of fieldlist and core modules * remove commented out line from varlist_util.py comment out convention match check for varlist setup development in pod_setup * check for time and set axis to T in data_model.coordinate_from_struct * add type hints to translation methods add date_range attribute to datasourcebase and a method to set the date range add call to set_date_range to pod_setup change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods * fix dest_path definition in varlist_util add dummy attributes required by abstract methods in data_model clean up formatting add realm to from_CF method in translation * remove unnecessary pod_convention parameter from translation calls change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation' refine logic when choosing to translate or not based on convention match in pod_setup * start re-working preprocessor module to make previous multirun classes the default * rename my_scripts to user_pp_scripts in template files * fix pod_setup status checks, add user_pp_scripts att and routine to set the att to the pod object * add call to add user-defined pp scripts to workflow to mdtf_framework.py * clean up pp module some more and add placeholder class for user-defined pp * add calls to instantiate MODEL_WORK_DIR and preprocessor objects to driver script refactor path_utils to divide MODEL_WORK_DIR and POD_WORK_DIR into separate objects, since the model work dir will be used by all PODs in a run, and does not need to be copied to each pod directory ancticipatory cleanup of preprocessor module add todos to query_fetch_preprocess module * add translate_data option to runtime config files refactor pod_setup to configure paths using podpathmanager and run translation based on new translate_data cli flag * refine path definitions in reconfigured path_utils setup update path object dependencies * move main log to framework run subdirectory * change PodObject._children to return case list values instead of None * edit preprocessor init methods and make DaskMultiFilePreprocessor the default pp class reorder the preprocessor init and pod setup calls in the driver script edit the runtime config template to point to oar.gfdl.mdtf conda installation * start defining proprocessor.query_catalog function and call * refine preprocessor catalog search criteria * move deactivate routine from core to util/basic move objectstatus from log module to basic module update src init module to reflect util mods * update deactivate calls to reflect routine changes in pod_setup and varlist_util * fix logic in translation.translate_coord and add assertion error message * define standard_name for esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.csv entries * fix standard_name defs in esm_catalog_CMIP_synthetic fit catalog path def in runtime_config.jsonc keep working on catalog query in preprocessor * rearrange calls in driver script * fix data path regex pattern and create aggregate catalog to return in preprocessor.query_catalog * rework catalog query and function parameters refactor edit_request calls in preprocessing routines * fix preprocessor edit_request interfaces * update edit_request and execute calls for each pp func remove refs to edit_request_wrapper-will prob delete since it is a PITA and can be replaced with something less confusing to handle alternates begin figuring out the whole dataset open-read procedure * add arguments for catalog_subset to preprocessor read_file functions remove extraneous classes from xr_parser and preprocessor add type hints to translation functions update runtime_config_template for local testing * notebook added * supporting configs added that was used for a test * 2 cases compared, with catalog from Ciheim generated by CatalogBuilder * fix formatting in output_manager and processes * add micromamba_exe parm to runtime_config templates * update demo remove old demo and update new demo with 2 cases * remove call to conda check from mdtf_framework add support for micromamba to pod_setup module add micromamba_exe parms to config templates add temporary comments to config jsonc template * fix typos in util modules add routine to append row to pandas dataframe to util.basic * add procedure to create dataframe from preliminary intake catalog query work on modding check_group_range to create DateRange object from catalog start_time and end_time and append it existing dataframe * update _parse_input_string in datelabel module to accept colon delimiter, and extend accepted date string format description * add check_date_format routine to cli.py with additional accepted date string formats for startdate and enddate input data * fix catalog dataframe update procedure in query_catalog work on passing xarray dataset from catalog query to preprocessing functions * update init method for xr_parser DefaultDataParser and calls refine preprocessing rouinte start updating xr_parser methods remove unused preprocesor load and read methods * add handling for microseconds to datelabel DatePrecision * rorganize xr_parser parse method to handle catalog xarray dataset, comment out calls to methods that may no longer be needed add logging to DefaultDataParser class * remove unused SingleVarFilePreprocessor class and read_dataset methods refactor preprocessor parse method to only call xr_parser.parse increase precision of datestring returned by CropDateRangeFunction excute logger comment out AssociatedVariablesFunction since it is not yet implemented and lacks a use case at this time * fix cmip6.py formatting * set output_to_ncl preprocessor attribute reorganize pp routines and remove unused methods start refactoring write routines * add pod runtime settings attribute to pod object * update mdtf_framework function calls * update config parameter defs and calls in output_manager fix calls to preprocess method refactor assocVariablesFunction replace args with kwargs parm in preprocessor execute methods and pass keyword args to function calls * add preliminary calls to environment and runtime managers to mdtf driver add failed and active properties to pod_setup * update enviroment and output manager modules to use multirun config base classes add preliminary calls and routines to handle multirun html template generation to output manager * refactor example_multicase html templates into separate header and plot files * Remove unused modules * move tempdirmanager to util/filesystem.py * update calls to tempdirmanager methods with config parameter * clean up preprocessor and data_sources modules add assoc_files attribute to varlist_util.Varlist class * refactor environment_manager subprocessruntimemanager methods update calls in mdtf_framework.py * update modules in toc rst doc files start updating fmwk_cli.rst update dev_start.rst * comment out calls to attributes that are not set in logs module add case_dict parm to data_sources init in pod setup set new_work_dir to True in paths init in pod setup add iter_vars_only method to data_model.py define iter_vars_only attribute in data_sources DataSourceBase class update environment setup and subprocess spawn calls in environment_manager start refactoring output_manager module update methods calls in mdtf_framework.py * rename example_multicase_header.html to example_multicase.html * remove unused code from pod_setup continue refactoring output_manager remove dry_run parameter from subprocess methods and calls update output_manager calls in mdtf_framework * add type hints to cli.read_config_files and make parms lowercase * change WK_DIR to WORK_DIR in environment_manager * continue refactoring output_manager to work with ctx.config informaton * refactor tempdir class to work with ctx.config information * add TEMP_DIR_ROOT, unit_test, and _configs attributes to ctx.config move backup_config method and ConfigTuple defs from core module to mdtf_framework * remove config parm from tempdir_cleanup_handler and calls add keep_temp attribute to TempDirManager define attributes in TempDirManager clean up logs.py formatting * fix formatting in filesystem, path_utils, and enviroment_manager fix pod data output dir definition so that it doesn't doubly append dates case directories in path_utils * add placeholder method for pp catalog creation to preprocessor.py * add catalog module to src/util with methods for postprocessed data catalog creation * clean up datelabel module * modify find_json method to accept full filepath and do a simple search for a file in a directory consolidate read_config_file and read_config_files methods to parse a json using the MDTF root direcory, subdirectory tree, and file name * refine output file catalog attributes and assets definitions * clean up path_utils and date_label modules * add methods to parse output file directories and split file name parts to define attributes to catalog.py refine calls to catalog methods in preprocessor * set new_workdir option to False in pod_setup pathutils initialization * remove unused imports from cli.py * work on defining regex to isolate time_range in file name in catalog module * refine write_pp_catalog method move define_pp_catalog to catalog module * update columns in pp catalog * fix order of imports in util init file to avoid circular import error * refine catalog assets setup add output file path to catalog asset definition method * add calls to update ctx.config WORK_DIR and OUTPUT_DIR with values defined by model_paths atts * fix csv file name in os.path.join call in catalog.define_pp_catalog_assets * add logic to PathManager to check for existing MDTF_output subdirectory before appending it to a directory attribute * change find mindepth to 1 remove duplicate entries from filelist before returning it in catalog.get_file_list * add calls to validate catalog to preprocessor * add hacked version of save function to catalog.py to try an work around output file name issue * add call to new save method and debug * move case object from pod to its own dictionary in the main program and update usage in data_sources and pod_setup finalize catalog save method and update comments * add logging and error handling to preprocessor write_pp_catalog method * refactor environment manager routines to use separate case dictionary add CATALOG_FILE environment variable to case_info.yaml * update intake-esm versions in base and python3 base envs * replace call to custom catalog save util with esm-intake serialize method in preprocessor since version update fixed the fsspec issue * remove save_catalog from catalog.py since it is not needed with esm-intake version update * fix data_sources formatting * get rid of VarlistEntryMixin class and consolidate with VarlistEntry class make env_vars a class attribute instead of a property to avoid collision with the attribute defined in the data_sources parent class create new set_env_vars method to define VarlistEntry env_vars and add the call to Varlist.setup_var method * resolve merge conflicts * fix the CASE_LIST key in the case_info.yaml creation fix the micromamba_exe parameter call in the subprocess command definition * delete cli_plugins and template jsons * remove member_id from groupby_attrs def in catalog module * remove debugging lines and fix file close and memory cleanup in example_multicase POD * replace case_info.yml with updated example_case_info_output.yml in example_multicase directory * change file name from case_info.yaml to case_info.yml in environment_manager * update the information in the example_multicase html template * update env vars in albedofb.py * update env vars and clean up formatting in blocking_neale.py * update env vars and fix formatting in convective_transition_diag scripts * update env vars in enso mse and enso rws drivers * update env variables in eulerian_storm_track scripts, and clean up formatting in POD files * update env vars and fix formatting in example POD files * add realm to varlist env_vars * update mixed_layer_depth env vars and clean up formatting in driver and html files * clean up formatting in MJO_prop_amp driver and html files * update env_vars and fix formatting in MJO_prop_amp NCL scripts * update env_vars and clean up ENSO_RWS scripts * update blocking_neale scripts * update and clean up ENSO_MSE scripts * update MJO suite env vars * update and clean up MJO_teleconnection scripts * update ocn_surf_flux_diag env vars and clean up formatting * update precip_diurnal_cycle env_vars and fix file formatting * update seaice_suite env_vars and clean up file formatting * update SM_ET_coupling env dirs and fix formatting in files * update stc_annluar_modes env vars and fix file formatting * update stc_eddy_heat_fluxes env vars and clean up file formatting * clean up and update env vars in stc_eddy_heat_fluxes scripts * clean up and update env vars in stc_spv_extremes scripts * clean up and update env vars in stc_vert_wave_coupling scripts * clean up and update env vars in TC_MSE scripts * clean up and update env vars in TC_Rain scripts * clean up and update env vars in temp_extremes_distshape scripts * clean up and update env vars in top_heaviness_metric scripts * clean up and update env vars in tropical_pacific_sea_level scripts * clean up and update env vars in Wheeler-Kiladis scripts * update env_vars and clean up formatting in stc_qbo_enso scripts * change WK_DIR to WORK_DIR in forcing_feedback python files * bump intake_esm to v 2024.2.6 in base and python3_base conda env files * update precip_buoy_diag env vars and clean up formatting in html and python files * replace _query_error_handler calls with generic error logging add check for empty dataframe returned by initial catalog search add comments with more advanced regex queries to test if esm-intake fixes issues in _search method * remove core.py * add POD env vars to subprocess env for single-case PODS make the case_name env var CASENAME for consistency with existing definition * fix type checks in write_data_log_file and fix file cleanup logic in output_manager * bump netcdf4, h5py, matplotlib, pip, dask, and xarray versions in base and python3_base env files * fix output path for netcdf file and add checks to example_diag.py * close case_info.yml file after writing * add print_summary routine to mdtf_framework.py * clean up verify_links formatting * replace PodObject.cases with PodObject.multi_case_dict attribute to match the att that contains the necessary case info in environment_manager * remove redundant checks in output_manager * add log argument and pass main log file to print_summary in driver script * add calls to close varlist loggers to driver script * add env vars from pod settings to PodObject pod_env_vars object in pod_setup module * comment out log string with file path since it is causing issues in the io stream * formatting cleanup in environment_manager * add routine to append case list atts to html template dict for single-case config, and placeholder for multicase config to output_manager * clean up formatting in logs and exceptions modules * add comments to mdtf_framework.py * clean up lines that were too long in environment_manager * Create Test_Notebook.ipynb * add block to read in case_info yaml generated by framework to the test notebook * update figure name convention in example_multicase html and driver scripts * add _DoubleBraceTemplate to util/__init__.py and clean up filesystem.py formatting * refine generate_html_file_case_loop procedure, add file name parm to make_html, add error message to assertion statement in output_manager * update comments in example_multicase.html template * update multirun_config_template.jsonc * add flag to append html code for 1 figure per case to runtime_config.yml * add missing parenthesis to multirun_config_template.jsonc * update runtime_config.jsonc * update multirun_config_template.jsonc * move case information template generation to a separate method add boolean attribute to determine whether to run the case loop template generator clean up formatting in output_manager * clean up units.py formatting * change file arg to io stream of open handler and open the output file in append mode before writing the case inafo to the html output file in output_manager * remove old travis tests * remove unused test script * clean up pod_setup.py formatting * clean up and remove unused class from dataclass.py * remove unused class from and clean up basic.py * clean up util test scripts * rename src/tests/test_core.py and start refactoring * start refactoring unit tests * delete old tests and refacotr data_manager, diagnostic, and units tests * move user_pp_scripts attribute to from pod_setup to preprocessor add module loader procedure to import custom preprocessing python scripts to DaskMultFilePreprocessor init method * create an example custom preprocessing script * update comments in example pp script and refine the main routine loop * move json routines from filesystem to json_utils module * remove unused MDTFEnumInt class and fix _MDTFMixin init method * update util __init__.py to reflect module modifications * remove unused MDTFIntEnum test * remove unused VarlistEntryStage attribute and calls from varlist_utils remove unused deactivate_data_key method from VarlistEntry * add __init__.py to user_scripts * add json_utils methods to read in config file to example_pp_script * change example_pp_script.py to work on daily data and finish debugging main routine * set progressbar to fales in to_dataset_dict call example pp script * set progressbar to false in to_dataset_dict call in preprocessor debug the custom module load method * add check for unit_test attribute in config param to filesystem tempdirmanager * stop tracking example_multicase catalog update example_multicase config and environment yamls * rearrange user_pp_scripts call still need to debug calling custom module on xr ds and variable * add realm parameter to from_CF_name calls in translation and test/test_translation modules * add init module to main repo directory * remove unused add_row function from basic module * add test routine to example_pp_script.py * finalize custom script import procedure in preprocessor change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths * fix edit_request an execute functions and calls so that they return varlistentries or xarray datasets whether they perform operations or act as dummy functions * update example_multirun_demo notebook --------- Co-authored-by: Aparna Radhakrishnan <[email protected]>
- Loading branch information