Skip to content

Analyzing Model Output

Andrew Ross edited this page Oct 22, 2024 · 11 revisions

This page gives an introduction to the COBALT-MOM6 model outputs and provides guidance on utilizing GFDL PPAN along with pre-installed packages to analyze and diagnose model result (WIP)

Model Output and PPAN.

PPAN

Post-processing and Analysis, or PPAN for short, is GFDL's onsite High Performance Computing (HPC) cluster for analyzing model output data. This is the site where you will most likely run analysis scripts and generate plots using model data. PPAN is comprised of several file systems, but ones most relevant to analyzing model output are:

  • /home - Your home directory, where you will first land when you access PPAN. Backed up, but by default only has a 10 GB storage quota, so be wary of storing large files here, like conda environments.
  • /work - A larger, non-backed up scratch space that is likely your best option for writing and running scripts.
  • /archive - A tape storage system meant to act as a permanent home for large files. Since archive is long term tape storage, do not run analysis scripts on archive, and make sure to stage any necessary data off of archive before using it.

More information about all of these file systems and PPAN can be found in the Data Storage and Archive section of the wiki

Accessing model output

After a successful model run, fre will initially stage the output from the model in the <archive> directory defined in your platforms.xml file underneath the platform within which you ran the experiment. The <archive> directory will contain a folder name history that contains model output in the form of tarred netcdf files. The files will be named by the start of the time period for which they contain data, and the time span of each file you have is determined by the settings in your diag tables.

The <archive> directory defined in your platforms.xml is only a temporary staging site on Gaea. Ideally, fre should move all of the directories within the archive directory to the /archive/$USER/<archive> directory on PPAN. Fre should furthermore create another directory in this location named pp containing model output that has been post processed according to the instructions laid out in the xml. If either of these steps fail, you can manually stage the files to archive and run post processing from there using the frepp command.

Post Processing and Analyzing Model Output

Post processing

Post-processing refers to the process of untarring the files in the history directory, reorganizing them into multiyear chunks, and optionally performing other operations like calculating averages or running scripts you wrote to analyze the data. While this is usually done as part of frerun, it may not run if frerun does not successfully come to completion. In this scenario, you can run the frepp command on PPAN to run post-processing manually.

The output from post processing is defined in the <postProcessing> section of the xml. This section is split into several "components" defining netcdf files that will be created during the postprocessing process. You have the option of defining the name (i.e type) of these components, but the source of these components must be the name of a history file component defined in the diag table. frepp will use the source to find the history files to work with after untarring the files in you history folder.

Within each component, you have the option to pass several other tags, but the most relevant tags will most likely be the <timeSeries> and <timeAverage> tags. The <timeSeries> tag tells fre the time chunks of model output it should group together, and the frequency at which data should be available within those chunks. For example, if you set chunkLength = 5yr and freq = monthly, fre will group together 5 years of history data beginning in the year defined by the start tag for the component and precede forward in 5 year time steps until it can no longer find 5 year chunks of data to group together. Any remaining data representing less than 5 years of output wouldn't be included in postprocessing. Within the five year chunks of model output will be data from the start of every month during that time period. Note that in order to generate a time series with monthly frequencies, you need to ensure that your diag table records snapshots for this component at at least a monthly frequency.

By default, the <timeSeries> and <timeAverage> tags will include all available history variables in the final netcdf files. If you would only like to select a few variables, you can select them using the <variables> tag within either section.

As with other xml tags, experiments inherit post processing instructions from their parent experiments, so running frepp for a child experiment will generate the post processed components described in both the child and parent experiment.

If you have a NOAA account, more information about post process tags and commands can be found in MSD's fre documentation. Note that these pages are several years old and therefore may not contain the most up to date information. However, it is still a good resource if you wish to dive further into available post processing capabilities with fre/bronx.

refineDiag and Analysis scripts

In addition to producing time series and time averages, fre also gives you the option to run scripts on the model output either before or after postprocessing takes place. Scripts that run before postprocessing are known as refineDiag scripts, while scripts that run afterwards are known as analysis scripts. To run either type of script, you must add either the <analysis></analysis> or <refineDiag></refineDiag> tags to the time series or time average you want the script to run on within the xml. You can then pass a path to the script you would like run, along with any additional attributes that the Analysis and refineDiag tags accept ( for example, the switch attribute is a boolean value you can use to turn the analysis option on or off between runs, while the cumulative attribute determines whether or not to run analysis scripts on all years that data is available for, or just the range determined by the command line flags -Y and -Z).

To help run anlayses, fre can optionally set the value of several variables in your script for you for use in calculations. These variables are set by typing set <variable_name> at the top of your script with no value set for the variable. For example, if you need to access the path to the pp files that your script is being run on, you can do so by typing set in_data_dir at the top of your script. When fre reads the analysis script, it will fill in the value of this and any other variable name it recognizes so that they can be used during the execution of your script. You can find a full list of variables that fre sets on the MSD website.

Diagnostics

Existing scripts and config.yaml

MED maintains a collection of scripts for producing diagnostic plots from model output. They are all available in the diagnostics folder of this repo, and are organzied into subfolders based on the variables each diagnostic script works with.

Most of these scripts were written for the NWA domain, but an effort is currently underway to make them applicable to all other CEFI domains of interest. Each subfolder will eventaully contain a config.yaml file listing script variables that may vary between domains. For now, only the physics folder has a working config file for its scripts. You can run any of the scripts in this folder by first editing this config file to suit your domain of interest then running python <script> -p <path_to_pp_files_without_the_pp> -c <path_to_config_yaml>. If you need access to a python environment with the dependencies for these scripts pre-installed, MED maintains an environment in our role account that you can activate with the instructions here.