-
Notifications
You must be signed in to change notification settings - Fork 13
Procedure to contribute model output to the portal
Usually the model outputs that are added to the portal are the ones that are also public data releases. The figure below shows our model output release framework. The steps on this page involves the following parts: adding data on a Thredds Server and serving it on the USGS CMGP Portal. For other parts such as creating the accompanying metadata see the Ocean ISO Metadata Wiki.
These steps include:
- Move your files to the usgs/Projects folder (where the published model outputs reside)
- Add datasets to an existing THREDDS Data Server (such as Sand and Poseidon servers)
- Add your simulation to the catalog of simulations (that is served by the public facing Geoport server)
Locate the usgs/Projects folder on the server and create a folder for your model output release. Move your model output to be published. See this example for types of files required for reproducibility.
- input
- his/avg folder
- forcings folders
- initial
- grid
- ncml file (see next step)
/sand/usgs/Projects/
/vortexfs1/share/usgs-share/Projects
For each simulation that appears on the portal, you need:
- NetCDF output files in a directory,
- An NcML file that starts with
00_dir
(e.g.00_dir_roms.ncml
) to aggregate, standardize and describe the dataset.
For ROMS datasets, you can use a python script to generate the NcML from a YAML text file as input. Editing the YAML is simpler than editing the NcML directly because you don't have to edit attributes for multiple variables. YAML file includes the following sections
- id
- title
- summary
- project
- contacts
- variables (to display on the portal)
and more to provide the required metadata for the model output. Here's the link to the yaml2ncml page, and an example YAML file used to create a ROMS NcML file. You can use this YAML file as a template to populate your own YAML file. Mind the following:
- Use white space instead of tabs to indent the fields in your YAML file.
- To be picked up by the crawler, the name of the resulting NcML file has to start
00_dir
and end with.ncml
, like00_dir_roms.ncml
. - Make sure the variables you want to show up in the Portal do not have a
display
attribute set toFalse
.
You can install yaml2ncml
locally following the instructions on the yaml2ncml page. Alternatively, you can do this remotely following the steps here.
yaml2ncml
is already installed on geoport and sand as part of the ioos python environment. To make this your default environment when you login to sand and/or geoport, add this line to your .bashrc
:
export PATH=/home/usgs/miniconda/bin:$PATH && source activate IOOS
Then try logging out and back in, and you should see something like this:
discarding /home/usgs/miniconda/bin from PATH
prepending /home/usgs/miniconda/envs/IOOS/bin to PATH
(ioos)rsignell@sand:~$
which means that you are using the ioos environment, and you can run the yaml2ncml
command.
Run the following to use the already built binaries for yaml2ncml. Then you can run the yaml2ncml
command from anywhere.
export PATH=/vortexfs1/share/usgs-share/Projects/yaml2ncml/bin:$PATH
This is an extra step to follow if for some reason you cannot place your files under the usgs/Projects folder.
Add the catalog that your NcML file is in to the list of thredds catalogs harvested by the catalog database. To do this, click the pen icon to edit, make your changes, and then click the button at the bottom of the page to submit a pull request.
Once your pull request is accepted by the administrator, the python script on geoport will be updated:
ssh gamone.whoi.edu
sudo su - rsignell
cd /opt/docker/harvest/usgs-cmg-portal
git pull
The python script currently runs each hour at 5 minutes past the hour. If everything is successful, there should be new ISO metadata records that appear in /opt/docker/pycsw/force/iso_records
, and also these records should be harvested and become available in the pycsw
database via the CSW service by 15 minutes past the hour.