Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCTools documentation update for 2023.02 #264

Merged
merged 11 commits into from
Dec 15, 2023
31 changes: 18 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,25 +58,30 @@ The tools available in FRE-NCtools are:


### Other Tools
The [Ocean Model Grid Generator](https://github.com/NOAA-GFDL/ocean_model_grid_generator) can be copied or cloned from its GFDL homepage.
There are several tools that have parallel versions and can overcome memory and cpu constrains of the serial
conterpart. E.g. fregrid_parallel reproduces the functionality of fregrid, and among other things it is
commonly used to generate the remapping weights for high resolution grids. (for further information, see the
"extreme fregrid" document).
The [Ocean Model Grid Generator](https://github.com/NOAA-GFDL/ocean_model_grid_generator) can be copied or
cloned from its GFDL homepage.


### User Documentation
Documentation on using individual tools may be obtained by running
the tool without arguments or with the `-h` or `--help` options. Generally
this provides a list of the legal command line arguments, and frequently with
examples and explanations.
Documentation on using individual tools may be obtained by running the tool without
arguments or with the `-h` or `--help` options. Usually this provides a list of the
legal command line arguments, definitions of the arguments, a summary of the tool, and
examples.

Many of the tools are commonly used in conjunction with other tools or as part of a
workflow. The directory FRE-NCtools/t has numerous test scripts that exercise
some possible workflows. As an example, consider the script for CI test #3
(file Test03-grid_coupled_model.sh) for creating grids and mosaics for a coupled model.
As a second example, script Test20-fregrid.sh creates a target mosaic
(file latlon_grid.nc) and then uses fregrid to remap an existing file
(--input_file ocean_temp_salt.res.nc) of a known mosaic (CM2.1_mosaic.nc)
to the target mosaic.

Additional documentation may be found in the documentation directory
some possible workflows and can provide context for use of the tools, and
the docs directory contains a summary catalog of them. As an example,
consider the script for CI test #3 (file Test03-grid_coupled_model.sh) : via a detailed
example this script shows the use order of make_coupler_mosaic, make_solo_mosaic,
make_hgrid, make_vgrid and make_topog for creating grids and mosaics for a coupled
model.

Additional documentation can be found in the documentation directory
( FRE-NCtools/docs ) and the
[FRE-NCTools wiki](https://github.com/NOAA-GFDL/FRE-NCtools/wiki/)

Expand Down
Binary file added docs/additional_nctools_testing.pdf
Binary file not shown.
77 changes: 42 additions & 35 deletions docs/extreme_fregrid_sample_runscript.txt
Original file line number Diff line number Diff line change
@@ -1,56 +1,63 @@
Runscript
# The NCTools app fregrid_parallel is the parallel version of fregrid, and
# it is particularly useful for processing with large grids. fregrid will
# generate this message when the remapping weights file is not available and
# grids are too large for it to process:
#
ngs333 marked this conversation as resolved.
Show resolved Hide resolved
# FATAL Error: The xgrid size is too large for resources.
# nxgrid is greater than MAXXGRID/nthreads; increase MAXXGRID,
# decrease nthreads, or increase number of MPI ranks.
#
# fregrid_parallel should instead be used if such an error is encountered. Like the
# serial version, it can be used to perform the remapping; but commonly it is used
# to perform only the compute intensive operations - the generation of the remapping
# weights file. The weights file is saved and in turn it is then used (and re-used)
# as an input file to the serial fregrid to quickly perform the remapping. However,
# even fregrid_parallel can generate the same error if run with insufficient
# computational resources. Below (after the dashed line) is an example runscript
# configured for running fregrid_parallel with a sufficiently large number of ranks
# and cores to avoid the fatal error for a common "extreme fregrid" case. This
# configuration runs in about 43 minutes on Gaea C5. The tail end of the runs output
# follows:
#
#> NOTE: done calculating index and weight for conservative interpolation
#> Memuse(MB) at After setup interp, min=4814.17, max=4881.2, avg=4834.05
#> Running time for get_input_grid, min=2.35723, max=4.71106, avg=4.4872
#> Running time for get_output_grid, min=0.000376, max=0.000725, avg=0.000484754
#> Running time for setup_interp, min=2517.68, max=2571.2, avg=2558.28
#> NOTE: Successfully running fregrid and the following files which
#> store weight information are generated.
#> ****lg_remap_C3072_11520x5760.nc

--------------------------------------------------------------------------------
#!/bin/csh -f
#SBATCH -J Run_Script
#SBATCH --nodes=61
#SBATCH --time 8:00:00
#SBATCH --cluster=c4
#SBATCH --nodes=41
#SBATCH --time 4:00:00
#SBATCH --cluster=c5
#SBATCH --partition=batch
#SBATCH --qos=normal
#SBATCH --account=gfdl_f

source /opt/modules/default/init/tcsh
module load fre/bronx-19
source $MODULESHOME/init/tcsh
module load fre/bronx-20

set echo=on

# Break up the run so the first MPI-rank is on a node by itself to eventually allow for
# coalescing of the exchange grid-based remap file to the first rank for output
# The remaining mpi-ranks can share nodes and may need to be run on a reduced set
# to allow for memory pressure amongst the worker nodes

set nt1=1
set cpt1=36
set nt2=540
set cpt1=64
set nt2=640
set cpt2=4

srun-multi --ntasks=$nt1 --cpus-per-task=$cpt1 \
fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 \'
srun --ntasks=$nt1 --cpus-per-task=$cpt1 \
fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 \
--remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug \
: \
--ntasks $nt2 --cpus-per-task=$cpt2 \
fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 \
--remap_file lg_remap_C3072_11520x5760.nc --interp_method  conserve_order1 --debug

Script output from gaea:
set nt1=1
set cpt1=36
set nt2=540
set cpt2=4
srun-multi --ntasks=1 --cpus-per-task=36 fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 --remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug : --ntasks 540 --cpus-per-task=4 fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 --remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug
****fregrid: first order conservative scheme will be used for regridding.
NOTE: No input file specified in this run, no data file will be regridded and only weight information is calculated.
Memuse(MB) at Before calling get_input_grid, min=17.9961, max=20.2305, avg=18.6771
Memuse(MB) at After calling get_input_grid, min=1461.04, max=1462.97, avg=1461.62
Memuse(MB) at After calling get_output_grid, min=1461.04, max=1462.97, avg=1461.62
Memuse(MB) at After get_input_output_cell_area, min=1463.56, max=1465.75, avg=1464.28
NOTE: done calculating index and weight for conservative interpolation
Memuse(MB) at After setup interp, min=4677.54, max=4714.79, avg=4688.48  
Running time for get_input_grid, min=3.75677, max=4.31201, avg=4.07478
Running time for get_output_grid, min=0.000969, max=0.001346, avg=0.00117746
Running time for setup_interp, min=13238.7, max=13244.7, avg=13243
NOTE: Successfully running fregrid and the following files which store weight information are generated.
****lg_remap_C3072_11520x5760.nc

Key things to note
fregrid_parallel indicates a memory use of 4.7GB and requires a runtime of 3.7 hours.  This suggests we could change cpt2 from 4 to 3 and cut overall node usage from 61 to 46 with the same elapsed time.  Regardless, this can be used as a guideline for guesstimating resource requirements as we continue to get these extreme remapping requests.  We have to see how much memory actually applying the remap file to data will consume. 

--remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug

Binary file not shown.
Binary file added docs/tools_context_from_workflow_tests.pdf
Binary file not shown.
4 changes: 3 additions & 1 deletion tools/libfrencutils/create_xgrid.c
Original file line number Diff line number Diff line change
Expand Up @@ -807,7 +807,9 @@ nxgrid = 0;
if( xarea/min_area > AREA_RATIO_THRESH ) {
pnxgrid[m]++;
if(pnxgrid[m]>= MAXXGRID/nthreads)
error_handler("nxgrid is greater than MAXXGRID/nthreads, increase MAXXGRID, decrease nthreads, or increase number of MPI ranks");
error_handler("The xgrid size is too large for resources.\n"
" nxgrid is greater than MAXXGRID/nthreads; increase MAXXGRID,\n"
" decrease nthreads, or increase number of MPI ranks.");
nn = pstart[m] + pnxgrid[m]-1;

pxgrid_area[nn] = xarea;
Expand Down
Loading