diff --git a/README.md b/README.md index cf26a607..727772b1 100644 --- a/README.md +++ b/README.md @@ -58,25 +58,30 @@ The tools available in FRE-NCtools are: ### Other Tools -The [Ocean Model Grid Generator](https://github.com/NOAA-GFDL/ocean_model_grid_generator) can be copied or cloned from its GFDL homepage. +There are several tools that have parallel versions and can overcome memory and cpu constrains of the serial +conterpart. E.g. fregrid_parallel reproduces the functionality of fregrid, and among other things it is +commonly used to generate the remapping weights for high resolution grids. (for further information, see the +"extreme fregrid" document). +The [Ocean Model Grid Generator](https://github.com/NOAA-GFDL/ocean_model_grid_generator) can be copied or +cloned from its GFDL homepage. ### User Documentation -Documentation on using individual tools may be obtained by running -the tool without arguments or with the `-h` or `--help` options. Generally -this provides a list of the legal command line arguments, and frequently with -examples and explanations. +Documentation on using individual tools may be obtained by running the tool without +arguments or with the `-h` or `--help` options. Usually this provides a list of the +legal command line arguments, definitions of the arguments, a summary of the tool, and +examples. Many of the tools are commonly used in conjunction with other tools or as part of a workflow. The directory FRE-NCtools/t has numerous test scripts that exercise -some possible workflows. As an example, consider the script for CI test #3 -(file Test03-grid_coupled_model.sh) for creating grids and mosaics for a coupled model. -As a second example, script Test20-fregrid.sh creates a target mosaic -(file latlon_grid.nc) and then uses fregrid to remap an existing file -(--input_file ocean_temp_salt.res.nc) of a known mosaic (CM2.1_mosaic.nc) -to the target mosaic. - -Additional documentation may be found in the documentation directory +some possible workflows and can provide context for use of the tools, and +the docs directory contains a summary catalog of them. As an example, +consider the script for CI test #3 (file Test03-grid_coupled_model.sh) : via a detailed +example this script shows the use order of make_coupler_mosaic, make_solo_mosaic, +make_hgrid, make_vgrid and make_topog for creating grids and mosaics for a coupled +model. + +Additional documentation can be found in the documentation directory ( FRE-NCtools/docs ) and the [FRE-NCTools wiki](https://github.com/NOAA-GFDL/FRE-NCtools/wiki/) diff --git a/docs/additional_nctools_testing.pdf b/docs/additional_nctools_testing.pdf new file mode 100644 index 00000000..5ec80439 Binary files /dev/null and b/docs/additional_nctools_testing.pdf differ diff --git a/docs/extreme_fregrid_sample_runscript.txt b/docs/extreme_fregrid_sample_runscript.txt index 3d6d04cd..a9c5b177 100644 --- a/docs/extreme_fregrid_sample_runscript.txt +++ b/docs/extreme_fregrid_sample_runscript.txt @@ -1,15 +1,45 @@ -Runscript +# The NCTools app fregrid_parallel is the parallel version of fregrid, and +# it is particularly useful for processing with large grids. fregrid will +# generate this message when the remapping weights file is not available and +# grids are too large for it to process: +# +# FATAL Error: The xgrid size is too large for resources. +# nxgrid is greater than MAXXGRID/nthreads; increase MAXXGRID, +# decrease nthreads, or increase number of MPI ranks. +# +# fregrid_parallel should instead be used if such an error is encountered. Like the +# serial version, it can be used to perform the remapping; but commonly it is used +# to perform only the compute intensive operations - the generation of the remapping +# weights file. The weights file is saved and in turn it is then used (and re-used) +# as an input file to the serial fregrid to quickly perform the remapping. However, +# even fregrid_parallel can generate the same error if run with insufficient +# computational resources. Below (after the dashed line) is an example runscript +# configured for running fregrid_parallel with a sufficiently large number of ranks +# and cores to avoid the fatal error for a common "extreme fregrid" case. This +# configuration runs in about 43 minutes on Gaea C5. The tail end of the runs output +# follows: +# +#> NOTE: done calculating index and weight for conservative interpolation +#> Memuse(MB) at After setup interp, min=4814.17, max=4881.2, avg=4834.05 +#> Running time for get_input_grid, min=2.35723, max=4.71106, avg=4.4872 +#> Running time for get_output_grid, min=0.000376, max=0.000725, avg=0.000484754 +#> Running time for setup_interp, min=2517.68, max=2571.2, avg=2558.28 +#> NOTE: Successfully running fregrid and the following files which +#> store weight information are generated. +#> ****lg_remap_C3072_11520x5760.nc + +-------------------------------------------------------------------------------- #!/bin/csh -f #SBATCH -J Run_Script -#SBATCH --nodes=61 -#SBATCH --time 8:00:00 -#SBATCH --cluster=c4 +#SBATCH --nodes=41 +#SBATCH --time 4:00:00 +#SBATCH --cluster=c5 #SBATCH --partition=batch #SBATCH --qos=normal #SBATCH --account=gfdl_f -source /opt/modules/default/init/tcsh -module load fre/bronx-19 +source $MODULESHOME/init/tcsh +module load fre/bronx-20 set echo=on @@ -17,40 +47,17 @@ set echo=on # coalescing of the exchange grid-based remap file to the first rank for output # The remaining mpi-ranks can share nodes and may need to be run on a reduced set # to allow for memory pressure amongst the worker nodes + set nt1=1 -set cpt1=36 -set nt2=540 +set cpt1=64 +set nt2=640 set cpt2=4 -srun-multi --ntasks=$nt1 --cpus-per-task=$cpt1 \ - fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 \' +srun --ntasks=$nt1 --cpus-per-task=$cpt1 \ + fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 \ --remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug \ : \ --ntasks $nt2 --cpus-per-task=$cpt2 \ fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 \ - --remap_file lg_remap_C3072_11520x5760.nc --interp_method  conserve_order1 --debug - -Script output from gaea: -set nt1=1 -set cpt1=36 -set nt2=540 -set cpt2=4 -srun-multi --ntasks=1 --cpus-per-task=36 fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 --remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug : --ntasks 540 --cpus-per-task=4 fregrid_parallel --input_mosaic C3072_mosaic.nc --nlon 11520 --nlat 5760 --remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug -****fregrid: first order conservative scheme will be used for regridding. -NOTE: No input file specified in this run, no data file will be regridded and only weight information is calculated. -Memuse(MB) at Before calling get_input_grid, min=17.9961, max=20.2305, avg=18.6771 -Memuse(MB) at After calling get_input_grid, min=1461.04, max=1462.97, avg=1461.62 -Memuse(MB) at After calling get_output_grid, min=1461.04, max=1462.97, avg=1461.62 -Memuse(MB) at After get_input_output_cell_area, min=1463.56, max=1465.75, avg=1464.28 -NOTE: done calculating index and weight for conservative interpolation -Memuse(MB) at After setup interp, min=4677.54, max=4714.79, avg=4688.48   -Running time for get_input_grid, min=3.75677, max=4.31201, avg=4.07478 -Running time for get_output_grid, min=0.000969, max=0.001346, avg=0.00117746 -Running time for setup_interp, min=13238.7, max=13244.7, avg=13243 -NOTE: Successfully running fregrid and the following files which store weight information are generated. -****lg_remap_C3072_11520x5760.nc - -Key things to note -fregrid_parallel indicates a memory use of 4.7GB and requires a runtime of 3.7 hours.  This suggests we could change cpt2 from 4 to 3 and cut overall node usage from 61 to 46 with the same elapsed time.  Regardless, this can be used as a guideline for guesstimating resource requirements as we continue to get these extreme remapping requests.  We have to see how much memory actually applying the remap file to data will consume.  - + --remap_file lg_remap_C3072_11520x5760.nc --interp_method conserve_order1 --debug diff --git a/docs/remapping_algorithm_cell_methods_measures.pdf b/docs/remapping_algorithm_cell_methods_measures.pdf new file mode 100644 index 00000000..dac9c978 Binary files /dev/null and b/docs/remapping_algorithm_cell_methods_measures.pdf differ diff --git a/docs/tools_context_from_workflow_tests.pdf b/docs/tools_context_from_workflow_tests.pdf new file mode 100644 index 00000000..5a55717b Binary files /dev/null and b/docs/tools_context_from_workflow_tests.pdf differ diff --git a/tools/libfrencutils/create_xgrid.c b/tools/libfrencutils/create_xgrid.c index 69b6acfe..6fe24453 100644 --- a/tools/libfrencutils/create_xgrid.c +++ b/tools/libfrencutils/create_xgrid.c @@ -807,7 +807,9 @@ nxgrid = 0; if( xarea/min_area > AREA_RATIO_THRESH ) { pnxgrid[m]++; if(pnxgrid[m]>= MAXXGRID/nthreads) - error_handler("nxgrid is greater than MAXXGRID/nthreads, increase MAXXGRID, decrease nthreads, or increase number of MPI ranks"); + error_handler("The xgrid size is too large for resources.\n" + " nxgrid is greater than MAXXGRID/nthreads; increase MAXXGRID,\n" + " decrease nthreads, or increase number of MPI ranks."); nn = pstart[m] + pnxgrid[m]-1; pxgrid_area[nn] = xarea;