Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for forecast-only runs on AWS #2711

Merged
merged 71 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
103f2c4
compiled OK now
weihuang-jedi Jun 18, 2024
916ff6c
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jun 19, 2024
b0ac406
re-test on aws with fewer changes
weihuang-jedi Jun 19, 2024
3de972f
make change in tasks.py to avoid error finding libiomp5.so problem
weihuang-jedi Jun 21, 2024
8308375
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jun 21, 2024
bc4c4a8
add comments so the reviewers know that these changes are for AWS, an…
weihuang-jedi Jun 22, 2024
924aede
Merge branch 'aws-forecast-only' of ssh://github.com/NOAA-EPIC/global…
weihuang-jedi Jun 22, 2024
b724937
add comments so the reviewers know that these changes are for AWS, an…
weihuang-jedi Jun 22, 2024
12ab29f
reverse config.resource changes, and memory restriction on AWS
weihuang-jedi Jun 25, 2024
adff250
sync with emc repo
weihuang-jedi Jun 25, 2024
2290ea2
move common data to a shared place
weihuang-jedi Jun 26, 2024
cd2c8e7
use ICs from s3-bucket
weihuang-jedi Jun 26, 2024
4e144e5
Merge branch 'develop' into aws-forecast-only
weihuang-jedi Jun 26, 2024
46e3ef5
change as suggested by reviewer
weihuang-jedi Jul 2, 2024
32f13eb
sync with develop
weihuang-jedi Jul 2, 2024
a34a4c8
sync sorc/ufs_model.fd
weihuang-jedi Jul 4, 2024
44011a3
remove mpmd_opt from APRUN_UFS
weihuang-jedi Jul 4, 2024
965ec80
mpmd_opt and switch off tracker/genesis default for AWS
weihuang-jedi Jul 5, 2024
3ce268e
add TODO
weihuang-jedi Jul 5, 2024
f03ac78
remove ncl version on AWS
weihuang-jedi Jul 6, 2024
007a56b
Merge remote-tracking branch 'origin/develop' into aws-forecast-only
weihuang-jedi Jul 6, 2024
2f6ec6e
sync ufs_model
weihuang-jedi Jul 6, 2024
dba83a7
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 10, 2024
24fe804
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 12, 2024
e8a2e0f
sync and remove gempak from noaacloud
weihuang-jedi Jul 12, 2024
4013eb1
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 15, 2024
a548c7f
update modules hash
weihuang-jedi Jul 15, 2024
d37e646
update module hash
weihuang-jedi Jul 15, 2024
2a80162
use bucket
weihuang-jedi Jul 17, 2024
fa44862
remove /scratch1, but kept TODO
weihuang-jedi Jul 17, 2024
55c7e7e
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 17, 2024
07851dc
re-sync
weihuang-jedi Jul 19, 2024
492808d
sync
weihuang-jedi Jul 19, 2024
d7a262e
add is_exclusive to resource.AWSPW
weihuang-jedi Jul 23, 2024
af573af
sync hash with EMC repo
weihuang-jedi Jul 23, 2024
0929180
remove --export=ALL from native, when is_exclusive set true
weihuang-jedi Jul 23, 2024
06fecca
sync
weihuang-jedi Jul 23, 2024
d8783ab
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 25, 2024
d22bc6d
Merge remote-tracking branch 'origin/develop' into aws-forecast-only
weihuang-jedi Jul 25, 2024
a5c441f
Merge branch 'aws-forecast-only' of ssh://github.com/NOAA-EPIC/global…
weihuang-jedi Jul 25, 2024
77e8233
Make AWS works similar to on-prem machine
weihuang-jedi Jul 25, 2024
96f73ba
remove --export=ALL from 'native'
weihuang-jedi Jul 25, 2024
a33a3be
remove --export=ALL from 'native'
weihuang-jedi Jul 25, 2024
80b294b
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 25, 2024
01a8928
add py-f90nml to noaacloud modulefile
weihuang-jedi Jul 25, 2024
b035947
remove un-necessary added lines
weihuang-jedi Jul 25, 2024
bf3b460
remove un-necessary added lines
weihuang-jedi Jul 25, 2024
47627ff
remove added lines which was originally for AWS, but should be define…
weihuang-jedi Jul 26, 2024
7bf8900
restore as develop
weihuang-jedi Jul 26, 2024
0685a8f
try to fix pynorms error
weihuang-jedi Jul 29, 2024
381403d
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 29, 2024
0e71f7d
Merge branch 'aws-forecast-only' of ssh://github.com/NOAA-EPIC/global…
weihuang-jedi Jul 29, 2024
2024835
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 30, 2024
2c52016
sync with EMC repo
weihuang-jedi Jul 30, 2024
cd6c541
sync Gaea link with EMC repo, and only include blocks/packs that run …
weihuang-jedi Jul 30, 2024
1f60ed0
Merge branch 'aws-forecast-only' of github.com:NOAA-EPIC/global-workf…
weihuang-jedi Jul 30, 2024
e1a57b4
merge fro develop
weihuang-jedi Jul 30, 2024
fe9a457
Remove ACCOUNT_SERVICE
weihuang-jedi Jul 31, 2024
5c6e052
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 1, 2024
93b1e66
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 2, 2024
f900893
correct pynorms error
weihuang-jedi Aug 2, 2024
f599cd7
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 5, 2024
1ee5492
Update workflow/rocoto/workflow_xml.py
weihuang-jedi Aug 6, 2024
6d6231a
fix pynorms issues
weihuang-jedi Aug 6, 2024
eb262be
fix pynorms issues
weihuang-jedi Aug 6, 2024
0db930d
only one pycodestyle error left now
weihuang-jedi Aug 6, 2024
06093af
pycodestype passed without any error
weihuang-jedi Aug 6, 2024
6fff724
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 6, 2024
f23d2d0
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 7, 2024
bd1c954
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 8, 2024
d12d9e9
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions env/AWSPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ fi

step=$1

export launcher="mpiexec.hydra"
export mpmd_opt=""
export launcher="srun --mpi=pmi2 -l"
export mpmd_opt="--distribution=block:block --hint=nomultithread --cpus-per-task=1"
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved

# Configure MPI environment
export OMP_STACKSIZE=2048000
Expand All @@ -36,7 +36,7 @@ if [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then
(( nnodes = (${!nprocs}+${!ppn}-1)/${!ppn} ))
(( ntasks = nnodes*${!ppn} ))
# With ESMF threading, the model wants to use the full node
export APRUN_UFS="${launcher} -n ${ntasks}"
export APRUN_UFS="${launcher} -n ${ntasks} ${mpmd_opt}"
unset nprocs ppn nnodes ntasks

elif [[ "${step}" = "post" ]]; then
Expand Down
51 changes: 51 additions & 0 deletions modulefiles/module_base.noaacloud.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
help([[
Load environment to run GFS on noaacloud
]])

local spack_mod_path=(os.getenv("spack_mod_path") or "None")
prepend_path("MODULEPATH", spack_mod_path)

load(pathJoin("stack-intel", (os.getenv("stack_intel_ver") or "None")))
load(pathJoin("stack-intel-oneapi-mpi", (os.getenv("stack_impi_ver") or "None")))
load(pathJoin("python", (os.getenv("python_ver") or "None")))

--load(pathJoin("hpss", (os.getenv("hpss_ver") or "None")))
load(pathJoin("gempak", (os.getenv("gempak_ver") or "None")))
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
load(pathJoin("ncl", (os.getenv("ncl_ver") or "None")))
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
load(pathJoin("jasper", (os.getenv("jasper_ver") or "None")))
load(pathJoin("libpng", (os.getenv("libpng_ver") or "None")))
load(pathJoin("cdo", (os.getenv("cdo_ver") or "None")))
--load(pathJoin("R", (os.getenv("R_ver") or "None")))

load(pathJoin("hdf5", (os.getenv("hdf5_ver") or "None")))
load(pathJoin("netcdf-c", (os.getenv("netcdf_c_ver") or "None")))
load(pathJoin("netcdf-fortran", (os.getenv("netcdf_fortran_ver") or "None")))

load(pathJoin("nco", (os.getenv("nco_ver") or "None")))
load(pathJoin("prod_util", (os.getenv("prod_util_ver") or "None")))
load(pathJoin("grib-util", (os.getenv("grib_util_ver") or "None")))
load(pathJoin("g2tmpl", (os.getenv("g2tmpl_ver") or "None")))
load(pathJoin("gsi-ncdiag", (os.getenv("gsi_ncdiag_ver") or "None")))
load(pathJoin("crtm", (os.getenv("crtm_ver") or "None")))
load(pathJoin("bufr", (os.getenv("bufr_ver") or "None")))
load(pathJoin("wgrib2", (os.getenv("wgrib2_ver") or "None")))
load(pathJoin("py-netcdf4", (os.getenv("py_netcdf4_ver") or "None")))
load(pathJoin("py-pyyaml", (os.getenv("py_pyyaml_ver") or "None")))
load(pathJoin("py-jinja2", (os.getenv("py_jinja2_ver") or "None")))
load(pathJoin("py-pandas", (os.getenv("py_pandas_ver") or "None")))
load(pathJoin("py-python-dateutil", (os.getenv("py_python_dateutil_ver") or "None")))
--load(pathJoin("met", (os.getenv("met_ver") or "None")))
--load(pathJoin("metplus", (os.getenv("metplus_ver") or "None")))
load(pathJoin("py-xarray", (os.getenv("py_xarray_ver") or "None")))

setenv("WGRIB2","wgrib2")
setenv("UTILROOT",(os.getenv("prod_util_ROOT") or "None"))

--prepend_path("MODULEPATH", pathJoin("/scratch1/NCEPDEV/global/glopara/git/prepobs/v" .. (os.getenv("prepobs_run_ver") or "None"), "modulefiles"))
--prepend_path("MODULEPATH", pathJoin("/scratch1/NCEPDEV/global/glopara/git/prepobs/feature-GFSv17_com_reorg_log_update/modulefiles"))
--load(pathJoin("prepobs", (os.getenv("prepobs_run_ver") or "None")))

--prepend_path("MODULEPATH", pathJoin("/scratch1/NCEPDEV/global/glopara/git/Fit2Obs/v" .. (os.getenv("fit2obs_ver") or "None"), "modulefiles"))
--load(pathJoin("fit2obs", (os.getenv("fit2obs_ver") or "None")))

whatis("Description: GFS run environment")
15 changes: 15 additions & 0 deletions modulefiles/module_gwci.noaacloud.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
help([[
Load environment to run GFS workflow setup scripts on noaacloud
]])

prepend_path("MODULEPATH", "/contrib/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")

load(pathJoin("stack-intel", os.getenv("2021.3.0")))
load(pathJoin("stack-intel-oneapi-mpi", os.getenv("2021.3.0")))

load(pathJoin("netcdf-c", os.getenv("4.9.2")))
load(pathJoin("netcdf-fortran", os.getenv("4.6.1")))
load(pathJoin("nccmp","1.9.0.1"))
load(pathJoin("wgrib2", "2.0.8"))

whatis("Description: GFS run setup CI environment")
20 changes: 20 additions & 0 deletions modulefiles/module_gwsetup.noaacloud.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
help([[
Load environment to run GFS workflow setup scripts on noaacloud
]])

load(pathJoin("rocoto"))

prepend_path("MODULEPATH", "/contrib/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")

local stack_intel_ver=os.getenv("stack_intel_ver") or "2021.3.0"
local python_ver=os.getenv("python_ver") or "3.10.3"

load(pathJoin("stack-intel", stack_intel_ver))
load(pathJoin("python", python_ver))
load("py-jinja2")
load("py-pyyaml")
load("py-numpy")
local git_ver=os.getenv("git_ver") or "1.8.3.1"
load(pathJoin("git", git_ver))
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved

whatis("Description: GFS run setup environment")
4 changes: 2 additions & 2 deletions parm/config/gfs/config.resources
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ case ${machine} in
;;
"AWSPW")
export PARTITION_BATCH="compute"
npe_node_max=40
npe_node_max=36
;;
"CONTAINER")
npe_node_max=1
Expand Down Expand Up @@ -812,7 +812,7 @@ case ${step} in
;;

"atmos_products")
export wtime_atmos_products="00:15:00"
export wtime_atmos_products="00:45:00"
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
export npe_atmos_products=24
export nth_atmos_products=1
export npe_node_atmos_products="${npe_atmos_products}"
Expand Down
2 changes: 1 addition & 1 deletion sorc/build_all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ build_opts["ww3prepost"]="${_wave_opt} ${_verbose_opt} ${_build_ufs_opt} ${_buil

# Optional DA builds
if [[ "${_build_ufsda}" == "YES" ]]; then
if [[ "${MACHINE_ID}" != "orion" && "${MACHINE_ID}" != "hera" && "${MACHINE_ID}" != "hercules" && "${MACHINE_ID}" != "wcoss2" ]]; then
if [[ "${MACHINE_ID}" != "orion" && "${MACHINE_ID}" != "hera" && "${MACHINE_ID}" != "hercules" && "${MACHINE_ID}" != "wcoss2" && "${MACHINE_ID}" != "noaacloud" ]]; then
echo "NOTE: The GDAS App is not supported on ${MACHINE_ID}. Disabling build."
else
build_jobs["gdas"]=8
Expand Down
29 changes: 4 additions & 25 deletions sorc/build_ufs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,30 +41,9 @@ COMPILE_NR=0
CLEAN_BEFORE=YES
CLEAN_AFTER=NO

if [[ "${MACHINE_ID}" != "noaacloud" ]]; then
BUILD_JOBS=${BUILD_JOBS:-8} ./tests/compile.sh "${MACHINE_ID}" "${MAKE_OPT}" "${COMPILE_NR}" "intel" "${CLEAN_BEFORE}" "${CLEAN_AFTER}"
mv "./tests/fv3_${COMPILE_NR}.exe" ./tests/ufs_model.x
mv "./tests/modules.fv3_${COMPILE_NR}.lua" ./tests/modules.ufs_model.lua
cp "./modulefiles/ufs_common.lua" ./tests/ufs_common.lua
else

if [[ "${PW_CSP:-}" == "aws" ]]; then
set +x
# TODO: This will need to be addressed further when the EPIC stacks are available/supported.
module use /contrib/spack-stack/envs/ufswm/install/modulefiles/Core
module load stack-intel
module load stack-intel-oneapi-mpi
module load ufs-weather-model-env/1.0.0
# TODO: It is still uncertain why this is the only module that is
# missing; check the spack build as this needed to be added manually.
module load w3emc/2.9.2 # TODO: This has similar issues for the EPIC stack.
module list
set -x
fi

export CMAKE_FLAGS="${MAKE_OPT}"
BUILD_JOBS=${BUILD_JOBS:-8} ./build.sh
mv "${cwd}/ufs_model.fd/build/ufs_model" "${cwd}/ufs_model.fd/tests/ufs_model.x"
fi
BUILD_JOBS=${BUILD_JOBS:-8} ./tests/compile.sh "${MACHINE_ID}" "${MAKE_OPT}" "${COMPILE_NR}" "intel" "${CLEAN_BEFORE}" "${CLEAN_AFTER}"
mv "./tests/fv3_${COMPILE_NR}.exe" ./tests/ufs_model.x
mv "./tests/modules.fv3_${COMPILE_NR}.lua" ./tests/modules.ufs_model.lua
cp "./modulefiles/ufs_common.lua" ./tests/ufs_common.lua

exit 0
1 change: 1 addition & 0 deletions sorc/link_workflow.sh
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ case "${machine}" in
"jet") FIX_DIR="/lfs4/HFIP/hfv3gfs/glopara/git/fv3gfs/fix" ;;
"s4") FIX_DIR="/data/prod/glopara/fix" ;;
"gaea") FIX_DIR="/gpfs/f5/epic/proj-shared/global/glopara/data/fix" ;;
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
"noaacloud") FIX_DIR="/contrib/Wei.Huang/data/hack-orion/fix" ;;
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
*)
echo "FATAL: Unknown target machine ${machine}, couldn't set FIX_DIR"
exit 1
Expand Down
26 changes: 15 additions & 11 deletions ush/forecast_postdet.sh
Original file line number Diff line number Diff line change
Expand Up @@ -268,20 +268,24 @@
fi
fi

# Get list of FV3 restart files
local file_list fv3_file
file_list=$(FV3_restarts)
### Check that there are restart files to copy
#if [[ ${#restart_dates} -gt 0 ]]; then
if [[ -n ${restart_dates} ]]; then
github-advanced-security[bot] marked this conversation as resolved.
Fixed
Show resolved Hide resolved
# Get list of FV3 restart files
local file_list fv3_file
file_list=$(FV3_restarts)

# Copy restarts for the dates collected above to COM
for restart_date in "${restart_dates[@]}"; do
echo "Copying FV3 restarts for 'RUN=${RUN}' at ${restart_date}"
for fv3_file in ${file_list}; do
${NCP} "${DATArestart}/FV3_RESTART/${restart_date}.${fv3_file}" \
"${COMOUT_ATMOS_RESTART}/${restart_date}.${fv3_file}"
# Copy restarts for the dates collected above to COM
for restart_date in "${restart_dates[@]}"; do
echo "Copying FV3 restarts for 'RUN=${RUN}' at ${restart_date}"
for fv3_file in ${file_list}; do
${NCP} "${DATArestart}/FV3_RESTART/${restart_date}.${fv3_file}" \
"${COMOUT_ATMOS_RESTART}/${restart_date}.${fv3_file}"
done
done
done

echo "SUB ${FUNCNAME[0]}: Output data for FV3 copied"
echo "SUB ${FUNCNAME[0]}: Output data for FV3 copied"
fi
}

# Disable variable not used warnings
Expand Down
2 changes: 1 addition & 1 deletion ush/load_fv3gfs_modules.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ source "${HOMEgfs}/versions/run.ver"
module use "${HOMEgfs}/modulefiles"

case "${MACHINE_ID}" in
"wcoss2" | "hera" | "orion" | "hercules" | "gaea" | "jet" | "s4")
"wcoss2" | "hera" | "orion" | "hercules" | "gaea" | "jet" | "s4" | "noaacloud")
module load "module_base.${MACHINE_ID}"
;;
*)
Expand Down
6 changes: 2 additions & 4 deletions ush/module-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,8 @@ elif [[ ${MACHINE_ID} = discover* ]]; then
# TODO: This can likely be made more general once other cloud
# platforms come online.
elif [[ ${MACHINE_ID} = "noaacloud" ]]; then

export SPACK_ROOT=/contrib/global-workflow/spack-stack/spack
export PATH=${PATH}:${SPACK_ROOT}/bin
. "${SPACK_ROOT}"/share/spack/setup-env.sh
# We are on NOAA Cloud
module purge

else
echo WARNING: UNKNOWN PLATFORM 1>&2
Expand Down
5 changes: 5 additions & 0 deletions versions/build.noaacloud.ver
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
export stack_intel_ver=2021.3.0
export stack_impi_ver=2021.3.0
export spack_env=gsi-addon-env
source "${HOMEgfs:-}/versions/build.spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"
11 changes: 11 additions & 0 deletions versions/run.noaacloud.ver
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
export stack_intel_ver=2021.3.0
export stack_impi_ver=2021.3.0
export spack_env=gsi-addon-env

export gempak_ver=7.4.2
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved

source "${HOMEgfs:-}/versions/run.spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"

export ncl_ver=6.6.2
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
export cdo_ver=2.2.0
2 changes: 1 addition & 1 deletion workflow/hosts.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def detect(cls):
elif container is not None:
machine = 'CONTAINER'
elif pw_csp is not None:
if pw_csp.lower() not in ['azure', 'aws', 'gcp']:
if pw_csp.lower() not in ['azure', 'aws', 'google']:
raise ValueError(
f'NOAA cloud service provider "{pw_csp}" is not supported.')
machine = f"{pw_csp.upper()}PW"
Expand Down
11 changes: 6 additions & 5 deletions workflow/hosts/awspw.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ DMPDIR: '/scratch1/NCEPDEV/global/glopara/dump' # TODO: This does not yet exist.
PACKAGEROOT: '/scratch1/NCEPDEV/global/glopara/nwpara' #TODO: This does not yet exist.
COMINsyn: '/scratch1/NCEPDEV/global/glopara/com/gfs/prod/syndat' #TODO: This does not yet exist.
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
HOMEDIR: '/contrib/${USER}'
STMP: '/lustre/${USER}/stmp2/'
PTMP: '/lustre/${USER}/stmp4/'
STMP: '/lustre/${USER}/stmp/'
PTMP: '/lustre/${USER}/ptmp/'
NOSCRUB: ${HOMEDIR}
ACCOUNT: hwufscpldcld
ACCOUNT_SERVICE: hwufscpldcld
ACCOUNT: ${USER}
ACCOUNT_SERVICE: ${USER}
SCHEDULER: slurm
QUEUE: batch
QUEUE_SERVICE: batch
Expand All @@ -16,8 +16,9 @@ PARTITION_SERVICE: compute
RESERVATION: ''
CHGRP_RSTPROD: 'YES'
CHGRP_CMD: 'chgrp rstprod' # TODO: This is not yet supported.
HPSSARCH: 'YES'
HPSSARCH: 'NO'
HPSS_PROJECT: emc-global #TODO: See `ATARDIR` below.
BASE_CPLIC: '/contrib/Wei.Huang/data/ICDIRS/prototype_ICs'
LOCALARCH: 'NO'
ATARDIR: '/NCEPDEV/${HPSS_PROJECT}/1year/${USER}/${machine}/scratch/${PSLOT}' # TODO: This will not yet work from AWS.
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
MAKE_NSSTBUFR: 'NO'
Expand Down
6 changes: 5 additions & 1 deletion workflow/rocoto/gefs_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from rocoto.tasks import Tasks
import rocoto.rocoto as rocoto
from datetime import datetime, timedelta

import os

weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
class GEFSTasks(Tasks):

Expand All @@ -11,6 +11,10 @@ def __init__(self, app_config: AppConfig, cdump: str) -> None:

def stage_ic(self):
cpl_ic = self._configs['stage_ic']
#The if block below is added for AWS.
#If we have a proper way to define 'BASE_CPLIC', this if block can be removed.
if ('BASE_CPLIC' not in cpl_ic.keys()):
cpl_ic['BASE_CPLIC'] = os.environ.get('BASE_CPLIC', '/contrib/Wei.Huang/data/ICDIRS/prototype_ICs')
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
deps = []
dtg_prefix = "@Y@m@d.@H0000"
offset = str(self._configs['base']['OFFSET_START_HOUR']).zfill(2) + ":00:00"
Expand Down
6 changes: 5 additions & 1 deletion workflow/rocoto/gfs_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from wxflow import timedelta_to_HMS
import rocoto.rocoto as rocoto
import numpy as np

import os

class GFSTasks(Tasks):

Expand All @@ -24,6 +24,10 @@ def stage_ic(self):

# Atm ICs
if self.app_config.do_atm:
#The if block below is added for AWS.
#If we have a proper way to define 'BASE_CPLIC', this if block can be removed.
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
if ('BASE_CPLIC' not in cpl_ic.keys()):
cpl_ic['BASE_CPLIC'] = os.environ.get('BASE_CPLIC', '/contrib/Wei.Huang/data/ICDIRS/prototype_ICs')
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
prefix = f"{cpl_ic['BASE_CPLIC']}/{cpl_ic['CPL_ATMIC']}/@Y@m@d@H/atmos"
for file in ['gfs_ctrl.nc'] + \
[f'{datatype}_data.tile{tile}.nc'
Expand Down
11 changes: 10 additions & 1 deletion workflow/rocoto/tasks.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/usr/bin/env python3

import os
import numpy as np
from applications.applications import AppConfig
import rocoto.rocoto as rocoto
Expand Down Expand Up @@ -214,7 +215,15 @@ def get_resource(self, task_name):
else:
native += ':shared'
elif scheduler in ['slurm']:
native = '--export=NONE'
#The PW_CSP is a AWS (CSPs parameter), if it is on CSPs, we need 'native' defined
#as below. Or, it won't run, but with an error:
#"ufs_model.x: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory"
#Even the library path is clearly in LD_LIBRARY_PATH, or load exactly the modules when build ufs_model.x
pw_csp = os.environ.get('PW_CSP', 'unknown')
if ( pw_csp in ['aws', 'azure', 'google'] ):
native = '--export=ALL --exclusive'
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
else:
native = '--export=NONE'
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
if task_config['RESERVATION'] != "":
native += '' if task_name in Tasks.SERVICE_TASKS else ' --reservation=' + task_config['RESERVATION']

Expand Down
22 changes: 17 additions & 5 deletions workflow/rocoto/workflow_xml.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from applications.applications import AppConfig
from rocoto.workflow_tasks import get_wf_tasks
import rocoto.rocoto as rocoto
import numpy as np
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
from abc import ABC, abstractmethod


Expand Down Expand Up @@ -156,11 +157,22 @@ def _write_crontab(self, crontab_file: str = None, cronint: int = 5) -> None:
replyto = ''

strings = ['',
f'#################### {pslot} ####################',
f'MAILTO="{replyto}"',
f'{cronintstr} {rocotorunstr}',
'#################################################################',
'']
f'#################### {pslot} ####################',
f'MAILTO="{replyto}"'
]
#AWS need 'SHELL', and 'BASH_ENV' defined, or, the crontab job won't start.
pw_csp = os.environ.get('PW_CSP')
if ( pw_csp in ['aws', 'azure', 'google'] ):
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
strings = np.append(strings,
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
[
f'SHELL="/bin/bash"',
f'BASH_ENV="/etc/bashrc"'
])
strings = np.append(strings,
[
f'{cronintstr} {rocotorunstr}',
'#################################################################',
''])

if crontab_file is None:
crontab_file = f"{expdir}/{pslot}.crontab"
Expand Down
Loading