Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reenable Orion Cycling Support #2877

Conversation

DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Aug 29, 2024

Description

This updates the model hash to include the UPP update needed to be able to run the post processor on Orion, thus reenabling support on that system.

A note on the UPP: it is using a newer version of g2tmpl that requires a separate spack-stack 1.6.0 installation. This version of g2tmpl will be standard in spack-stack 1.8.0, but for now requires loading separate modules for the UPP.

A note on running analyses on Orion: due to a yet-unknown issue causing the BUFR library to run much slower on Orion when compared with Rocky 8, the GSI and GDASApp are expected to run significantly slower than on any other platform (on the order of an hour longer).

Lastly, I made adjustments to the build_all.sh script to send more cores to compiling the UFS and GDASApp. Under this configuration, the GSI, UPP, UFS_Utils, and WW3 pre/post executables finish compiling before the UFS when run with 20 cores.

Resolves #2694
Resolves #2851

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? YES (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

  • Build on Orion
  • C48_ATM on Orion
  • C48_S2SW on Orion
  • C96C48_hybatmDA on Orion

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes

@DavidHuber-NOAA
Copy link
Contributor Author

@WenMeng-NOAA

The spack-stack environment used by the UPP (upp-addon) does not include xarray and uses older versions of numpy (v1.22.3) and jinja2 (3.0.3) than the gsi-addon environment. To make this work, I had to add a kludge to ush/python/pygfs/__init.py__ so that the marine Tasks' importing of xarray does not cause the UPP to fail. This can be removed once we get to spack-stack v1.8.0.

With these changes, the gdasatmanlupp and gfsatmanlupp jobs ran to completion. Wen, would you be able to look at the output files and verify they are OK? You can find a pair of examples here:

/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gdas.20211221/00/model/atmos/master/gdas.t00z.master.grb2anl
/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gfs.20211221/00/model/atmos/master/gfs.t00z.master.grb2anl

@WenMeng-NOAA
Copy link
Contributor

@DavidHuber-NOAA Can you change the access permission of /work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gdas.20211221/00/model/atmos/master/?

@DavidHuber-NOAA
Copy link
Contributor Author

@WenMeng-NOAA Yes, done.

@WenMeng-NOAA
Copy link
Contributor

@WenMeng-NOAA

The spack-stack environment used by the UPP (upp-addon) does not include xarray and uses older versions of numpy (v1.22.3) and jinja2 (3.0.3) than the gsi-addon environment. To make this work, I had to add a kludge to ush/python/pygfs/__init.py__ so that the marine Tasks' importing of xarray does not cause the UPP to fail. This can be removed once we get to spack-stack v1.8.0.

With these changes, the gdasatmanlupp and gfsatmanlupp jobs ran to completion. Wen, would you be able to look at the output files and verify they are OK? You can find a pair of examples here:

/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gdas.20211221/00/model/atmos/master/gdas.t00z.master.grb2anl
/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gfs.20211221/00/model/atmos/master/gfs.t00z.master.grb2anl

@DavidHuber-NOAA Somehow, the 6 aerosol fields (ATOK) are missing in both gfs and gdas master files. To output them from UPP, itag should be set as:

&nampgb
  kpo = 57,
  po = 1000.0, 975.0, 950.0, 925.0, 900.0, 875.0, 850.0, 825.0, 800.0, 775.0, 750.0, 725.0, 700.0, 675.0, 650.0, 625.0, 600.0, 575.0, 550.0, 525.0, 500.0, 475.0, 450.0, 425.0, 400.0, 375.0, 350.0, 325.0, 300.0, 275.0, 250.0, 225.0, 200.0, 175.0, 150.0, 125.0, 100.0, 70.0, 50.0, 40.0, 30.0, 20.0, 15.0, 10.0, 7.0, 5.0, 3.0, 2.0, 1.0, 0.7, 0.4, 0.2, 0.1, 0.07, 0.04, 0.02, 0.01,
  rdaod = .true.

@DavidHuber-NOAA
Copy link
Contributor Author

@WenMeng-NOAA Here are the contents of the nampgb namelist used to generate the master file:

&nampgb
  kpo = 57,
  po = 1000.0, 975.0, 950.0, 925.0, 900.0, 875.0, 850.0, 825.0, 800.0, 775.0, 750.0, 725.0, 700.0, 675.0, 650.0, 625.0, 600.0, 575.0, 550.0, 525.0, 500.0, 475.0, 450.0, 425.0, 400.0, 375.0, 350.0, 325.0, 300.0, 275.0, 250.0, 225.0, 200.0, 175.0, 150.0, 125.0, 100.0, 70.0, 50.0, 40.0, 30.0, 20.0, 15.0, 10.0, 7.0, 5.0, 3.0, 2.0, 1.0, 0.7, 0.4, 0.2, 0.1, 0.07, 0.04, 0.02, 0.01,
  rdaod = .true.

These appear to be the same. The experiment that I was running was ATM-only, so that may be why the aerosol fields were not present. I will try running the C96C48_hybatmaerosnowDA test case and let you know when I have an analysis master grib2 file ready.

@DavidHuber-NOAA
Copy link
Contributor Author

@WenMeng-NOAA I have finished running the C96C48_hybatmaerosnowDA test case. The analysis grib2 file can be found here: /work/noaa/global/dhuber/para/orion/COMROOT/hybaerosnow/gdas.20211221/00/model/atmos/master/gdas.t00z.master.grb2anl. Running wgrib2 | grep ATOK returned no entries. Is this a feature that requires a new g2 or g2tmpl module? Or is there perhaps a change in another namelist elsewhere?

@emcbot emcbot added CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Sep 5, 2024
@emcbot
Copy link

emcbot commented Sep 5, 2024

Experiment C96_atm3DVar_extended_c8710fe9 FAIL on Wcoss2 at 09/05/24 10:36:32 PM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f000.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f001.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f002.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f003.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f004.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f005.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f006.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f007.log

Follow link here to view the contents of the above file(s): (link)

@WalterKolczynski-NOAA
Copy link
Contributor

The index file isn't being created in the GOES UPP job. I checked $DATA and it isn't there (GFSGOES.GrbF00 is there).

�[38;21m2024-09-05 22:30:49,031 - INFO     - upp         : Copy 'goes' processed data to COM/ directory�[0m
�[38;21m2024-09-05 22:30:49,034 - INFO     - file_utils  : Copied /lfs/h2/emc/stmp/terry.mcguinness/RUNDIRS/C96_atm3DVar_extended_c8710fe9/gfs.2021122100/upp.161988/GFSGOES.GrbF00 to /lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/gfs.20211221/00//model/atmos/master/gfs.t00z.special.grb2f000�[0m
Traceback (most recent call last):
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/fsutils.py", line 85, in cp
    shutil.copy2(source, target)
  File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/shutil.py", line 432, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/shutil.py", line 261, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/lfs/h2/emc/stmp/terry.mcguinness/RUNDIRS/C96_atm3DVar_extended_c8710fe9/gfs.2021122100/upp.161988/GFSGOES.GrbF00.idx'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/scripts/exglobal_atmos_upp.py", line 48, in <module>
    main()
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/scripts/exglobal_atmos_upp.py", line 44, in main
    upp.finalize(upp_dict.upp_run, upp_yaml)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/pygfs/task/upp.py", line 264, in finalize
    FileHandler(upp_yaml[upp_run].data_out).sync()
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/file_utils.py", line 43, in sync
    sync_factory[action](files)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/file_utils.py", line 63, in _copy_files
    cp(src, dest)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/fsutils.py", line 87, in cp
    raise OSError(f"unable to copy {source} to {target}")
OSError: unable to copy /lfs/h2/emc/stmp/terry.mcguinness/RUNDIRS/C96_atm3DVar_extended_c8710fe9/gfs.2021122100/upp.161988/GFSGOES.GrbF00.idx to 

@emcbot emcbot added CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Sep 6, 2024
@emcbot
Copy link

emcbot commented Sep 6, 2024

CI Passed on Orion in Build# 5
Built and ran in directory /work2/noaa/stmp/CI/ORION/2877


Experiment C48_ATM_c8710fe9 Completed 1 Cycles: *SUCCESS* at Thu Sep  5 05:32:18 PM CDT 2024
Experiment C96C48_hybatmDA_c8710fe9 Completed 3 Cycles: *SUCCESS* at Thu Sep  5 06:52:35 PM CDT 2024
Experiment C96_atm3DVar_c8710fe9 Completed 3 Cycles: *SUCCESS* at Thu Sep  5 06:52:43 PM CDT 2024
Experiment C48_S2SWA_gefs_c8710fe9 Completed 1 Cycles: *SUCCESS* at Thu Sep  5 06:59:03 PM CDT 2024
Experiment C48_S2SW_c8710fe9 Completed 1 Cycles: *SUCCESS* at Thu Sep  5 07:10:15 PM CDT 2024

@aerorahul
Copy link
Contributor

Let's disable the GOES product generation and merge this PR.
Something has changed unintentionally in there that needs to be further investigated.
HR4 needs to commence immediately.

@aerorahul
Copy link
Contributor

NOAA-EMC/UPP@97ea655
here GFSGOES was renamed in the post_gfs_goes control file.

https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/python/pygfs/task/upp.py#L203-L207
The index file is created from GFSPRS and GFSFLX files. The above change in from GFSPRS to GFSGOES broke this part of the workflow.

@aerorahul
Copy link
Contributor

@DavidHuber-NOAA

for ftype in ['PRS', 'FLX']:

Can you add GOES to the list here?
['PRS', 'FLX', 'GOES']
This should be done properly, but this will do in a pinch.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS and removed CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully labels Sep 6, 2024
@emcbot emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Sep 6, 2024
@emcbot
Copy link

emcbot commented Sep 6, 2024

CI Update on Wcoss2 at 09/06/24 12:29:07 PM
============================================
Cloning and Building global-workflow PR: 2877
with PID: 216532 on host: dlogin03

@emcbot emcbot added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Sep 6, 2024
@emcbot
Copy link

emcbot commented Sep 6, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Fri Sep  6 12:32:39 UTC 2024 on dlogin03
---------------------------------------------------
Build: Completed at 09/06/24 01:13:16 PM
Case setup: Completed for experiment C48_ATM_8f7bbc44
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_8f7bbc44
Case setup: Skipped for experiment C48_S2SWA_gefs_8f7bbc44
Case setup: Completed for experiment C48_S2SW_8f7bbc44
Case setup: Completed for experiment C96_atm3DVar_extended_8f7bbc44
Case setup: Skipped for experiment C96_atm3DVar_8f7bbc44
Case setup: Completed for experiment C96C48_hybatmaerosnowDA_8f7bbc44
Case setup: Completed for experiment C96C48_hybatmDA_8f7bbc44
Case setup: Completed for experiment C96C48_ufs_hybatmDA_8f7bbc44

@aerorahul aerorahul mentioned this pull request Sep 6, 2024
7 tasks
@emcbot emcbot added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Sep 7, 2024
@emcbot
Copy link

emcbot commented Sep 7, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_8f7bbc44 *** SUCCESS *** at 09/06/24 02:36:12 PM
Experiment C48_S2SW_8f7bbc44 *** SUCCESS *** at 09/06/24 02:51:12 PM
Experiment C96C48_hybatmDA_8f7bbc44 *** SUCCESS *** at 09/06/24 03:48:26 PM
Experiment C96C48_hybatmaerosnowDA_8f7bbc44 *** SUCCESS *** at 09/06/24 04:48:41 PM
Experiment C96C48_ufs_hybatmDA_8f7bbc44 *** SUCCESS *** at 09/06/24 05:42:35 PM
Experiment C96_atm3DVar_extended_8f7bbc44 *** SUCCESS *** at 09/07/24 03:12:49 AM

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit ac93a9b into NOAA-EMC:develop Sep 7, 2024
5 checks passed
DavidHuber-NOAA added a commit to DavidHuber-NOAA/global-workflow that referenced this pull request Sep 9, 2024
* origin/develop:
  Create JEDI class (NOAA-EMC#2805)
  Restructure the bufr sounding job    (NOAA-EMC#2853)
  Add an archive task to GEFS system to archive files locally (NOAA-EMC#2816)
  Reenable Orion Cycling Support (NOAA-EMC#2877)
  Eliminate race conditions and remove DATAROOT last in cleanup (NOAA-EMC#2893)
  Update aerosol climatology to 2013-2024 mean (NOAA-EMC#2888)
  Add ability to run CI test C96_atm3DVar.yaml to Gaea-C5 (NOAA-EMC#2885)
  Support global-workflow GEFS C48 on Google Cloud (NOAA-EMC#2861)
  Add 3 and 9 hr increment files to IC staging (NOAA-EMC#2876)
  Add diffusion/diag B for aerosol DA and some other needed changes (NOAA-EMC#2738)
  Correct ocean `MOM.res_#` stage copy (NOAA-EMC#2868)
  Support coupling on AWS (NOAA-EMC#2859)
  Add JEDI ATM lgetkf observer and solver jobs (NOAA-EMC#2833)
  Fix gdas build on Gaea and add Gaea to available CI list (NOAA-EMC#2857)
  Support ATM forecast only on Google (NOAA-EMC#2832)
  Add GEFS C48 support on AWS (NOAA-EMC#2818)
  Update omega calculation (NOAA-EMC#2751)
  Add snow DA update and recentering for the EnKF forecasts (NOAA-EMC#2690)
  support ATM forecast only on Azure (NOAA-EMC#2827)
  Convert staging job to python and yaml (NOAA-EMC#2651)
  Fixed test on UNAVAILBLE in python Rocoto check (NOAA-EMC#2842)
@DavidHuber-NOAA DavidHuber-NOAA deleted the feature/orion_upp_update branch November 4, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update ufs_model.fd with new commit from ufs-weather-model Orion: Migration to Rocky9 OS
6 participants