Unify obs and test configurations between FV3-JEDI and MPAS-JEDI #255

SamuelDegelia-NOAA · 2024-12-18T14:49:58Z

Description

The goal of this PR is to unify the obs and configuration between FV3-JEDI and MPAS-JEDI. New obs are generated by running the offline domain check with the FV3-JEDI grid (the smaller of the two grids). These new obs are staged in $RDAS_DATA/fix/expr_data/obs_2024052700 and linked into the fix directories for both test cases.

I also removed the older obs_ctest directory since it was confusing to have separate directories for the full obs and the obs used for the ctest. A lot of those files actually overlapped anyways. Now there is just one obs directory that should be the same for both test cases.

A lot of other little things are included in this PR to better unify the configuration between the two test cases and GSI:

ATMS obs are added to the FV3-JEDI tests to match the MPAS-JEDI obs list. There was previously an issue with the field names for soil temperature and moisture needed for CRTM that has been resolved with updates to the staged file: Data/fieldmetadata/tlei-gfs-restart.yaml.
The time window was increased for the GETKF tests to allow ATMS obs to pass QC for the smaller domain
niter and gradient norm reduction are now consistent between both cases for the Ens3Dvar test.
The &analysisDate variable is now set correctly for FV3-JEDI to better match wind obs counts compared to MPAS-JEDI (important for the temporal thinning filter).
The same localization radii are also now used for FV3-JEDI, MPAS-JEDI, and GSI. Bump was rerun for both FV3-JEDI and MPAS-JEDI. See here for a summary of the new localization radii and their units
The same number of mpi tasks (160) are now used for both sets of ctests and the bumploc files are updated for MPAS-JEDI for this purpose.
Since the GSI validation is an important part of RDASApp at the moment, I added a few of the important fix files into the repo for better tracking. These are under RDASApp/rrfs-test/gsi_fix.

Lastly, a small utility is also added called rrfs-test/ush/print_ctest_runtime.py that can be used for printing the actual runtime of the ctests (instead of the runtime + wait time included in the normal ctest output).

Issue(s) addressed

#246

Dependencies (if applicable)

List the other PRs that this PR is dependent on:
None

Checklist

I have performed a self-review of my own code.
I have run rrfs tests before creating the PR (if applicable).
I have staged the relevant data on all supported machines.

…util

…fv3mpas

…er domain

… reference

SamuelDegelia-NOAA · 2024-12-18T14:51:51Z

Ctest results from Hera:

[Samuel.Degelia@hfe06 rrfs-test]$ ctest -j8
Test project /scratch1/BMC/zrtrr/Samuel.Degelia/RDASApp_unify_fresh2/RDASApp/build/rrfs-test
    Start 2: rrfs_fv3jedi_2024052700_getkf_observer
    Start 5: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_2024052700_Ens3Dvar
    Start 4: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 7: rrfs_mpasjedi_2024052700_bumploc
    Start 8: rrfs_bufr2ioda_msonet
1/8 Test #8: rrfs_bufr2ioda_msonet .....................   Passed   32.57 sec
2/8 Test #7: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  210.26 sec
3/8 Test #5: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  259.36 sec
    Start 6: rrfs_mpasjedi_2024052700_getkf_solver
4/8 Test #2: rrfs_fv3jedi_2024052700_getkf_observer ....   Passed  262.80 sec
    Start 3: rrfs_fv3jedi_2024052700_getkf_solver
5/8 Test #1: rrfs_fv3jedi_2024052700_Ens3Dvar ..........   Passed  286.87 sec
6/8 Test #4: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  383.32 sec
7/8 Test #3: rrfs_fv3jedi_2024052700_getkf_solver ......   Passed  534.06 sec
8/8 Test #6: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  863.33 sec

100% tests passed, 0 tests failed out of 8

Label Time Summary:
mpi            = 2832.56 sec*proc (8 tests)
rdas-bundle    = 2832.56 sec*proc (8 tests)
script         = 2832.56 sec*proc (8 tests)

Total Test time (real) = 1123.28 sec

SamuelDegelia-NOAA · 2024-12-18T14:53:20Z

Here is a comparison of the analyses from FV3-JEDI and MPAS-JEDI:

FV3-JEDI:

MPAS-JEDI:

The structure of the increments are generally consistent. There are some regions where increments are about 1-2 K larger in MPAS-JEDI compared to FV3-JEDI.

delippi

This PR is quite large. It's generally better to keep PRs smaller when possible to make reviews easier. One of my main concerns is with the large yaml files under testinput. Are those files truly necessary? It seems like you already have templates in place, which could significantly reduce the amount of committed code. In a sense, committing unnecessarily long files is similar to committing binaries--every update to these files creates a new snapshot that github must store. Over time, this can increase repository size and impact the time required for cloning and pulling.

delippi · 2024-12-19T14:20:21Z

rrfs-test/testinput/rrfs_bufr2ioda_msonet.yaml

Is this file really needed? This is just a duplicate of rrfs-test/IODA/yaml/prepbufr_msonet.yaml except that the reference time is already filled in. Could we just use that one and fill in the place holder? That might be better than trying to maintain two.

This is related to my response below. Since we decided to commit the larger ctest yamls, I also kept this filled in version of the yaml used for the bufr2ioda ctest.

delippi · 2024-12-19T14:39:30Z

rrfs-test/testinput/rrfs_fv3jedi_2024052700_Ens3Dvar.yaml

+     - obs space:
+         name: atms_npp
+         distribution:
+           name: "RoundRobin"
+           halo size: 100e3
+         obsdatain:
+           engine:
+             type: H5File
+             obsfile: "Data/obs/atms_npp_obs_2024052700_dc.nc"


I just realized that the each of these files are the large yamls and not templated. Can you update each of these such that the obs space section is a place holder and then the validated yamls get inserted at run time? The atms section alone is almost 600 lines. The entire file length is over 3500 lines. There are 8 of such files. That is a lot of lines we don't need.

The other argument for this is that you don't want to have to maintain these files when someone updates the corresponding "official" yamls.

Edit: Well, I see down below that you might be using the templating. So now I'm confused why we need these yamls since I think they might be generated at run time.

@delippi for ctest case, I think we can use the large yamls. All yamls should be treated as fix files so the the developer can run ctest to ensure that there are no error in RDASApp. But we should also include an instruction of how to generate these yamls from JCB(?) so that they may start their development work. We can further discuss this.

delippi · 2024-12-19T14:44:02Z

rrfs-test/validated_yamls/templates/basic_config/fv3jedi_en3dvar.yaml

+        filename_sfcw: 20240527.000000.fv_srf_wnd.res.tile1.nc
+        filename_cplr: 20240527.000000.coupler.res
+      pattern: "%mem%"
+      nmembers: 20


Thanks for catching this - will fix it.

delippi · 2024-12-19T14:46:11Z

rrfs-test/validated_yamls/gen_yaml_fv3jedi_ctest.sh

+# Define the aircar observation type configs as an array
+aircar_obtype_configs=(
+    "aircar_airTemperature_133.yaml"
+    "aircar_uv_233.yaml"


this should be aircar_winds_233.yaml

Will fix this - the same problem is in gen_yaml_mpasjedi_ctest.sh too. Thanks!

Actually the template file is still called aircar_uv_233.yaml for now until #251 is merged. If that PR is merged before this one (which it probably will be), I will change this then.

Oh yeah haha

delippi · 2024-12-19T14:47:00Z

rrfs-test/validated_yamls/gen_yaml_fv3jedi_ctest.sh

+
+  # Concatenate all obtypes into the super yaml
+  process_obtypes "${ctest_names[$iconfig]}" "aircar_obtype_configs[@]" "Data/obs/ioda_aircar_dc.nc"             "$temp_yaml"
+  process_obtypes "${ctest_names[$iconfig]}" "aircft_obtype_configs[@]" "Data/obs/ioda_aircft_dc.nc"             "$temp_yaml"


I'm not confident with these. These are only up to the point of Phase 1. Is it too much to just turn them off for now? (I'm talking about just "aircft" NOT "aircar")

Sure, I can turn those off!

SamuelDegelia-NOAA · 2024-12-19T15:49:48Z

This PR is quite large. It's generally better to keep PRs smaller when possible to make reviews easier. One of my main concerns is with the large yaml files under testinput. Are those files truly necessary? It seems like you already have templates in place, which could significantly reduce the amount of committed code. In a sense, committing unnecessarily long files is similar to committing binaries--every update to these files creates a new snapshot that github must store. Over time, this can increase repository size and impact the time required for cloning and pulling.

I agree that this PR is very large - sorry about that! I just kept finding little things that needed to be changed to match the configuration between GSI/FV3-JEDI/MPAS-JEDI. I kept adding those changes to the PR but I probably should have stopped somewhere and made individual PRs. One thing I struggled with is that the test references have to be updated for each change in this PR, so if we made individual PRs then there could be conflicts. Also, updating the mpi tasks for each ctest also meant rerunning bump for each solver. So it made sense to include localization changes at the same time. But still, I agree that this became too large.

Would it still be helpful to go ahead and split this up?

I could have individual PRs for

Adding ATMS to FV3-JEDI
Updating a few yaml options (niter, gradient norm reduction, &analysisDate)
Updating localization radii + using 160 ntasks
Adding GSI fix files to repo

SamuelDegelia-NOAA · 2024-12-19T15:51:14Z

Also, in regards to commiting the super yamls, this is something we discussed in #184 and #187. It was suggested to commit the super yamls into the repo (instead of creating them at runtime or during the build) so that other developers can use them as templates.

But I agree that as these yamls become larger, that PRs like this will have lots of changes to them which make it very hard to review. These changes would also show up in both the templates and the super yamls. I am not fully sure the solution here, but maybe we should revisit the choice to commit the super yamls. Tagging @guoqing-noaa and @ShunLiu-NOAA here for potential thoughts.

guoqing-noaa · 2024-12-19T16:05:07Z

rrfs-test/testinput_expr/rrfs_mpasjedi_2024052700_Ens3Dvar.yaml

-            data directory: data/bumploc/conus12km-401km11levels
-            files prefix: bumploc_401km11levels
+            data directory: data/bumploc/conus12km-401km_1p095logp
+            files prefix: bumploc_401km_1p095logp


This is to be discussed.
I think In the deterministic part of RAP/HRRR/RRFS, we have been using the "modellevel" for the vertical effective scale or vertical localization radius since the beginning. Whether we want to change it to "logp" for MPASJEDI?

This change was mainly a result of my experimenting with BUMP for fv3-jedi. I originally thought that the options for vertical localization in fv3-jedi were sigma and logp (modellevel is definitely not an option). So since this change is primarily made in the interest of keeping a consistent configuration between GSI/FV3-JEDI/MPAS-JEDI during this early stage of testing, I decided to use logp for all test cases. Note that I am not saying that we should use logp for the eventual workflow, just that logp makes sense at this early stage of development.

But later I found out that logp also does not work for fv3-jedi and only sigma coordinates can be used for vertical localization (see discussion in #53). So then I changed fv3-jedi back to sigma, leaving mpasjedi and GSI with logp coordinates. But since both mpasjedi and GSI can use modellevel as was used in RRFS_A (at least for conventional variables, tracers use logp with VDL), it makes sense to change this back to modellevel. I will make that change.

~~Also, RRFS uses 3 modellevels in its retro configuration so this will still need to be changed for MPAS-JEDI no matter what.~~ Nevermind, 11 levels is the conversion from e-folding to the cutoff. So I will keep 401km11levels for MPAS-JEDI but still rerun to create the bumploc files with 160 mpi tasks.

And sorry for the rambling response... basically I will change this back to using modellevel!

I changed the vertical localization for MPAS-JEDI back to 11 model levels and GSI back to 3 model levels. The only difference now for MPAS-JEDI is that the bumploc files are generated for 160 mpi tasks.

For FV3-JEDI, using sigma = 0.04 matches pretty closely with the GSI configuration. However, the MPAS-JEDI vertical localization of 11 model levels does not match the 3 model levels used in GSI. This is because the vertical levels near the surface in MPAS-JEDI are less dense. So the equation used to convert from GSI to JEDI is not valid here.

However, since the point of unifying these configurations is primarily to test runtime differences, I will leave the configuration as shown in the plots below. Changing the modellevels vloc in MPAS-JEDI to match GSI or FV3-JEDI is getting awfully close to tuning territory and I do not think we are interested in that at this stage. We can just be aware that there will be additional differences between MPAS-JEDI and GSI/FV3-JEDI due to the vertical grid spacing and localization.

@SamuelDegelia-NOAA Thanks for the great tests.

We will update the MPASJEDI case soon to use the recent 65 levels which are more consistent with the RRFSv1 setting. So we don't need to worry about the MPASJEDI part for now.

delippi · 2024-12-19T16:11:41Z

Also, in regards to commiting the super yamls, this is something we discussed in #184 and #187. It was suggested to commit the super yamls into the repo (instead of creating them at runtime or during the build) so that other developers can use them as templates.

But I agree that as these yamls become larger, that PRs like this will have lots of changes to them which make it very hard to review. These changes would also show up in both the templates and the super yamls. I am not fully sure the solution here, but maybe we should revisit the choice to commit the super yamls. Tagging @guoqing-noaa and @ShunLiu-NOAA here for potential thoughts.

I’d like to revisit this discussion, please, as I may have missed it earlier. I don’t think we should commit any of the large "super" yamls to RDASApp or JCB. We should only maintain a single copy of each file to keep the repository manageable. Committing these large files makes the PR unnecessarily large and harder to maintain and prone error.

If the argument for including them is that they can serve as templates for other developers, that doesn’t make sense. Developers should reference the validated_yamls instead, as those are expected to be the most up-to-date versions and should be used as examples for developing their own work.

Additionally, relying on these super yamls in ctest introduces a risk of missing updates or using outdated versions, which can cause issues down the line. For example, if the super yamls continue using "DRIPCG," that outdated configuration could propagate to other PRs and test results, perpetuating the problem.

…n tune sigma for FV3-JEDI

guoqing-noaa · 2024-12-20T00:28:02Z

I would like to make a clarification that GIT treats text files and binary files very differently.

For binary files, GIT cannot track delta changes. It can only save a complete copy for different commits. That will increase a repo size quickly.

For text files, GIT uses a delta encoding mechanism to store only the differences (or "deltas") between the file versions. For example, for a 1M text file (which is a super large text file. As an example, a 22000-lines YAML file is about 572KB) , if we only modify 50 lines, when we git commit, Git stores only the changes (i.e., the modified 50 lines) and references the original file for the unchanged parts. Further, GIT compresses text files very efficiently in its storage.

delippi · 2024-12-20T03:05:23Z

@guoqing-noaa thanks for adding that clarification. That makes sense that binaries are treated different in that way. I think there are valid reasons for doing this either way and I hope we can all discuss and come to some consensus on how we should handle this.

I would say that my main concern is that we should just keep one copy of each yaml part vs having as many as 9 different copies of the same thing that you have to maintain.

delippi · 2024-12-20T20:07:24Z

rrfs-test/testinput/rrfs_bufr2ioda_msonet.yaml

I meant to have commented on this file before. My apologies if I already did.

Is this file really needed? This is a duplicate of that is already in prepbufr_msonet.yaml

Just pushed an update to use not use this file for the bufr2ioda ctest anymore. Now CMakeLists.txt will copy rrfs-test/IODA/yaml/prepbufr_msonet.yaml into the ctest run directory and then run sed to replace the @REFERENCETIME@ string.

delippi · 2024-12-20T20:08:49Z

rrfs-test/validated_yamls/gen_yaml_fv3jedi_ctest.sh

+# Define the aircar observation type configs as an array
+aircar_obtype_configs=(
+    "aircar_airTemperature_133.yaml"
+    "aircar_uv_233.yaml"


Oh yeah haha

…file in IODA/yaml/prepbufr_msonet.yaml

SamuelDegelia-NOAA · 2024-12-20T21:27:01Z

I'll add a bit to the discussion about committing the super yamls. If we commit these files, it can make PRs like this one hard to review as we see lots of extra changes show up. For example, adding ATMS obs back to the ctests for FV3-JEDI caused lots of big changes to rrfs-test/testinput/rrfs_fv3jedi_2024052700_Ens3Dvar.yaml when really those additions just come from the existing file rrfs-test/validated_yamls/templates/obtype_config/atms_npp_qc_bc.yaml. Plus, updates like those in this PR show up in both the super yaml and the templates which makes the PR look larger than it is (even though this is still a large PR...)

However, I just realized that there is a downside to creating the super yamls during build.sh. Currently, our super yamls lag behind recent updates to the templates in obtype_config because the gen_yaml_${DYCORE}_ctest.sh scripts have not been run for a bit. This was done so that not everyone needed to learn how to edit the gen yaml scripts and update the ctest reference files if they made changes to the templates. But if we instead run this script during the build, then the ctests would be updated anytime that the templates in obtype_config change. And thus users would have to update the test reference files as part of their PR too.

At this point, I am thinking that it might be worth it to just leave the super yamls as they are in the repo and not dynamically update them for the ctests. We wanted to increase the number of obs for the ctests for more realistic testing, but at this point these yamls already have a large number of obs and are now useful for that purpose. So maybe we can just keep the ctests as is and only update them and their associated super yamls if there are any updates to the test cases itself (and not the obs space).

SamuelDegelia-NOAA and others added 25 commits December 4, 2024 19:30

Add templates for fv3jedi yaml, add back atms obs, add ctest runtime …

a82ba6d

…util

Fix offline domain check for fv3 domain and abi obs

b976d9e

Link in new staged obs that are the same for both cases

37ae174

Replace "obs_ctest" with only "obs" directory now

f5b35d1

Update name of atms obs for domain check

7d12e6a

Same as before, remove obs_ctest reference

917ce66

bug fixes

9386370

Merge branch 'NOAA-EMC:develop' into feature/unify_obs_fv3mpas

af038be

Update test reference and bufr2ioda ctest for updated bufr2ioda yamls

82e645e

Merge branch 'NOAA-EMC:develop' into feature/unify_obs_fv3mpas

8c56246

Unify var config for fv3 Ens3Dvar case

033e35b

Update update script

c2cd4d4

Update time window setting that led to different obs counts

3a59ea1

Update test reference

8f5ed11

Merge remote-tracking branch 'origin/develop' into feature/unify_obs_…

5382073

…fv3mpas

Update GETKF to have larger time window to allow ATMS obs in on small…

fad6b51

…er domain

Use same localization and mpi tasks for MPAS and FV3, and update test…

3835402

… reference

Update run scripts to use 160 mpi tasks for mpas-jedi

c126960

Merge branch 'NOAA-EMC:develop' into feature/unify_obs_fv3mpas

c6a5681

Update localization for all configs, add gsi fix to repo

5a0ab71

Remove old bumploc that was not used

11f5bd2

Fix basic_config with new bump settings

dc1f2bb

Fix link script for new bumploc directory

4ab024c

Update test reference

af98551

Update plotting scripts for fv3jedi to be consistent with mpasjedi

052821d

SamuelDegelia-NOAA requested review from guoqing-noaa, delippi, hu5970 and ShunLiu-NOAA and removed request for guoqing-noaa December 18, 2024 14:50

SamuelDegelia-NOAA requested review from guoqing-noaa and TingLei-NOAA December 18, 2024 14:50

SamuelDegelia-NOAA linked an issue Dec 18, 2024 that may be closed by this pull request

Unify obs used for FV3-JEDI and MPAS-JEDI tests #246

Open

delippi reviewed Dec 19, 2024

View reviewed changes

guoqing-noaa reviewed Dec 19, 2024

View reviewed changes

SamuelDegelia-NOAA and others added 3 commits December 19, 2024 14:01

Merge branch 'develop' into feature/unify_obs_fv3mpas

4ac84d7

Change localization back to 11 modellevels for GSI, MPAS-JEDI and the…

5452252

…n tune sigma for FV3-JEDI

Fix space

c3c353c

SamuelDegelia-NOAA added 2 commits December 20, 2024 17:46

Update fv3jedi templates to use 30 members

6f97dee

Update gen yaml to not use aircft

5c924e6

delippi reviewed Dec 20, 2024

View reviewed changes

Remove testinput/rrfs_bufr2ioda_msonet.yaml and instead use existing …

e39dafa

…file in IODA/yaml/prepbufr_msonet.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify obs and test configurations between FV3-JEDI and MPAS-JEDI #255

Unify obs and test configurations between FV3-JEDI and MPAS-JEDI #255

SamuelDegelia-NOAA commented Dec 18, 2024 •

edited

Loading

SamuelDegelia-NOAA commented Dec 18, 2024

SamuelDegelia-NOAA commented Dec 18, 2024 •

edited

Loading

delippi left a comment

delippi Dec 19, 2024

SamuelDegelia-NOAA Dec 19, 2024

delippi Dec 19, 2024

ShunLiu-NOAA Dec 19, 2024

delippi Dec 19, 2024

SamuelDegelia-NOAA Dec 19, 2024

delippi Dec 19, 2024

SamuelDegelia-NOAA Dec 19, 2024

SamuelDegelia-NOAA Dec 20, 2024 •

edited

Loading

delippi Dec 20, 2024

delippi Dec 19, 2024

SamuelDegelia-NOAA Dec 19, 2024

SamuelDegelia-NOAA commented Dec 19, 2024

SamuelDegelia-NOAA commented Dec 19, 2024

guoqing-noaa Dec 19, 2024

SamuelDegelia-NOAA Dec 19, 2024 •

edited

Loading

SamuelDegelia-NOAA Dec 20, 2024 •

edited

Loading

guoqing-noaa Dec 20, 2024

delippi commented Dec 19, 2024

guoqing-noaa commented Dec 20, 2024

delippi commented Dec 20, 2024

delippi Dec 20, 2024

SamuelDegelia-NOAA Dec 20, 2024 •

edited

Loading

delippi Dec 20, 2024

SamuelDegelia-NOAA commented Dec 20, 2024

Unify obs and test configurations between FV3-JEDI and MPAS-JEDI #255

Are you sure you want to change the base?

Unify obs and test configurations between FV3-JEDI and MPAS-JEDI #255

Conversation

SamuelDegelia-NOAA commented Dec 18, 2024 • edited Loading

Description

Issue(s) addressed

Dependencies (if applicable)

Checklist

SamuelDegelia-NOAA commented Dec 18, 2024

SamuelDegelia-NOAA commented Dec 18, 2024 • edited Loading

delippi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamuelDegelia-NOAA Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamuelDegelia-NOAA commented Dec 19, 2024

SamuelDegelia-NOAA commented Dec 19, 2024

Choose a reason for hiding this comment

SamuelDegelia-NOAA Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

SamuelDegelia-NOAA Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delippi commented Dec 19, 2024

guoqing-noaa commented Dec 20, 2024

delippi commented Dec 20, 2024

Choose a reason for hiding this comment

SamuelDegelia-NOAA Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamuelDegelia-NOAA commented Dec 20, 2024

SamuelDegelia-NOAA commented Dec 18, 2024 •

edited

Loading

SamuelDegelia-NOAA commented Dec 18, 2024 •

edited

Loading

SamuelDegelia-NOAA Dec 20, 2024 •

edited

Loading

SamuelDegelia-NOAA Dec 19, 2024 •

edited

Loading

SamuelDegelia-NOAA Dec 20, 2024 •

edited

Loading

SamuelDegelia-NOAA Dec 20, 2024 •

edited

Loading