Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase 2 validation: updates for msonet yamls #188

Open
wants to merge 24 commits into
base: develop
Choose a base branch
from

Conversation

delippi
Copy link
Collaborator

@delippi delippi commented Oct 10, 2024

Finished phase 2 testing with great results. The main additions were updates gross error check and better use of QualityMarker. For winds, I added all the providers and specific stations from each provider similarly to the GSI mesonetuselist and mesonet_stnuselist.

Results are documented in Issue #80

Here is the combined test for all msonet data.
fv3jedi_vs_gsi_increment_airTemperature_msonet_all

…updates gross error check and better use of QualityMarker. For winds, I added all the providers and specific stations from each provider similarly to the GSI mesonetuselist and mesonet_stnuselist.
@delippi delippi self-assigned this Oct 10, 2024
@@ -1,5 +1,5 @@
- obs space:
name: msonet
name: msonet_airTemperature_188
Copy link
Collaborator

@guoqing-noaa guoqing-noaa Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too verbose. msonet_t_188 will be enough to be descriptive and consistent with msonet_uv_188

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guoqing-noaa Thanks for the comment. I was actually thinking about this a little bit when you suggested it in Sam's ctest PR. My thinking was to keep it consistent with the naming convention of the yaml file itself. I don't see why not? IMO it seems a little cleaner. And for the yaml files, I was trying to use the naming convention of the variable names used in JEDI. Where I was having more trouble was actually with my naming convention of msonet_uv_288 which I kind of do not like. I would personally like to change this to be more consistent with the others, but this contains both windEastward and windNorthward obtypes. I don't think they can be split into two obspaces, nor would that be a better idea... although maybe... Anyway, I didn't want it to be msonet_windEastward_windNorthward_288.yaml because that seems too long and it seems you would agree. What do you think about these options?

  1. msonet_winds_288.yaml
  2. msonet_windComponents_288.yaml
  3. msonet_windEastNorth_288.yaml
  4. msonet_windEastNorthward_288.yaml

How firm are you on using simplified variable names like T, q, uv, ps? I just like keeping it consistent with the JEDI variable naming conventions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think t, q, uv, ps is descriptive enough. GSI used it for more than 20 years and I did not hear any problems or complains with that. I don't like JEDI's style to make many things cumbersome and more difficult. With that said, any place which is mandated by JEDI (such as keywords), we have to follow that. But it is lucky that obs_space.name is not that case and we may use traditional good names. BTW, uv is not bad at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I see your point about the simplicity of using shorter names like t, q, and uv, especially since they've worked well in GSI for a long time. However, as we're moving away from GSI and fully transitioning to JEDI, I think it makes sense to stick with JEDI's more descriptive naming conventions for consistency across the project.

I agree that people will understand abbreviations like t for airTemperature, but using the full names across the board keeps everything aligned with the JEDI framework and can make things like searching through files easier. For example, searching for airTemperature is more straightforward than searching for t. While this specific instance might not be an issue since the whole string would be msonet_t_188, adopting msonet_airTemperature_188 makes everything more consistent.

Additionally, the yaml file itself is named msonet_airTemperature_188, so I'd really like to keep the naming inside the file consistent with the file name for clarity. What do you think?

Copy link
Collaborator

@guoqing-noaa guoqing-noaa Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If succinct and simple is far enough, please don't make it lengthy and cumbersome.
For example, we use RDASApp, not RapidRefreshForecastSystemDataAssimilationSystemApplication;
we use ObsSfcPCorrected.h instead of ObservationSurfacePressureCorrected.h
we use atms_npp instead of AdvancedTechnologyMicrowaveSounder_NationalPolarorbitingPartnership

@@ -1,5 +1,5 @@
- obs space:
name: msonet
name: msonet_specificHumidity_188
Copy link
Collaborator

@guoqing-noaa guoqing-noaa Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, suggest using msonet_q_188

@@ -1,5 +1,5 @@
- obs space:
name: msonet
name: msonet_stationPressure_188
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, suggest using msonet_ps_188

@ShunLiu-NOAA
Copy link

@guoqing-noaa and @delippi Thank you for the discussion. I thought we could use a simpler naming convention. However, as Donnie mentioned we may have to follow JEDI's style at moment. I was told that JEDI is using the naming convention that can be expanded to include model variables beyond atmospheric model.

@guoqing-noaa
Copy link
Collaborator

guoqing-noaa commented Oct 11, 2024

@guoqing-noaa and @delippi Thank you for the discussion. I thought we could use a simpler naming convention. However, as Donnie mentioned we may have to follow JEDI's style at moment. I was told that JEDI is using the naming convention that can be expanded to include model variables beyond atmospheric model.

@ShunLiu-NOAA Thanks for input. That's a different story where JEDI tries to follow the CCPP naming convention for model variables (i.e. keywords). Here obs_space.name is just a mark for us to distinguish different obs_space in the merged yaml file and actually the most important part is the prepbufr type (e.g. 181, 120, 133, etc). If we can tell what the obs space is at a glance, that's far enough.

@delippi delippi marked this pull request as draft October 11, 2024 14:36
@delippi
Copy link
Collaborator Author

delippi commented Oct 11, 2024

Aside from the discussion above, I found some other things that I missed before. Changed PR to draft mode.

@hu5970
Copy link
Contributor

hu5970 commented Oct 11, 2024

H All, I think we better to follow general JEDI naming convention. It is long but it is clear for all of the users.

hu5970
hu5970 previously approved these changes Oct 11, 2024
@delippi
Copy link
Collaborator Author

delippi commented Oct 11, 2024

  1. msonet_wind_288.yaml
  2. msonet_windComponents_288.yaml
  3. msonet_windEastNorth_288.yaml
  4. msonet_windEastNorthward_288.yaml

@hu5970 @ShunLiu-NOAA @guoqing-noaa Thank you all for the input. If we do try to follow the JEDI naming convention, what is the preferred name to replace uv in msonet_uv_288.yaml? The actual names are windEastward and windNorthward, but I thought msonet_windEastward_windNorthward.yaml was too long. Maybe that is the preferred?

@HuiLiu-NOAA
Copy link
Collaborator

HuiLiu-NOAA commented Oct 11, 2024 via email

@HuiLiu-NOAA
Copy link
Collaborator

Is there a reason to assimilate only "windEastNorth_288"?

@HuiLiu-NOAA
Copy link
Collaborator

What I meant is that "msonet_windEastNorth_288" is NOT clear.

@delippi
Copy link
Collaborator Author

delippi commented Oct 11, 2024

Thanks @HuiLiu-NOAA. msonet_winds_288.yaml sounds good to me if that is what is typically used in JEDI to refer to both. I wasn't sure if that was already established.

Is there a reason to assimilate only "windEastNorth_288"?
What I meant is that "msonet_windEastNorth_288" is NOT clear.

I'm not sure that I follow what you mean. I'm not assimilating only wind from 288. I have each obtype from each observing network in its own yaml. So for mesonet, I will have:

msonet_airTemperature_188.yaml
msonet_stationPressure_188.yaml
msonet_specificHumidity_188.yaml
msonet_winds_288.yaml

Is that is what you mean?

I think this is the easiest approach because it keeps each yaml as simple as possible and facilitates easier testing, debugging, and reusing validated yamls for new obtypes (since a lot of each content can be carried over to other types). I also know that when I'm editing msonet_uv_288.yaml I'm only making edits to that obtype. There are no unintended updates to other types. Plus I think it really simplifies the yamls. For validation/testing purposes, it is probably easiest to debug a single obtype from a single platform at a time rather than trying to do it all at once.

@guoqing-noaa
Copy link
Collaborator

If you really want to follow the JEDI naming convention, please check the latest JEDI naming effort and make sure everything is consistent with them.

For example, check this PR:
https://github.com/JCSDA-internal/oops/issues/2737

@HuiLiu-NOAA
Copy link
Collaborator

HuiLiu-NOAA commented Oct 11, 2024

@delippi : the new names look fine to me.

PS: I guess the yamls for each of the obs types are mainly for current initial testing only. For future operational, it may be better to have a consolidated and complete single yaml file with all of the obs types under the same "obs space"as @guoqing mentioned.

@guoqing-noaa
Copy link
Collaborator

I think they will make the following change:

specific_humidity ->
water_vapor_mixing_ratio_wrt_moist_air OR
water_vapor_mixing_ratio_wrt_moist_air_and_condensed_water```

@guoqing-noaa
Copy link
Collaborator

For surface temperature observations, please change as follows:

surface_temperature -> air_temperature_at_2m

@guoqing-noaa
Copy link
Collaborator

for surface humidity observations, please change as follows:

water_vapor_mixing_ratio_wrt_moist_air_at_2m OR
water_vapor_mixing_ratio_wrt_moist_air_and_condensed_water_at_2m

@delippi
Copy link
Collaborator Author

delippi commented Oct 11, 2024

@delippi : the new names look fine to me.

PS: I guess the yamls for each of the obs types are mainly for current initial testing only. For future operational, it may be better to have a consolidated and complete single yaml file with all of the obs types under the same "obs space"as @guoqing mentioned.

@HuiLiu-NOAA, maybe. I'm pretty sure the long term solution is JCB. But even without JCB, I would still argue it is easiest without consolidating them how I currently have them. I still don't understand the reason why we would want to consolidate all these types into a single yaml. It makes them way too complicated and very prone to error.

@delippi
Copy link
Collaborator Author

delippi commented Oct 11, 2024

I think they will make the following change:

specific_humidity ->
water_vapor_mixing_ratio_wrt_moist_air OR
water_vapor_mixing_ratio_wrt_moist_air_and_condensed_water```

These aren't exactly what we are talking about. In the JEDI yamls we use the following specificHumidity, airTemperature, windEastward, windNorthward, stationPressure. The only question is what is used to refer to BOTH windEastward and windNorthward which @HuiLiu-NOAA mentioned it should just be winds.

@delippi
Copy link
Collaborator Author

delippi commented Oct 24, 2024

I think this PR is ready to go. I didn't add any of the short/long name stuff just because I wanted to see how that was exactly going to be used for first. It can always be something added later. The results from with these yamls are in the following #80 (comment)

Actually, I need to run all these tests again for MPAS and make sure the yamls work for that as well.

Okay, I've rerun these tests for MPAS-JEDI using these yamls. I fixed a couple of things... one of the things was stationPressure was using surface_geometric_height where that isn't available for MPAS-JEDI. The only yaml that fails is still specificHumidity which was expected because for some reason MPAS-JEDI can't compute saturation_specific_humidity in the ObsErrorFactorPressureCheck which is required for scaling the RH oberrors. I need to investigate that more. So all the yamls work except the specificHumidity (the ObsErrorFactoPressureCheck could just be turned off in MPAS-JEDI tests until I figure that one out). The only result I wasn't happy with was stationPressure increments in MPAS-JEDI. I will dig more into that with "Phase 3" testing which will compare the FV3-JEDI result against MPAS-JEDI. We don't yet have that matching case. It functionally works though.

Copy link
Contributor

@SamuelDegelia-NOAA SamuelDegelia-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for answering all of my questions @delippi!

@delippi
Copy link
Collaborator Author

delippi commented Oct 24, 2024

@SamuelDegelia-NOAA thanks for the quick review and great questions!

delippi and others added 2 commits October 30, 2024 11:21
@HuiLiu-NOAA
Copy link
Collaborator

@delippi : better to add a "maxvalue: 0.5" to the online regional domain check since the MPAS domain mask index can be up to 1-7 in the boundary zones.

@delippi
Copy link
Collaborator Author

delippi commented Oct 31, 2024

@delippi : better to add a "maxvalue: 0.5" to the online regional domain check since the MPAS domain mask index can be up to 1-7 in the boundary zones.

@HuiLiu-NOAA, I think I do have that. Here's an example of how I have it used:

         # Online domain check
         - filter: Bounds Check
           filter variables:
           - name: windEastward
           - name: windNorthward
           test variables:
           - name:  GeoVaLs/observable_domain_mask
           minvalue: 0.0
           maxvalue: 0.5

where:
- variable: ObsType/stationPressure
is_in: 188
threshold: 50.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the threshold should be 5 here, and 3.5 below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spanNOAA, good question. I actually really wasn't too sure about this. Let me just show some results to start some discussion. I think we need to look into it some more. Neither result looks particularly great, but the magnitudes of the increments looked a little closer at x10.

Result with threshold at x1 (5 & 3.5):
fv3jedi_vs_gsi_increment_airTemperature_msonet_stationPressure_188

Result with threshold at x10 (50 & 35):
fv3jedi_vs_gsi_increment_airTemperature_msonet_stationPressure_188

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is interesting. What’s behind this difference? Are there more observations passing the gross check and entering the minimization process? Have you looked into the observations assimilated by both systems in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is the cgross thresholds for the gross error checks. Increasing the cgross threshold values would be less restrictive and allow more obs through the gross check. I haven't looked at the observations themselves, but I have ob counts (nobs) in each figure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spanNOAA, I'm adding print statements in GSI:setupps.f90 and UFO:BackgroundCheck.cc to look into this more with a single ob test.

There are these lines in setupps.f90 which draw my attention

obserror = min(r10/max(ratio_errors*error,tiny_r_kind),huge_r_kind)
residual = abs(r10*ddiff)

My print in GSI:

281:    obserror = 0.541315926311159     
281:    obserrlm = 1.00000000000000     
281:    residual = 3.47478724557078     
281:    ratio = 3.47478724557078     
281:    ddiff = -0.347478724557078     
281:    error = 18.5563178156275     

My prints in UFO:

 0:   errorMultiplier[jobs] = 54.1881
 0:   hofx[jobs] - obs[jv][jobs] = 334.414

In GSI ratio > cgross is compared. In JEDI hofx[jobs] - obs[jv][jobs] > thr*errorMultiplier[jobs] is compared. where I have thr (threshold) set to x10 GSI's value of 5. Here is how I see the calculation going in each system:

In GSI:

obserror=0.5413
obserrlm=1
residual=10*0.3474=3.474
ratio=3.474/1

#Then check if ratio > cgross
3.474 > 5

In UFO:

thr=50 #I got 50 by taking cgross in convinfo and x10
errorMultiplier=54.18
hofx - obs = 334.414

#Then check if hofx - obs - bias) > thr * errorMultiplier
334.414/54.18 > 50

I'm thinking a couple of things. The thing I am leaning closer toward is that the errorMultiplier in UFO just needs to use the 100 Pa min error that I was trying to set in the yaml. That would then make the calculation such that I can compare against 5 instead of 50.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that errorMultiplier depends on thresholdWrtBGerror to determine whether to use hofxerr or obserr_[iv]. What I’m wondering is: what exactly is thresholdWrtBGerror, and how is it configured? In GSI, the ratio is calculated as abs(pob - pges) in hPa divided by obserrlm, which is set to errmin since obserr is smaller than errmin (which is 1). It appears that errorMultiplier is intended to function similarly to obserrlm in GSI, but its value seems to be equal to obserr in JEDI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hofxerr is only used if you specify it in the yaml. We are not. You have to specify it as:

threshold wrt background error: true

So you can use the background error instead of the obserror.

I'm thinking that the min ob error of 100 Pa isn't getting into the Background Check. When I remove the "Create temporary ObsErrorData" part, it does use the 100 Pa. The only thing I can think of is the Background Check tries using that TempObsErrorData/stationPressure values instead of ObsErrorData/stationPressure. My understanding is that it should use ObsErrorData unless otherwise specified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like adding defer to post: true after the Re-assign err ObsErrorData might work. I need to test more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spanNOAA, after trying a few things this was the best I could get it to be using the same thresholds as GSI. There could very well be something I'm missing too. The ob counts actually look a lot closer, but the spatial structure and magnitude of the increments still looks a little off. Perhaps it is slightly closer to GSI now than in the other two options I have shared in this thread? Other than now having the thresholds the same as GSI, I just added defer to post: true after each step starting with creating the TempObsErrorData, setting max/min oberrors for gross checks, the gross checks themselves, and re-assigning the TempObsErrorData back into ObsErrorData. I also put some debug print statements in the code to make sure that the min/max error values to be used in the gross check is being used. I was finding that without the defer to post: true it was just using the TempObsErrorData values in the gross error check--hence the reason I needed to have a larger cgross value.

fv3jedi_vs_gsi_increment_airTemperature_msonet_stationPressure_188

@ShunLiu-NOAA
Copy link

@delippi Is there anything else that needs to be changed with the PR? Or are you waiting for the reviewers' comments?

@delippi
Copy link
Collaborator Author

delippi commented Nov 13, 2024

@ShunLiu-NOAA, I think we should wait until I finalize some other PRs I have in ufo. There will be some updates needed to these yamls to invoke some of the options that I am implementing. Here are those ufo PRs I'm currently working on.

https://github.com/JCSDA-internal/ufo/pull/3512
https://github.com/JCSDA-internal/ufo/pull/3526

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants