Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOL Dependency Handling, RH Clipping, and Bufr Export Rounding #307

Merged
merged 6 commits into from
Oct 14, 2024

Conversation

ladsmund
Copy link
Contributor

@ladsmund ladsmund commented Oct 8, 2024

This PR introduces multiple fixes and improvements related to OOL dependency handling, relative humidity (RH) clipping, and rounding in Bufr export. Additionally, it includes code formatting and the addition of helper functions for wind speed calculations.

Changes:

  • OOL Dependency Handling:

    • Fixed variable name mismatch that prevented child variables from being set to NaN when parent variables were NaN.
    • Introduced a new DependencyGraph helper class to model and handle variable dependencies, including circular dependencies.
    • Updated clip_values to properly propagate NaN values to dependent variables.
    • Removed incorrect OOL relations in variables.csv and fixed formatting issues (e.g., removing double quotes and commas).
  • RH Corrected Clipping:

    • Fixed an issue where NaN inputs for RH correction were incorrectly assigned a value of 0.
    • Added unit tests to cover RH corrected clipping behavior.
  • BUFR Export Precision:

    • Adjusted airTemperature rounding to 1 digit precision, aligning it with the actual measurement accuracy, even though this violates the Bufr schema expecting 2 digits.
  • Helper Function for Wind Speed:

    • Created get_directional_wind_speed helper function to process wind speed calculations, with corresponding unit tests.
  • Code Formatting:

    • Applied black formatting to tx.py and test_station_config.py.

Unit Tests:

  • Added tests for RH corrected clipping.
  • Added unit tests for the DependencyGraph and clip_values handling.
  • Added unit tests for get_directional_wind_speed.

These changes improve the reliability of OOL handling, ensure correct data export, and enforce better code structure through unit tests and formatting.

@ladsmund ladsmund requested a review from PennyHow October 8, 2024 09:07
@ladsmund ladsmund changed the base branch from main to develop October 8, 2024 09:11
* Added unit test for processing wind speed
We only of 1 digits precision measurments from out instantaneous values. This violates the bufr schema which expects 2 digits precision.
OOL contains dependent variables (children)
* Added unit test functions for clip_values
* Add a helper class, DependencyGraph, for modelling the dependency relations to find the children closure and handle circular dependencies.
* Updated clip_values to set children variables to nan whenever a parent is nan.
* Fixed a bug in clipValues introduced in commit d1a6ac8, where child variables were not set to nan due to an erroneous change in variable names.

Fixed variables.csv
* Removed incorrect OOL relations from variables.csv. wspd_x_u, wspd_y_u... had wdir and wspd as children. The relation is the opposite.
* Removed double quotes around ool
* Removed comma between ool entries in wspeed
* The rh corrected clipping were asigning 0 to inpu nan
* Added unit test specifically for rh corr
Copy link
Member

@BaptisteVandecrux BaptisteVandecrux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rehabilitating the OOL dependencies is a correct thing to do and it looks great!

Since it is equivalent to reactivating a lot of filters at once, every variable at every site should be investigated before and after this update to make sure good data is not being removed due to an wrong dependency or limit value.

Cases where uncorrected values are used in the corrected variables when the correction si not possible, should be identified.
For instance, I am suspecting that z_pt_cor contains the same values as z_pt when the correction is not possible, same thing for dsr_cor and usr_cor.

In those cases, enforcing the OOL dependencies will switch those _cor variables from "corrected values when possible" to "only corrected values", and potentially impact downstream calculations that are OK with "corrected values when possible".

Last, the "lo" "hi" and "OOL" names in the columns of variable.csv do not make sense. Might be the opportunity to change the to lower_limit, upper_limit, dependent_variables or alike.

I don't want to postpone any longer making the updated pypromice operational, so it would be totally fine if the dependency handling was left for a future PR and only the necessary updates went to develop and then main.

src/pypromice/process/value_clipping.py Outdated Show resolved Hide resolved
src/pypromice/process/value_clipping.py Show resolved Hide resolved
src/pypromice/resources/variables.csv Outdated Show resolved Hide resolved
@BaptisteVandecrux
Copy link
Member

And before it goes operational, it would be great to add the compression of the netcdf files.
Maybe, as a start, the lossless compression suggested here:
#301
it will reduce the footprint on the fileshare and the workload on thredds caused by the download of large files.
It can always be adjusted to a variable-specific handling later on.

@ladsmund
Copy link
Contributor Author

ladsmund commented Oct 8, 2024

The following table shows the current dependency closures between the variables

field OOL parents children
time nan
rec nan
p_u z_pt z_pt_cor dshf_u dlhf_u qh_u dlhf_u, z_pt_cor, qh_u, dshf_u, z_pt
p_l dshf_l dlhf_l qh_l dlhf_l, dshf_l, qh_l
t_u rh_u_cor cc dsr_cor usr_cor z_boom z_stake dshf_u dlhf_u qh_u dlhf_u, z_stake, dsr_cor, usr_cor, qh_u, dshf_u, rh_u_cor, z_boom, cc
t_l rh_l_cor z_boom_l dshf_l dlhf_l qh_l qh_l, rh_l_cor, z_boom_l, dshf_l, dlhf_l
rh_u rh_u_cor dlhf_u, qh_u, dshf_u, rh_u_cor
rh_u_cor dshf_u dlhf_u qh_u t_u, rh_u qh_u, dshf_u, dlhf_u
qh_u nan wspd_u, t_surf, rh_u, t_rad, ulr, p_u, rh_u_cor, dlr, t_u, z_boom_u
rh_l rh_l_cor qh_l, rh_l_cor, dshf_l, dlhf_l
rh_l_cor dshf_l dlhf_l qh_l t_l, rh_l dlhf_l, qh_l, dshf_l
qh_l nan rh_l, rh_l_cor, z_boom_l, t_l, p_l, wspd_l
wspd_u wdir_u wspd_x_u wspd_y_u dshf_u dlhf_u qh_u precip_u wdir_u, dlhf_u, precip_u, qh_u, dshf_u, wspd_y_u, precip_u_rate, wspd_x_u, precip_u_cor
wspd_l wdir_l wspd_x_l wspd_y_l dshf_l dlhf_l qh_l precip_l dshf_l, wspd_y_l, precip_l_cor, wdir_l, precip_l, dlhf_l, wspd_x_l, precip_l_rate, qh_l
wdir_u wspd_x_u wspd_y_u wspd_u wspd_y_u, wspd_x_u
wdir_std_u nan
wdir_l wspd_x_l wspd_y_l wspd_l wspd_y_l, wspd_x_l
wspd_x_u nan wspd_u, wdir_u
wspd_y_u nan wdir_u, wspd_u
wspd_x_l nan wspd_l, wdir_l
wspd_y_l nan wspd_l, wdir_l
dsr albedo dsr_cor usr_cor albedo, usr_cor, dsr_cor
dsr_cor nan dsr, tilt_y, t_rad, tilt_x, dlr, t_u, usr
usr albedo dsr_cor usr_cor albedo, usr_cor, dsr_cor
usr_cor nan dsr, tilt_y, t_u, t_rad, tilt_x, dlr, usr
albedo nan dsr, tilt_y, t_rad, tilt_x, dlr, usr
dlr albedo dsr_cor usr_cor cc t_surf t_rad albedo, t_surf, dlhf_u, dsr_cor, qh_u, dshf_u, usr_cor, cc
ulr t_surf t_rad t_surf, dlhf_u, qh_u, dshf_u
cc nan dlr, t_u, t_rad
t_surf dshf_u dlhf_u qh_u dlr, ulr, t_rad qh_u, dshf_u, dlhf_u
dlhf_u nan wspd_u, t_surf, rh_u, t_rad, ulr, p_u, rh_u_cor, dlr, t_u, z_boom_u
dlhf_l nan rh_l, rh_l_cor, z_boom_l, t_l, p_l, wspd_l
dshf_u nan wspd_u, t_surf, rh_u, t_rad, ulr, p_u, rh_u_cor, dlr, t_u, z_boom_u
dshf_l nan rh_l, rh_l_cor, z_boom_l, t_l, p_l, wspd_l
z_boom_u dshf_u dlhf_u qh_u qh_u, dshf_u, dlhf_u
z_boom_q_u nan
z_boom_l dshf_l dlhf_l qh_l t_l dlhf_l, dshf_l, qh_l
z_boom_q_l nan
z_stake nan t_u
z_stake_q nan
z_pt z_pt_cor p_u z_pt_cor
z_pt_cor nan p_u, z_pt
z_surf_combined nan
z_ice_surf nan
snow_height nan
precip_u precip_u_cor precip_u_rate freq_vw, wspd_u precip_u_rate, precip_u_cor
precip_u_cor nan wspd_u, freq_vw, precip_u
precip_u_rate nan wspd_u, freq_vw, precip_u
precip_l precip_l_cor precip_l_rate wspd_l precip_l_rate, precip_l_cor
precip_l_cor nan wspd_l, precip_l
precip_l_rate nan wspd_l, precip_l
t_i_1 nan
t_i_2 nan
t_i_3 nan
t_i_4 nan
t_i_5 nan
t_i_6 nan
t_i_7 nan
t_i_8 nan
t_i_9 nan
t_i_10 nan
t_i_11 nan
d_t_i_1 nan
d_t_i_2 nan
d_t_i_3 nan
d_t_i_4 nan
d_t_i_5 nan
d_t_i_6 nan
d_t_i_7 nan
d_t_i_8 nan
d_t_i_9 nan
d_t_i_10 nan
d_t_i_11 nan
t_i_10m nan
tilt_x dsr_cor usr_cor albedo albedo, usr_cor, dsr_cor
tilt_y dsr_cor usr_cor albedo albedo, usr_cor, dsr_cor
rot nan
gps_lat nan
gps_lon nan
gps_alt nan
gps_time nan
gps_geoid nan
gps_geounit nan
gps_hdop nan
gps_numsat nan
gps_q nan
lat nan
lon nan
alt nan
batt_v nan
batt_v_ini nan
batt_v_ss nan
fan_dc_u nan
fan_dc_l nan
freq_vw precip_u precip_u_rate, precip_u_cor, precip_u
t_log nan
t_rad t_surf dlr ulr albedo, t_surf, dlhf_u, dsr_cor, qh_u, dshf_u, ulr, dlr, usr_cor, cc
p_i nan
t_i nan
rh_i rh_i_cor rh_i_cor
rh_i_cor nan rh_i
wspd_i wdir_i wspd_x_i wspd_y_i wspd_x_i, wspd_y_i, wdir_i
wdir_i wspd_x_i wspd_y_i wspd_i wspd_x_i, wspd_y_i
wspd_x_i nan wspd_i, wdir_i
wspd_y_i nan wspd_i, wdir_i

@ladsmund
Copy link
Contributor Author

ladsmund commented Oct 9, 2024

@BaptisteVandecrux and I just talked about removing all the OOL relations and reintroduce them as we investigate the consequences. The reason why I initially looked into this was because the was an inconsistency in the wind direction and wind speed in the BUFR files.

The logic for handling `rh` values has been simplified to align with the general clipping rules. This change removes the exception cases previously applied to `rh`, ensuring consistent value clipping across all variables.
@ladsmund ladsmund merged commit a698e68 into develop Oct 14, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants