Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution fixes #788

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
27 changes: 22 additions & 5 deletions assimilation_code/modules/assimilation/normal_distribution_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ subroutine test_normal
1e-5_r8, 1e-4_r8, 1e-3_r8, 1e-2_r8, 1e-1_r8, 1e-0_r8]

! Compare to matlab
write(*, *) 'Absolute value of differences should be less than 1e-15'
! Absolute value of differences should be less than 1e-15
do i = 1, 7
cdf_diff(i) = normal_cdf(mx(i), mmean(i), msd(i)) - mcdf(i)
Expand Down Expand Up @@ -157,6 +158,9 @@ function inv_weighted_normal_cdf(alpha, mean, sd, q) result(x)
real(r8) :: normalized_q

! VARIABLES THROUGHOUT NEED TO SWITCH TO DIGITS_12
! The comment above is not consistent with the performance of these routines
! as validated by test_normal. There is no evidence that this is still
! required.
Comment on lines +161 to +163
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following this comment.
"'digits12' is reserved for real variables that MUST retain 64 bits of precision", when dart is built with r8=r4

Are you running test_normal with r8=r4? Should the distribution tests all be run with r8=r4

! real precision:
! TO RUN WITH REDUCED PRECISION REALS (and use correspondingly less memory)
! comment OUT the r8 definition below and use the second one:
integer, parameter :: r4 = SELECTED_REAL_KIND(6,30)
integer, parameter :: r8 = SELECTED_REAL_KIND(12) ! 8 byte reals
!integer, parameter :: r8 = r4 ! alias r8 to r4
! complex precision:
integer, parameter :: c4 = SELECTED_REAL_KIND(6,30)
integer, parameter :: c8 = SELECTED_REAL_KIND(12)
! 'digits12' is reserved for real variables that MUST retain 64 bits of
! precision. DO NOT CHANGE '12' to a smaller number. BAD BAD BAD things happen.
! This is a small subset of the variables. Changing this will ruin the ability
! to distinguish timesteps that are a few seconds apart, for instance.
integer, parameter :: digits12 = SELECTED_REAL_KIND(12)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[hkershaw:work](distribution-fixes) > ./test_normal_dist

 --------------------------------------
 Starting ... at YYYY MM DD HH MM SS = 
                 2025  1  6 10 43 42
 --------------------------------------

  set_nml_output Echo NML values to log file only
 Absolute value of differences should be less than 1e-15
 Matlab Comparison Tests: PASS    0.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000001E-10 for quantiles <   0.899999976    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000001E-10 for quantiles <   0.990000010    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000001E-10 for quantiles <   0.999000013    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000001E-10 for quantiles <   0.999899983    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000001E-10 for quantiles <   0.999989986    
 FAIL: Max inversion diff    17.0500488      > bound    9.99999972E-10 for quantiles <   0.999998987    
 FAIL: Max inversion diff    17.0500488      > bound    9.99999994E-09 for quantiles <   0.999999881    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000001E-07 for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000001E-07 for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    9.99999997E-07 for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    9.99999975E-06 for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    9.99999975E-05 for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000005E-03 for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    9.99999978E-03 for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound   0.100000001     for quantiles <    1.00000000    
 FAIL: Max inversion diff    17.0500488      > bound    1.00000000     for quantiles <    1.00000000    

 --------------------------------------
 Finished ... at YYYY MM DD HH MM SS = 
                 2025  1  6 10 43 43
 --------------------------------------
[hkershaw:work](distribution-fixes) > git diff          
diff --git a/assimilation_code/modules/utilities/types_mod.f90 b/assimilation_code/modules/utilities/types_mod.f90
index 8e4b376ee..ecf791cf7 100644
--- a/assimilation_code/modules/utilities/types_mod.f90
+++ b/assimilation_code/modules/utilities/types_mod.f90
@@ -81,8 +81,8 @@ integer, parameter :: i8 = SELECTED_INT_KIND(13)
 ! TO RUN WITH REDUCED PRECISION REALS (and use correspondingly less memory)
 ! comment OUT the r8 definition below and use the second one:
 integer, parameter :: r4 = SELECTED_REAL_KIND(6,30)
-integer, parameter :: r8 = SELECTED_REAL_KIND(12)   ! 8 byte reals
-!integer, parameter :: r8 = r4                      ! alias r8 to r4
+!integer, parameter :: r8 = SELECTED_REAL_KIND(12)   ! 8 byte reals
+integer, parameter :: r8 = r4                      ! alias r8 to r4


! Can search in a standard normal, then multiply by sd at end and add mean
! Divide q by alpha to get the right place for weighted normal
Expand Down Expand Up @@ -196,8 +200,6 @@ function approx_inv_normal_cdf(quantile_in) result(x)
real(r8), intent(in) :: quantile_in

! This is used to get a good first guess for the search in inv_std_normal_cdf
! The params argument is not needed here but is required for consistency &
! with other distributions

! normal inverse
! translate from http://home.online.no/~pjacklam/notes/invnorm
Expand Down Expand Up @@ -285,14 +287,17 @@ function inv_std_normal_cdf(quantile) result(x)
! Given a quantile q, finds the value of x for which the standard normal cdf
! has approximately this quantile

! Where should the stupid p type come from
type(distribution_params_type) :: p
real(r8) :: mean, sd

! Set the mean and sd to 0 and 1 for standard normal
mean = 0.0_r8; sd = 1.0_r8
p%params(1) = mean; p%params(2) = sd

! Normal is unbounded
p%bounded_below = .false.; p%bounded_above = .false.
p%lower_bound = missing_r8; p%upper_bound = missing_r8

hkershaw-brown marked this conversation as resolved.
Show resolved Hide resolved
x = inv_std_normal_cdf_params(quantile, p)

end function inv_std_normal_cdf
Expand All @@ -301,6 +306,10 @@ end function inv_std_normal_cdf

function inv_cdf(quantile_in, cdf, first_guess, p) result(x)

! This routine is used for many distributions, not just the normal distribution
! Could be moved to its own module for clarity or combined with Ian Groom's
! rootfinding module as an option.

interface
function cdf(x, p)
use types_mod, only : r8
Expand Down Expand Up @@ -338,6 +347,11 @@ function first_guess(quantile, p)
! Limit on number of times to halve the increment; No deep thought.
integer, parameter :: max_half_iterations = 25

! Largest delta for computing centered difference derivative
! Changing this can affect accuracy for specific applications like Ian Grooms KDE
! Changing to 1e-9 allows all of the KDE tests to PASS
real(r8), parameter :: max_delta = 1e-8_r8

real(r8) :: quantile
real(r8) :: reltol, dq_dx, delta
real(r8) :: x_guess, q_guess, x_new, q_new, del_x, del_q, del_q_old, q_old
Expand Down Expand Up @@ -387,7 +401,7 @@ function first_guess(quantile, p)
! Analytically, the PDF is derivative of CDF but this can be numerically inaccurate for extreme values
! Use numerical derivatives of the CDF to get more accurate inversion
! These values for the delta for the approximation work with Gfortran
delta = max(1e-8_r8, 1e-8_r8 * abs(x_guess))
delta = max(max_delta, max_delta * abs(x_guess))
dq_dx = (cdf(x_guess + delta, p) - cdf(x_guess - delta, p)) / (2.0_r8 * delta)
! Derivative of 0 means we're not going anywhere else
if(dq_dx <= 0.0_r8) then
Expand Down Expand Up @@ -482,14 +496,17 @@ subroutine set_normal_params_from_ens(ens, num, p)

integer, intent(in) :: num
real(r8), intent(in) :: ens(num)
type(distribution_params_type), intent(inout) :: p
type(distribution_params_type), intent(out) :: p

! Set up the description of the normal distribution defined by the ensemble
p%distribution_type = NORMAL_DISTRIBUTION

! The two meaningful params are the mean and standard deviation
call normal_mean_sd(ens, num, p%params(1), p%params(2))

! Normal is unbounded
p%bounded_below = .false.; p%bounded_above = .false.
p%lower_bound = missing_r8; p%upper_bound = missing_r8

end subroutine set_normal_params_from_ens

Expand Down