-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaNs observed in tridiagonal eigensolver after "bulkerification" #953
Comments
With @RMeli we investigated a bit and here I'm going to summarise what we found out. We are not fully sure about it, but it might be relevant for fixing this issue.
Apparently, the precondition about sorted eigenvalues is not always verified. Indeed, we added checks about d0 sorting at various step of the auto mergeSubProblems() {
// d0 is in "initial" state with (potentially) 4 groups
// non-deflated | deflated | non-deflated | deflated
// each pair coming from one of the two subproblems
assembleZVec(rho, e0); // -> z0 = f(e0)
// i1 is created and, from it, i2 is created for sorting all eigenvalues of both problems
CHECK_SORTED(d0, i2); // d0-post-sort [0,n)
BACKUP(i2);
BACKUP(d0);
applyDeflation(tol, rho, d0, z0, ...);
k = stablePartition(...); // -> i3
CHECK_BACKUP(i2); // CHANGE i2
CHECK_BACKUP(d0); // CHANGE d0
CHECK_SORTED(d0, i2); // d0-post-stable-full [0,n)
CHECK_SORTED(d0, i3); // d0-post-stable-nd-d|k [0,k)|[k, n)
// they are separated in non-deflated and deflated
// but each group should be sorted
sort(d0, i3);
CHECK_SORTED(d0); // d0-sorted|k [0,k)|[k, n)
} What we found out is that at some point, This has been tested with the input provided by CP2K (i.e. 5120 case), with different blocksizes (128, 256). This sorting problem appears both on pre-bulk (b44340f PR888 + patch for HDF5) and on master, but:
I look forward to confirmation that expectations about sorting are correct as we stated in this report (@rasolca can you have a look, please?). If yes, from a naive point of view it looks like a numerical problem somewhere. I'm not sure it will solve the DetailsIn the output, you will find also some internal output from tol = 1.95696e-05
c = -1 s = -0.000785526
i1s = 257 i2s = 0 @ 1|2
d1 = -0.949937 d2 = -0.932222
d1 = -0.949937 d2 = -0.932222 Now, let's start with an extent of the output of the run with 256 blocksize. I took the very first part of the output and split it in parts, trying to make it easier to read. [0]
tol = 1.95696e-05
c = -1 s = -0.000785526
i1s = 257 i2s = 0 @ 1|2
d1 = -0.949937 d2 = -0.932222
d1 = -0.949937 d2 = -0.932222
tol = 1.95696e-05
c = 0.819746 s = -0.572728
i1s = 3 i2s = 258 @ 5|6
d1 = -0.928868 d2 = -0.928827
d1 = -0.928854 d2 = -0.92884
tol = 1.95696e-05
c = -0.772187 s = -0.635396
i1s = 340 i2s = 74 @ 158|159
d1 = 1.71013 d2 = 1.71014
d1 = 1.71014 d2 = 1.71014
tol = 1.95696e-05
c = -0.00174197 s = 0.999999
i1s = 185 i2s = 186 @ 392|393
d1 = 12.7213 d2 = 12.7305
d1 = 12.7305 d2 = 12.7213
CHANGE d0 3 -0.928854 -0.928868
CHANGE d0 74 1.71014 1.71014
CHANGE d0 185 12.7305 12.7213
CHANGE d0 186 12.7213 12.7305
CHANGE d0 258 -0.92884 -0.928827
CHANGE d0 340 1.71014 1.71013
d0-post-stable-full NOT SORTED 0:512 1
393, In the previous part it can be seen that there is an unexpected out of order value in position 393 (they should be fully sorted), and actually index 393 has been changed by tol = 1.95696e-05
c = -0.00174197 s = 0.999999
i1s = 185 i2s = 186 @ 392|393
d1 = 12.7213 d2 = 12.7305
d1 = 12.7305 d2 = 12.7213 It looks quite a big change, but I don't have deep knowledge of this algorithm. Anyway, this change does not really end up causing any issue once <continue from above>
tol = 6.6807e-06
c = 1 s = 1.74006e-05
i1s = 256 i2s = 1 @ 0|2
d1 = -1.15486 d2 = -0.921493
d1 = -1.15486 d2 = -0.921493
tol = 6.6807e-06
c = 0.801806 s = -0.597584
i1s = 193 i2s = 450 @ 387|388
d1 = 6.2362 d2 = 6.23621
d1 = 6.2362 d2 = 6.2362
CHANGE d0 193 6.2362 6.2362
CHANGE d0 450 6.2362 6.23621 At this step in the output above nothing happens, there are changes, but the sorting is not altered. <continue from above>
tol = 6.56497e-06
c = -1 s = 1.51461e-05
i1s = 0 i2s = 3 @ 0|3
d1 = -0.966395 d2 = -0.911016
d1 = -0.966395 d2 = -0.911016
tol = 6.56497e-06
c = 1 s = -1.08164e-05
i1s = 0 i2s = 5 @ 0|5
d1 = -0.966395 d2 = -0.909847
d1 = -0.966395 d2 = -0.909847
tol = 6.56497e-06
c = 1 s = -9.92931e-05
i1s = 0 i2s = 6 @ 0|6
d1 = -0.966395 d2 = -0.909007
d1 = -0.966395 d2 = -0.909007
tol = 6.56497e-06
c = 1 s = 6.15887e-05
i1s = 0 i2s = 7 @ 0|7
d1 = -0.966395 d2 = -0.908289
d1 = -0.966395 d2 = -0.908289
tol = 6.56497e-06
c = 7.6434e-06 s = 1
i1s = 17 i2s = 257 @ 18|21
d1 = -0.901174 d2 = -0.634082
d1 = -0.634082 d2 = -0.901174
CHANGE d0 17 -0.634082 -0.901174
CHANGE d0 257 -0.901174 -0.634082
d0-post-stable-full NOT SORTED 0:512 2
19, 21,
d0-post-stable-nd-d|486 NOT SORTED 486:512 1
495,
d0-sorted|486 NOT SORTED 486:512 1
495, We can see that:
PRE-BULK vs MASTER (with bulk)This sorting problem exists in both versions, but apparently, just for the bulk it manifests with As can be seen from the diff output, at some point deflation has d0-post-stable-full NOT SORTED 0:2560 22
92, 149, 288, 495, 497, 542, 553, 657, 661, 759, 763, 803, 876, 1150, 1380, 1517, 1731, 1805, 1909, 1976, 2310, 2316,
d0-post-stable-nd-d|2044 NOT SORTED 0:2044 1
1791,
d0-post-stable-nd-d|2044 NOT SORTED 2044:2560 1
2136,
d0-sorted|2044 NOT SORTED 0:2044 1
1791,
d0-sorted|2044 NOT SORTED 2044:2560 1
2136, and in particular d0-sorted|2044 NOT SORTED 0:2044 1
1791, says that At this point I'm still not sure why pre-bulk and post-bulk behaves differently in terms of results, but for sure they address the same sorting problem (which looks like a problem from my naive knowledge of the algorithm). |
As reported in #960, it was a problem existing in our tridiagonal solver since the very beginning. @rasolca confirmed what we @RMeli and I reported in the very last message, and proposed and implemented a fix. Just for the records, a note about the |
See #960 (comment) for more details. That PR might have fixed this issue, but we are still not sure. For sure, that PR fixes a problem in tridiagonal solver. |
The discrepancy from the results reported here and the ones reported in #960 are down to - EPS_DEFAULT 1.0E-12
+ EPS_DEFAULT 1.0E-5 + &FM
+ FORCE_BLOCK_SIZE .TRUE.
+ NCOL_BLOCKS 256
+ NROW_BLOCKS 256
+ &END With DLA-Future
ScaLAPACK
|
DLA-Future's tridiagonal eigensolver returns
NaN
s for some matrices obtained from CP2K. This appears to happen after the "bulkerification" of PRs #860 and #904.Investigation
Observations in CP2K:
QS/H2O-32
andQS/H2O-64
results (total energy) appear to match ScaLAPACKQS/H2O-128
nan
s are produced and CP2K crashes, for some block sizes256
and512
crash1024
works (same total energy as ScaLAPACK)nan
s are produced appears to depend on the block sizeDLA-Future investigation:
numpy.linalg.eigh
does not producenan
snumpy.linalg.eigh
andscipy.linalg.eigh_tridiagonal
do not producenan
sminiapp_tridiag_solver
producesnan
spika
threadnan
s start to appear after TridiagSolver (distributed): "bulkerify" rank1 problem solution #904nan
s start to appear after TridiagSolver (local): "bulkerify" rank1 problem solution #860std::sqrt(-w[i])
immediately jumped to the eyes as a possible cause ofnan
s (ifw[i] > 0
)w[i] > 0
)Further investigation is ongoing, but as discussed with @albestro and @msimberg it's good to keep track of this here.
Details
CP2K
H2O-32
ScaLAPACK:
DLA-Future:
Inspecting the eigenvalues shows that some are not ordered in ascending order (cyclic shift):
H2O-64
ScaLAPACK:
DLA-Future (time missing from my notes, not relevant here):
The behavior is different depending on the block size.
Block size of
64
:Differences are small. Calculations in DLA-Future are performed in double precision, but the output is in single precision.
Block size of
128
:Same cyclic shift observed for
H2O-32
.H2O-128
ScaLAPACK:
DLA-Future:
Increasing the block size from
256
to1024
appears to work:DLA-Future
Tridiagonal eigensolver miniapp
Experiments with block sizes
Block size of
128
:Block size of
256
:Block size of
1024
:Debugging
nan
values stem fromstd::squrt(-w[i])
insolveRank1Problem
w[i]
are positive--pika:threads=1
DLA-Future/include/dlaf/eigensolver/tridiag_solver/merge.h
Line 596 in 26cf7e9
With
k = 2146
:w[50] = 0.000914332923
w[51] = 4.66800714e-07
w[165] = 5.93085525e-10
w[166] = 0.00505260751
With
k = 2044
:w[1790] = 1.14765066e-10
With
k = 1
w[0] = 1
The text was updated successfully, but these errors were encountered: