Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LAPACK test failure with 3.28 on aarch64 #5050

Closed
opoplawski opened this issue Jan 5, 2025 · 26 comments · Fixed by #5057
Closed

LAPACK test failure with 3.28 on aarch64 #5050

opoplawski opened this issue Jan 5, 2025 · 26 comments · Fixed by #5057

Comments

@opoplawski
Copy link

With the update from 0.3.26 to 0.3.28 in Fedora we're starting to see the following lapack test failure on aarch64 only:

corrupted size vs. prev_size
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0  0xffffb0f7d157 in ???
#1  0xffffb0f7c03f in ???
#2  0xffffb112883f in ???
#3  0xffffb0caef80 in ???
#4  0xffffb0c5b3bf in ???
#5  0xffffb0c45a57 in ???
#6  0xffffb0ca1863 in ???
#7  0xffffb0cba487 in ???
#8  0xffffb0cbaf27 in ???
#9  0xffffb0cbb17f in ???
#10  0xffffb0cbc7eb in ???
#11  0xffffb0cbc99b in ???
#12  0xffffb0cbf8bf in ???
#13  0x479d03 in csyl01_
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/csyl01.f:308
#14  0x40f373 in cchkec_
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/cchkec.f:129
#15  0x409f0b in cchkee
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/cchkee.F:1271
#16  0x402327 in main
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/cchkee.F:1036
Testing COMPLEX           Eigen-Condition-EIG/xeigtstc < cec.in > cec.out ---- TESTING /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/xeigtstc... FAILED(/builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/xeigtstc < cec.in > cec.out did not work) !
@opoplawski
Copy link
Author

Is there a way to get the lapack-test make target to actually fail with this test failure?

@martin-frbg
Copy link
Collaborator

On which flavour of aarch64 / which build TARGET do you see this ? (At first glance it looks like a malloc error, does not look familiar)
Not sure how to get the build to stop there, if make does not error out on its own after that abort

@opoplawski
Copy link
Author

This is the make command:

make -C openmp TARGET=ARMV8 DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 USE_THREAD=1 USE_OPENMP=1 FC=gfortran CC=gcc 'COMMON_OPT=-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -fopenmp -pthread' 'FCOMMON_OPT=-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -fopenmp -pthread -frecursive' NUM_THREADS=128 LIBPREFIX=libopenblaso INTERFACE64=0 CPP_THREAD_SAFETY_TEST=1:1

What is doubly strange is that the error has only been reproduced on the Fedora koji builders, not on other aarch64 test machines.

@martin-frbg
Copy link
Collaborator

Is koji running on actual hardware, or virtual machines (that may be resource-limited or not reporting hardware properties like cache sizes correctly) ? The COMPLEX/COMPLEX16 parts of the LAPACK testsuite are a bit more memory hungry than the other tests.

@martin-frbg
Copy link
Collaborator

Also, would it be possible to add OPENBLAS_VERBOSE=2 to the environment, to see which cpu gets autodetected during the test phase of this DYNAMIC_ARCH build ? (Though I think it is at least as likely that the out-of-bounds access is a bug in the test code OpenBLAS imports from Reference-LAPACK, maybe it is the glibc version in your koji setup that catches this. ISTR there were a few test code fixes in Reference-LAPACK that I copied for 0.3.29 or still need to copy before its release)

@opoplawski
Copy link
Author

Just a note that 0.3.27 seems to be okay. koji is using VMs, should be fairly decent memory - I'll get a number at some point and try with the verbose.

@martin-frbg
Copy link
Collaborator

Hmm. The only remotely relevant change in 0.3.28 that I can identify is the addition of vector registers to the clobber list of the cdot/zdot assembly kernel used primarily on AppleM, ThunderX2 and Graviton2 - if anything this should have improved that kernel, certainly not led to memory overruns. (CSYL01 tests CTRSYL which uses very few external functions, most notably CDOT).

@sharkcz
Copy link
Contributor

sharkcz commented Jan 7, 2025

FWIW I have reproduced the issue on our bare-metal Ampere MtSnow system (80 cpus) doing a rawhide mock build for flexiblas.

@martin-frbg
Copy link
Collaborator

Thanks - that would be NeoverseN1, which (at least in theory) should be rather well tested, though perhaps not with all your additional compiler options. I'll see if I can reproduce this in the GCC Compile Farm

@Enchufa2
Copy link

Enchufa2 commented Jan 7, 2025

In valgrind @opoplawski got:

==44481== Invalid read of size 4
==44481==    at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)

I cannot find the reference to cgemm_beta_NEOVERSEN1, but maybe it rings a bell for you.

@martin-frbg
Copy link
Collaborator

Thanks - my valgrind run has not reported anything interesting so far.
There is no individual CGEMM_BETA for the N1 (in KERNEL.NEOVERSEN1) so it should be using the generic C implementation as defined in the default KERNEL file for arm64. This is kernel/generic/zgemm_beta.c - a fairly trivial unrolled loop that has been unchanged for about 20 years.

@Enchufa2
Copy link

Enchufa2 commented Jan 7, 2025

Could #4595 and/or #4626 have something to do with this?

@martin-frbg
Copy link
Collaborator

Seems extremely unlikely to me, if anything the older nrm2 kernels that these reverted to have had much more exposure. I also don't think we have NRM2 anywhere on the call graph of CSYL01 testing CTRSYL (the hit in cgemm_beta makes it look as if the fault is coming from the test code rather than the function under testing, but maybe that was a false positive from valgrind)

@martin-frbg
Copy link
Collaborator

I cannot reproduce the error with gcc 14.2 and all your build options except the special spec files (cfarm425 runs Debian, my other option would be "Rocky 9.5" but it looks like I'd need to build my own gcc there first to get anything recent). As an unwanted side effect, the installed valgrind 3.2 trips over something in the binary when I use your build options.

@sharkcz
Copy link
Contributor

sharkcz commented Jan 7, 2025

FYI newer gcc version are available via Developer Toolset (not sure if it's still called this way) for RHEL-based distros

@martin-frbg
Copy link
Collaborator

can I install them as a non-privileged user ?
btw I just happened to get a lapack-test crash in zsyl01 rather than csyl01, an invalid pointer while trying to deallocate the "C" array after the test (line 308), no useful information beyond that in the backtrace. will see if this is reproducible

@sharkcz
Copy link
Contributor

sharkcz commented Jan 7, 2025

can I install them as a non-privileged user ?

they are (usually) in a different repo, but still as rpms, so admin privilege is needed

@sharkcz
Copy link
Contributor

sharkcz commented Jan 7, 2025

I can run more checks if I get some details how as I am not really familiar with openblas (or flexiblas) internals and buildsystems ...

@Enchufa2
Copy link

Enchufa2 commented Jan 7, 2025

12 seems to be the magical number here. I've reproduced the crash by setting OMP_NUM_THREADS=12 or more, but not with less. This is what valgrind tells me (but note that the test doesn't crash with valgrind):

cat test/lapack-3.12.0/cec.in | valgrind --leak-check=full --show-leak-kinds=all build/test/lapack-3.12.0/EIG/xeigtstc
==2309477== Memcheck, a memory error detector
==2309477== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==2309477== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==2309477== Command: build/test/lapack-3.12.0/EIG/xeigtstc
==2309477== 
 Tests of the Nonsymmetric eigenproblem condition estimation routines
 CTRSYL, CTREXC, CTRSNA, CTRSEN

 Relative machine precision (EPS) =     0.119209E-06
 Safe minimum (SFMIN)             =     0.117549E-37

 Routines pass computational tests if test ratio is less than   20.00


 CEC routines passed the tests of the error exits ( 41 tests done)

==2309477== Thread 10:
==2309477== Invalid read of size 4
==2309477==    at 0x6102DC4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:105)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid read of size 4
==2309477==    at 0x6102DCC: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:107)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:126)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF8: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:128)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid read of size 4
==2309477==    at 0x6102DC4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:105)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid read of size 4
==2309477==    at 0x6102DCC: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:107)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:126)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF8: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:128)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 8
==2309477==    at 0x6102D54: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:69)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 8
==2309477==    at 0x6102D54: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:69)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 

 All tests for CEC routines passed the threshold (   5966 tests run)


 End of tests
 Total time used =     10335.21 seconds

==2309477== 
==2309477== HEAP SUMMARY:
==2309477==     in use at exit: 172,577 bytes in 51 blocks
==2309477==   total heap usage: 4,809 allocs, 4,758 frees, 2,096,297,782 bytes allocated
==2309477== 
==2309477== Thread 1:
==2309477== 8 bytes in 1 blocks are still reachable in loss record 1 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x7022857: gomp_malloc (alloc.c:38)
==2309477==    by 0x703753B: gomp_init_num_threads (proc.c:91)
==2309477==    by 0x70212C3: initialize_env (env.c:2218)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x400137B: _dl_catch_exception (dl-catch.c:215)
==2309477==    by 0x400A0FB: dl_open_worker (dl-open.c:804)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477== 
==2309477== 16 bytes in 1 blocks are still reachable in loss record 2 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4FA58E3: strdup (strdup.c:42)
==2309477==    by 0x4967623: flexiblas_init (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/lib/libflexiblas.so.3.4)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x401AB77: (below main) (dl-start.S:46)
==2309477== 
==2309477== 32 bytes in 1 blocks are still reachable in loss record 3 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4012147: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4012147: htab_create (inline-hashtab.h:47)
==2309477==    by 0x4012147: _dl_make_tlsdesc_dynamic (tlsdeschtab.h:94)
==2309477==    by 0x400D48F: elf_machine_rela (dl-machine.h:245)
==2309477==    by 0x400D48F: elf_dynamic_do_Rela (do-rel.h:147)
==2309477==    by 0x400D48F: _dl_relocate_object (dl-reloc.c:301)
==2309477==    by 0x400AB3B: _dl_open_relocate_one_object (dl-open.c:453)
==2309477==    by 0x400AB3B: dl_open_worker_begin (dl-open.c:698)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 45 bytes in 2 blocks are still reachable in loss record 4 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x401D62B: malloc (rtld-malloc.h:56)
==2309477==    by 0x401D62B: strdup (strdup.c:42)
==2309477==    by 0x4011A57: _dl_load_cache_lookup (dl-cache.c:499)
==2309477==    by 0x400732F: _dl_map_object (dl-load.c:2057)
==2309477==    by 0x40025BF: openaux (dl-deps.c:64)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x4002B33: _dl_map_object_deps (dl-deps.c:232)
==2309477==    by 0x400A9C3: dl_open_worker_begin (dl-open.c:613)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477== 
==2309477== 45 bytes in 2 blocks are still reachable in loss record 5 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4009DD7: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4009DD7: _dl_new_object (dl-object.c:199)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x40025BF: openaux (dl-deps.c:64)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x4002B33: _dl_map_object_deps (dl-deps.c:232)
==2309477==    by 0x400A9C3: dl_open_worker_begin (dl-open.c:613)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477== 
==2309477== 96 bytes in 1 blocks are still reachable in loss record 6 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x702289F: gomp_malloc_cleared (alloc.c:47)
==2309477==    by 0x702121B: add_initial_icv_to_list (env.c:2159)
==2309477==    by 0x70212E3: initialize_env (env.c:2224)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x400137B: _dl_catch_exception (dl-catch.c:215)
==2309477==    by 0x400A0FB: dl_open_worker (dl-open.c:804)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477== 
==2309477== 104 bytes in 1 blocks are still reachable in loss record 7 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4012097: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4012097: htab_expand (inline-hashtab.h:139)
==2309477==    by 0x4012097: htab_find_slot (inline-hashtab.h:192)
==2309477==    by 0x4012097: _dl_make_tlsdesc_dynamic (tlsdeschtab.h:102)
==2309477==    by 0x400D48F: elf_machine_rela (dl-machine.h:245)
==2309477==    by 0x400D48F: elf_dynamic_do_Rela (do-rel.h:147)
==2309477==    by 0x400D48F: _dl_relocate_object (dl-reloc.c:301)
==2309477==    by 0x400AB3B: _dl_open_relocate_one_object (dl-open.c:453)
==2309477==    by 0x400AB3B: dl_open_worker_begin (dl-open.c:698)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 112 bytes in 1 blocks are still reachable in loss record 8 of 18
==2309477==    at 0x488CABC: realloc (vg_replace_malloc.c:1801)
==2309477==    by 0x70228DB: gomp_realloc (alloc.c:56)
==2309477==    by 0x7035CA3: gomp_team_start (team.c:497)
==2309477==    by 0x702CA6F: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 152 bytes in 3 blocks are still reachable in loss record 9 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x401D62B: malloc (rtld-malloc.h:56)
==2309477==    by 0x401D62B: strdup (strdup.c:42)
==2309477==    by 0x400712F: _dl_map_object (dl-load.c:2123)
==2309477==    by 0x400A973: dl_open_worker_begin (dl-open.c:553)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 152 bytes in 3 blocks are still reachable in loss record 10 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4009DD7: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4009DD7: _dl_new_object (dl-object.c:199)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x400A973: dl_open_worker_begin (dl-open.c:553)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477== 
==2309477== 192 bytes in 1 blocks are still reachable in loss record 11 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x7022857: gomp_malloc (alloc.c:38)
==2309477==    by 0x7034F37: gomp_get_thread_pool (pool.h:42)
==2309477==    by 0x7034F37: get_last_team (team.c:156)
==2309477==    by 0x7034F37: gomp_new_team (team.c:175)
==2309477==    by 0x702CA53: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 240 bytes in 10 blocks are still reachable in loss record 12 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4011FE7: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4011FE7: _dl_make_tlsdesc_dynamic (tlsdeschtab.h:112)
==2309477==    by 0x400D48F: elf_machine_rela (dl-machine.h:245)
==2309477==    by 0x400D48F: elf_dynamic_do_Rela (do-rel.h:147)
==2309477==    by 0x400D48F: _dl_relocate_object (dl-reloc.c:301)
==2309477==    by 0x400AB3B: _dl_open_relocate_one_object (dl-open.c:453)
==2309477==    by 0x400AB3B: dl_open_worker_begin (dl-open.c:698)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 1,776 bytes in 5 blocks are still reachable in loss record 13 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4010EBF: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4010EBF: _dl_check_map_versions (dl-version.c:280)
==2309477==    by 0x400A9FB: dl_open_worker_begin (dl-open.c:621)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477==    by 0x4F8807F: dlopen_implementation (dlopen.c:71)
==2309477==    by 0x4F8807F: dlopen@@GLIBC_2.34 (dlopen.c:81)
==2309477== 
==2309477== 2,527 bytes in 2 blocks are still reachable in loss record 14 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4009B7B: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4009B7B: _dl_new_object (dl-object.c:92)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x40025BF: openaux (dl-deps.c:64)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x4002B33: _dl_map_object_deps (dl-deps.c:232)
==2309477==    by 0x400A9C3: dl_open_worker_begin (dl-open.c:613)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477== 
==2309477== 3,896 bytes in 3 blocks are still reachable in loss record 15 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4009B7B: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4009B7B: _dl_new_object (dl-object.c:92)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x400A973: dl_open_worker_begin (dl-open.c:553)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477== 
==2309477== 4,224 bytes in 12 blocks are possibly lost in loss record 16 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x400EAB7: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x400EAB7: allocate_dtv (dl-tls.c:395)
==2309477==    by 0x400F5DB: _dl_allocate_tls (dl-tls.c:673)
==2309477==    by 0x4F8D2EF: allocate_stack (allocatestack.c:431)
==2309477==    by 0x4F8D2EF: pthread_create@@GLIBC_2.34 (pthread_create.c:660)
==2309477==    by 0x703525B: gomp_team_start (team.c:859)
==2309477==    by 0x702CA6F: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 4,360 bytes in 1 blocks are still reachable in loss record 17 of 18
==2309477==    at 0x488D124: memalign (vg_replace_malloc.c:2020)
==2309477==    by 0x7022927: gomp_aligned_alloc (alloc.c:71)
==2309477==    by 0x7034E47: gomp_new_team (team.c:181)
==2309477==    by 0x702CA53: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 154,600 bytes in 1 blocks are still reachable in loss record 18 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4967603: flexiblas_init (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/lib/libflexiblas.so.3.4)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x401AB77: (below main) (dl-start.S:46)
==2309477== 
==2309477== LEAK SUMMARY:
==2309477==    definitely lost: 0 bytes in 0 blocks
==2309477==    indirectly lost: 0 bytes in 0 blocks
==2309477==      possibly lost: 4,224 bytes in 12 blocks
==2309477==    still reachable: 168,353 bytes in 39 blocks
==2309477==         suppressed: 0 bytes in 0 blocks
==2309477== 
==2309477== For lists of detected and suppressed errors, rerun with: -s
==2309477== ERROR SUMMARY: 641 errors from 11 contexts (suppressed: 0 from 0)

@martin-frbg
Copy link
Collaborator

Got it down to a segfault in kernel/generic/zgemm_beta.c line 105 during bisect (suggesting that the second set of values being processed two-at-a-time in that part of the unrolled loop is already nonexistent)

@martin-frbg
Copy link
Collaborator

Bisect puts it down to #4655 "Expanding the scope of 2D thread distribution to improve multithreaded DGEMM performance. (51ab190). I need more time to understand if that PR is actually at fault here (and maybe some of its performance improvement can be salvaged by limiting it to non-complex cases or certain thread counts), or if it only exposes a flaw in (gcc 14's optimization of) the generic C gemm beta code.

@martin-frbg
Copy link
Collaborator

pragma GCC optimize O0 in zgemm_beta.c does not help, so probably not a gcc14 optimizer bug. Checking the thread redistribution produced by Yamazaki's PR now to see if it does anything interesting at the time of the crash.

@martin-frbg
Copy link
Collaborator

Looks like the while loop in zgemm_beta.c can cause an additional roundtrip... still testing my "fix" though...

@martin-frbg
Copy link
Collaborator

Can you please give #5057 a spin ?

@opoplawski
Copy link
Author

I'm still seeing the crash with that patch.

@Enchufa2
Copy link

Enchufa2 commented Jan 9, 2025

I'm still seeing the crash with that patch.

Yes, the zeroing loop in that function must be patched too. I did that here and the crash disappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants