-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds #5057
Conversation
Subtle one. I'll give it a spin later. Did you figure out why this happens to cause problems only in aarch64 with 12 or more threads? |
aarch64 is the only widely used architecture for which OpenBLAS currently has no optimized GEMM_BETA kernels at all, so |
Could #4917 be a consequence of this? |
And BTW... the valgrind output I collected reported invalid writes in line |
Not sure about #4917 as that was reproducibly returning sloppy results for existing data (if only for select thread counts), and yes the zeroing loop will need identical treatment if this stage of the PR is correct at all |
Ok, I'm building in Copr. I've reproduced the crash by setting |
looks good here, thanks |
Thanks for testing - this is still quite weird to me as the code must have been like that for at least 15 years, if not 20 - granted aarch64 gave it much more exposure lately |
correct, I have built updated openblas with this fix (in rawhide mock), installed new rpms into a new rawhide buildroot and successfully rebuilt flexiblas there |
With the most current commit in rawhide? Because yesterday I limited the number of threads to 10 in order to avoid the crash and be able to rebuild FlexiBLAS in rawhide, because it was affected by the retirement of ATLAS. |
Ok, good news: the new build succeeded where the others failed. So I can confirm that patching the zeroing loop in |
openblas is from rawhide HEAD, but flexiblas from https://src.fedoraproject.org/rpms/flexiblas/c/e10825622fc90f7405e4791062e6b433822a62c8?branch=rawhide (before your workaround) |
fixes #5050