Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matplotlib plots crash due to conflicting BLAS versions in numpy #471

Closed
cboettig opened this issue May 31, 2022 · 9 comments · Fixed by #657
Closed

matplotlib plots crash due to conflicting BLAS versions in numpy #471

cboettig opened this issue May 31, 2022 · 9 comments · Fixed by #657
Labels
bug Something isn't working

Comments

@cboettig
Copy link
Member

The pre-compiled binaries for numpy are built against a different BLAS library than the openblas library included on rocker images. This leads to unexpected errors unless numpy is installed from source, e.g.

pip3 install --no-binary="numpy" numpy --ignore-installed

I don't yet know the best work around here, it's difficult as the error is both relatively opaque for users to track down, and there are just oh so many ways users will install and re-install numpy from binary without realizing it in different virtualenvs.

A start but no means a complete solution would be to at least build numpy from source in the python images. Open to any better solutions / ideas? (cc @yuvipanda ?)

@eitsupi eitsupi added the bug Something isn't working label Jun 1, 2022
@cboettig
Copy link
Member Author

Looks like this may impact scikit-learn as well, which binds lapack / blas independently. Simplest solution for now may be to opt out of openblas, e.g.

export ARCH=$(uname -m)
update-alternatives --set "libblas.so.3-${ARCH}-linux-gnu" "/usr/lib/${ARCH}-linux-gnu/blas/libblas.so.3"
update-alternatives --set "liblapack.so.3-${ARCH}-linux-gnu" "/usr/lib/${ARCH}-linux-gnu/lapack/liblapack.so.3"

@eddelbuettel
Copy link
Member

I think you could also just rely on apt to only have reference blas (or atlas) rather than openblas installed (if the multithreading is the issue here).

@cboettig
Copy link
Member Author

Thanks @eddelbuettel! We've had openblas enabled by default on the r-ver stack for a long time to give multithreaded behavior out-of-the-box, though we've had some issues occasionally as you know.

From what I understand, the segfault here is not due to multithreading per se, but something to do with how openblas uses LP64 suffixes on it's symbols, and whether the numpy wheels from pypi are being built using CFLAGS=-fvisibility=hidden -- but whatever that all means is pretty much over my head. (See numpy/numpy#21643 for details and recent changes to numpy build recipes which may address this).

I don't really understand why the same python binary behaves differently with respect to BLAS libraries when it is called via reticulate vs being invoked in pure python, but it seems that is what happens. This issue is discussed over in rstudio/reticulate#1190. t-kalinowski mentions that this issue is caused when R is built "specifically against openblas" rather than "a generic BLAS interface"; I thought we were doing the latter, given that update-alternatives works, but maybe we aren't?

@eddelbuettel
Copy link
Member

eddelbuettel commented Jun 27, 2022

The problem really is that "all this is hard". In theory one could a assume a well managed and current distro and you get R and Python/numpy from it and all works. In practice and for a million different reason you two decided to not go with neither the distro R nor the distro Python (for reticulate). Now you own all the pieces. Such is life. There may not be a quick or easy fix other than grinding down bug by bug.

@eitsupi
Copy link
Member

eitsupi commented Nov 1, 2022

I don't know why, but it seems that this bug does not occur with Ubuntu 22.04-based images (e.g. rocker/r-ver:4.2.2), as was confirmed in #531.
cc @cvanderaa

@cboettig
Copy link
Member Author

cboettig commented Nov 1, 2022

@eitsupi do you think it would make sense to put openblas back on by default then in the ubuntu 22.04 images?

@eitsupi
Copy link
Member

eitsupi commented Nov 2, 2022

@cboettig Yes, but I am wondering if the bug could recur as long as I don't know why it is not occurring.

@cboettig
Copy link
Member Author

cboettig commented Nov 2, 2022

@eitsupi Good point. I believe the origin of the bug comes from the fact that openblas was not using LP64 suffixes on it's symbols, causing numpy to grab the wrong methods. Since we install openblas from the Ubuntu repos, it's probable that bug was patched a while ago and now that we are pulling from the jammy repos we would start seeing the newer version of openblas that no longer lacks the suffixes. But I haven't been able to dig deep enough into the openblas changelog to confirm that to be the case.

@eitsupi
Copy link
Member

eitsupi commented Nov 2, 2022

Thank you for the explanation.
It seems like a good idea to update the documentation and scripts to clearly state that workarounds are required for focal-based images.

eitsupi added a commit that referenced this issue Jun 1, 2023
Close #471 (and related to
#582 (comment))

This workaround seems to be sufficient if it is executed only on Ubuntu
20.04, since OpenBLAS on Ubuntu 22.04 does not seem to have the problem
of crashing numpy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants