-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Conda install does not consistently work #886
Comments
Thank you for reporting this. Something is very broken with this installation ( Do you remember how you tried to install originally? Could you share the current contents of your environment ( Could you share the output of Could you try creating a new environment, requesting the latest available packages?
Conda would ideally be picking the latest version of the legate.core package, but it might be having trouble fulfilling that version's dependencies, and ends up using an earlier version (whose dependencies are incomplete). |
I originally tried using the Here's the anaconda info, this was a completely new install of anaconda I did not have it before trying to install legate and cunumeric. Anaconda could not find that specific version of legate-core. I am not using mamba, but I don think that matters?? It looks like the version I have installed was 23.11 though.
Also some more info, not sure if its relevant, but inside the |
I think what happened is that your original from-source installation (using I would suggest that you remove your entire anaconda installation, do a fresh anaconda install, then create new a child environment containing cunumeric.
Hopefully conda picks the latest version automatically (23.09), but if it doesn't you can try specifying it explicitly ( |
Sounds good, I will try that. Do you have any idea why my original clean installation with anaconda (which was in a separate env) did not pick up GPU support? When I did legate --info it showed no cuda support but the docs say it should build with GPU support by default. This is the main reason I ended up in this messed up state as I need the GPU and multi node support. Do NCCL, UCX or other libraries need to be installed for the conda to pick up full support? |
In that case the pre-built conda packages will not help (they only support single-node execution). You will need to do a from-source install unfortunately. I suggest following the "basic build" instructions from https://github.com/nv-legate/legate.core/blob/branch-24.01/BUILD.md#basic-build. The base environment that Based on your machine, I suggest creating a base environment as follows:
Assuming your machine already has a C++ compiler, and some MPI implementation. |
So this build script just gives an environment with all the dependencies and I am supposed to install the legate anaconda library on top of this environment by running the install.py? Or can I use the anaconda install command to put legate into this env? Also, the environment fully re-installs CUDA and UCX through anaconda even though I have them both installed. Is that fine? I dont see why it would be a huge problem, but I also am unfamiliar with the anaconda versions of these packages. |
The If the pre-built cuNumeric conda package were sufficient for you (in your case it's not, because you want to do multi-node runs), then it should be sufficient to create a new environment containing just the pre-built package (using the
That should be fine, as long as you use the same version of CUDA in the conda environment as you have on your system. We use conda to pull some CUDA libraries that don't come standard in the CUDA SDK (e.g. cuTensor), and that unfortunately requires that we pull in a lot of other (potentially superfluous) dependencies.
If you already have a version of UCX on your system that you want to reuse, then use an environment created with |
Thanks for all your help, I got
|
Never mind, turns out when you upgrade gcc on Cent-OS the devtoolset does not come with libatomic. |
I'm surprised tblis's
Github issues is the best place currently for build/run issues. If you'd like to discuss your overall usecase in more detail, feel free to email [email protected]. |
Software versions
Python : 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
Platform : Linux-3.10.0-1160.99.1.el7.x86_64-x86_64-with-glibc2.17
Legion : (failed to detect)
Traceback (most recent call last):
File "/home/emeitz/software/anaconda3/bin/legate-issue", line 8, in
sys.exit(main())
^^^^^^
File "/home/emeitz/software/anaconda3/lib/python3.11/site-packages/legate/issue.py", line 79, in main
print(f"Legate : {try_version('legate', 'version')}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/emeitz/software/anaconda3/lib/python3.11/site-packages/legate/issue.py", line 32, in try_version
return getattr(module, attr) if module else None
^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'legate' has no attribute 'version'
Jupyter notebook / Jupyter Lab version
N/A
Expected behavior
I am trying to use cunumeric which requires legion.core to be installed first. I thought I had everything up and running, but after restarting my terminal the
legate
command was no longer on path (conda env was active). I went digging through the anaconda packages and found this package:legate-core-23.03.00-cuda11_py311_g5de57a8_3
which has a bin with legate and legate-issue binaries inside. I'm not exactly sure what is wrong so I will just list some things I found strage:/usr/local/cuda
but it does not appear to have done thatlegate --info
) until I installed some other libraries. I do not know which one exactly fixed this but I'm guessing it was NCCL.Observed behavior
legate
would not register as a valid commandExample code or instructions
Stack traceback or browser console output
No response
The text was updated successfully, but these errors were encountered: