-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: CHPL_ROCM_PATH
is set to the wrong ROCm installation path
#25952
Comments
I suspect the issue is mostly with how we detect the rocm sdk directory. We run The If you manually set I think this is just a case where we need to improve the auto-detection, because everything should work fine if you set |
Thanks for the bug report Guillaume and the assessment Jade! Personally, I find the difficulty of this path finding really surprising. Neither vendor seems to have a good way of asking the compiler about some crucial paths, leaving us guessing in our scripts. Most systems I am familiar with have rocm installations in So far what we have been doing (and seemingly will keep on doing) is to improve our scripts as we encounter more different installations and bug reports and that's unfortunate. For this particular case, I am wondering whether we can do One little challenge is that the compiler picks out some bitcode libraries from these installations. If we want to |
Thanks @jabraham17 and @e-kayrakli for your feedbacks. It's obviously not trivial to handle all the different installations exhaustively due to the wide diversity. At least, I hope this report can help to improve a bit more the existing path finding strategy.
That's right, I just tested it. Setting manually |
Since For example, this bash one-liner should report the right installation, without needing our path heuristics
I tested this on 2 systems with various rocm versions and it seemed to report the right path each time. @Guillaume-Helbecque does this report the right path on your system? Here is a similar check for nvidia
|
The magic words you're grepping is still a portability issue. If it helps with the current situation, I am not against it though. We'll be exchanging a problem with a smaller one (hopefully). Note that this could still be augmented with more checks using |
Yes, it does. In my case, |
This was also resolved by @jabraham17's #26072. Thanks for fixing it, Jade, and thanks again for reporting it @Guillaume-Helbecque ! |
Summary of Problem
Description:
I tried to build Chapel 2.1 on a system where ROCm 6.0.3 is the default and ROCm 5.4.6 is loaded. The reason for this is that ROCm 6.0.3 is not supported by Chapel 2.1. In this configuration, I got the error
Error: command not found: /***/***/***/rocm/llvm/bin/llvm-config
. The issue here is that Chapel setCHPL_ROCM_PATH
to the default ROCm installation path, which is not supported, instead of the one I loaded:/***/***/***/rocm/5.4.6/llvm/bin/llvm-config
. Manually settingCHPL_ROCM_PATH=/***/***/***/rocm/5.4.6
fixes the issue.[edit: After a quick discussion with the experts managing the system, it seems that the issue may not come from how Chapel detects ROCm installation, but how it tries to find llvm-config. The following subdirectories in the path in the error message are those from the ROCm 5.4.6 module. However, for some reason, rather than looking in rocm/5.4.6/llvm, it removes the 5.4.6 from the directory where it goes looking. ExportingCHPL_LLVM_CONFIG=/***/***/***/rocm/5.4.6/llvm/bin/llvm-config
also solve the issue.]If I'm not wrong, the heuristic that searches for the ROCm installation path is using
which hipcc
and then detects the path. However, executingwhich hipcc
on the system seems to give me the good path:/***/***/***/rocm/5.4.6/bin/hipcc
. This suggests that an issue may occur when Chapel have to choose between two (or more) possible installation paths.May be related to #23542.
Is this issue currently blocking your progress?
No. Setting path(s) manually makes things work.
The text was updated successfully, but these errors were encountered: