-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in array initialisation #305
Comments
Regarding artifacts on 0.4.3, can you post output of |
|
Ah, versions for artifacts are show only when you are in the AMDGPU.jl project (you can pxl-th@Yotun:~/.julia/dev/AMDGPU$ HSA_OVERRIDE_GFX_VERSION=10.3.0 julia -t8 --project=.
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.0-DEV.1437 (2022-09-26)
_/ |\__'_|_|_|\__'_| | Commit 26304f763cf (8 days old master)
|__/ |
(AMDGPU) pkg> st
Project AMDGPU v0.4.2
Status `~/.julia/dev/AMDGPU/Project.toml`
[621f4979] AbstractFFTs v1.2.1
[79e6a3ab] Adapt v3.4.0
[b99e7846] BinaryProvider v0.5.10
[fa961155] CEnum v0.4.2
[f68482b8] Cthulhu v2.7.3
[e2ba6199] ExprTools v0.1.8
[0c68f7d7] GPUArrays v8.5.0
[61eb1bfa] GPUCompiler v0.16.4
[929cbde3] LLVM v4.14.0
⌃ [1914dd2f] MacroTools v0.5.9
[21216c6a] Preferences v1.3.0
⌃ [efd6af41] ProfileCanvas v0.1.4
[ae029012] Requires v1.3.0
[efcf1570] Setfield v1.1.1
[276daf66] SpecialFunctions v2.1.7
[2696aab5] HIP_jll v5.2.3+1
[d55e3150] LLD_jll v14.0.6+0
[86de99a1] LLVM_jll v14.0.6+0
[873c0968] ROCmDeviceLibs_jll v5.2.3+0
[dd59ff1a] hsa_rocr_jll v5.2.3+0
[1ef8cab2] rocBLAS_jll v5.2.3+2 `~/.julia/dev/rocBLAS_jll`
[a6151927] rocRAND_jll v5.2.3+0
[8c6ce2ba] rocSPARSE_jll v5.2.3+0
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[44cfe95a] Pkg v1.8.0
[de0858da] Printf
[9a3f8284] Random
[10745b16] Statistics As for why |
Getting
|
You have artifacts of version |
My bad, I did not instantiate the project. Here if I instantiate and update:
So it seems HSA is |
That's fine, it is still And do you still have issues with |
Yep, still same segfault as before. |
Now, if I manually set this to
|
Hm... try making HIP_jll of 4.5.2 version. If we are doing |
How can you achieve this? Also, was there a recent change to use HIP for stream sync instead of HSA? |
Use |
Moreover, why does one manually need to tweak artifact-usage parsing? This should happen automatically? or was there a design decision to only rely on artifacts since now? |
Both HSA and HIP are still used, although I think that hip sync is not needed always. |
With adding
|
So what is unclear to me is:
|
If by artifact parsing you mean stuff that happens in
I think mixing system-wide installations with artifacts is not allowed anymore as it may cause issues. But both cases are still supported. There are now more and more artifacts available for ROCm related stuff, like rocBLAS, rocSPARSE and most recent MIOpen. |
Yes, exactly. But having |
It should automatically download correct versions, but there are two things that may prevent this. One AMDGPU is shipping Manifest.toml files, which have hardcoded versions. And second, this was probably introduced by me 😅, sometimes artifacts of version 5.2.3 will be installed, because for some of them there are no compat bound that says they should be installed only on Julia 1.9. But I'm planning on fixing that.
I think that may be a bug you've bumped into in |
Thanks a lot for your insights and for your help 🙏!
|
No problem!
I think the easiest fix would be to get rid of |
The usage of Preferences is to allow globally or per-environment configuring whether artifacts get used, without having to use an env. var. With AMDGPU 0.4.3, because we removed the build step, the env. var is now read during precompile, which sometimes happens and sometimes not. If the user forgets to set the env. var consistently, then it can cause confusing behavior, and it's not easily possible to switch it on and off as easily as with The ROCm dependency version issues are known, as @pxl-th points out. We might be able to use Pkg hooks to manually select the right set of packages, as in https://github.com/JuliaBinaryWrappers/LLD_jll.jl/blob/main/.pkg/select_artifacts.jl; someone just needs to wire this up. |
Thanks @jpsamaroo for the comments. Is the preference thing really needed, especially if it is not reliable? We could just replace it by
The ROCm dependency version issue would be nice to solve to allow smooth support for various still "recent" GPUs ;-) |
@jpsamaroo shall one stick to using Preferences for this on just parse ENV var in the classical way as suggested above? |
We now do runtime discovery of the deps. |
Using AMDGPU 0.4.3 segfaults upon array initialisation
AMDGPU.ones(Float64,2,2)
orROCArray(ones(2,2))
. Also, it is unclear to me why it now useshipStreamSynchronize
. (@jpsamaroo or @vchuravy do you have any insights on what's going on here?)This occurs using Julia 1.8.2 and ROCm 4.3 on a system with Vega20 (gfx906) cards using artifacts.
It looks like that
JULIA_AMDGPU_DISABLE_ARTIFACTS
has no longer any effect.Testing with AMDGPU 0.4.2 all works fine, and also env var to disable artifacts works.
Here errors using
AMDGPU.ones(Float64,2,2)
:And here using
ROCArray(ones(2,2))
:The text was updated successfully, but these errors were encountered: