Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GPU support #3111

Merged
merged 3 commits into from
Oct 10, 2024
Merged

Improve GPU support #3111

merged 3 commits into from
Oct 10, 2024

Conversation

nazar-pc
Copy link
Member

@nazar-pc nazar-pc commented Oct 9, 2024

Two key fixes here: Windows and linking.

Windows support is fixed (ignore whitespaces, the diff is very small), the solution is described in https://x.com/nazarpc/status/1844098745857667412

Linking was problematic after ROCm introduction because sppark seemed to always link amdhip64, even if only CUDA support was built (and I don't really see why), but dot-asm/sppark#2 fixed it alongside with DX improvements, so I pulled it into our fork at https://github.com/autonomys/sppark/tree/subspace-v1 and now it is good:

nobody@0ee4c112cfe9:/$ ldd /subspace-farmer*
/subspace-farmer:
	linux-vdso.so.1 (0x000078597b6dd000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000785979b19000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000785979800000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000078597b6b4000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000785979400000)
	/lib64/ld-linux-x86-64.so.2 (0x000078597b6df000)
/subspace-farmer-rocm:
	linux-vdso.so.1 (0x0000772350bf2000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000772350b02000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x000077234e400000)
	libamdhip64.so.6 => /opt/rocm/lib/libamdhip64.so.6 (0x000077234ca00000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000772350ae2000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000077234c600000)
	/lib64/ld-linux-x86-64.so.2 (0x0000772350bf4000)
	librocprofiler-register.so.0 => /opt/rocm/lib/librocprofiler-register.so.0 (0x000077234e77e000)
	libamd_comgr.so.2 => /opt/rocm/lib/libamd_comgr.so.2 (0x0000772343800000)
	libhsa-runtime64.so.1 => /opt/rocm/lib/libhsa-runtime64.so.1 (0x0000772343400000)
	libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x0000772350ad3000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x000077234e762000)
	libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x000077234e730000)
	libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1 (0x000077234e712000)
	libdrm.so.2 => /lib/x86_64-linux-gnu/libdrm.so.2 (0x000077234e6fc000)
	libdrm_amdgpu.so.1 => /lib/x86_64-linux-gnu/libdrm_amdgpu.so.1 (0x0000772350ac5000)

I think the next simplification for CI would be to only build containers and then simply extract executables from then instead of building them on the host again.

Still thinking whether it would be a good idea to simply package into zip archive AMD libraries so user doesn't need to install their runtime through a custom repo manually, any opinion?

Code contributor checklist:

@nazar-pc
Copy link
Member Author

nazar-pc commented Oct 9, 2024

@jim-counter @randywessels @EmilFattakhov FYI, on Windows HIP SDK 6.1.2 from https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html is needed.

Specifically just the HIP RTC Runtime should be enough under "HIP Runtime Compiler -> HIP RTC Runtime 6.1.0", everything else can be unchecked (farmer starts with this, but I didn't check if it can plot, so confirmation would be appreciated, "HIP Libraries Runtime 6.1.0" may be hypothetically needed too):

Screenshot

Знімок екрана з 2024-10-10 00-53-42

Full installation will take much more time and disk space for no reason.

UPD: Updated forum instructions at https://forum.autonomys.xyz/t/rocm-gpu-support-amd/4440?u=nazar-pc

@teor2345
Copy link
Contributor

teor2345 commented Oct 9, 2024

Still thinking whether it would be a good idea to simply package into zip archive AMD libraries so user doesn't need to install their runtime through a custom repo manually, any opinion?

I think making it easier for users is a good idea, but only if the AMD library license allows us to do that. (Some licenses don’t allow redistribution, and require you to get the libraries from official sources.)

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@nazar-pc nazar-pc added this pull request to the merge queue Oct 9, 2024
Merged via the queue into main with commit 6c0c789 Oct 10, 2024
11 checks passed
@nazar-pc nazar-pc deleted the improve-gpu-support branch October 10, 2024 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants