Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPU support #20

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

AMD GPU support #20

wants to merge 2 commits into from

Conversation

haampie
Copy link

@haampie haampie commented Jan 21, 2021

Adds a hook for AMD GPUs, which currently just mounts /dev/dri and /dev/kfd as advocated by AMD.

Hook can be enabled through the following flag:

sarus run --amdgpu [container] [cmd]

It will just fail when /dev/dri or /dev/kfd does not exist or can't be mounted.

- Bind mount /dev/dri and /dev/kfd in the rootfs
- Add the amdgpu hook and install it by default
- Enable amdgpu when --amdgpu is passed
@haampie haampie changed the base branch from master to develop January 21, 2021 14:35
@Madeeks Madeeks self-requested a review January 22, 2021 15:34
@Madeeks Madeeks added the enhancement New feature or request label Jan 22, 2021
Copy link
Member

@Madeeks Madeeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @haampie, thanks for opening this PR!

The baseline implementation looks good!
I have a few questions for you:

  • If I understand correctly, the code is for single GPU systems. What would happen on a multi-GPU system?
  • The integration with the NVIDIA Container Toolkit does not require any additional CLI option. Is there a feature of the ROCm environment which could be leveraged to obtain a similar experience? If a user requests GPU hardware (and in this case, a specific GPU architecture) through the workload manager, ideally there should not be the need to repeat the request to the container engine.

install(FILES templates/hooks.d/09-slurm-global-sync-hook.json.in DESTINATION ${CMAKE_INSTALL_PREFIX}/etc/hooks.d)
install(FILES templates/hooks.d/11-amdgpu-hook.json.in DESTINATION ${CMAKE_INSTALL_PREFIX}/etc/hooks.d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add this hook to a default installation, mainly because it is targeted at very specific hardware, and therefore it should be explicitly chosen by the system administrator (like the MPI and NVIDIA hooks).
Another reason would be that at present we have no way to test it as part of the automated tests.

#include "AmdGpuHook.hpp"

#include <vector>
#include <fstream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the headers here needed? For example, I don't think you need fstream, boost/regex, and possibly others.

#define sarus_hooks_amdgpu_AmdGpuHook_hpp

#include <vector>
#include <unordered_map>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out for the .cpp file above, could you check if all headers are effectively used?

@haampie
Copy link
Author

haampie commented Jan 25, 2021

Hi @Madeeks, I haven't tested this for multiple GPUs, but in principle it should work. Every GPU should should be listed in /dev/dri/card{n} for n = 0, 1, ..., and this PR is mounting /dev/dri entirely.

I'll think about autodetection like we have for NVIDIA GPUs, but didn't immediately know what to check. AMD likes to install /opt/rocm/bin/hipconfig to check the version of the rocm libs, but that doesn't imply there are actual GPUs available. Maybe best is to check if vendor data is available from /dev/dri/card* and/or /dev/kfd/*.

@haampie
Copy link
Author

haampie commented Jan 25, 2021

Ok, so the way rocm_agent_enumerator detects AMD GPUs is by calling hsa_iterate_agents, which is available from a spack package https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/hsa-rocr-dev/package.py, but depends on AMD's fork of LLVM :D so not a great dependency to just add to Sarus.

Another idea is to check if rocminfo is in the PATH or /opt/rocm/bin/rocminfo exists, and if so execute it and grep the output for some string. That's a bit ugly, but probably easiest.

@Madeeks
Copy link
Member

Madeeks commented Feb 5, 2021

Let me elaborate a bit more my question about hook interface and device selection.

The CUDA runtime uses the CUDA_VISIBLE_DEVICES environment variable to determine the GPU devices applications have access to. The NVIDIA Container Toolkit uses NVIDIA_VISIBLE_DEVICES to determine which GPUs to mount inside the container. By checking for the presence of such variables, Sarus does not need an explicit CLI option to know if the host process is requesting GPU devices (and which ones).

I was wondering if there were analogous variables in the ROCm environment.
A quick seach brought me to the following issues: ROCm/ROCm#841, ROCm/ROCm#994
From what I understand there are 2 variables which cover similar roles: HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES.
I don't have experience with ROCm, so according to you can anyone of those be used to control hook activation? If so, which one is the most appropriate? How does the numerical ids in those variables relate to the /dev/dri/* files?

As an additional reference, the GRES plugin of Slurm sets CUDA_VISIBLE_DEVICES to the GPUs allocated by the workload manager. What's the mechanism implemented by Slurm (or other workload managers) to signal allocation of AMD GPUs?

@haampie
Copy link
Author

haampie commented Feb 8, 2021

Ah, Ault is configured such that by default you get all GPUs.

$ srun -p amdvega /bin/bash -c 'echo "ROCM_VISIBLE_DEVICES: $ROCR_VISIBLE_DEVICES"; /opt/rocm/bin/rocm_agent_enumerator; ls /dev/dri/card*'
ROCM_VISIBLE_DEVICES: 
gfx000
gfx906
gfx906
gfx906
/dev/dri/card0
/dev/dri/card1
/dev/dri/card2
/dev/dri/card3

$ srun -p amdvega --gres=gpu:1 /bin/bash -c 'echo "ROCM_VISIBLE_DEVICES: $ROCR_VISIBLE_DEVICES"; /opt/rocm/bin/rocm_agent_enumerator; ls /dev/dri/card*'
ROCM_VISIBLE_DEVICES: 0
gfx000
gfx906
/dev/dri/card0
/dev/dri/card1
/dev/dri/card2
/dev/dri/card3

$ srun -p amdvega --gres=gpu:3 /bin/bash -c 'echo "ROCM_VISIBLE_DEVICES: $ROCR_VISIBLE_DEVICES"; /opt/rocm/bin/rocm_agent_enumerator; ls /dev/dri/card*'
ROCM_VISIBLE_DEVICES: 0,1,2
gfx000
gfx906
gfx906
gfx906
/dev/dri/card0
/dev/dri/card1
/dev/dri/card2
/dev/dri/card3

$ srun -p amdvega --gres=gpu:2 /bin/bash -c '/opt/rocm/bin/rocminfo | grep GPU'
  Uuid:                    GPU-3f50506172fc1a63               
  Device Type:             GPU                                
  Uuid:                    GPU-3f4478c172fc1a63               
  Device Type:             GPU                                

$ srun -p amdvega --gres=gpu:2 /bin/bash -c '/opt/rocm/opencl/bin/clinfo | grep Number'
Number of platforms:				 1
Number of devices:				 2

@haampie
Copy link
Author

haampie commented Feb 8, 2021

So, ROCM_VISIBLE_DEVICES is only set by when --gres=gpu[:n] is provided. When it is set, I think it's handled on the software level by the ROCm stack, so we might not want to bother doing the bookkeeping of mounting exactly those specific GPUs from /dev/dri, but leave ROCm to that. For instance:

$ ROCR_VISIBLE_DEVICES=1,2 sarus run -t --mount=type=bind,src=/dev/kfd,dst=/dev/kfd --mount=type=bind,src=/dev/dri,dst=/dev/dri stabbles/sirius-rocm /opt/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9.3.0/rocminfo-4.0.0-lruzhymnjm4hez3jeuyf3kyhmjjloqyp/bin/rocm_agent_enumerator
gfx000
gfx906
gfx906

How about we just unconditionally mount /dev/kfd and /dev/dri when they exist?


Edit: in fact I find it only confusing to mount just a few specific GPUs, because ROCR_VISIBLE_DEVICES=1,2 should then be unset or relabeled to ROCR_VISIBLE_DEVICES=0,1 inside the container:

$ ls /dev/dri/
by-path  card0  card1  card2  card3  renderD128  renderD129  renderD130


$ ROCR_VISIBLE_DEVICES=1,2 sarus run \
  --mount=type=bind,src=/dev/kfd,dst=/dev/kfd \
  --mount=type=bind,src=/dev/dri/renderD129,dst=/dev/dri/renderD129 \
  --mount=type=bind,src=/dev/dri/renderD130,dst=/dev/dri/renderD130 \
  stabbles/sirius-rocm /bin/bash -c '/opt/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9.3.0/rocminfo-4.0.0-lruzhymnjm4hez3jeuyf3kyhmjjloqyp/bin/rocminfo'
.. only shows 1 gpu because ROCR_VISIBLE_DEVICES is still 1,2 and the GPUs are labeled 0,1 now ...

$ ROCR_VISIBLE_DEVICES=1,2 sarus run \
  --mount=type=bind,src=/dev/kfd,dst=/dev/kfd \
  --mount=type=bind,src=/dev/dri/renderD129,dst=/dev/dri/renderD129 \
  --mount=type=bind,src=/dev/dri/renderD130,dst=/dev/dri/renderD130 \
  stabbles/sirius-rocm /bin/bash -c 'unset ROCR_VISIBLE_DEVICES && /opt/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9.3.0/rocminfo-4.0.0-lruzhymnjm4hez3jeuyf3kyhmjjloqyp/bin/rocminfo'
... shows 2 gpus correctly ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants