Port GPUMD to HIP? #404

Dankomaister · 2023-04-07T12:40:28Z

Many of the latest HPC systems, such as Frontier, LUMI, Adastra, and Dardel, are equipped with AMD GPUs. Therefore, it would be great if GPUMD could run on both NVIDIA and AMD GPUs. Thus, are there any plans or ongoing efforts to port GPUMD to AMD's Heterogeneous Interface for Portability (HIP)? It appears that only minimal changes are required to port a CUDA application to HIP without any loss in performance.

brucefan1983 · 2023-04-07T12:50:18Z

This is a good question. Currently I don't have access to any HIP system. Some callaborator has proposed to port GPUMD to AMD in LUMI and if that is granted, I may get access to HIP and try it out.

Dankomaister · 2023-04-08T13:24:00Z

Okay, sounds good!
We have access to systems with AMD GPUs (Dardel) if I have time I might do some tests to see how much work is need for a HIP port.

Dankomaister · 2023-06-30T02:33:36Z

Hi,

I made a first attempt to port the code from CUDA to HIP.

After cloning the GPUMD git repository (v3.8) and loading the appropriate ROCm module. Porting starts by first converting all CUDA calls to HIP calls. This is done automatically using hipconvertinplace-perl.sh according to the HIP Porting Guide

hipconvertinplace-perl.sh GPUMD/

While this does most of the work, some manual changes to the code are needed, which I will list below.

Since HIP currently does not support warp sync all the __syncwarp() statements must be changed. Not sure what is the best approach here, but replacing __syncwarp() with __syncthreads() seems to work, this may have a negative impact on performance?

Furthermore, some changes are need to the parts of the code which relies on prefix-sums from the thrust library. While this library as been ported to HIP there are some changes needed. Specifically thrust::cuda::par.on(stream) does not exist in the HIP version of the library. Again not sure what is the best approach but changing thrust::cuda::par.on(stream) to thrust::host seems to work.

Finally, the cusolver_wrapper.cu code had to be manually ported since for some reason the hipconvertinplace-perl.sh script could not handle this.

There were probably a few more minor changes that needed to be made to get the code to compile which I may have forgotten to mention. However, this diff file cu_to_hip.patch contains all the changes required to port the code to HIP.

To compile one simply have to change the make file to use hipcc instead of nvcc

###########################################################
# some flags
###########################################################
CC = hipcc
CFLAGS = -std=c++14 -O3
INC = -I./
LDFLAGS = 
LIBS = -lhipblas -lhipsolver

After the compiling I tested both training with nep and MD with gpumd which both worked fine without any problems. This is very promising for moving the GPUMD code from CUDA to HIP! It is also worth noting that one does not need access to an AMD GPU to develop HIP code since hipcc can also compile code for Nvidia GPUs!

brucefan1983 · 2023-06-30T16:38:13Z

This is a great news to hear about during my holiday.

I didn't know it can be so "easy" to port from CUDA to HIP for the whole GPUMD.

Replacing __syncwarp() with __syncthreads() does not affect the performance much, because the functions with this kind of syncrhonization are by no means a bottleneck. I can even remove all __syncwarp() to simiplify future work.

changing thrust::cuda::par.on(stream) to thrust::host might affect the performance noticeably, and tests are needed.

Perhaps it is better to change the CUDA part such that there is little or no manual tuning after the automatic conversion. Then we can always develop the CUDA version and make an HIP one for each realease in the future.

How to you think?

Dankomaister · 2023-08-22T03:16:47Z

I think it might be a good idea to have a separate HIP version of the GPUMD code to work on the CUDA part until porting it to HIP is seamless.
Then more testing can be done to ensure that the HIP version is not significantly slower than the CUDA version, after that maybe switching to HIP is a good idea?

I wanted to work a bit more with the HIP port and make some benchmarks to compare with the CUDA version but unfortunately I haven't had time to do this.

brucefan1983 · 2023-08-22T16:40:04Z

Thanks for your work on this direction. I also hope to have a look when I have the tools.

njzjz · 2023-09-29T20:30:32Z

It is a pain to maintain two similar codes, so I suggest GPUMD merge CUDA and HIP codes into the same files. I have done it for DeePMD-kit; see deepmodeling/deepmd-kit#2838.

brucefan1983 · 2023-09-29T21:03:15Z

Yes, this should be done after the hip version is working.

Knight-WP · 2024-03-18T02:50:38Z

Hi,

I made a first attempt to port the code from CUDA to HIP.

After cloning the GPUMD git repository (v3.8) and loading the appropriate ROCm module. Porting starts by first converting all CUDA calls to HIP calls. This is done automatically using hipconvertinplace-perl.sh according to the HIP Porting Guide
hipconvertinplace-perl.sh GPUMD/
While this does most of the work, some manual changes to the code are needed, which I will list below.

Since HIP currently does not support warp sync all the __syncwarp() statements must be changed. Not sure what is the best approach here, but replacing __syncwarp() with __syncthreads() seems to work, this may have a negative impact on performance?

Furthermore, some changes are need to the parts of the code which relies on prefix-sums from the thrust library. While this library as been ported to HIP there are some changes needed. Specifically thrust::cuda::par.on(stream) does not exist in the HIP version of the library. Again not sure what is the best approach but changing thrust::cuda::par.on(stream) to thrust::host seems to work.

Finally, the cusolver_wrapper.cu code had to be manually ported since for some reason the hipconvertinplace-perl.sh script could not handle this.

There were probably a few more minor changes that needed to be made to get the code to compile which I may have forgotten to mention. However, this diff file cu_to_hip.patch contains all the changes required to port the code to HIP.

To compile one simply have to change the make file to use hipcc instead of nvcc
###########################################################
# some flags
###########################################################
CC = hipcc
CFLAGS = -std=c++14 -O3
INC = -I./
LDFLAGS = 
LIBS = -lhipblas -lhipsolver
After the compiling I tested both training with nep and MD with gpumd which both worked fine without any problems. This is very promising for moving the GPUMD code from CUDA to HIP! It is also worth noting that one does not need access to an AMD GPU to develop HIP code since hipcc can also compile code for Nvidia GPUs!

Hello, have you tried running on multiple cards? It runs normally on a single AMD card for me, but multiple card runs are interrupted.

Dankomaister · 2024-08-03T08:07:22Z

Did some more work on porting GPUMD to HIP, here's an updated diff for GPUMD-v3.8:
v3.8-cu_to_hip.patch

The main difference is that __syncwarp() has been removed instead of being replaced with __syncthreads() and thrust::cuda::par.on(stream) has been replaced with thrust::hip::par.on(stream) instead of thrust::host which allows running on multiple GPUs.

I also ported GPUMD-v3.9.4 to HIP, here is the diff:
v3.9.4-cu_to_hip.patch Training works fine but MD runs fail with segmentation fault... can't figure out what the problem is here.

Since I have access to many different GPUs, I also did some benchmarking and compared the performance of the AMD Instinct MI250X running the HIP ported version to several different Nvidia GPUs running the CUDA version. Here are the results for a 1 million atom MD run with -DUSE_TABLE.

It's worth noting that the MI250X has two GCDs on each physical card with each GCD acting as a separate GPU much like a dual GPU card from Nvidia. The official performance figures given by AMD for the MI250X are 47.9 TFLOPs (FP32 and FP64) i.e. 23.95 TFLOPs per GCD. With that in mind, we'd expect the MI250X to land somewhere between the A100 and H100. However, as seen from the benchmark, the performance (1 GCD) is closer to A40 or V100, so there is likely a lot more speed left on the table for the HIP ported version.

I also tested the parallel scaling when running MD on multiple GPUs (GCDs).

For this system, parallel scaling is decent up to 4 GCD (2 cards) after which performance drops. As a bonus here is "parallel scaling" on a single H100 with the SMs partitioned with multi-instance GPU (MIG).

Besides MD, I also benchmarked training (full batch) with v3.9.4.

Here, the performance of the MI250X (1 GCD) is even worse compared to Nvidia GPUs, landing closer to the T4. The lower performance of the MI250X in training is also reflected in the power draw where MD shows a 402 W power draw, but training only 334 W.

Parallel scaling when training is excellent as shown below.

To summarize, here is the performance relative to the V100 for both MD and training.

For the HIP port, there are two things that need to be resolved before it is "production ready". One is the segmentation fault that occurs when running MD on v3.9.4, the other is the poor performance compared to the CUDA version.

brucefan1983 · 2024-08-03T14:36:47Z

Thanks for the detailed benchmark!

So HIP does not need synchronization within warp, so __syncwarp() is commented out?

brucefan1983 · 2024-08-14T19:24:53Z

It seems to be rather clear on what to do with the combination of CUDA and HIP.
Do you want to make a PR or should I do it? @Dankomaister

Dankomaister · 2024-09-08T23:53:05Z

I guess there are a couple of things to solve like getting MD working with the latest versions of the code and also multiple GPU works in v3.8 but the results are incorrect.

elindgren · 2024-10-31T12:52:18Z

@Dankomaister @brucefan1983 What packages/headers does one need to compile GPUMD with HIP locally? I'm trying to compile GPUMD in a docker container, and I get an error referring to hiprand_kernel.h not found. I use an official ROCm image, rocm/ubuntu-24.04-dev.

Am I right in using a ROCm image, or have I got things backwards?

These are the relevant lines from my Dockerfile:

#FROM nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04
FROM rocm/dev-ubuntu-24.04

# Install python 3.11
RUN apt-get update
RUN apt-get install -y python3-pip python3-dev git

# Install HIP headers
RUN apt-get install -y hip-dev rocm-llvm-dev

# Test HIP
RUN /opt/rocm/bin/hipconfig --full

# download GPUMD
WORKDIR /gpumd
RUN git clone https://github.com/brucefan1983/GPUMD.git
WORKDIR /gpumd/GPUMD/src
RUN make -f makefile.hip
# Make executables available globally
RUN cp nep /usr/bin/nep
RUN cp gpumd /usr/bin/gpumd

Dankomaister · 2024-11-08T04:57:40Z

Have not tested this but maybe you can install the hiprand and libhiprand-dev packages?

brucefan1983 added the new feature label Apr 8, 2023

elindgren mentioned this issue Oct 24, 2024

From CUDA only to CUDA+HIP #768

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port GPUMD to HIP? #404

Port GPUMD to HIP? #404

Dankomaister commented Apr 7, 2023

brucefan1983 commented Apr 7, 2023

Dankomaister commented Apr 8, 2023

Dankomaister commented Jun 30, 2023

brucefan1983 commented Jun 30, 2023

Dankomaister commented Aug 22, 2023

brucefan1983 commented Aug 22, 2023

njzjz commented Sep 29, 2023

brucefan1983 commented Sep 29, 2023

Knight-WP commented Mar 18, 2024

Dankomaister commented Aug 3, 2024

brucefan1983 commented Aug 3, 2024

brucefan1983 commented Aug 14, 2024

Dankomaister commented Sep 8, 2024

elindgren commented Oct 31, 2024 •

edited

Loading

Dankomaister commented Nov 8, 2024

Port GPUMD to HIP? #404

Port GPUMD to HIP? #404

Comments

Dankomaister commented Apr 7, 2023

brucefan1983 commented Apr 7, 2023

Dankomaister commented Apr 8, 2023

Dankomaister commented Jun 30, 2023

brucefan1983 commented Jun 30, 2023

Dankomaister commented Aug 22, 2023

brucefan1983 commented Aug 22, 2023

njzjz commented Sep 29, 2023

brucefan1983 commented Sep 29, 2023

Knight-WP commented Mar 18, 2024

Dankomaister commented Aug 3, 2024

brucefan1983 commented Aug 3, 2024

brucefan1983 commented Aug 14, 2024

Dankomaister commented Sep 8, 2024

elindgren commented Oct 31, 2024 • edited Loading

Dankomaister commented Nov 8, 2024

elindgren commented Oct 31, 2024 •

edited

Loading