Add amdgpu target #823

Flakebi · 2024-12-26T01:14:18Z

Proposal

Add the amdgpu target to rustc that allows to generate code for AMD GPUs.

The LLVM backend has good support for this backend. The goal is to expose this to Rust, enabling Rust as another language on these GPUs. The main target is compute capabilities. This is in contrast to the rust-gpu project, which targets graphics capabilities through spir-v (the graphics variant). The base runtime to run compute programs on AMD GPUs is HSA (Heterogeneous System Architecture), which is implemented in ROCR-Runtime. Therefore, the Rust backend should target the amdhsa OS. To support the target in rustc, LLVM needs to be compiled with the amdgpu backend enabled.

There are two points in which the amdgpu target is different from other (mainstream/x86) targets.

Address spaces

Address spaces can be thought of as denoting different physical memory areas (this is a thought concept, it can be that way in hardware, but it does not need to be). In LLVM IR, each pointer has an address space, defaulting to addrspace(0) (this is implicit in textual IR, which is why you won’t see it there). Different address spaces have different properties, e.g. they can have a different pointer size, the nullptr can be different (e.g. 0 vs -1) and the machine instructions used to access them can be different.

The amdgpu LLVM backend makes heavy use of address spaces. This is also a problem for other targets that want to support Rust (though mostly more exotic ones). The use of address spaces leads to situations, where a pointer in one address space needs to be casted to a pointer in a different address space. In LLVM IR, bitcast is invalid for this case, addrspacecast needs to be used.

The changes to rustc code should be mostly about fixing problems in a rather contained way. (I don’t know for sure how contained it will be, but compiling core required surprisingly few changes as can be seen in rust-lang/rust#134740; disclaimer: I only tried running a very simple program with -Zbuild-std=core so far.) The changes will bring the rust llvm backend closer to how LLVM envisions address spaces, which should make it easier to support other future targets that use more address spaces (there was already some work for other targets, which in turn made it easier for amdgpu).

To get a feeling for what LLVM address spaces are used for, here is the list of the important amdgpu address spaces (from https://llvm.org/docs/AMDGPUUsage.html#address-spaces):

0 (generic/flat): This is a “catch all”. Loads and stores to addrspace(0) may go to any of the below address spaces. The hardware switches at runtime, depending on the pointer. This works for all pointers, but is the slowest.
1 (global): This is basically VRAM (i.e. the most “normal” memory).
3 (local/LDS): This is different memory in hardware, basically a software-defined cache. Loads and stores are a lot faster than to VRAM, but not globally visible to all other threads. (Known as groupshared in HLSL or shared in GLSL.)
4 (constant): Same memory region as global (1), but guaranteed to be constant throughout the program. This allows using different instructions that are faster by going through a different cache.
5 (private/scratch): This is used for the stack, all allocas need to be in addrspace(5). A thread can only access its own private memory.

Basic support for the amdgpu target means using address spaces 1 (incoming pointers), 5 (allocas) and 0 (if we don’t know which one it is). Support for groupshared memory in the language requires its own RFC (something similar to thread_local probably makes sense).

Many processors / target-cpus

Every generation of GPUs uses different machine code (to some extent). LLVM supports them as different “cpu”s or processors (-Ctarget-cpu= argument for rustc).

There are two challenges for this regarding Rust support

A single processor needs to be the default in the target description.
If Rust would distribute compiled libraries (e.g. core), it would need to be for all processors to be useful.

There is no obvious choice for the default processor. If some processor is the default and a user tries to use the amdgpu target without overwriting the target-cpu, it likely results in an unusable binary. We could use a non-existing “cpu” as the default, resulting in compiler errors, to make users aware of the need to set a target-cpu.
E.g "please specify -Ctarget-cpu" results in failing compilations and warnings:

'please specify -Ctarget-cpu' is not a recognized processor for this target (ignoring processor)

Alternatively, some choice can be made, like gfx900.

Regarding 2., there may be a generic backend for amdgpu in the future, relying on spir-v (the compute variant) which would solve both these issues, but that is a long way to go and unsure if it eventually happens. For the reason of binary size alone, it does not make sense for Rust to distribute pre-compiled code for the amdgpu backend. Users should instead specify their processor via -Ctarget-cpu= and compile core via -Zbuild-std=core or similar means.

The list of processors supported by LLVM is here: https://llvm.org/docs/AMDGPUUsage.html#processors

Related issues

The PR to add the amdgpu target: Add amdgpu target rust#134740
The original tracking issue for amdgpu (has been closed for a while): Tracking issue for targeting AMDGPU devices rust#51575

Mentors or Reviewers

None yet, this is my first Rust contribution :)

Process

The main points of the Major Change Process are as follows:

File an issue describing the proposal.
A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
- Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

The text was updated successfully, but these errors were encountered:

rustbot · 2024-12-26T01:14:20Z

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:

@rfcbot concern reason-for-concern 
<description of the concern>

Concerns can be lifted with:

@rfcbot resolve reason-for-concern

See documentation at https://forge.rust-lang.org

cc @rust-lang/compiler

Flakebi added major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team labels Dec 26, 2024

rust-lang locked and limited conversation to collaborators Dec 26, 2024

rustbot added the to-announce Announce this issue on triage meeting label Dec 26, 2024

apiraino removed the to-announce Announce this issue on triage meeting label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add amdgpu target #823

Add amdgpu target #823

Flakebi commented Dec 26, 2024

rustbot commented Dec 26, 2024