-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax GOT indirection into PC-relative addressing #397
Conversation
The following question is more about the implementation than the ABI, but does this also mean we can sometimes garbage-collect the corresponding GOT entry? That would also serve to slightly improve spatial locality in GOT accesses. |
@aswaterman Yes we can, and the following sentence in this proposal should cover that topic.
|
As a benchmark, lld contains about 100k |
Patch for GNU assembler: https://sourceware.org/pipermail/binutils/2023-September/129523.html |
Seems fine to me. The AArch64 scheme (I participated in the review) has the nice property that adrp+ldr costs just 2 relocations while ours costs 4. AArch64 did not want to introduce new relocation types for compatibility with existing toolchains. Technically, we can allow GOT to PC-relative optimization even in the absence of relaxation. With The AArch64 ABI has explicitly called out properties when an optimization can be made: non-preemptible & not-ifunc (https://maskray.me/blog/2021-01-18-gnu-indirect-function#non-preemptible-ifunc). We shall also disallow relaxation to an absolute symbol for -shared/-pie links. |
Just like AArch64, I didn't want to introduce a new relocation for backward compatibility. In hindsight, the RISC-V psABI could have defined
You made a good point. Since the second |
I made a change to the proposal so that it requires only one |
This is what I was suggesting a while back when specifying that undef weak symbols be forced to be via the GOT for medany, so thanks for doing this. This should allow us to remove the note that allows GCC to not do that. |
@jrtc27 This proposal doesn't really relax references to unresolved weak symbols for the medany code model, because we handle only PC-relative references in this proposal. Since unresolved weak symbols are handled as if they were absolute symbols at address 0, their addresses aren't PC-relative constants. But we could relax references to absolute symbols, too, so let me add that to this proposal. I intentionally omitted it at first because I thought that the use case is rare, but weak symbols are indeed a fairly important use case. |
f923a9a
to
45c50c0
Compare
Updated the proposal to allow to relax an absolute symbol at a very low address. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
riscv-elf.adoc
Outdated
ld t0, 0(t0) # R_RISCV_PCREL_LO12_I (label), R_RISCV_RELAX | ||
---- | ||
|
||
Relaxation result (absolute symbol whose address is within `0x0` ~ `0x1f` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whose address is within
0x0
~0x1f
Where does this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, c.lui
was a typo of c.li
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the address 0x0 to 0x1f?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because c.li
can set only a signed 6 bit value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more precisely range: "0x0 ~ 0x1f or 0xffff ffff ffff ff20 ~ 0xffffffffffffffff for RV64 and 0xffffff20 ~ 0xffffffff for RV32." since it's sign-extended value?
1f25a71
to
41dd3be
Compare
riscv-elf.adoc
Outdated
with `R_RISCV_RELAX`. | ||
|
||
- `R_RISCV_GOT_HI20` refers to the location 4 bytes before where | ||
`R_RISCV_PCREL_LO12_I` points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Who cares where they are relative to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to exclude the pattern that one GOT_HI20 is shared by multiple PCREL_LO_I because it'd complicates the specification a lot. This optimization focuses on optimizing la
pseudo instruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not complicated at all, just require that the GOT_HI20 and all referencing PCREL_LO12_I's are relaxable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
riscv-elf.adoc
Outdated
- The symbol pointed to by `R_RISCV_PCREL_LO12_I` is at the location to | ||
which `R_RISCV_GOT_HI20` refers. | ||
|
||
- The symbol's address is a link-time constant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what you mean. What you mean is that it's bound at link time. Otherwise a PIE, or DSO with suitable binding, wouldn't be able to use it, as the object would have an unknown load base and thus the address wouldn't be a link-time constant, just its PC-relative offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say "The symbol's PC-relative address is a link-time constant." Does this make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true for absolute symbols. Just do it in terms of "the symbol is bound at link time to within the object / is defined in the object and not preemptible" or similar, because that's what the important part is. Well, with undef weak catered for...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual condition is a bit more complicated due to GNU ifunc. Let me rearrange this section to explain it more clearly.
symbol's address from a GOT entry with an `auipc` + `ld` instruction | ||
pair. | ||
|
||
Condition:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language in here needs tweaking to more obviously cover the case when a single GOT_HI20 is shared by multiple PCREL_LO12_I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to handle such case. This optimization is to relax la
and not others. It looks like a GOT_HI20 is almost always paired with a PCREL_LO_I that immediately follows, so this should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM does not model them as a pair, it models them as independent instructions. In practice you will put them together to benefit from macro-op fusion, but there's no requirement to do so in the ABI or in LLVM's internals, so the ABI should not start to introduce arbitrary restrictions that have no technical grounding other than being marginally more annoying to write a specification for. Please specify the actual necessary restrictions.
My reference to AArch64 is meant to highlight that GOT optimization is not necessarily linked with linker relaxation. I am concerned that using 2 The auipc+addi result is that is mostly relevant as the two other forms have stringent address requirement and are position-dependent only. If a toolchain decides to allow auipc+addi optimization using this scheme, relocatable object files will expense 2 . I think x86-64 GOTPCRELX provides no difference on slightly larger applications. It just optimizes some microbenchmarks. I'll be out of town since Friday for about 10 days. Sorry for being unable to respond quickly during that time. |
@MaskRay I chose to require But I might be overthinking. A value computed by Is there anyone who thinks we should require 'R_RISCV_RELAX' for %got_pcrel_hi20? If no one wants it, I can remove it from the proposal. |
I tend to mark with R_RISC_RELAX, one consideration is #393, also for consistency with other relaxation, of cause we can call it optimization rather than relax, but I am not sure does it means should we also disable those optimization when -mno-relax is given? That's kind of introducing new concept into psABI I think. |
@kito-cheng Yeah, that makes sense. We have never removed an instruction or replaced it with a compact one if there's no |
Another random idea come to my mind - we can call it optimization IF we don't perform any code shrinking, so we may also optimize this case with replacing auipc with nop like other architecture, also same idea can apply to other existing relaxation too. But it just throw an idea not review comments, so don't putting into this PR :p And that's may useful for those scenarios which disable linker relaxation. |
That we have too many RELAX relocations is indeed a problem, but that's not new and not specific to this proposal. I created an issue for the topic, so let's discuss that here: #401 |
riscv-elf.adoc
Outdated
- The symbol pointed to by `R_RISCV_PCREL_LO12_I` is at the location to | ||
which `R_RISCV_GOT_HI20` refers. | ||
|
||
- If the symbol is absolute, its address is within `0x0` ~ `0x7ff`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more precisely range: "0x0
~ 0x7ff
or 0xfffffffffffff800
~ 0xffffffffffffffff
for RV64 and 0xfffff800
~ 0xffffffff
for RV32." since it's sign-extended value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly for undefined weak symbols whose address is zero, but yeah, we can technically use it for the highest addresses. I'll update the proposal.
riscv-elf.adoc
Outdated
ld t0, 0(t0) # R_RISCV_PCREL_LO12_I (label), R_RISCV_RELAX | ||
---- | ||
|
||
Relaxation result (absolute symbol whose address is within `0x0` ~ `0x1f` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more precisely range: "0x0 ~ 0x1f or 0xffff ffff ffff ff20 ~ 0xffffffffffffffff for RV64 and 0xffffff20 ~ 0xffffffff for RV32." since it's sign-extended value?
Please take another look |
Ping? |
Gentle ping |
Right. Converting to the following is linker optimization, which does not shrink the section
I think this conversion does not need This patch focuses on what we can do with
Frankly, I consider the first two forms not that useful. They require absolute addresses, which imply The last, |
riscv-elf.adoc
Outdated
---- | ||
|
||
Relaxation result (absolute symbol whose address cannot be represented | ||
as a 6-bit signed integer or if the RVC instruction is not permitted): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 12-bit: [-2048, 2047]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
Note that the use case of this relaxation pattern is to relax a GOT-indirect reference to an unresolved weak symbol in an environment that the C extension is not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resulted in a wrong condition, so I rewrote the sentence.
The forms to relax absolute symbols are for unresolved weak symbols. With these forms, we don't need to store zeros in GOT for unresolved weak symbols. I think this use case is too important that we can't dismiss. |
I think it differently. Undefined weak symbols are actually relatively rarely used (and many can be converted weak definitions) and almost every use case is non-measurable for performance/code size. I even believe most use cases are checked just once in one program invocation. |
My original proposal didn't include the relaxation of small absolute symbols. I added it as a response to #397 (comment). So I guess you guys can discuss? @jrtc27 My opinion is it's a good relaxation because:
|
This was the reason given for why GCC wouldn't adopt the "use the GOT for possibly-undefined weak symbols + -mcmodel=medany" specification and it had to be weakened for the 1.0 psABI. So either that was wrong and we can just do it anyway in GCC or we need this for GCC to use the GOT in this case (and thus not require instruction rewriting with -mno-relax). Doing neither isn't a solution though. |
Some psABIs define the linker optimizations to relax a GOT load into a PC-relative address materialization. AArch64 [1] allows the linker to rewrite ADRP+ADD into ADR. x86-64 does the same thing with the `R_X86_64_GOTPCRELX` and `R_X86_64_REX_GOTPCRELX` relocations. In our case, we have lots of AUIPC+LD instruction pairs to load an address from the GOT in our RISC-V programs because `la` assembly pseudo instruction is expanded to that instruction pair. If the PC-relative address loaded by the instruction pair is a link-time constant, we can rewrite the instructions with AUIPC+ADDI to directly materialize the value into a register, which eliminates one memory load. [1] https://github.com/ARM-software/abi-aa/blob/844a79fd4c77252a11342709e3b27b2c9f590cf1/aaelf64/aaelf64.rst#relocation-optimization
Any more comments? |
Sorry, but I am still very confused. The description is rewritten in a tone that this is for optimization. I commented that the int6 Another topic is whether this scheme can help unresolved undefined weak symbols. For Do we know whether GNU ld relaxes R_RISCV_GOT_HI20 and if no, what the lone R_RISCV_RELAX is about? This proposal suggests that R_RISCV_GOT_HI20 needs a pairing R_RISCV_RELAX, i.e. R_RISCV_GOT_HI20/R_RISCV_RELAX + R_RISCV_PCREL_LO12_I/R_RISCV_RELAX. This can be argued as acceptable if we consider that GCC/Clang already emit R_RISCV_PCREL_HI20/R_RISCV_RELAX + R_RISCV_PCREL_LO12_I/R_RISCV_RELAX for PC-relative addressing, |
I'm not sure if I've understood your questions correctly, but as I see it:
|
For For linker optimization without R_RISCV_RELAX, I am also happy to see that, however I would like be come a separated PR to address that part, as I mention before it's would be new concept to RISC-V psABI, it's not necessary become blocker to this PR IMO. For GCC implementation: Yeah, it's unfortunately we don't fix that on GCC land yet...we should find some people to fix that :( |
Are there any further comments or discussions? I believe the proposal is in good shape and ready to be merged. |
Last ping, will merge this this Friday (2023/12/8) if no further comments. |
Some psABIs define a relaxation to turn a GOT load into a PC-relative address materialization. For example, the AArch64's psABI allows adrp+ldr to be rewritten to nop+adr to eliminate the memory load. This patch is part of the effort to make such optimization possible for RISC-V. For RISC-V, we use the la assembly pseudo instruction to load a symbol address from the GOT. The pseudo instruction is expanded to auipc+ld. If the address loaded by the instruction pair is actually a PC-relative link-time constant, we want the linker to rewrite the instruction pair with auipc+addi. We can't rewrite all existing auipc+ld pairs with auipc+addi in the linker because there might be code that jumps to the middle of the instruction pair. That should be extremely rare, if ever exists, but you can at least in theory write a program in assembly that jumps to the ld instruction of the instruction pair. We need a marker to identify that an auipc+ld can be safely relaxed (i.e. they are emitted for la). This patch is to annotate R_RISCV_GOT_HI20 with R_RISCV_RELAX only when the relocation is emitted for the la pseudo instruction. The linker will use it as a signal that the instruction pair can be safely relaxed. Proposal to the RISC-V psABI: riscv-non-isa/riscv-elf-psabi-doc#397 gas/ * config/tc-riscv.c (source_macro): New static int variable. The identifier of the assembler macro we are expanding, if any. (append_insn): Updated source_macro to tc_fix_data, to record which macro expands, if any. (macro): Record which macro expands into source_macro. Reset source_macro to -1 at the end. (md_apply_fix): Apply R_RISCV_RELAX if pcrel_got_hi is expanded from macro LA/LGA. * config/tc-riscv.h (struct riscv_fix, TC_FIX_TYPE, TC_INIT_FIX_DATA): Defined to record source_macro into fixups for riscv target. * testsuite/gas/riscv/la-variants.d: Updated.
Some psABIs define the linker optimizations to relax a GOT load into a PC-relative address materialization. AArch64 [1] allows the linker to rewrite ADRP+ADD into ADR. x86-64 does the same thing with the
R_X86_64_GOTPCRELX
andR_X86_64_REX_GOTPCRELX
relocations.In our case, we have lots of AUIPC+LD instruction pairs to load an address from the GOT in our RISC-V programs because
la
assembly pseudo instruction is expanded to that instruction pair. If the PC-relative address loaded by the instruction pair is a link-time constant, we can rewrite the instructions with AUIPC+ADDI to directly materialize the value into a register, which eliminates one memory load.We can't rewrite existing AUIPC+ADDI instruction pairs unconditionally because technically there may exist code that jumps to the middle of the instruction pair. That should be extremely rare, but we can't deny the possibility. Therefore,
R_RISCV_GOT_HI20
is required to be annotated withR_RISCV_RELAX
in this proposal. Currently, neither gas nor LLVM assembler emitR_RISCV_RELAX
forR_RISCV_GOT_HI20
. I have a WIP patch to emitR_RISCV_RELAX
only for thela
pseudo instruction.[1] https://github.com/ARM-software/abi-aa/blob/844a79fd4c77252a11342709e3b27b2c9f590cf1/aaelf64/aaelf64.rst#relocation-optimization