Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax GOT indirection into PC-relative addressing #397

Merged
merged 1 commit into from
Dec 10, 2023

Conversation

rui314
Copy link
Collaborator

@rui314 rui314 commented Sep 18, 2023

Some psABIs define the linker optimizations to relax a GOT load into a PC-relative address materialization. AArch64 [1] allows the linker to rewrite ADRP+ADD into ADR. x86-64 does the same thing with the R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX relocations.

In our case, we have lots of AUIPC+LD instruction pairs to load an address from the GOT in our RISC-V programs because la assembly pseudo instruction is expanded to that instruction pair. If the PC-relative address loaded by the instruction pair is a link-time constant, we can rewrite the instructions with AUIPC+ADDI to directly materialize the value into a register, which eliminates one memory load.

We can't rewrite existing AUIPC+ADDI instruction pairs unconditionally because technically there may exist code that jumps to the middle of the instruction pair. That should be extremely rare, but we can't deny the possibility. Therefore, R_RISCV_GOT_HI20 is required to be annotated with R_RISCV_RELAX in this proposal. Currently, neither gas nor LLVM assembler emit R_RISCV_RELAX for R_RISCV_GOT_HI20. I have a WIP patch to emit R_RISCV_RELAX only for the la pseudo instruction.

[1] https://github.com/ARM-software/abi-aa/blob/844a79fd4c77252a11342709e3b27b2c9f590cf1/aaelf64/aaelf64.rst#relocation-optimization

@aswaterman
Copy link
Contributor

The following question is more about the implementation than the ABI, but does this also mean we can sometimes garbage-collect the corresponding GOT entry? That would also serve to slightly improve spatial locality in GOT accesses.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 18, 2023

@aswaterman Yes we can, and the following sentence in this proposal should cover that topic.

If this relaxation eliminates all references to the symbol's GOT slot, the linker may opt not to create a GOT slot for that symbol.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 18, 2023

As a benchmark, lld contains about 100k R_RISCV_GOT_HI20 relocations in total in its input object files. When I tried to build lld with this relaxation, about the half of the relocations were optimized and turned into PC-relative addressing. The remaining R_RISCV_GOT_HI20 relocations referred to symbols defined by external shared objects and couldn't be optimized by this relaxation.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 19, 2023

Patch for GNU assembler: https://sourceware.org/pipermail/binutils/2023-September/129523.html
Patch for the mold linker: rui314/mold@a19783d

@MaskRay
Copy link
Collaborator

MaskRay commented Sep 19, 2023

Seems fine to me. The AArch64 scheme (I participated in the review) has the nice property that adrp+ldr costs just 2 relocations while ours costs 4. AArch64 did not want to introduce new relocation types for compatibility with existing toolchains.

Technically, we can allow GOT to PC-relative optimization even in the absence of relaxation. With R_RISCV_RELAX, we can allow removing one instruction. Also, I am not sure we need 2 R_RISCV_RELAX.

The AArch64 ABI has explicitly called out properties when an optimization can be made: non-preemptible & not-ifunc (https://maskray.me/blog/2021-01-18-gnu-indirect-function#non-preemptible-ifunc). We shall also disallow relaxation to an absolute symbol for -shared/-pie links.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 19, 2023

Just like AArch64, I didn't want to introduce a new relocation for backward compatibility. In hindsight, the RISC-V psABI could have defined R_RISCV_GOT_LOAD for the auipc+ld instruction pair just like it did for auipc+jalr with R_RISCV_CALL. But it's too late to make such change to the ABI.

Technically, we can allow GOT to PC-relative optimization even in the absence of relaxation. With R_RISCV_RELAX, we can allow removing one instruction. Also, I am not sure we need 2 R_RISCV_RELAX.

You made a good point. Since the second ld instruction is annotated with R_RISCV_RELAX, it might be optimized out as a result of linker relaxation, so any code that directly jumps to the ld instruction is probably wrong -- that ld might not exist at runtime. So, maybe we don't need to make a change to the assembler and can just do this optimization to the existing object file?

@rui314
Copy link
Collaborator Author

rui314 commented Sep 19, 2023

I made a change to the proposal so that it requires only one R_RISCV_RELAX for each relaxable auipc+ld instruction pair.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 19, 2023

This is what I was suggesting a while back when specifying that undef weak symbols be forced to be via the GOT for medany, so thanks for doing this. This should allow us to remove the note that allows GCC to not do that.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 20, 2023

@jrtc27 This proposal doesn't really relax references to unresolved weak symbols for the medany code model, because we handle only PC-relative references in this proposal. Since unresolved weak symbols are handled as if they were absolute symbols at address 0, their addresses aren't PC-relative constants.

But we could relax references to absolute symbols, too, so let me add that to this proposal. I intentionally omitted it at first because I thought that the use case is rare, but weak symbols are indeed a fairly important use case.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 20, 2023

Updated the proposal to allow to relax an absolute symbol at a very low address. Please take a look.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 22, 2023

@MaskRay @jrtc27 Any comments? My change to GNU assembler was approved and can be merged once this proposal is ratified.

Copy link
Collaborator

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just give my blessing here first, thanks @rui314 :)

I would like one more people either @MaskRay or @jrtc27 OK before merge.

riscv-elf.adoc Outdated
ld t0, 0(t0) # R_RISCV_PCREL_LO12_I (label), R_RISCV_RELAX
----

Relaxation result (absolute symbol whose address is within `0x0` ~ `0x1f`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whose address is within 0x0 ~ 0x1f

Where does this come from?

Copy link
Collaborator Author

@rui314 rui314 Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, c.lui was a typo of c.li.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the address 0x0 to 0x1f?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because c.li can set only a signed 6 bit value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more precisely range: "0x0 ~ 0x1f or 0xffff ffff ffff ff20 ~ 0xffffffffffffffff for RV64 and 0xffffff20 ~ 0xffffffff for RV32." since it's sign-extended value?

@rui314 rui314 force-pushed the patch-1 branch 2 times, most recently from 1f25a71 to 41dd3be Compare September 28, 2023 04:31
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated
with `R_RISCV_RELAX`.

- `R_RISCV_GOT_HI20` refers to the location 4 bytes before where
`R_RISCV_PCREL_LO12_I` points.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Who cares where they are relative to each other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to exclude the pattern that one GOT_HI20 is shared by multiple PCREL_LO_I because it'd complicates the specification a lot. This optimization focuses on optimizing la pseudo instruction

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not complicated at all, just require that the GOT_HI20 and all referencing PCREL_LO12_I's are relaxable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

riscv-elf.adoc Outdated
- The symbol pointed to by `R_RISCV_PCREL_LO12_I` is at the location to
which `R_RISCV_GOT_HI20` refers.

- The symbol's address is a link-time constant.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what you mean. What you mean is that it's bound at link time. Otherwise a PIE, or DSO with suitable binding, wouldn't be able to use it, as the object would have an unknown load base and thus the address wouldn't be a link-time constant, just its PC-relative offset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say "The symbol's PC-relative address is a link-time constant." Does this make sense to you?

Copy link
Collaborator

@jrtc27 jrtc27 Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true for absolute symbols. Just do it in terms of "the symbol is bound at link time to within the object / is defined in the object and not preemptible" or similar, because that's what the important part is. Well, with undef weak catered for...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual condition is a bit more complicated due to GNU ifunc. Let me rearrange this section to explain it more clearly.

riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
symbol's address from a GOT entry with an `auipc` + `ld` instruction
pair.

Condition::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language in here needs tweaking to more obviously cover the case when a single GOT_HI20 is shared by multiple PCREL_LO12_I

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to handle such case. This optimization is to relax la and not others. It looks like a GOT_HI20 is almost always paired with a PCREL_LO_I that immediately follows, so this should be sufficient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM does not model them as a pair, it models them as independent instructions. In practice you will put them together to benefit from macro-op fusion, but there's no requirement to do so in the ABI or in LLVM's internals, so the ABI should not start to introduce arbitrary restrictions that have no technical grounding other than being marginally more annoying to write a specification for. Please specify the actual necessary restrictions.

@MaskRay
Copy link
Collaborator

MaskRay commented Sep 28, 2023

My reference to AArch64 is meant to highlight that GOT optimization is not necessarily linked with linker relaxation.
GOT optimization can be performed even in the absence of R_RISCV_RELAX relocations. My article at https://maskray.me/blog/2021-08-29-all-about-global-offset-table#got-optimization describes several architectures that provide linker optimization but not linker relaxation.

I am concerned that using 2 R_RISCV_RELAX relocations may not bring enough return on investment.

The auipc+addi result is that is mostly relevant as the two other forms have stringent address requirement and are position-dependent only.

If a toolchain decides to allow auipc+addi optimization using this scheme, relocatable object files will expense 2 Elf64_Rela relocations, which are 48 bytes. Embedded applications that use -fno-pic mostly avoid GOT anyway.

. I think x86-64 GOTPCRELX provides no difference on slightly larger applications. It just optimizes some microbenchmarks.

I'll be out of town since Friday for about 10 days. Sorry for being unable to respond quickly during that time.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 28, 2023

@MaskRay I chose to require R_RISCV_RELAX there because the existing code could be using the value set by auipc t0, %got_pcrel_hi20(foo) in some obscure ways. If that's the case, we can't rewrite or remove the instruction.

But I might be overthinking. A value computed by %got_pcrel_hi20(foo) doesn't make sense at all if it is not combined with a corresponding %pcrel_lo(label). So I understand your point. I just chose to be on the safer side.

Is there anyone who thinks we should require 'R_RISCV_RELAX' for %got_pcrel_hi20? If no one wants it, I can remove it from the proposal.

@kito-cheng
Copy link
Collaborator

I tend to mark with R_RISC_RELAX, one consideration is #393, also for consistency with other relaxation, of cause we can call it optimization rather than relax, but I am not sure does it means should we also disable those optimization when -mno-relax is given? That's kind of introducing new concept into psABI I think.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 28, 2023

@kito-cheng Yeah, that makes sense. We have never removed an instruction or replaced it with a compact one if there's no R_RISCV_RELAX pointing to that location. Removing an auipc pointed to by R_RISCV_GOT_HI20 without RELAX looks indeed inconsistent.

@kito-cheng
Copy link
Collaborator

Another random idea come to my mind - we can call it optimization IF we don't perform any code shrinking, so we may also optimize this case with replacing auipc with nop like other architecture, also same idea can apply to other existing relaxation too.

But it just throw an idea not review comments, so don't putting into this PR :p

And that's may useful for those scenarios which disable linker relaxation.

@rui314
Copy link
Collaborator Author

rui314 commented Sep 29, 2023

That we have too many RELAX relocations is indeed a problem, but that's not new and not specific to this proposal. I created an issue for the topic, so let's discuss that here: #401

riscv-elf.adoc Outdated
- The symbol pointed to by `R_RISCV_PCREL_LO12_I` is at the location to
which `R_RISCV_GOT_HI20` refers.

- If the symbol is absolute, its address is within `0x0` ~ `0x7ff`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more precisely range: "0x0 ~ 0x7ff or 0xfffffffffffff800 ~ 0xffffffffffffffff for RV64 and 0xfffff800 ~ 0xffffffff for RV32." since it's sign-extended value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly for undefined weak symbols whose address is zero, but yeah, we can technically use it for the highest addresses. I'll update the proposal.

riscv-elf.adoc Outdated
ld t0, 0(t0) # R_RISCV_PCREL_LO12_I (label), R_RISCV_RELAX
----

Relaxation result (absolute symbol whose address is within `0x0` ~ `0x1f`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more precisely range: "0x0 ~ 0x1f or 0xffff ffff ffff ff20 ~ 0xffffffffffffffff for RV64 and 0xffffff20 ~ 0xffffffff for RV32." since it's sign-extended value?

@rui314
Copy link
Collaborator Author

rui314 commented Oct 3, 2023

Please take another look

@rui314
Copy link
Collaborator Author

rui314 commented Oct 13, 2023

Ping?

@rui314
Copy link
Collaborator Author

rui314 commented Oct 23, 2023

Gentle ping

@MaskRay
Copy link
Collaborator

MaskRay commented Oct 23, 2023

Another random idea come to my mind - we can call it optimization IF we don't perform any code shrinking, so we may also optimize this case with replacing auipc with nop like other architecture, also same idea can apply to other existing relaxation too.

But it just throw an idea not review comments, so don't putting into this PR :p

And that's may useful for those scenarios which disable linker relaxation.

Right. Converting to the following is linker optimization, which does not shrink the section

    auipc   tX, <hi>
    addi    tY, tX, <lo>

I think this conversion does not need R_RISCV_RELAX.

This patch focuses on what we can do with R_RISCV_RELAX. We have

  • int6 c.li tY, <symbol-value>
  • int12 addi tY, zero, <symbol-value>
  • other auipc+addi

Frankly, I consider the first two forms not that useful. They require absolute addresses, which imply -fno-pic code.
However, -fno-pic code typically doesn't generate GOT-generating relocations in the first place.
int12 is rather constrained even for embedded programming, so the utilization of the relaxation for the first two will be very low.
As a toolchain developer, I will not add complexity to assembler/linker for something I can control.

The last, auipc/addi, as mentioned, does not need R_RISCV_RELAX, and we have heard success story from AArch64 adrp+add.

riscv-elf.adoc Outdated
----

Relaxation result (absolute symbol whose address cannot be represented
as a 6-bit signed integer or if the RVC instruction is not permitted):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 12-bit: [-2048, 2047]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

Note that the use case of this relaxation pattern is to relax a GOT-indirect reference to an unresolved weak symbol in an environment that the C extension is not available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resulted in a wrong condition, so I rewrote the sentence.

@rui314
Copy link
Collaborator Author

rui314 commented Oct 23, 2023

The forms to relax absolute symbols are for unresolved weak symbols. With these forms, we don't need to store zeros in GOT for unresolved weak symbols. I think this use case is too important that we can't dismiss.

@MaskRay
Copy link
Collaborator

MaskRay commented Oct 24, 2023

The forms to relax absolute symbols are for unresolved weak symbols. With these forms, we don't need to store zeros in GOT for unresolved weak symbols. I think this use case is too important that we can't dismiss.

I think it differently. Undefined weak symbols are actually relatively rarely used (and many can be converted weak definitions) and almost every use case is non-measurable for performance/code size. I even believe most use cases are checked just once in one program invocation.

@rui314
Copy link
Collaborator Author

rui314 commented Oct 25, 2023

My original proposal didn't include the relaxation of small absolute symbols. I added it as a response to #397 (comment). So I guess you guys can discuss? @jrtc27

My opinion is it's a good relaxation because:

  1. If we scan relocations to relax GOT-loading instructions for relative symbols anyway, there seems no reason to avoid the relaxation opportunity for small absolute symbols, and
  2. if you executable is position-dependent and has a GOT only for weak undefined symbols, this relaxation can eliminate the GOT section entirely.

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 25, 2023

  1. if you executable is position-dependent and has a GOT only for weak undefined symbols, this relaxation can eliminate the GOT section entirely.

This was the reason given for why GCC wouldn't adopt the "use the GOT for possibly-undefined weak symbols + -mcmodel=medany" specification and it had to be weakened for the 1.0 psABI. So either that was wrong and we can just do it anyway in GCC or we need this for GCC to use the GOT in this case (and thus not require instruction rewriting with -mno-relax). Doing neither isn't a solution though.

Some psABIs define the linker optimizations to relax a GOT load into a
PC-relative address materialization. AArch64 [1] allows the linker to
rewrite ADRP+ADD into ADR. x86-64 does the same thing with the
`R_X86_64_GOTPCRELX` and `R_X86_64_REX_GOTPCRELX` relocations.

In our case, we have lots of AUIPC+LD instruction pairs to load an
address from the GOT in our RISC-V programs because `la` assembly pseudo
instruction is expanded to that instruction pair. If the PC-relative
address loaded by the instruction pair is a link-time constant, we can
rewrite the instructions with AUIPC+ADDI to directly materialize the
value into a register, which eliminates one memory load.

[1] https://github.com/ARM-software/abi-aa/blob/844a79fd4c77252a11342709e3b27b2c9f590cf1/aaelf64/aaelf64.rst#relocation-optimization
@rui314
Copy link
Collaborator Author

rui314 commented Nov 2, 2023

Any more comments?

@MaskRay
Copy link
Collaborator

MaskRay commented Nov 2, 2023

Sorry, but I am still very confused.

The description is rewritten in a tone that this is for optimization.

I commented that the int6 c.li tY, <symbol-value> and int12 addi tY, zero, <symbol-value> schemes do not seem useful due to the small ranges, and therefore, very few eligible symbols.
Therefore, only the auipc+addi conversion seems useful in practice.
I then asked about performing GOT optimization regardless of R_RISCV_RELAX (like AArch64), which seems to not have been extensively discussed.


Another topic is whether this scheme can help unresolved undefined weak symbols.
riscv64-linux-gnu-gcc -O1 -mcmodel=medany -fno-pic generates an inappropriate PC-relative code sequence for auipc+lw __attribute__((weak)) extern int x; int foo() { return x; }.
(R_RISCV_PCREL_HI20/R_RISCV_RELAX + R_RISCV_PCREL_LO12_I/R_RISCV_RELAX)

For -mcmodel=medany -fno-pic, clang 16 emits the same code sequence.
clang 17 has changed to auipc+ld+lw (R_RISCV_GOT_HI20 + R_RISCV_PCREL_LO12_I/R_RISCV_RELAX), which fulfills the recommendation in #201.
Does GCC benefit from this proposal to move away from PC-relative to GOT? How?

Do we know whether GNU ld relaxes R_RISCV_GOT_HI20 and if no, what the lone R_RISCV_RELAX is about?

This proposal suggests that R_RISCV_GOT_HI20 needs a pairing R_RISCV_RELAX, i.e. R_RISCV_GOT_HI20/R_RISCV_RELAX + R_RISCV_PCREL_LO12_I/R_RISCV_RELAX.

This can be argued as acceptable if we consider that GCC/Clang already emit R_RISCV_PCREL_HI20/R_RISCV_RELAX + R_RISCV_PCREL_LO12_I/R_RISCV_RELAX for PC-relative addressing,
so doing the same for GOT addressing is ok.
However, this can also be argued as an unnecessary size regression for GOT addressing, which is used heavily for -fPIC.
In addition, my arguments that short ranges and whether we can perform GOT optimization regardless of R_RISCV_RELAX stand.

@rui314
Copy link
Collaborator Author

rui314 commented Nov 5, 2023

I'm not sure if I've understood your questions correctly, but as I see it:

  • The optimization for small absolute addresses aligns well with our recommendation to always use GOT for weak symbols. I believe this alignment should be convincing enough to adopt this approach, especially since the optimization is straightforward and could perhaps be implemented with fewer than 20 lines of code in the linker. If we don't want to do that, we probably need a strong reason to not do so.
  • For greater safety, we should mandate two RELAX relocs for GOT_HI20 and PCREL_LO12_I. Existing object files might contain machine code that jumps into the middle of the auipc+ld instruction sequence. While that's unlikely, it is not illegal, so we cannot guarantee that relaxing existing object files will be completely safe without a hint that the object file was generated by a compiler/assembler that is aware of this GOT optimization.

@kito-cheng
Copy link
Collaborator

For c.li and addi part: it's more like completeness of this proposal IMO, and it may only benefit only small set of embedded program, but it's simple enough so I am happy to include that.

For linker optimization without R_RISCV_RELAX, I am also happy to see that, however I would like be come a separated PR to address that part, as I mention before it's would be new concept to RISC-V psABI, it's not necessary become blocker to this PR IMO.

For GCC implementation: Yeah, it's unfortunately we don't fix that on GCC land yet...we should find some people to fix that :(

@rui314
Copy link
Collaborator Author

rui314 commented Dec 1, 2023

Are there any further comments or discussions? I believe the proposal is in good shape and ready to be merged.

@kito-cheng
Copy link
Collaborator

Last ping, will merge this this Friday (2023/12/8) if no further comments.

@kito-cheng kito-cheng merged commit 653ace8 into riscv-non-isa:master Dec 10, 2023
4 checks passed
@rui314 rui314 deleted the patch-1 branch December 11, 2023 00:45
saagarjha pushed a commit to ahjragaas/binutils-gdb that referenced this pull request Dec 12, 2023
Some psABIs define a relaxation to turn a GOT load into a PC-relative
address materialization.  For example, the AArch64's psABI allows
adrp+ldr to be rewritten to nop+adr to eliminate the memory load.
This patch is part of the effort to make such optimization possible
for RISC-V.

For RISC-V, we use the la assembly pseudo instruction to load a symbol
address from the GOT. The pseudo instruction is expanded to auipc+ld.
If the address loaded by the instruction pair is actually a PC-relative
link-time constant, we want the linker to rewrite the instruction pair
with auipc+addi.

We can't rewrite all existing auipc+ld pairs with auipc+addi in the
linker because there might be code that jumps to the middle of the
instruction pair.  That should be extremely rare, if ever exists, but
you can at least in theory write a program in assembly that jumps to
the ld instruction of the instruction pair.  We need a marker to
identify that an auipc+ld can be safely relaxed (i.e. they are emitted
for la).

This patch is to annotate R_RISCV_GOT_HI20 with R_RISCV_RELAX only
when the relocation is emitted for the la pseudo instruction.  The
linker will use it as a signal that the instruction pair can be safely
relaxed.

Proposal to the RISC-V psABI:
riscv-non-isa/riscv-elf-psabi-doc#397

gas/
	* config/tc-riscv.c (source_macro): New static int variable.
	The identifier of the assembler macro we are expanding, if any.
	(append_insn): Updated source_macro to tc_fix_data, to record
	which macro expands, if any.
	(macro): Record which macro expands into source_macro.  Reset
	source_macro to -1 at the end.
	(md_apply_fix): Apply R_RISCV_RELAX if pcrel_got_hi is expanded
	from macro LA/LGA.
	* config/tc-riscv.h (struct riscv_fix, TC_FIX_TYPE, TC_INIT_FIX_DATA):
	Defined to record source_macro into fixups for riscv target.
	* testsuite/gas/riscv/la-variants.d: Updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants