[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects #73977

preames · 2023-11-30T20:16:47Z

This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes #63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.

This fixes a mis-link when mixing compressed and non-compressed input to LLD. When relaxing calls, we must respect the source file that the section came from when deciding whether it's legal to use compressed instructions. If the call in question comes from a non-rvc source, then it will not expect 2-byte alignments and cascading failures may result. This fixes llvm#63964. The symptom seen there is that a latter RISCV_ALIGN can't be satisfied and we either fail an assert or produce a totally bogus link result. (It can be easily reproduced by putting .p2align 5 right before the nop in the reduced test case and running check-lld on an assertions enabled build.) However, it's important to note this is just one possible symptom of the problem. If the resulting binary has a runtime switch between rvc and non-rvc routines (via e.g. ifuncs), then even if we manage to link we may execute invalid instructions on a machine which doesn't implement compressed instructions.

llvmbot · 2023-11-30T20:17:22Z

@llvm/pr-subscribers-lld

Author: Philip Reames (preames)

Changes

This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes #63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.

Full diff: https://github.com/llvm/llvm-project/pull/73977.diff

2 Files Affected:

(modified) lld/ELF/Arch/RISCV.cpp (+1-1)
(added) lld/test/ELF/riscv-relax-call-mixed-rvc.s (+30)

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index a556d89c36400d3..898e3e45b9e7240 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -591,7 +591,7 @@ static void initSymbolAnchors() {
 // Relax R_RISCV_CALL/R_RISCV_CALL_PLT auipc+jalr to c.j, c.jal, or jal.
 static void relaxCall(const InputSection &sec, size_t i, uint64_t loc,
                       Relocation &r, uint32_t &remove) {
-  const bool rvc = config->eflags & EF_RISCV_RVC;
+  const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;
   const Symbol &sym = *r.sym;
   const uint64_t insnPair = read64le(sec.content().data() + r.offset);
   const uint32_t rd = extractBits(insnPair, 32 + 11, 32 + 7);
diff --git a/lld/test/ELF/riscv-relax-call-mixed-rvc.s b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
new file mode 100644
index 000000000000000..88ee08572875088
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=+c,+relax a.s -o a.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=-c,+relax b.s -o b.o
+
+# RUN: ld.lld a.o b.o --shared -o a
+# RUN: llvm-objdump -d --no-show-raw-insn -M no-aliases a | FileCheck %s
+
+## This needs to be a *uncompressed* jal instruction since it came from the
+## source file which does not enable C
+# CHECK-LABEL: <foo>:
+# CHECK-NEXT:    1260: jal zero, 0x1260 <foo>
+# CHECK-NEXT:    1264: addi zero, zero, 0
+
+# w/C
+#--- a.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0"
+
+# w/o C
+#--- b.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_zicsr2p0_zifencei2p0"
+	.p2align	5
+	.type	foo,@function
+foo:
+    tail    foo
+    nop

llvmbot · 2023-11-30T20:17:22Z

@llvm/pr-subscribers-lld-elf

Author: Philip Reames (preames)

Changes

This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes #63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.

Full diff: https://github.com/llvm/llvm-project/pull/73977.diff

2 Files Affected:

(modified) lld/ELF/Arch/RISCV.cpp (+1-1)
(added) lld/test/ELF/riscv-relax-call-mixed-rvc.s (+30)

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index a556d89c36400d3..898e3e45b9e7240 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -591,7 +591,7 @@ static void initSymbolAnchors() {
 // Relax R_RISCV_CALL/R_RISCV_CALL_PLT auipc+jalr to c.j, c.jal, or jal.
 static void relaxCall(const InputSection &sec, size_t i, uint64_t loc,
                       Relocation &r, uint32_t &remove) {
-  const bool rvc = config->eflags & EF_RISCV_RVC;
+  const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;
   const Symbol &sym = *r.sym;
   const uint64_t insnPair = read64le(sec.content().data() + r.offset);
   const uint32_t rd = extractBits(insnPair, 32 + 11, 32 + 7);
diff --git a/lld/test/ELF/riscv-relax-call-mixed-rvc.s b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
new file mode 100644
index 000000000000000..88ee08572875088
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=+c,+relax a.s -o a.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=-c,+relax b.s -o b.o
+
+# RUN: ld.lld a.o b.o --shared -o a
+# RUN: llvm-objdump -d --no-show-raw-insn -M no-aliases a | FileCheck %s
+
+## This needs to be a *uncompressed* jal instruction since it came from the
+## source file which does not enable C
+# CHECK-LABEL: <foo>:
+# CHECK-NEXT:    1260: jal zero, 0x1260 <foo>
+# CHECK-NEXT:    1264: addi zero, zero, 0
+
+# w/C
+#--- a.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0"
+
+# w/o C
+#--- b.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_zicsr2p0_zifencei2p0"
+	.p2align	5
+	.type	foo,@function
+foo:
+    tail    foo
+    nop

jrtc27

This makes sense to me, let's see what others think.

Of course, there's the bigger problem that we don't (currently) encode the arch string in R_RISCV_RELAX (which I've previously proposed by using the symbol to point at a mapping symbol or similar), but we do have this information so should honour it.

ilovepi · 2023-11-30T21:29:57Z

Thanks for running this down. When I started looking into #65090 I found #63964., but wasn't sure they were related, since we only saw this in LTO builds and we use the C extension everywhere.

I'm not sure if this patch will be able to resolve the LTO issues we saw, since I don't think the section flags will be relevant in the LTO case, but it's certainly a welcome improvement.

@jrtc27 I assume that riscv-non-isa/riscv-elf-psabi-doc#393 is the latest incarnation of the idea. Do you think it is likely to be accepted? It seems like a much more robust approach from reading the proposal and comments.

MaskRay · 2023-12-01T05:22:46Z

lld/test/ELF/riscv-relax-call-mixed-rvc.s

+## This needs to be a *uncompressed* jal instruction since it came from the
+## source file which does not enable C
+# CHECK-LABEL: <foo>:
+# CHECK-NEXT:    1260: jal zero, 0x1260 <foo>


Omit 1260: so that the test is less sensitive to address changes.

jal zero, {{.*}} <foo>

I need the addresses to make the instruction size difference obvious. Is there a way to force the address so it's stable instead?

Consider -Ttext=0x10000 as many riscv-relax-* files use.

MaskRay · 2023-12-01T05:24:15Z

lld/test/ELF/riscv-relax-call-mixed-rvc.s

+	.type	foo,@function
+foo:
+    tail    foo
+    nop


Relaxations may create NOPs as well. Consider a different instruction.

Do you have a specific suggestion? I don't really see why I care that a nop could be produced by relaxation since I'm testing one particular case here, but I'm happy to adjust to your preference.

This is quite nitpicking. R_RISCV_ALIGN handling adjusts NOPs, so a nop in the check prefix isn't immediately clear it is related to user input or R_RISCV_ALIGN related nops. Switching to another instruction makes the intention clearer.

MaskRay · 2023-12-01T05:24:35Z

lld/test/ELF/riscv-relax-call-mixed-rvc.s

+# CHECK-NEXT:    1260: jal zero, 0x1260 <foo>
+# CHECK-NEXT:    1264: addi zero, zero, 0
+
+# w/C


Missing space? w/ C

preames · 2023-12-01T15:50:33Z

This patch definitely does not fix all of the LTO issues. This patch fixes a non-LTO linker correctness bug. Can I ask we keep the discussion of mapping symbols etc elsewhere? I don't want to derail this functional fix.

ilovepi · 2023-12-01T15:52:55Z

Sure. Sorry. That wasn’t my intention

If we have a RISCV_ALIGN relocation which can't be satisified with the available space provided, report an error rather than silently continuing with a corrupt state. For context, llvm#73977 fixes an LLD bug which can cause this effect, but that's not the only source of such cases. Another is our hard-to-fix set of LTO problems. We can have a single function which was compiled without C in an otherwise entirely C module. Until we have all of the mapping symbols and related mechanisms implemented, this case can continue to arise. I think it's very important from a user interface perspective to have non-assertion builds report an error in this case. If we don't report an error here, we can crash the linker (due to the fatal error at the bottom of the function), or if we're less lucky silently produce a malformed binary. There's a couple of known defects with this patch. First, there's no test case. I don't know how to write a stable test case for this that doesn't involve hex editting an object file, or abusing the LTO bug that we hope to fix. Second, this will report an error on each relax iteration. I explored trying to report an error only once after relaxation, but ended up deciding I didn't have the context to implement it safely. I would be thrilled if someone more knowledgeable of this code wants to write a better version of this patch, but in the meantime, I believe we should land this to address the user experience problem described above.

If we have a RISCV_ALIGN relocation which can't be satisfied with the available space provided, report an error rather than silently continuing with a corrupt state. For context, #73977 fixes an LLD bug which can cause this effect, but that's not the only source of such cases. Another is our hard-to-fix set of LTO problems. We can have a single function which was compiled without C in an otherwise entirely C module. Until we have all of the mapping symbols and related mechanisms implemented, this case can continue to arise. I think it's very important from a user interface perspective to have non-assertion builds report an error in this case. If we don't report an error here, we can crash the linker (due to the fatal error at the bottom of the function), or if we're less lucky silently produce a malformed binary. There's a couple of known defects with this patch. First, there's no test case. I don't know how to write a stable test case for this that doesn't involve hex editing an object file, or abusing the LTO bug that we hope to fix. Second, this will report an error on each relax iteration. I explored trying to report an error only once after relaxation, but ended up deciding I didn't have the context to implement it safely. I would be thrilled if someone more knowledgeable of this code wants to write a better version of this patch, but in the meantime, I believe we should land this to address the user experience problem described above.

If we have a RISCV_ALIGN relocation which can't be satisfied with the available space provided, report an error rather than silently continuing with a corrupt state. For context, llvm#73977 fixes an LLD bug which can cause this effect, but that's not the only source of such cases. Another is our hard-to-fix set of LTO problems. We can have a single function which was compiled without C in an otherwise entirely C module. Until we have all of the mapping symbols and related mechanisms implemented, this case can continue to arise. I think it's very important from a user interface perspective to have non-assertion builds report an error in this case. If we don't report an error here, we can crash the linker (due to the fatal error at the bottom of the function), or if we're less lucky silently produce a malformed binary. There's a couple of known defects with this patch. First, there's no test case. I don't know how to write a stable test case for this that doesn't involve hex editing an object file, or abusing the LTO bug that we hope to fix. Second, this will report an error on each relax iteration. I explored trying to report an error only once after relaxation, but ended up deciding I didn't have the context to implement it safely. I would be thrilled if someone more knowledgeable of this code wants to write a better version of this patch, but in the meantime, I believe we should land this to address the user experience problem described above.

preames added 2 commits November 30, 2023 12:00

[LLD][RISCV] Add test case showing incorrect call relaxation

8de61cc

preames requested review from MaskRay, jrtc27 and topperc November 30, 2023 20:16

llvmbot added lld lld:ELF labels Nov 30, 2023

jrtc27 approved these changes Nov 30, 2023

View reviewed changes

preames mentioned this pull request Nov 30, 2023

Incorrect branch targets in RISC-V executables built with LTO #65090

Closed

MaskRay approved these changes Dec 1, 2023

View reviewed changes

preames mentioned this pull request Dec 1, 2023

[LLD][RISCV] Report error for unsatisfiable RISCV_ALIGN #74121

Merged

Address review comments on test

969b6b9

preames merged commit 62213be into llvm:main Dec 1, 2023
2 of 3 checks passed

preames deleted the pr-riscv-lld-call-relaxation-bug branch December 1, 2023 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects #73977

[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects #73977

preames commented Nov 30, 2023

llvmbot commented Nov 30, 2023

llvmbot commented Nov 30, 2023

jrtc27 left a comment

ilovepi commented Nov 30, 2023

MaskRay Dec 1, 2023

preames Dec 1, 2023

MaskRay Dec 1, 2023

MaskRay Dec 1, 2023

preames Dec 1, 2023

MaskRay Dec 1, 2023

MaskRay Dec 1, 2023

preames commented Dec 1, 2023

ilovepi commented Dec 1, 2023

[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects #73977

[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects #73977

Conversation

preames commented Nov 30, 2023

llvmbot commented Nov 30, 2023

llvmbot commented Nov 30, 2023

jrtc27 left a comment

Choose a reason for hiding this comment

ilovepi commented Nov 30, 2023

MaskRay Dec 1, 2023

Choose a reason for hiding this comment

preames Dec 1, 2023

Choose a reason for hiding this comment

MaskRay Dec 1, 2023

Choose a reason for hiding this comment

MaskRay Dec 1, 2023

Choose a reason for hiding this comment

preames Dec 1, 2023

Choose a reason for hiding this comment

MaskRay Dec 1, 2023

Choose a reason for hiding this comment

MaskRay Dec 1, 2023

Choose a reason for hiding this comment

preames commented Dec 1, 2023

ilovepi commented Dec 1, 2023