-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects #73977
Conversation
This fixes a mis-link when mixing compressed and non-compressed input to LLD. When relaxing calls, we must respect the source file that the section came from when deciding whether it's legal to use compressed instructions. If the call in question comes from a non-rvc source, then it will not expect 2-byte alignments and cascading failures may result. This fixes llvm#63964. The symptom seen there is that a latter RISCV_ALIGN can't be satisfied and we either fail an assert or produce a totally bogus link result. (It can be easily reproduced by putting .p2align 5 right before the nop in the reduced test case and running check-lld on an assertions enabled build.) However, it's important to note this is just one possible symptom of the problem. If the resulting binary has a runtime switch between rvc and non-rvc routines (via e.g. ifuncs), then even if we manage to link we may execute invalid instructions on a machine which doesn't implement compressed instructions.
@llvm/pr-subscribers-lld Author: Philip Reames (preames) ChangesThis fixes a mis-link when mixing compressed and non-compressed input to This fixes #63964. The symptom If the resulting binary has a runtime switch between rvc and non-rvc routines Full diff: https://github.com/llvm/llvm-project/pull/73977.diff 2 Files Affected:
diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index a556d89c36400d3..898e3e45b9e7240 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -591,7 +591,7 @@ static void initSymbolAnchors() {
// Relax R_RISCV_CALL/R_RISCV_CALL_PLT auipc+jalr to c.j, c.jal, or jal.
static void relaxCall(const InputSection &sec, size_t i, uint64_t loc,
Relocation &r, uint32_t &remove) {
- const bool rvc = config->eflags & EF_RISCV_RVC;
+ const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;
const Symbol &sym = *r.sym;
const uint64_t insnPair = read64le(sec.content().data() + r.offset);
const uint32_t rd = extractBits(insnPair, 32 + 11, 32 + 7);
diff --git a/lld/test/ELF/riscv-relax-call-mixed-rvc.s b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
new file mode 100644
index 000000000000000..88ee08572875088
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=+c,+relax a.s -o a.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=-c,+relax b.s -o b.o
+
+# RUN: ld.lld a.o b.o --shared -o a
+# RUN: llvm-objdump -d --no-show-raw-insn -M no-aliases a | FileCheck %s
+
+## This needs to be a *uncompressed* jal instruction since it came from the
+## source file which does not enable C
+# CHECK-LABEL: <foo>:
+# CHECK-NEXT: 1260: jal zero, 0x1260 <foo>
+# CHECK-NEXT: 1264: addi zero, zero, 0
+
+# w/C
+#--- a.s
+ .text
+ .attribute 4, 16
+ .attribute 5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0"
+
+# w/o C
+#--- b.s
+ .text
+ .attribute 4, 16
+ .attribute 5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_zicsr2p0_zifencei2p0"
+ .p2align 5
+ .type foo,@function
+foo:
+ tail foo
+ nop
|
@llvm/pr-subscribers-lld-elf Author: Philip Reames (preames) ChangesThis fixes a mis-link when mixing compressed and non-compressed input to This fixes #63964. The symptom If the resulting binary has a runtime switch between rvc and non-rvc routines Full diff: https://github.com/llvm/llvm-project/pull/73977.diff 2 Files Affected:
diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index a556d89c36400d3..898e3e45b9e7240 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -591,7 +591,7 @@ static void initSymbolAnchors() {
// Relax R_RISCV_CALL/R_RISCV_CALL_PLT auipc+jalr to c.j, c.jal, or jal.
static void relaxCall(const InputSection &sec, size_t i, uint64_t loc,
Relocation &r, uint32_t &remove) {
- const bool rvc = config->eflags & EF_RISCV_RVC;
+ const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;
const Symbol &sym = *r.sym;
const uint64_t insnPair = read64le(sec.content().data() + r.offset);
const uint32_t rd = extractBits(insnPair, 32 + 11, 32 + 7);
diff --git a/lld/test/ELF/riscv-relax-call-mixed-rvc.s b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
new file mode 100644
index 000000000000000..88ee08572875088
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=+c,+relax a.s -o a.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=-c,+relax b.s -o b.o
+
+# RUN: ld.lld a.o b.o --shared -o a
+# RUN: llvm-objdump -d --no-show-raw-insn -M no-aliases a | FileCheck %s
+
+## This needs to be a *uncompressed* jal instruction since it came from the
+## source file which does not enable C
+# CHECK-LABEL: <foo>:
+# CHECK-NEXT: 1260: jal zero, 0x1260 <foo>
+# CHECK-NEXT: 1264: addi zero, zero, 0
+
+# w/C
+#--- a.s
+ .text
+ .attribute 4, 16
+ .attribute 5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0"
+
+# w/o C
+#--- b.s
+ .text
+ .attribute 4, 16
+ .attribute 5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_zicsr2p0_zifencei2p0"
+ .p2align 5
+ .type foo,@function
+foo:
+ tail foo
+ nop
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, let's see what others think.
Of course, there's the bigger problem that we don't (currently) encode the arch string in R_RISCV_RELAX (which I've previously proposed by using the symbol to point at a mapping symbol or similar), but we do have this information so should honour it.
Thanks for running this down. When I started looking into #65090 I found #63964., but wasn't sure they were related, since we only saw this in LTO builds and we use the C extension everywhere. I'm not sure if this patch will be able to resolve the LTO issues we saw, since I don't think the section flags will be relevant in the LTO case, but it's certainly a welcome improvement. @jrtc27 I assume that riscv-non-isa/riscv-elf-psabi-doc#393 is the latest incarnation of the idea. Do you think it is likely to be accepted? It seems like a much more robust approach from reading the proposal and comments. |
## This needs to be a *uncompressed* jal instruction since it came from the | ||
## source file which does not enable C | ||
# CHECK-LABEL: <foo>: | ||
# CHECK-NEXT: 1260: jal zero, 0x1260 <foo> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit 1260:
so that the test is less sensitive to address changes.
jal zero, {{.*}} <foo>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the addresses to make the instruction size difference obvious. Is there a way to force the address so it's stable instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider -Ttext=0x10000
as many riscv-relax-*
files use.
.type foo,@function | ||
foo: | ||
tail foo | ||
nop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relaxations may create NOPs as well. Consider a different instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a specific suggestion? I don't really see why I care that a nop could be produced by relaxation since I'm testing one particular case here, but I'm happy to adjust to your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite nitpicking. R_RISCV_ALIGN handling adjusts NOPs, so a nop
in the check prefix isn't immediately clear it is related to user input or R_RISCV_ALIGN related nops. Switching to another instruction makes the intention clearer.
# CHECK-NEXT: 1260: jal zero, 0x1260 <foo> | ||
# CHECK-NEXT: 1264: addi zero, zero, 0 | ||
|
||
# w/C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space? w/ C
This patch definitely does not fix all of the LTO issues. This patch fixes a non-LTO linker correctness bug. Can I ask we keep the discussion of mapping symbols etc elsewhere? I don't want to derail this functional fix. |
Sure. Sorry. That wasn’t my intention |
If we have a RISCV_ALIGN relocation which can't be satisified with the available space provided, report an error rather than silently continuing with a corrupt state. For context, llvm#73977 fixes an LLD bug which can cause this effect, but that's not the only source of such cases. Another is our hard-to-fix set of LTO problems. We can have a single function which was compiled without C in an otherwise entirely C module. Until we have all of the mapping symbols and related mechanisms implemented, this case can continue to arise. I think it's very important from a user interface perspective to have non-assertion builds report an error in this case. If we don't report an error here, we can crash the linker (due to the fatal error at the bottom of the function), or if we're less lucky silently produce a malformed binary. There's a couple of known defects with this patch. First, there's no test case. I don't know how to write a stable test case for this that doesn't involve hex editting an object file, or abusing the LTO bug that we hope to fix. Second, this will report an error on each relax iteration. I explored trying to report an error only once after relaxation, but ended up deciding I didn't have the context to implement it safely. I would be thrilled if someone more knowledgeable of this code wants to write a better version of this patch, but in the meantime, I believe we should land this to address the user experience problem described above.
If we have a RISCV_ALIGN relocation which can't be satisfied with the available space provided, report an error rather than silently continuing with a corrupt state. For context, #73977 fixes an LLD bug which can cause this effect, but that's not the only source of such cases. Another is our hard-to-fix set of LTO problems. We can have a single function which was compiled without C in an otherwise entirely C module. Until we have all of the mapping symbols and related mechanisms implemented, this case can continue to arise. I think it's very important from a user interface perspective to have non-assertion builds report an error in this case. If we don't report an error here, we can crash the linker (due to the fatal error at the bottom of the function), or if we're less lucky silently produce a malformed binary. There's a couple of known defects with this patch. First, there's no test case. I don't know how to write a stable test case for this that doesn't involve hex editing an object file, or abusing the LTO bug that we hope to fix. Second, this will report an error on each relax iteration. I explored trying to report an error only once after relaxation, but ended up deciding I didn't have the context to implement it safely. I would be thrilled if someone more knowledgeable of this code wants to write a better version of this patch, but in the meantime, I believe we should land this to address the user experience problem described above.
If we have a RISCV_ALIGN relocation which can't be satisfied with the available space provided, report an error rather than silently continuing with a corrupt state. For context, llvm#73977 fixes an LLD bug which can cause this effect, but that's not the only source of such cases. Another is our hard-to-fix set of LTO problems. We can have a single function which was compiled without C in an otherwise entirely C module. Until we have all of the mapping symbols and related mechanisms implemented, this case can continue to arise. I think it's very important from a user interface perspective to have non-assertion builds report an error in this case. If we don't report an error here, we can crash the linker (due to the fatal error at the bottom of the function), or if we're less lucky silently produce a malformed binary. There's a couple of known defects with this patch. First, there's no test case. I don't know how to write a stable test case for this that doesn't involve hex editing an object file, or abusing the LTO bug that we hope to fix. Second, this will report an error on each relax iteration. I explored trying to report an error only once after relaxation, but ended up deciding I didn't have the context to implement it safely. I would be thrilled if someone more knowledgeable of this code wants to write a better version of this patch, but in the meantime, I believe we should land this to address the user experience problem described above.
This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.
This fixes #63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.
If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.