Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects #73977

Merged
merged 3 commits into from
Dec 1, 2023

Conversation

preames
Copy link
Collaborator

@preames preames commented Nov 30, 2023

This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes #63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.

This fixes a mis-link when mixing compressed and non-compressed input to
LLD.  When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes llvm#63964.  The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result.  (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.)  However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.
@llvmbot
Copy link

llvmbot commented Nov 30, 2023

@llvm/pr-subscribers-lld

Author: Philip Reames (preames)

Changes

This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes #63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.


Full diff: https://github.com/llvm/llvm-project/pull/73977.diff

2 Files Affected:

  • (modified) lld/ELF/Arch/RISCV.cpp (+1-1)
  • (added) lld/test/ELF/riscv-relax-call-mixed-rvc.s (+30)
diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index a556d89c36400d3..898e3e45b9e7240 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -591,7 +591,7 @@ static void initSymbolAnchors() {
 // Relax R_RISCV_CALL/R_RISCV_CALL_PLT auipc+jalr to c.j, c.jal, or jal.
 static void relaxCall(const InputSection &sec, size_t i, uint64_t loc,
                       Relocation &r, uint32_t &remove) {
-  const bool rvc = config->eflags & EF_RISCV_RVC;
+  const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;
   const Symbol &sym = *r.sym;
   const uint64_t insnPair = read64le(sec.content().data() + r.offset);
   const uint32_t rd = extractBits(insnPair, 32 + 11, 32 + 7);
diff --git a/lld/test/ELF/riscv-relax-call-mixed-rvc.s b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
new file mode 100644
index 000000000000000..88ee08572875088
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=+c,+relax a.s -o a.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=-c,+relax b.s -o b.o
+
+# RUN: ld.lld a.o b.o --shared -o a
+# RUN: llvm-objdump -d --no-show-raw-insn -M no-aliases a | FileCheck %s
+
+## This needs to be a *uncompressed* jal instruction since it came from the
+## source file which does not enable C
+# CHECK-LABEL: <foo>:
+# CHECK-NEXT:    1260: jal zero, 0x1260 <foo>
+# CHECK-NEXT:    1264: addi zero, zero, 0
+
+# w/C
+#--- a.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0"
+
+# w/o C
+#--- b.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_zicsr2p0_zifencei2p0"
+	.p2align	5
+	.type	foo,@function
+foo:
+    tail    foo
+    nop

@llvmbot
Copy link

llvmbot commented Nov 30, 2023

@llvm/pr-subscribers-lld-elf

Author: Philip Reames (preames)

Changes

This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes #63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.


Full diff: https://github.com/llvm/llvm-project/pull/73977.diff

2 Files Affected:

  • (modified) lld/ELF/Arch/RISCV.cpp (+1-1)
  • (added) lld/test/ELF/riscv-relax-call-mixed-rvc.s (+30)
diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index a556d89c36400d3..898e3e45b9e7240 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -591,7 +591,7 @@ static void initSymbolAnchors() {
 // Relax R_RISCV_CALL/R_RISCV_CALL_PLT auipc+jalr to c.j, c.jal, or jal.
 static void relaxCall(const InputSection &sec, size_t i, uint64_t loc,
                       Relocation &r, uint32_t &remove) {
-  const bool rvc = config->eflags & EF_RISCV_RVC;
+  const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;
   const Symbol &sym = *r.sym;
   const uint64_t insnPair = read64le(sec.content().data() + r.offset);
   const uint32_t rd = extractBits(insnPair, 32 + 11, 32 + 7);
diff --git a/lld/test/ELF/riscv-relax-call-mixed-rvc.s b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
new file mode 100644
index 000000000000000..88ee08572875088
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-call-mixed-rvc.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=+c,+relax a.s -o a.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64 -mattr=-c,+relax b.s -o b.o
+
+# RUN: ld.lld a.o b.o --shared -o a
+# RUN: llvm-objdump -d --no-show-raw-insn -M no-aliases a | FileCheck %s
+
+## This needs to be a *uncompressed* jal instruction since it came from the
+## source file which does not enable C
+# CHECK-LABEL: <foo>:
+# CHECK-NEXT:    1260: jal zero, 0x1260 <foo>
+# CHECK-NEXT:    1264: addi zero, zero, 0
+
+# w/C
+#--- a.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0"
+
+# w/o C
+#--- b.s
+	.text
+	.attribute	4, 16
+	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_zicsr2p0_zifencei2p0"
+	.p2align	5
+	.type	foo,@function
+foo:
+    tail    foo
+    nop

Copy link
Collaborator

@jrtc27 jrtc27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, let's see what others think.

Of course, there's the bigger problem that we don't (currently) encode the arch string in R_RISCV_RELAX (which I've previously proposed by using the symbol to point at a mapping symbol or similar), but we do have this information so should honour it.

@ilovepi
Copy link
Contributor

ilovepi commented Nov 30, 2023

Thanks for running this down. When I started looking into #65090 I found #63964., but wasn't sure they were related, since we only saw this in LTO builds and we use the C extension everywhere.

I'm not sure if this patch will be able to resolve the LTO issues we saw, since I don't think the section flags will be relevant in the LTO case, but it's certainly a welcome improvement.

@jrtc27 I assume that riscv-non-isa/riscv-elf-psabi-doc#393 is the latest incarnation of the idea. Do you think it is likely to be accepted? It seems like a much more robust approach from reading the proposal and comments.

## This needs to be a *uncompressed* jal instruction since it came from the
## source file which does not enable C
# CHECK-LABEL: <foo>:
# CHECK-NEXT: 1260: jal zero, 0x1260 <foo>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit 1260: so that the test is less sensitive to address changes.

jal zero, {{.*}} <foo>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the addresses to make the instruction size difference obvious. Is there a way to force the address so it's stable instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider -Ttext=0x10000 as many riscv-relax-* files use.

.type foo,@function
foo:
tail foo
nop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relaxations may create NOPs as well. Consider a different instruction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a specific suggestion? I don't really see why I care that a nop could be produced by relaxation since I'm testing one particular case here, but I'm happy to adjust to your preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite nitpicking. R_RISCV_ALIGN handling adjusts NOPs, so a nop in the check prefix isn't immediately clear it is related to user input or R_RISCV_ALIGN related nops. Switching to another instruction makes the intention clearer.

# CHECK-NEXT: 1260: jal zero, 0x1260 <foo>
# CHECK-NEXT: 1264: addi zero, zero, 0

# w/C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space? w/ C

@preames
Copy link
Collaborator Author

preames commented Dec 1, 2023

This patch definitely does not fix all of the LTO issues. This patch fixes a non-LTO linker correctness bug. Can I ask we keep the discussion of mapping symbols etc elsewhere? I don't want to derail this functional fix.

@ilovepi
Copy link
Contributor

ilovepi commented Dec 1, 2023

Sure. Sorry. That wasn’t my intention

preames added a commit to preames/llvm-project that referenced this pull request Dec 1, 2023
If we have a RISCV_ALIGN relocation which can't be satisified with the
available space provided, report an error rather than silently continuing
with a corrupt state.

For context, llvm#73977 fixes an LLD bug which can cause this effect, but that's not the only source of such cases.

Another is our hard-to-fix set of LTO problems.  We can have a single function
which was compiled without C in an otherwise entirely C module.  Until we have
all of the mapping symbols and related mechanisms implemented, this case can
continue to arise.

I think it's very important from a user interface perspective to have
non-assertion builds report an error in this case. If we don't report an
error here, we can crash the linker (due to the fatal error at the bottom
of the function), or if we're less lucky silently produce a malformed binary.

There's a couple of known defects with this patch.

First, there's no test case.  I don't know how to write a stable test
case for this that doesn't involve hex editting an object file, or abusing
the LTO bug that we hope to fix.

Second, this will report an error on each relax iteration.  I explored
trying to report an error only once after relaxation, but ended up deciding
I didn't have the context to implement it safely.

I would be thrilled if someone more knowledgeable of this code wants to
write a better version of this patch, but in the meantime, I believe we
should land this to address the user experience problem described above.
@preames preames merged commit 62213be into llvm:main Dec 1, 2023
2 of 3 checks passed
@preames preames deleted the pr-riscv-lld-call-relaxation-bug branch December 1, 2023 19:02
preames added a commit that referenced this pull request Jan 17, 2024
If we have a RISCV_ALIGN relocation which can't be satisfied with the
available space provided, report an error rather than silently
continuing with a corrupt state.

For context, #73977 fixes an
LLD bug which can cause this effect, but that's not the only source of
such cases.

Another is our hard-to-fix set of LTO problems. We can have a single
function which was compiled without C in an otherwise entirely C module.
Until we have all of the mapping symbols and related mechanisms
implemented, this case can continue to arise.

I think it's very important from a user interface perspective to have
non-assertion builds report an error in this case. If we don't report an
error here, we can crash the linker (due to the fatal error at the
bottom of the function), or if we're less lucky silently produce a
malformed binary.

There's a couple of known defects with this patch.

First, there's no test case. I don't know how to write a stable test
case for this that doesn't involve hex editing an object file, or
abusing the LTO bug that we hope to fix.

Second, this will report an error on each relax iteration. I explored
trying to report an error only once after relaxation, but ended up
deciding I didn't have the context to implement it safely.

I would be thrilled if someone more knowledgeable of this code wants to
write a better version of this patch, but in the meantime, I believe we
should land this to address the user experience problem described above.
ampandey-1995 pushed a commit to ampandey-1995/llvm-project that referenced this pull request Jan 19, 2024
If we have a RISCV_ALIGN relocation which can't be satisfied with the
available space provided, report an error rather than silently
continuing with a corrupt state.

For context, llvm#73977 fixes an
LLD bug which can cause this effect, but that's not the only source of
such cases.

Another is our hard-to-fix set of LTO problems. We can have a single
function which was compiled without C in an otherwise entirely C module.
Until we have all of the mapping symbols and related mechanisms
implemented, this case can continue to arise.

I think it's very important from a user interface perspective to have
non-assertion builds report an error in this case. If we don't report an
error here, we can crash the linker (due to the fatal error at the
bottom of the function), or if we're less lucky silently produce a
malformed binary.

There's a couple of known defects with this patch.

First, there's no test case. I don't know how to write a stable test
case for this that doesn't involve hex editing an object file, or
abusing the LTO bug that we hope to fix.

Second, this will report an error on each relax iteration. I explored
trying to report an error only once after relaxation, but ended up
deciding I didn't have the context to implement it safely.

I would be thrilled if someone more knowledgeable of this code wants to
write a better version of this patch, but in the meantime, I believe we
should land this to address the user experience problem described above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RISCV][LLD] Unable to resolve R_RISCV_ALIGN when mixing +C and -C object files
5 participants