-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect branch targets in RISC-V executables built with LTO #65090
Comments
@llvm/issue-subscribers-backend-risc-v |
The guideline for those kind of bug in my mind is: could you try to disable linker relaxation at all to see if it work? I am not sure how to disable that correctly at LTO build, but you could try to add And if it not work...@MaskRay could help :P |
Probably something like -Wl,--plugin-opt=-mattr=-relax |
#59350 (comment) this may be relevant
|
Allow LTO to be selected for RISC-V, only when LLD >= 14, since there is an issue [1] in prior LLD versions that prevents LLD to generate proper machine code for RISC-V when writing `nop`s. To avoid boot failures in QEMU [2], '-mattr=+c' and '-mattr=+relax' need to be passed via '-mllvm' to ld.lld, as there appears to be an issue with LLVM's target-features and LTO [3], which can result in incorrect relocations to branch targets [4]. Once this is fixed in LLVM, it can be made conditional on affected ld.lld versions. Disable LTO for arch/riscv/kernel/pi, as llvm-objcopy expects an ELF object file when manipulating the files in that subfolder, rather than LLVM bitcode. [1] llvm/llvm-project#50505, resolved by LLVM commit e63455d5e0e5 ("[MC] Use local MCSubtargetInfo in writeNops") [2] ClangBuiltLinux#1942 [3] llvm/llvm-project#59350 [4] llvm/llvm-project#65090 Tested-by: Wende Tan <[email protected]> Signed-off-by: Wende Tan <[email protected]> Co-developed-by: Nathan Chancellor <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]>
We have now also encountered this issue in ThinLTO builds. We no longer think it's LTO related, our current working theory is that it's a garbage-in-garbage-out scenario where the backend generates branches with invalid targets and LLD fails to detect and warn for this case. We believe it might be related to the size of module which is why it's manifesting primarily in LTO and ThinLTO builds, but we're trying to see if we could reproduce it in a regular build as well. While we're trying to narrow down the issue, I think it'd be helpful if LLD could diagnose this case and give a warning. |
Please check ClangBuiltLinux/linux#1942 (comment) to see if this is similar to your case :) |
So, we're able to work around this by passing While I'm going to close this particular issue, I'll probably file a separate issue an perhaps follow up on discourse about how to reconcile the underlying problem w/ the interaction between the linker, target-features, and LTO. |
That's unfortunate. Every other project looking to use LTO for riscv is going to hit the same issue and have to use the same (awful) workaround. (Needing |
@nickdesaulniers I'm going through the issues now, and compiling a list of all of them. If I don't find a canonical one, I'll file a new issue pointing out the underlying issue w/ links to all the rest and follow up on discourse. This is pretty broken ATM, and while its critical for RISC-V, I don't think it's a RISC-V specific issue any more than #67698 (which is remarkably similar). It just hits harder on RISC-V because target-features is load-bearing in the extreme. |
The failure description on this sounds a lot like the same issue as #73977. There may be other contributors as well, but ruling that one out would likely be very worthwhile. Edit: A bit more context on why this sounds related.
This was exactly the behavior I saw in the original build which triggered finding #73977. In my local example, we had a mix of object files compiled without C, and an archive that was compiled with C. Due to the link order, the linker was treating all the object files as if they had been compiled with C. This was incompatible with the code emitted by the compiler, and resulted in a case where a RISCV_ALIGN couldn't be satisfied. Adding C to all of the object files universally caused the problem to go away because the compiler left two extra bytes of padding for the RISCV_ALIGN. Note that while the RISCV_ALIGN failed asserts in assert builds, it simply computes an incorrect adjustment in release builds and thus true chaos can fall out. |
Thanks for chiming in @preames. I'll try to confirm if that mitigates some of the issues once this lands. |
Allow LTO to be selected for RISC-V, only when LLD >= 14, since there is an issue [1] in prior LLD versions that prevents LLD to generate proper machine code for RISC-V when writing `nop`s. To avoid boot failures in QEMU [2], '-mattr=+c' and '-mattr=+relax' need to be passed via '-mllvm' to ld.lld, as there appears to be an issue with LLVM's target-features and LTO [3], which can result in incorrect relocations to branch targets [4]. Once this is fixed in LLVM, it can be made conditional on affected ld.lld versions. Disable LTO for arch/riscv/kernel/pi, as llvm-objcopy expects an ELF object file when manipulating the files in that subfolder, rather than LLVM bitcode. [1] llvm/llvm-project#50505, resolved by LLVM commit e63455d5e0e5 ("[MC] Use local MCSubtargetInfo in writeNops") [2] ClangBuiltLinux#1942 [3] llvm/llvm-project#59350 [4] llvm/llvm-project#65090 Tested-by: Wende Tan <[email protected]> Signed-off-by: Wende Tan <[email protected]> Co-developed-by: Nathan Chancellor <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Conor Dooley <[email protected]>
The underlying issue seems to be that the linker does not have target attributes set correctly in all cases, and LTO cannot diagnose this. This is tracked in upstream LLVM, and adding these flags to the linker invocation is the suggested work around (see llvm/llvm-project#65090, llvm/llvm-project#59350 (comment), llvm/llvm-project#67698, and https://discourse.llvm.org/t/myterious-soft-float-output-in-lto-cache/70753 for more details). Bug: 129493 Change-Id: Ib3b9f387d61c5a855ba57ede191881c567414de7 Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/949677 Reviewed-by: Petr Hosek <[email protected]> Fuchsia-Auto-Submit: Paul Kirth <[email protected]> Commit-Queue: Auto-Submit <[email protected]>
Allow LTO to be selected for RISC-V, only when LLD >= 14, since there is an issue [1] in prior LLD versions that prevents LLD to generate proper machine code for RISC-V when writing `nop`s. To avoid boot failures in QEMU [2], '-mattr=+c' and '-mattr=+relax' need to be passed via '-mllvm' to ld.lld, as there appears to be an issue with LLVM's target-features and LTO [3], which can result in incorrect relocations to branch targets [4]. Once this is fixed in LLVM, it can be made conditional on affected ld.lld versions. Disable LTO for arch/riscv/kernel/pi, as llvm-objcopy expects an ELF object file when manipulating the files in that subfolder, rather than LLVM bitcode. [1] llvm/llvm-project#50505, resolved by LLVM commit e63455d5e0e5 ("[MC] Use local MCSubtargetInfo in writeNops") [2] ClangBuiltLinux#1942 [3] llvm/llvm-project#59350 [4] llvm/llvm-project#65090 Tested-by: Wende Tan <[email protected]> Signed-off-by: Wende Tan <[email protected]> Co-developed-by: Nathan Chancellor <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
Allow LTO to be selected for RISC-V, only when LLD >= 14, since there is an issue [1] in prior LLD versions that prevents LLD to generate proper machine code for RISC-V when writing `nop`s. To avoid boot failures in QEMU [2], '-mattr=+c' and '-mattr=+relax' need to be passed via '-mllvm' to ld.lld, as there appears to be an issue with LLVM's target-features and LTO [3], which can result in incorrect relocations to branch targets [4]. Once this is fixed in LLVM, it can be made conditional on affected ld.lld versions. Disable LTO for arch/riscv/kernel/pi, as llvm-objcopy expects an ELF object file when manipulating the files in that subfolder, rather than LLVM bitcode. [1] llvm/llvm-project#50505, resolved by LLVM commit e63455d5e0e5 ("[MC] Use local MCSubtargetInfo in writeNops") [2] ClangBuiltLinux#1942 [3] llvm/llvm-project#59350 [4] llvm/llvm-project#65090 Tested-by: Wende Tan <[email protected]> Signed-off-by: Wende Tan <[email protected]> Co-developed-by: Nathan Chancellor <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
We're seeing several cases where branch targets are invalid, for example in the middle of another instruction, when building Fuchsia's zircon kernel with
flto=full
. This only seems to occur under-flto-full
, and both non-LTO and ThinLTO do not seem to have this issue.Most branches appear to be correct, or at least have valid targets, but in some cases the target isn't the location of a valid instruction. This sounds like it could be related to linker relaxation, but https://discourse.llvm.org/t/riscv-status-of-lto-for-risc-v/58518 isn't clear on the current status of LTO for RISC-V, so it may be related.
I'm working on getting something smaller than our kernel image as a reproducer, but in the meantime I'm providing a snippet from a relevant case. The target for the
bne
instruction is in the middle of another instruction sequencehttps://fxbug.dev/129493 has additional context.
CC: @topperc @MaskRay @asb
The text was updated successfully, but these errors were encountered: