Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relocations, ELF Markers, and >32-bit Instructions #453

Open
lenary opened this issue Nov 1, 2024 · 8 comments
Open

Relocations, ELF Markers, and >32-bit Instructions #453

lenary opened this issue Nov 1, 2024 · 8 comments

Comments

@lenary
Copy link
Contributor

lenary commented Nov 1, 2024

I took a good look through the currently open issues, and none specifically
address any of these issues, but sorry if I missed something. Some of the
discussion in #393 touches on these issues but not in any great depth. I've also
been working with other architectures for a while, so am still getting back up
to speed with the RISC-V psABI details again.

This issue is going to touch on a series of related issues, around support for
longer-than-32-bit instructions. I'm sorry I didn't get this to you in time for
the most recent psABI meeting, but hopefully that gives you time to think about
the issues before the next one.

One of my central queries is about the meaning of EF_RISCV_RVC - this denotes
whether you can use 16-bit aligned instructions (rather than 32-bit), and
whether you can use c extension instructions during relaxation. I'll note that
LLVM has updated how it interprets this flag, to still mean the former, but for
the latter mean just Zca, not all of C 1 (this should have been expected
when C was exploded into a lot of sub-extensions, some incompatible with each
other).

What about binaries containing 48-bit instructions? We have a set of public isa
extensions, Xqci 2, which we want to add support for, which contains 48-bit
instructions (something the core ISA standard is yet to ratify encodings for,
but space has been reserved for >32-bit instructions). 48-bit instructions
require us to have 16-bit aligned instructions (or else we would have 64-bit
instructions). In our case, none of the sub-extensions which have 48-bit
instructions also have 16-bit instructions, nor do any
require/imply C/Zca.

Both implications of the EF_RISCV_RVC flag are also redundant: sections already
have an alignment (with obvious semantics when two sections are merged together:
enforce the higher alignment), and we now have architecture build attributes
which we can query to work out which extensions we are allowed to use during
relaxation.

So, should we be setting the EF_RISCV_RVC flag for binaries containing 48-bit
instructions? Maybe it would be ok to keep EF_RISCV_RVC clear but still mark
any code sections as having 16-bit alignment, which the linker should be
honouring? Some guidance as to a reasonable direction to take here would be
helpful. We could allocate ourselves a non-standard extension elf flag to
represent "this object contains 16-bit aligned instructions, but not necessarily
C/Zca", but we would like support for these instructions to go upstream and
allocating a non-standard extension bit for this seems greedy and potentially
unnecessary.

I have a similar query relating to relocations on 48-bit instructions. The
Xqcibi sub-extension (described in the release, above) contains some 48-bit
branch immediate instructions (qc.e.b<cond>i) where the branch offset is
encoded into the exact same bits that would be used by a b<cond> instruction.
The ISA designers did this so they could use an R_RISCV_BRANCH relocation in
their prototype toolchain. My concern is that this is likely to have a knock-on
effect on relaxations and beyond - we have instruction types for a reason, and
we quite like to use them in the ABI (aside: the document/yaml for Xqci
doesn't mention instruction types, which is a drawback, but I think this is
shared by the riscv-unified-db upstream too). Right now, all instruction
relocations end up well-aligned with the start of the instruction they apply to.

Broadly, my question is: do we want to reuse an existing relocation like this
(on a longer instructions of a different type), or would we prefer that all
relocations are correct for the instruction type (and size)? My gut feeling is
that we do want new relocations for new instruction types, to keep relocations
obvious and aligned with instruction boundaries, but I'd be interested to hear
other opinions. I think keeping relocations aligned with instructions and only
applied to instructions with the correct type makes relaxations easier and less
brittle, but I'm not 100% sure on that. The specific implication here is that we
might end up needing quite a lot of new relocations as we get longer
instructions, but I think we'd reasonably quickly stop getting lots more
instructions for materializing addresses.

I think maybe @kito-cheng and @asb might be expecting some of these queries, but keen to hear from others too.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 1, 2024

For instruction alignment, just because your instructions individually can be 16-bit aligned doesn't mean the whole section only needs that. For example, xtvec requires 4-byte alignment on the address even with RVC due to using the low 2 bits as the mode, so any OS's text section will be at least 4-byte aligned (and the trap vector at a 4-byte aligned offset within that). So whilst align(.text) == 2 implies you can use 2-byte instructions, the converse is not true, and thus align(.text) == 4 does not imply that 2-byte instructions aren't in use.

The relocation normally needs to imply the instruction size, yes. Even on X86 where the actual operand may be encoded in a uniform manner despite different instruction prefixes, the number of prefix bytes still gets encoded so you can do relaxation (albeit in a more limited manner there).

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 1, 2024

Importantly, I don't think I can see how 48-bit instructions work without having a 16-bit NOP. Ditching the rest of C seems fine, but I think you need at least that one instruction from it.

@lenary
Copy link
Contributor Author

lenary commented Nov 1, 2024

For instruction alignment, just because your instructions individually can be 16-bit aligned doesn't mean the whole section only needs that. For example, xtvec requires 4-byte alignment on the address even with RVC due to using the low 2 bits as the mode, so any OS's text section will be at least 4-byte aligned (and the trap vector at a 4-byte aligned offset within that). So whilst align(.text) == 2 implies you can use 2-byte instructions, the converse is not true, and thus align(.text) == 4 does not imply that 2-byte instructions aren't in use.

Ah, ok I did miss this nuance, that executable section alignment isn't 1:1 with IALIGN (and I probably should have re-read the unprivileged spec to remind me how the ISA refers to this situation, before posting). I think my overall question still stands, that EF_RISCV_RVC implies two things: IALIGN=16 bits and "instructions from [some part of] the C extension are allowed to be introduced when relaxing". I intended to point out that Xqci contains sub-extensions which want IALIGN=16 bits, but don't necessarily want the changes to relaxations.

The relocation normally needs to imply the instruction size, yes. Even on X86 where the actual operand may be encoded in a uniform manner despite different instruction prefixes, the number of prefix bytes still gets encoded so you can do relaxation (albeit in a more limited manner there).

I will go and read the x86 psABI to understand how it deals with relaxation and long instructions better, thanks for the tip. I think you're agreeing with my intended direction though, which sounds positive to me.

Importantly, I don't think I can see how 48-bit instructions work without having a 16-bit NOP. Ditching the rest of C seems fine, but I think you need at least that one instruction from it.

If you have 48-bit instructions (or any odd multiple of 16 bits), those extensions not implying IALIGN=16 bits is a little pointless - you've actually defined a set of 64-bit instructions (respectively, the next even multiple of 16 bits) and wasted 16 bits of the encoding with the same bits that you put in c.nop. Surely the point in adding 48-bit instruction encodings is so you can directly follow them with another instruction of any length, rather than having to pair them with a 16-bit instruction. Note that the unprivileged spec says "IALIGN is 32 bits in the base ISA, but some ISA extensions, including the compressed ISA extension, relax IALIGN to 16 bits" so presumably "some ISA extensions" could also include vendor extensions, not just C and its standard sub-extensions.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 1, 2024

Importantly, I don't think I can see how 48-bit instructions work without having a 16-bit NOP. Ditching the rest of C seems fine, but I think you need at least that one instruction from it.

If you have 48-bit instructions (or any odd multiple of 16 bits), those extensions not implying IALIGN=16 bits is a little pointless - you've actually defined a set of 64-bit instructions (respectively, the next even multiple of 16 bits) and wasted 16 bits of the encoding with the same bits that you put in c.nop. Surely the point in adding 48-bit instruction encodings is so you can directly follow them with another instruction of any length, rather than having to pair them with a 16-bit instruction. Note that the unprivileged spec says "IALIGN is 32 bits in the base ISA, but some ISA extensions, including the compressed ISA extension, relax IALIGN to 16 bits" so presumably "some ISA extensions" could also include vendor extensions, not just C and its standard sub-extensions.

I don't mean that they have to be followed by a 16-bit instruction. But for, say, R_RISCV_ALIGN, we currently insert c.nop if the padding is 2 mod 4. This case can't arise with 32-bit-only instructions, but can with 16+32-bit, and can with 32+48-bit. How do you insert 2 bytes of padding without c.nop? (You can of course do it 2n+2 for n > 0 if you have a 48-bit NOP, but n = 0 is a special case)

@lenary
Copy link
Contributor Author

lenary commented Nov 4, 2024

Thanks for clarifying, I had momentarily forgotten the "align with nops" requirements. Given the smallest architecture that contains c.nop is Zca, it now makes most sense for any extensions with instructions that are an odd multiple of 16 bits long to require Zca as well, which means I don't need to worry about the EF_RISCV_RVC flag.

I also read the x86-64 psabi and understand a bit more what's going on there, even though the relaxable relocations are mid-way through an instruction (at the start of an immediate field, which I think always comes last), the types indicate where the instruction started. It's a lot less clean than just having instruction types, and relevant relocations for each instruction type.

@kito-cheng
Copy link
Collaborator

I think the conclusion from the earlier discussion is that for instruction lengths greater than 32-bit, linker relaxation will require at least Zca, and I agree with this point. other than that, here are some additional thoughts I have on the topic:

For EF_RISCV_RVC:

The definition of this ELF flag has become a bit ambiguous after the introduction of Zc* standards. This ambiguity also extends to the meaning of .option rvc/.option norvc, but since we’re discussing ABI here, we’ll set that aside for now.

The current definition is:

This bit is set when the binary targets the C ABI, which allows instructions to be aligned to 16-bit boundaries (the base RV32 and RV64 ISAs only allow 32-bit instruction alignment). When linking objects that specify EF_RISCV_RVC, the linker is permitted to use RVC instructions such as C.JAL in the linker relaxation process.

However, after introducing Zc*, we might consider changing "permitted to use RVC instructions" to "permitted to use Zca instructions." But we also have an unresolved issue, #393, so we might want to consider removing the linker part in the latter half and let this flag simply represent IALIGN.

For dedicated relocation types for longer instruction length:

I can see the possibility of reusing some relocations for longer instructions in the future—for example, using R_RISCV_32 to handle a 32-bit immediate. However, from a linker relaxation and implementation standpoint, I’d prefer using new relocations instead of reusing existing ones. This could simplify some parts of the linker relaxation process (avoiding instruction scanning) and improve output readability in objdump or readelf. For example, if we had an instruction that could take a 32-bit immediate, with the first 16 bits potentially being an opcode, then the relocation would show up in the middle of the instruction.

For #393:

I still haven’t seen a better solution for this issue…maybe we should push forward on this with Nelson's help.

@lenary
Copy link
Contributor Author

lenary commented Nov 6, 2024

For dedicated relocation types for longer instruction length

[…] for example, using R_RISCV_32 to handle a 32-bit immediate […]

I did think about this, as some of the 48-bit instructions in Xqci have 32-bit contiguous immediate fields - the reason I discounted it is because of big-endian. I don't think there are big endian implementations yet, but I also don't think we want to use data relocations (which have to be endianness-aware) on instructions (which are always little endian) or vice-versa. I don't think altering the interpretation of a relocation depending on whether a section is executable or based on the marker symbols (for two examples) is a viable route forwards.

EF_RISCV_RVC: Thanks for pointing out #393 - I will think about this issue a bit more. As you say, we've slightly struggled since C was split into sub-parts. I will comment on that proposal.

@kito-cheng
Copy link
Collaborator

I did think about this, as some of the 48-bit instructions in Xqci have 32-bit contiguous immediate fields - the reason I discounted it is because of big-endian. I don't think there are big endian implementations yet, but I also don't think we want to use data relocations (which have to be endianness-aware) on instructions (which are always little endian) or vice-versa. I don't think altering the interpretation of a relocation depending on whether a section is executable or based on the marker symbols (for two examples) is a viable route forwards.

Good point on the endian...I didn't aware that, but that definite a potential issue, BTW, we did have few non-standard big endian software support like spike and GNU toolchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants