Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support branch instructions that define their blockparams #186

Open
bjorn3 opened this issue Aug 30, 2024 · 10 comments
Open

Support branch instructions that define their blockparams #186

bjorn3 opened this issue Aug 30, 2024 · 10 comments

Comments

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 30, 2024

This would allow an invoke instruction to define a virtual register representing the return value and then pass this as as blockparam to the regular return block. And similarly define a virtual register representing the landingpad argument and pass it as blockparam to the block for the landingpad.

@cfallin
Copy link
Member

cfallin commented Aug 30, 2024

@bjorn3 and I were discussing this just now; to add a little more background:

  • invoke encapsulates both the call itself, and the (virtual) branch that is implied by the potential unwinding control flow. Separating them is problematic because the regalloc is allowed to insert arbitrary moves between instructions and we can't have any between the call and the "branch": if an unwind happens, it will not execute these moves.
  • modeling the regular (non-exceptional) return values as blockparams for the non-exceptional continuation makes sense because semantically the results only exist/are defined in this case: on unwind, the function hasn't actually returned values.

A thought does occur to me though: if we relax the second point above, this may become a bit simpler. Working backward from what we want to happen in the "backend" of regalloc: we want to allocate new register(s) for the call results; we want their liverange to start with the underlying call instruction. This is a little different from blockparams, which logically start at the successor block, so we have a little "stub" of a liverange just over the invoke/branch instruction anyway. What if we take that further, relax the results-as-blockparams, and do something like:

v1 = invoke ..., regular block1(...), exceptional block2(...)

block1(...):  ; regular-return (fallthrough) case
  (use v1)

block2(...):  ; exceptional case, reached implicitly via unwind
  (don't use v1, it won't have been written)

basically, the idea is to construct the RA2 problem as a conservative superset of the required allocations: the invoke results are allocated at the call, and in-scope in all blocks they dominate as normal, including the normal return path and exceptional return path; the fact that the call will not have written the result on unwind is a fact at a higher semantic level that we respect (by not reading the register on that path) in the CLIF.

The remaining question is then: can branches def values? I can't think of a specific case or check that would break this, but there may be some subtle interaction. In particular I believe that edge-moves are not a problem as long as invoke has two targets: this will force edge-moves to be placed in the successor blocks' heads rather than just before the branch. (The edge-moves come logically after the def so it would be a problem otherwise.)

Others have any thoughts?

@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 30, 2024

In the unwind case there are also return values. These may be of different types than for the regular-return case, but generally do occupy the same registers. These need to be supported as they are often used to pass around the exception object itself.

@Amanieu
Copy link
Contributor

Amanieu commented Aug 30, 2024

This actually already works in regalloc2 without any changes needed. First, we must differentiate between 2 types of branches:

  • branch terminators can have multiple targets but cannot have blockparams.
  • jump terminators can only have 1 target and cannot have operands, but can have blockparams.

Notably, it's possible for branch terminators to have def operands. Any moves associated with these would be placed in the corresponding successor blocks.

You can build an invoke as a branch terminator which also defs 2 output values in fixed registers: the normal return value and the exception pointer. The normal return value is only used in the normal successor block while the exception pointer is only used in the landing pad.

The last commit in #170 extends the fuzzer to test this case.

@cfallin
Copy link
Member

cfallin commented Aug 30, 2024

Neat, that's I think essentially what I describe above except two defs (exception ptr as well). Hopefully a fairly straightforward change on the producer side!

@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 31, 2024

You can build an invoke as a branch terminator which also defs 2 output values in fixed registers: the normal return value and the exception pointer.

Both the normal and exception return cases can have a variable amount of return values. For exception returns it is up to two registers. I guess one way to handle this would be to look at the signature of the called function to know how many normal return values there are and treat the rest as exception return values.

You can build an invoke as a branch terminator

How can I lower the block calls on the clif ir side to something that avoids blockparams on the regalloc side?

@cfallin
Copy link
Member

cfallin commented Aug 31, 2024

I think a reasonable design might be:

  • An invoke of a function with N return types (T1, T2, ...) and M exceptional returns (Exc1, Exc2, ...) always has a result tuple of N+M values: (T1, T2, ..., Exc1, Exc2, ...). The values of the former are undefined on exception, and the latter are undefined on normal return. If we wanted to keep out undefined behavior in the CLIF semantics, we could add a verifier rule that the values are only used in blocks dominated by the appropriate edge.

    (if any pass that examines the invoke needs to know which are which, it can look at the signature of the function to know N, then get M from the number of total results)

  • The block-calls in CLIF are still lowered to blockparams provided by this "branch" as seen by regalloc; the only thing is that the function returns and exceptional values are not additional blockparams.

@cfallin
Copy link
Member

cfallin commented Aug 31, 2024

So to add an example, if we have

  fn0 = %f(i32, i32) -> (i64, i64)
block0(v1: i32, v2: i32):
  v3, v4, v5, v6 = invoke %f(v1, v2), block1(v1), block2(v2)  ;; v3, v4 are normal returns; v5, v6 are exceptional returns

block1(v7: 32):  ;; v7 is a normal blockparam
  ;; use v3, v4

block2(v8: i32):  ;; v8 is a normal blockparam
  ;; use v5, v6

then the regalloc metadata for the invoke is:

  • v1, v2 are early-uses (constrained per ABI)
  • v3, v4, v5, v6 are late-defs (constrained per ABI)
  • is_branch = true, two successors, blockparams of [[v1], [v2]]

@fitzgen
Copy link
Member

fitzgen commented Aug 31, 2024

Cross posting bytecodealliance/rfcs#36 here because I think both discussions should be aware of each other.

Random notes on my preferences for the clif design side of things, since it is relevant to the discussion here about how we would lower/translate this to regalloc2 input:

  • I know this is super nitpicky, but I pretty strongly prefer try_call to invoke as far as naming the clif instruction goes, since the latter really doesn't imply anything to do with fallibility other than it happens to be the same name that LLVM chose.

  • I think we want to annotate landing pad blocks in clif so that the verifier can easily check that normal control doesn't flow to them. This is the idea behind the catch prefix (similar to the cold prefix) for blocks that is described in the linked RFC. Not sure if this makes sense for regalloc input as well or not; not clear to me that regalloc would want or need to treat such blocks differently or not.

  • @cfallin

    An invoke of a function with N return types (T1, T2, ...) and M exceptional returns (Exc1, Exc2, ...) always has a result tuple of N+M values: (T1, T2, ..., Exc1, Exc2, ...).

    To be clear, this would only be done in the regalloc2 side of things, right? That is, the clif would still terminate a block with a try_call and then the successor blocks would have block params? I feel pretty strongly that we want to have that shape of clif, and not rely on the verifier to check that the undefined values are not used in successor blocks that don't match the exceptionality of the return. I don't have strong feelings about the shape of the regalloc input tho.

  • @bjorn3

    Both the normal and exception return cases can have a variable amount of return values. For exception returns it is up to two registers.

    Is there any reason we can't start by simply fixing the amount of exceptional return values to exactly one or exactly two (whatever is sufficient for both wasm exceptions and clif's panic unwinding)? I feel like variable exceptional returns is a complication that we should be able to ignore for now. Of course, if this doesn't actually make anything simpler on the regalloc side of things, no need to artificially build that constraint in.

@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 31, 2024

I think we want to annotate landing pad blocks in clif so that the verifier can easily check that normal control doesn't flow to them.

There is no fundamental reason why this is necessary. In fact for wasm lowering it may be beneficial if this restriction doesn't exist sich that throwing an exception can be done by directly jumping to the cleanup block while still sharing this block with the unwind target for calls. With this restriction in place wasm lowering would need to add an extra block whose sole purpose is to be marked as landingpad and jump to the real cleanup block.

Edit: It may be necessary for another block to be inserted in between when lowering to vcode for any register moves, but that has to be done anyway when two calls share a landingpad.

@cfallin
Copy link
Member

cfallin commented Sep 3, 2024

  • @cfallin

    An invoke of a function with N return types (T1, T2, ...) and M exceptional returns (Exc1, Exc2, ...) always has a result tuple of N+M values: (T1, T2, ..., Exc1, Exc2, ...).

    To be clear, this would only be done in the regalloc2 side of things, right? That is, the clif would still terminate a block with a try_call and then the successor blocks would have block params? I feel pretty strongly that we want to have that shape of clif, and not rely on the verifier to check that the undefined values are not used in successor blocks that don't match the exceptionality of the return. I don't have strong feelings about the shape of the regalloc input tho.

The context of this discussion is how to present the instruction to RA2, so yes, my thinking is mostly from that side. That said, I think this issue is a little more subtle than it may seem; will write up some comments over in the proposal (I hadn't had a chance to read it when posting the above as it had just appeared!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants