Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow blockparams on branches with multiple successors #170

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Nov 24, 2023

Since the input CFG is required to not have critical edges, such blockparams are useless. The incoming blockparams can simply be replaced with the vreg from the (unique) predecessor.

It's better to let the client's lowering code handle this so regalloc2 doesn't need to.

Since the input CFG is required to not have critical edges, such
blockparams are useless. The incoming blockparams can simply be replaced
with the vreg from the (unique) predecessor.

It's better to let the client's lowering code handle this so regalloc2
doesn't need to.
@elliottt
Copy link
Member

elliottt commented Dec 5, 2023

I think that this is a reasonable change to make, but cranelift is still generating code that would violate this new invariant. I believe that it would be pretty straightforward to fix that, but we probably won't be able to get to that until next year. Here's the issue that I added describing where the fix would need to be made: bytecodealliance/wasmtime#7639

@cfallin
Copy link
Member

cfallin commented Dec 5, 2023

(Popping in to give my thoughts on this in a timely manner, though I'm on parental leave at the moment)

IMHO, there is some philosophical question here of where to place complexity: in the embedder (user of regalloc2) or in regalloc2 itself. Right now, regalloc2 accepts a well-defined IR: SSA, with blockparams, used in any way that SSA allows. Introducing further restrictions on that could make sense if we were developing the allocator from scratch, or if it otherwise were a substantial improvement in complexity or speed, but re: complexity we see a "+90 -92" diff, and re: speed there is no improvement cited here. (Perhaps some improvement does occur and I'd be curious if you've measured any!) Finally, existing consumers (notably Cranelift) generate code without keeping this extra restriction in mind, and it creates additional work (and potential for bugs) to adapt to the new requirement.

Such blockparams are indeed "useless" if one simplifies completely, but generating the code now requires additional care, and there is value in having components or stages of a compiler that accept a more general input without required canonicalization, especially when offering a library.

So: since the allocator already exists and supports this more general input, why remove that capability? (More succinctly: is there a motivation for this PR, beyond the "useless" value judgment above?)

@elliottt
Copy link
Member

elliottt commented Dec 5, 2023

My argument in favor would be that any simplifications we can make to regalloc2 lessen the maintenance burden, and adapting cranelift as it stands is not a huge effort. Certainly we can leave things as they are, but given how few users of regalloc2 there are, I think that simplification is a good goal.

@cfallin
Copy link
Member

cfallin commented Dec 6, 2023

Right, agreed with that; I guess my question is really whether this is a simplification, in a global sense; it doesn't seem to have removed any logic in RA2 here, and I don't think it simplifies the allocation problem in any appreciable way. (Please do correct me if I'm missing some aspect here though!) Additional constraints/requirements to keep in mind on the user side are also complexity, so from my perspective at the moment, this change pushes us into higher complexity overall.

(Disappearing back into my cave now, but happy to discuss more in February when I'm back; just wanted to inject this perspective in case this were to move forward otherwise)

@cfallin
Copy link
Member

cfallin commented Dec 6, 2023

To amend the above a bit: please don't let me block this if it is indeed a simplification to the allocation problem, for reasons I'm missing; I just wasn't sure what the motivation was, and wanted to ask. (The only changes to ion/ are in liveranges.rs where we change iteration over blockparams as appropriate; nothing in the solver core changes.) If there were additional simplifications that could follow on, or new algorithms that this enables, or performance optimizations we can now make, or something like that, that would be very good additional context to have!

@cfallin
Copy link
Member

cfallin commented Dec 6, 2023

Chatted a bit offline with @elliottt about this and we surmised that there may be indirect benefits from this: the changes to CL necessary to fit these constraints plausibly could improve allocation time and/or runtime. In any case, @Amanieu I'm still curious to hear more of your motivation or backstory for this -- were you working toward other simplifications or seeing some direct benefits or ...?

@Amanieu
Copy link
Contributor Author

Amanieu commented Dec 6, 2023

This isn't intended to improve performance, it is mainly about simplifying the logic a bit. The motivating code for this is actually on a branch that I'm working on, specifically this commit.

It isn't needed when there is only one successor.
@bjorn3
Copy link
Contributor

bjorn3 commented Dec 6, 2023

My branch for adding unwinding support to cranelift makes use of blockparams for terminators with multiple successors. The invoke terminator has multiple successors, each of which has block params. For the regular return case it uses the blockparams to pass the call return value and for the unwinding case to pass the exception data.

@elliottt
Copy link
Member

elliottt commented Dec 6, 2023

My branch for adding unwinding support to cranelift makes use of blockparams for terminators with multiple successors. The invoke terminator has multiple successors, each of which has block params. For the regular return case it uses the blockparams to pass the call return value and for the unwinding case to pass the exception data.

Does your branch require blockparams on branches after lowering?

@bjorn3
Copy link
Contributor

bjorn3 commented Dec 6, 2023

Currently it doesn't do that, but I think it will be necessary for regalloc to not insert moves between the call and the end of the block. The unwind path would skip those moves. I haven't seen any miscompilation from this yet though.

@Amanieu
Copy link
Contributor Author

Amanieu commented Dec 6, 2023

At the regalloc2 level you should be able to model the invoke as a block terminator with 2 successor blocks. AFAIK branch instructions are allowed to produce output values as long as blockparams are not used.

This extends the fuzzer and checker to verify that branches that produce
outputs are properly supported and no moves are inserted after a branch.

This also checks the one edge case where a branch instruction may have
both operands and blockparams: when there is only one successor and that
successor only has one predecessor.
@Amanieu
Copy link
Contributor Author

Amanieu commented Dec 12, 2023

I added a check in the fuzzer (and fixed the checker to handle it) that branch instructions that produce outputs work properly. This would allow you to model invoke as a branch instruction that produces outputs in registers. This allows subsequent blocks (normal return & unwind) to use these values.

jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Aug 26, 2024
This lets us stop allocating temporary VRegs for critical edges that
have block parameters. That makes the register allocation problem a
little smaller, and also allows reusing lower_branch_blockparam_args for
all block parameters.

Fixes bytecodealliance#7639, and unblocks bytecodealliance/regalloc2#170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants