Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Use linear block order for MinOpts in LSRA #108147

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

Follow-up from #108086 (comment). Since LSRA shouldn't create cross-block live registers in MinOpts, LSRA's block ordering shouldn't matter, so just use the lexical order we have on-hand.

cc @dotnet/jit-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 23, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member

This is quite odd... probably worth figuring out what is going on in libraries pmi (likely very few minops methods, but still...)

image

@amanasifkhalid
Copy link
Member Author

This is quite odd... probably worth figuring out what is going on in libraries pmi (likely very few minops methods, but still...)

I looked at the JitDisasm summary, and there are only a handful of MinOpts methods in libraries.pmi. The standout method is Microsoft.FSharp.Linq.RuntimeHelpers.LeafExpressionConverter:ConvExprToLinqInContext, which has 2459 basic blocks. I'm not able to run pin.exe on my AMD-based DevBox to take a closer look at the TP cost for this method, but perhaps there's some locality issue with using lexical order for super-large methods? For example, if we need to grab the predecessor of a block, the pred is less likely to be cached if we didn't use an RPO traversal, but this only becomes a problem if we have lots of basic blocks.

@amanasifkhalid
Copy link
Member Author

Per discussion here, I'm going to try using the optimized layout all the time to see if we can get around some of the pitfalls of using a RPO sequence on its own.

@amanasifkhalid
Copy link
Member Author

This is quite odd... probably worth figuring out what is going on in libraries pmi (likely very few minopts methods, but still...)

Sorry for getting back to you so late on this, but I finally got pin working thanks to Jakob. The bulk of the TP regression for the above method is coming from LinearScan::processKills. I did some rudimentary profiling, and both traversals call processKills the same number of times, but when using the lexical order, processKills, ends up doing a lot more work for this method: The inner loop runs for about 3.5x more iterations in total. I wonder if processing register kills in some order that doesn't match control flow resulted in some duplicated work -- perhaps the logic for killing registers can be simplified quite a bit for MinOpts, since we don't have cross-block live registers? @kunalspathak would you expect the register killing behavior to change in MinOpts when the block order changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants