Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for partial compilation of methods #80635

Merged
merged 13 commits into from
Mar 6, 2023

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Jan 13, 2023

[edited: was enable PC by default -- we will pursue that in a subsequent PR]

Fix some issues seen with partial compilation in wider testing:

  • PC jit helper must not block GC
  • proper flow cleanup when we put a PC patchpoint at a switch or degenerate cond

Enable partial compilation of Tier0 methods, using the simple strategy
that any stack-empty non-entry non-handler rarely run block is worth
deferring.

Since Tier0 curently does not access static profile data, this means
we mainly defer compilation of blocks with explicit throws.

This leverages patchpoints to trigger compilation of the missing parts
of methods in a manner similar to OSR. It is enabled for x64 and arm64.
@ghost ghost assigned AndyAyersMS Jan 13, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 13, 2023
@ghost
Copy link

ghost commented Jan 13, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Enable partial compilation of Tier0 methods, using the simple strategy that any stack-empty non-entry non-handler rarely run block is worth deferring.

Since Tier0 curently does not access static profile data, this means we mainly defer compilation of blocks with explicit throws.

This leverages patchpoints to trigger compilation of the missing parts of methods in a manner similar to OSR. It is enabled for x64 and arm64.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

@EgorBo PTAL
cc @dotnet/jit-contrib

Possibly should mark this as "draft" as I'm not sure it is worth enabling just yet. At any rate this is a trivial change since the PC mechanism has been in the code (and run in various experimental CI legs) for quite a while now.

TP impact (per SPMI) will be both interesting and misleading. We hopefully should see a bit of movement in the ASP.NET startup data.

@EgorBo
Copy link
Member

EgorBo commented Jan 13, 2023

@AndyAyersMS perhaps worth running some stress tests?

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jan 13, 2023

@AndyAyersMS perhaps worth running some stress tests?

Sure. Let's see if it makes it through normal CI first (guessing it will) -- looks like there may be a failure to investigate.

@AndyAyersMS
Copy link
Member Author

Not sure if the failures are related or not -- locally I can't repro.

Diffs. TP impact is pretty small per SPMI, will be interesting to see on actual apps.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is quite the diffs 🙂
Let's also see if TE benchmarks will be affected, e.g. FullPGO ones where we disable R2R

@AndyAyersMS
Copy link
Member Author

Still a bit puzzled by the failures. I can't repro them so far locally and can't get anything useful out of the core dumps.

@AndyAyersMS
Copy link
Member Author

Looks like all the recent pred list stuff is causing PC to faill -- will work up a fix. Also there is one other failure; can't tell for sure if it is the same issue that I mentioned above since CI has purged the logs, but I think it probably is.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Feb 3, 2023
We may have degenerate flow out of a partial compilation block, so make
sure to fully remove the block from all successor pred lists.

Fixes issue seen in dotnet#80635.
AndyAyersMS added a commit that referenced this pull request Feb 6, 2023
We may have degenerate flow out of a partial compilation block, so make
sure to fully remove the block from all successor pred lists.

Fixes issue seen in #80635.
@AndyAyersMS
Copy link
Member Author

Merged to pick up fix from #81605.

@AndyAyersMS
Copy link
Member Author

@jkotas I was seeing sporadic failures in OSX/Linux libraries tests. Debugging locally, I could repro a GC suspension timeout -- and while I didn't catch the PartialCompilation helper on the stack it seems likely that the looping /waiting we were doing in there was the culprit.

Can you look over how I modified this helper to enable preemptive GC when you get a chance?

@@ -5219,15 +5219,11 @@ void JIT_Patchpoint(int* counter, int ilOffset)
// Unlike regular patchpoints, partial compilation patchpoints
// must always transitio.
//
void JIT_PartialCompilationPatchpoint(int ilOffset)
HCIMPL1(VOID, JIT_PartialCompilationPatchpoint, int ilOffset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does JIT_Patchpoint need the same treatment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't loop, so (at least so far) I've never seen it cause problems.

If it does end up jitting a method, that thread switches to preemptive, and any other threads that hit the patchpoint don't need to wait, they just keep running the Tier0 code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the HELPER_METHOD_FRAME_BEGIN_0 setup need to be moved in front of EECodeInfo codeInfo(ip); for the same reason you had to move here?

// (but consider: throw path in method with try/catch, OSR method will contain more than just the throw?)
//
LOG((LF_TIEREDCOMPILATION, LL_INFO10, "Jit_PartialCompilationPatchpoint: patchpoint [%d] (0x%p) TRIGGER\n", ppId, ip));
PCODE newMethodCode = JitPatchpointWorker(pMD, codeInfo, ilOffset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also add STANDARD_VM_CONTRACT; annotation to JitPatchpointWorker, and make callers switch to preemptive mode before calling it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@AndyAyersMS
Copy link
Member Author

Arm failure is perhaps another instance of #82252 (which is closed as fixed).

@AndyAyersMS
Copy link
Member Author

With the latest commit, PC may be enabling GC at places we wouldn't have GC'd before, so it could be exposing some latent reporting issue.

@AndyAyersMS
Copy link
Member Author

Hmm, seems like more widespread problems with contract violations.

::SetLastError(dwLastError);
ENDFORBIDGC();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENDFORBIDGC may need to be before SetLastError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- will reorder these.

@AndyAyersMS
Copy link
Member Author

Latest Diffs.

SPMI sees about at 1.5% TP improvement for minopts, and (I'm guessing) a similar decrease in code size.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr pgo, runtime-coreclr libraries-pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 25, 2023

Basic CI tests are all good; extended tests have some issues:

  • libraries pgo failure looks like it might be unrelated? F
  • not sure about the outerloop failure -- error is an an arm test, but PC is only enabled for x64/arm64.
  • A number of jitstress failures to sort through that seem clearly related. Will tackle those first.

@AndyAyersMS
Copy link
Member Author

Jitstress issue was random PGO stress putting PC patchpoints in blocks with switches; we weren't using the more careful successor iterator and so were seeing duplicate successors.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr pgo, runtime-coreclr libraries-pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 26, 2023

Libraries PGO failure:

Assert failure(PID 28 [0x0000001c], Thread: 37 [0x0025]): Assertion failed '(igInLoop->igLoopBackEdge == nullptr) || (igInLoop->igLoopBackEdge == igLoopHeader)' in 'System.Enum:TryParseByValueOrName[int](System.RuntimeType,System.ReadOnlySpan`1[ushort],bool,bool,byref):bool' during 'Generate code' (IL size 724; hash 0xef12aa33; Tier1)

    File: /__w/1/s/src/coreclr/jit/emit.cpp Line: 5713
    Image: /root/helix/work/correlation/dotnet

Looks like emitter accounting is off -- since this is Tier1 (and not OSR) seems like it is probably an existing issue. Looks like it popped up in runs against main done this weekend. @EgorBo let me know if you're going to track this one with a new test monitor issue.

CG2 failure is

Fatal error. Internal CLR error. (0x80131506)
at System.GC.AllocateNewArray(IntPtr, Int32, GC_ALLOC_FLAGS)
at System.Text.StringBuilder.ExpandByABlock(Int32)
at System.Text.StringBuilder.Append(Char, Int32)
at System.Text.StringBuilder.Append(Char)
at System.Diagnostics.StackTrace.ToString(TraceFormat, System.Text.StringBuilder)
at System.Diagnostics.StackTrace.ToString(TraceFormat)
at System.Exception.get_StackTrace()
at System.Exception.ToString()
at Internal.JitInterface.CorInfoImpl.AllocException(System.Exception)

I am seeing this same failure in recent outerloop runs on main too.

@EgorBo
Copy link
Member

EgorBo commented Feb 27, 2023

@EgorBo let me know if you're going to track this one with a new test monitor issue.

Sure, let me see if it pops up in different PRs, but presumably indeed worth filing an issue since it should not be connected with your PR directly

@AndyAyersMS
Copy link
Member Author

@EgorBo let me know if you're going to track this one with a new test monitor issue.

Sure, let me see if it pops up in different PRs, but presumably indeed worth filing an issue since it should not be connected with your PR directly

#82729

@BruceForstall
Copy link
Member

Is it worthwhile to separately PR/merge the various fixes and the TC_PartialCompilation=1 enabling change?

@AndyAyersMS
Copy link
Member Author

Is it worthwhile to separately PR/merge the various fixes and the TC_PartialCompilation=1 enabling change?

Yeah, makes sense. I will hijack this PR to push the fixes.

@AndyAyersMS AndyAyersMS changed the title Enable partial compilation of methods Fixes for partial compilation of methods Mar 4, 2023
@AndyAyersMS AndyAyersMS merged commit c16c7ae into dotnet:main Mar 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants