Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move remaining HIR SIMDIntrinsics to SimdAsHWIntrinsic #79720

Merged
merged 22 commits into from
Dec 28, 2022

Conversation

tannergooding
Copy link
Member

This moves the remaining legacy SIMD intrinsics (GT_SIMD) that exist in HIR to be implemented using SimdAsHWIntrinsic instead.

As part of this, it means we are able to delete impSIMDIntrinsic.

There are still two legacy SIMD intrinsics which exist in LIR. In particular these are SIMDIntrinsicUpperSave and SIMDIntrinsicUpperRestore. After this PR goes in, I plan on putting up one last PR to move these to be NamedIntrinsic (they cannot be SimdAsHWIntrinsic because they exist even when FEATURE_HW_INTRINSIC is not defined in order to support the ABI). This will allow us to delete GT_SIMD and any remaining support that was specific to the legacy SIMD logic.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 15, 2022
@ghost ghost assigned tannergooding Dec 15, 2022
@ghost
Copy link

ghost commented Dec 15, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This moves the remaining legacy SIMD intrinsics (GT_SIMD) that exist in HIR to be implemented using SimdAsHWIntrinsic instead.

As part of this, it means we are able to delete impSIMDIntrinsic.

There are still two legacy SIMD intrinsics which exist in LIR. In particular these are SIMDIntrinsicUpperSave and SIMDIntrinsicUpperRestore. After this PR goes in, I plan on putting up one last PR to move these to be NamedIntrinsic (they cannot be SimdAsHWIntrinsic because they exist even when FEATURE_HW_INTRINSIC is not defined in order to support the ABI). This will allow us to delete GT_SIMD and any remaining support that was specific to the legacy SIMD logic.

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

tannergooding commented Dec 15, 2022

Worth noting this is not expected to be zero diff, just positive diffs.

In particular, SIMDIntrinsicInitN had some pretty horrid codegen involving repeated shifts + shuffles. Where-as Vector128_Create will appropriate use SSE41_Insert when it's available.

Some simple diffs from these changes are below.

We'll correctly contain this when no side effects interfere:

- movzx    rax, word  ptr [rsp+04H]
- vmovd    xmm0, eax
- vpbroadcastw ymm0, ymm0
+ vpbroadcastw ymm0, word  ptr [rsp+04H]

We'll emit insertps directly rather than movss + pslldq combinations:

- vxorps   xmm3, xmm3
- vmovss   xmm3, xmm3, xmm2
- vpslldq  xmm3, 4
- vmovss   xmm3, xmm3, xmm6
- vpslldq  xmm3, 4
- vmovss   xmm3, xmm3, xmm5
- vmovaps  xmm2, xmm3
+ vinsertps xmm5, xmm5, xmm6, 16
+ vinsertps xmm2, xmm5, xmm2, 40

We'll recognize and contain zero when its an input to insertps

- vxorps    xmm1, xmm1, xmm1
- vinsertps xmm0, xmm0, xmm1, 48
+ vinsertps xmm0, xmm0, xmm0, 56

We'll also fold chains of insertps, when there are no side effects, and zero is involved

  vmovss    xmm0, dword ptr [rsi+30H]
  vinsertps xmm0, xmm0, dword ptr [rsi+34H], 16
- vinsertps xmm0, xmm0, dword ptr [rsi+38H], 32
- vxorps    xmm1, xmm1, xmm1
- vinsertps xmm0, xmm0, xmm1, 48
+ vinsertps xmm0, xmm0, dword ptr [rsi+38H], 40

@tannergooding
Copy link
Member Author

Raised the aliasing bug on #72725 and asked the fix be reopened.

@tannergooding
Copy link
Member Author

tannergooding commented Dec 17, 2022

CC. @dotnet/jit-contrib. This removes the last bit of the legacy SIMD handling that existed in HIR. It deletes a good amount of dead code and sets us up to move the last two remaining SIMDIntrinsic cases (LIR only) and delete GT_SIMD in a follow up PR.

It provides a small throughput win and some decent size wins for x64, Arm64, and x86.

I'm not quite ure why SPMI is complaining. I noticed that locally jitdiff.exe --diff --pmi has been failing for quite some time on S.P.Corelib due to a bit in impSIMDIntrinsic that wasn't handling Vector<Vector<float>> correctly and that isn't happening anymore now that impSIMDIntrinsic is gone. Perhaps that is related? -- I don't see anything obvious from the log, just a lot of missing context errors. So if someone more knowledgeable here can confirm or point me in the direction of the issue that would be appreciated.

@BruceForstall
Copy link
Member

I'm not quite ure why SPMI is complaining.

Looks like you have a crash in the Linux/x64 cross-compiler replays, both in the superpmi-diffs and superpmi-replay pipelines. E.g., in the replay pipeline log, cases of:

ERROR: Exception thrown: DebugBreak or AV Exception 123

in the diffs log, e.g.:

ERROR: method 4466 is missing a compileResult, cannot do diffing

To repro, run something like this under the debugger (fix the paths first):

superpmi.exe -box -boe -boa -jitoption JitStressRegs=2 <core_root>\clrjit_unix_x64_x64.dll <spmi_root>\mch\fd13d4e1-9815-4336-8232-b27878b9ebd4.Linux.x64\libraries.pmi.Linux.x64.checked.mch

We do expect some MISSING cases as the JIT evolves. You might be introducing more if you're changing calls across the JIT/EE interface. In fact, the summary page shows that is the case.

@tannergooding
Copy link
Member Author

Thanks for the info @BruceForstall.

I did run jitdiff --diff --pmi again locally to double check as well and found two more cases (pre-existing) where we were asserting, so I fixed those (they were small fixes).

I'll try to see if I can repro the SPMI Linux/x64 failure as well.

@tannergooding
Copy link
Member Author

hmm, I ran:

D:\Users\tagoo\source\repos\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\superpmi.exe -box -boe -boa -jitoption JitStressRegs=2 D:\Users\tagoo\source\repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit_unix_x64_x64.dll D:\Users\tagoo\source\repos\runtime\artifacts\spmi\mch\fd13d4e1-9815-4336-8232-b27878b9ebd4.Linux.x64\libraries.pmi.Linux.x64.checked.mch

And am seeing the following counts of giving replay failures:

1354 - failed to replay: SuperPMI assertion failed (missing key "key" in map IsValueClass)
98   - failed to replay: SuperPMI assertion failed (missing key "key" in map GetFieldClass)
3    - failed to replay: SuperPMI assertion failed (missing map GetArgType)
4    - failed to replay: SuperPMI assertion failed (missing key "key" in map GetArgType)
25   - failed to replay: SuperPMI assertion failed (missing map GetFieldClass)
11   - failed to replay: SuperPMI assertion failed (missing key "key" in map GetClassSize)
1    - failed to replay: SuperPMI assertion failed (missing key "key" in map GetClassNumInstanceFields)

Nothing outside the missing context ones though. Perhaps I fixed it by resolving the other two issues I had seen...

@tannergooding
Copy link
Member Author

tannergooding commented Dec 17, 2022

Was able to get a local repro, it was libraries_tests.pmi. The log calls out this particular failure as [04:00:15] ERROR: Method 252910 of size 64 failed to load and compile correctly by JIT2 (C:\h\w\B49E0952\p\diff\checked\clrjit_unix_x64_x64.dll).

The actual assert was:

ISSUE: <ASSERT> #252910 D:\Users\tagoo\source\repos\runtime\src\coreclr\jit\lower.cpp (6592) - Assertion failed 'arg->OperIsPutArg()' in 'System.Numerics.Tests.Vector2Tests:Vector2LengthTest1():this' during 'Lowering nodeinfo' (IL size 64; hash 0x1063802a; FullOpts)

ERROR: SuperPMI: Assert Failure (PID 37748, Thread 13216/33a0)
Assertion failed 'arg->OperIsPutArg()' in 'System.Numerics.Tests.Vector2Tests:Vector2LengthTest1():this' during 'Lowering nodeinfo' (IL size 64; hash 0x1063802a; FullOpts)

Issue ended up being the lowering logic around insert zero, zero which removed node but didn't adjust the gtNext returned, so it terminated the lowering logic too early. Fixed that to correctly pass through gtNext and things were passing again.

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib ready for review now with the SPMI issue resolved.

Saving ~5.2k overall bytes on Arm64 and ~11.5k overall bytes on x64, with most of that being in MinOpts for each of them.

Likewise with a -0.02% overall improvement to throughput, its higher and up to -0.05% in MinOpts.

void genSIMDScalarMove(
var_types targetType, var_types type, regNumber target, regNumber src, SIMDScalarMoveType moveType);
void genSIMDZero(var_types targetType, var_types baseType, regNumber targetReg);
void genSIMDIntrinsicInitN(GenTreeSIMD* simdNode);
void genSIMDIntrinsicUpperSave(GenTreeSIMD* simdNode);
void genSIMDIntrinsicUpperRestore(GenTreeSIMD* simdNode);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two only exist in LIR and are created by LSRA

regNumber targetReg);
void genSIMDIntrinsic32BitConvert(GenTreeSIMD* simdNode);
void genSIMDIntrinsic64BitConvert(GenTreeSIMD* simdNode);
void genSIMDExtractUpperHalf(GenTreeSIMD* simdNode, regNumber srcReg, regNumber tgtReg);
void genSIMDIntrinsic(GenTreeSIMD* simdNode);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go away once we handle UpperSave/UpperRestore in a follow up PR.

// TODO-CQ: We don't handle contiguous args for anything except TYP_FLOAT today

GenTree* prevArg = nullptr;
bool areArgsContiguous = (simdBaseType == TYP_FLOAT);
Copy link
Member Author

@tannergooding tannergooding Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing handling in areArgumentsContiguous and elsewhere only checks and supports TYP_FLOAT today.

It shouldn't be difficult to extend to other regular types (TYP_INT, TYP_UINT, TYP_LONG, TYP_ULONG, and TYP_DOUBLE). It might require a bit more work to extend to small types (TYP_BYTE, TYP_UBYTE, TYP_SHORT, and TYP_USHORT).

However, I'm leaving it to a follow up PR to minimize the churn here and since it doesn't impact the System.Numerics types that were handled by impSimdIntrinsic.

Comment on lines +767 to +776
CORINFO_ARG_LIST_HANDLE arg1 = sig->args;
CORINFO_ARG_LIST_HANDLE arg2 = info.compCompHnd->getArgNext(arg1);
var_types argType = TYP_UNKNOWN;
CORINFO_CLASS_HANDLE argClass = NO_CLASS_HANDLE;

argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg2, &argClass)));
op2 = getArgForHWIntrinsic(argType, argClass);

argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg1, &argClass)));
op1 = getArgForHWIntrinsic(argType, argClass);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a pre-existing assert that was being triggered by operator /(Vector128<T> vector, T scalar)

}
assert(intrinsicId == NI_SSE41_Insert);

// We have Sse41.Insert in which case we can specially handle
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling added here ensures we get great codegen for some of the patterns that had popped up for Vector2/3/4.

It will also help with the System.Runtime.Intrinsics handling when zero is involved.

@tannergooding
Copy link
Member Author

Merging in dotnet/main to resolve the unrelated CI failure.

This is still ready for review.

@build-analysis build-analysis bot mentioned this pull request Dec 22, 2022
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Run outerloop? jitstress? ISAs jitstress?

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

jitstress failures are #75244

@ghost ghost locked as resolved and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants