Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some low-hanging HWIntrinsic issues #80626

Merged
merged 10 commits into from
Jan 15, 2023

Conversation

tannergooding
Copy link
Member

This fixes a few low-hanging issues:

  • It ensures WithElement for Vector64/128/256 isn't claiming to be a System.Numerics type
  • It ensures Vector128.AsVector2 and Vector128.AsVector3 are intrinsic
  • It ensures Vector2.AsVector128 and Vector3.Vector128 are intrinsic
  • It ensures AsNInt and AsNUInt for Vector64/128/256 are intrinsic
  • It ensures get_Count and IsSupported for Vector64/128/256 and Vector<T> are intrinsic

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 13, 2023
@ghost ghost assigned tannergooding Jan 13, 2023
@ghost
Copy link

ghost commented Jan 13, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This fixes a few low-hanging issues:

  • It ensures WithElement for Vector64/128/256 isn't claiming to be a System.Numerics type
  • It ensures Vector128.AsVector2 and Vector128.AsVector3 are intrinsic
  • It ensures Vector2.AsVector128 and Vector3.Vector128 are intrinsic
  • It ensures AsNInt and AsNUInt for Vector64/128/256 are intrinsic
  • It ensures get_Count and IsSupported for Vector64/128/256 and Vector<T> are intrinsic
Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

For IsSupported in particular it allow some significant amount of dead code elimination in the case where the T was not supported. In one such method we get the following diff:

- ; Lcl frame size = 1112
+ ; Lcl frame size = 72

In the above code, the actual method body remains the same. We just no longer have a non-trivial amount of locals we couldn't determine were dead:

G_M3734_IG02:
       mov      ecx, 66
       call     [System.ThrowHelper:ThrowNotSupportedException(int)]
       int3

@tannergooding
Copy link
Member Author

Likewise for get_Count we see some better codegen for the software fallback cases and more loop unrolling occurring. It does cause an expected and slight increase in code size when T is supported but really only impacts a subset of the Vector128/256/512 software fallbacks.

@tannergooding
Copy link
Member Author

tannergooding commented Jan 14, 2023

Really great improvement to size and TP. There are few small size regressions due to additional inlining that occurs in a few places now. The TP and size wins may be a little misrepresented since SPMI tests Vector<float> as one of the generic types so this is really impacting all the Vector128<Vector<float>> paths that get tested by SPMI.

That being said, such paths do appear in practice almost anywhere we have Vector###<T> support. This is because many of those paths are generic and we only take the "accelerated path" when T is supported. This leaves the JIT with the job of doing dead code elimination and these being intrinsic make that much simpler/easier. -- Consider that a dev might do IndexOf on a Span<Guid>, in which case the accelerated path will do a if Vector128<Guid>.IsSupported check

On Windows x64

Overall (-257,636 bytes)
MinOpts (-916 bytes)
FullOpts (-256,720 bytes)

and

Overall (-0.12%)
FullOpts (-0.16%)

Will post Arm64 numbers once CI finishes.

@tannergooding tannergooding marked this pull request as ready for review January 14, 2023 14:26
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib. Fixing some low-hanging fruit giving a significant decrease to SPMI generated bytes and improvement to throughput for x64

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diffs look great! The dead code elimination really helped out.

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

tannergooding commented Jan 15, 2023

Logged #80666 to track the jitstress/outerloop failures which are also occuring in main

@tannergooding tannergooding merged commit 968b4f8 into dotnet:main Jan 15, 2023
@tannergooding tannergooding deleted the better-simd branch January 15, 2023 06:13
@ghost ghost locked as resolved and limited conversation to collaborators Feb 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants