Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return Matrix.Identity from static field instead of creating it every time #87723

Closed
wants to merge 1 commit into from

Conversation

BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Jun 17, 2023

This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement. Quite a bit of matrix benchmarks from our suite rely on this property.

This is how it was originally implemented but has since been changed to creating a matrix on the fly in f8218f9 leading to some reported regressions on mono.

… time

This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement.
@ghost
Copy link

ghost commented Jun 17, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement. Quite a bit of matrix benchmarks from our suite rely on this property.

This is how it was originally implemented but has since been changed to creating a matrix on the fly in f8218f9 leading to some reported regressions on mono.

Author: BrzVlad
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@@ -48,19 +48,11 @@ internal struct Impl : IEquatable<Impl>
Z = new Vector2(m31, m32);
}

private static readonly Impl _identity = new Impl () { X = Vector2.UnitX, Y = Vector2.UnitY, Z = Vector2.Zero };

public static Impl Identity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be written as one-liner too:

Suggested change
public static Impl Identity
public static Impl Identity { get; } = new Impl() { X = Vector2.UnitX, Y = Vector2.UnitY, Z = Vector2.Zero };

@tannergooding
Copy link
Member

tannergooding commented Jun 17, 2023

Can you provide the benchmark numbers and diffs?

This was an intentional change to better align with how native libraries do this and to more readily allow optimizations that are applicable to real world usages of the value.

In real world cases, returning this the way it was (that is constructing a new object in the property) allows for promotion, constant folding, and other optimizations to light up.

@BrzVlad
Copy link
Member Author

BrzVlad commented Jun 19, 2023

There are a lot of benchmarks in https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Numerics.Vectors/Perf_Matrix4x4.cs that call Matrix4x4.Identity repeatedly. For example, manually testing locally an extracted microbenchmark:

Matrix4x4 res = Matrix4x4.Identity;
for (int i = 0; i < 1000000000; i++)
    res = Matrix4x4.Add (Matrix4x4.Identity, Matrix4x4.Identity);
Console.WriteLine (res);

Before this change it takes around 5.1s and after 3.3s, on coreclr amd64. On mono the gain is much higher.

I don't really understand why building the value in the property would provide any additional support for JIT optimizations. This PR makes it such that we access the value from a static readonly field, which the JIT would also see it as a constant once the class is initialized.

@SingleAccretion
Copy link
Contributor

I don't really understand why building the value in the property would provide any additional support for JIT optimizations. This PR makes it such that we access the value from a static readonly field, which the JIT would also see it as a constant once the class is initialized.

The idea is that => new(const...) is more amenable to RyuJit's optimizations because the constants are more readily visible, if you decompose the matrix into vector fields, as the code aspires to do. It's the same idea why static int Prop => 10 is better than static int Prop { get; } = 10.

Now, this is a bit theoretical with structs, because some things can go not as planned (as the benchmarks results, clearly, indicate), and somewhat recently we've gained the ability get constants out of static readonly structs (but at a late stage, in VN - historically RyuJit wasn't able to see through field access of static readonly structs at all); it would require a deeper investigation to determine what's up.

@EgorBo
Copy link
Member

EgorBo commented Jun 19, 2023

At the same time, RyuJIT should be good with static readonly structs in this case, shouldn't it?
e.g.

static readonly Matrix4x4 Identity = Matrix4x4.Identity;

[MethodImpl(MethodImplOptions.NoInlining)]
static float Test()
{
    return Identity.M22;
}

is correctly folded in return 1.0f;

@tannergooding
Copy link
Member

At the same time, RyuJIT should be good with static readonly structs in this case, shouldn't it?

Yes. It is correctly optimized in Tier 1 after rejit. However, it is worse for scenarios where rejit cannot happen. This can negatively impact AOT and can leave various artifacts left around due to the existence of the static constructor.

The existing code should likewise be optimizable. But it looks like the JIT gets tripped up a bit due to the Unsafe.As that happens and so it doesn't realize it can throw out the 3 unused fields and fold the constant.

There are a lot of benchmarks in https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Numerics.Vectors/Perf_Matrix4x4.cs that call Matrix4x4.Identity repeatedly

Right, and they aren't necessarily representative of "real world" code. They're setup in a way to allow tracking general changes to the code over time instead.

An actual usage typically only uses M4x4 as an initial value for the matrix before mutating it further -or- as as the input to a transform where the user want to no-op.

Before this change it takes around 5.1s and after 3.3s, on coreclr amd64. On mono the gain is much higher.

For Mono part of this is because the general Vector4 acceleration isn't all in yet. When it is, most of this should improve quite a bit.

For RyuJIT and Mono both, this is a bit of what was touched above that Unsafe.As is causing the normal struct optimizations to give up.

Matrix4x4 in general was designed exposing public fields and that leads to a few of these pessimizations. Ideally we'd be exposing 4x Vector4 properties instead and have all the fields completely hidden.

Likewise, we'd ideally be using the new Unsafe.BitCast (after the other optimizations around BitCast go in) with the compiler correctly recognizing that its not actual aliased data so we could get it treated as 4x Vector4 fields. That then allows cases like m4x4A + m4x4B to be constant foldable by extension, and they are for the backing Matrix4x4.Impl struct.

@jakobbotsch
Copy link
Member

I analyzed why we don't get expected (constant folded) codegen in these benchmarks here: #76928 (comment)

The codegen for this benchmark is terrible with and without physical promotion. The problem is around V09 that we end up address exposing -- the JIT is not able to see through the AsImpl() calls with full fidelity. If we change AsImpl to return by value instead of by ref then the problem is solved and the benchmark reduces to a vector constant. At the same time we can switch to Unsafe.BitCast. Does that seem reasonable @tannergooding ?

The idea would be to switch the AsImpl() calls that are directly dereferencing the returned pointer to an ImplValue() that is essentially Impl ImplValue() => Unsafe.As<Matrix4x4, Impl>(ref this); (or using Unsafe.BitCast, but the problem here is unrelated to the As vs BitCast debate).
(I also looked at fixing this in the JIT, but it didn't look straightforward.)

@BrzVlad
Copy link
Member Author

BrzVlad commented Jun 30, 2023

Closing this since relying on static constructors to run might be problematic in some scenarios.

@BrzVlad BrzVlad closed this Jun 30, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants