Return Matrix.Identity from static field instead of creating it every time #87723

BrzVlad · 2023-06-17T08:17:49Z

This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement. Quite a bit of matrix benchmarks from our suite rely on this property.

This is how it was originally implemented but has since been changed to creating a matrix on the fly in f8218f9 leading to some reported regressions on mono.

… time This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement.

ghost · 2023-06-17T08:17:59Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement. Quite a bit of matrix benchmarks from our suite rely on this property.

This is how it was originally implemented but has since been changed to creating a matrix on the fly in f8218f9 leading to some reported regressions on mono.

Author:	BrzVlad
Assignees:	-
Labels:	`area-System.Numerics`
Milestone:	-

gfoidl · 2023-06-17T11:16:22Z

src/libraries/System.Private.CoreLib/src/System/Numerics/Matrix3x2.Impl.cs

@@ -48,19 +48,11 @@ internal struct Impl : IEquatable<Impl>
                Z = new Vector2(m31, m32);
            }

+            private static readonly Impl _identity = new Impl () { X = Vector2.UnitX, Y = Vector2.UnitY, Z = Vector2.Zero };
+
            public static Impl Identity


Could be written as one-liner too:

Suggested change

public static Impl Identity

public static Impl Identity { get; } = new Impl() { X = Vector2.UnitX, Y = Vector2.UnitY, Z = Vector2.Zero };

tannergooding · 2023-06-17T18:12:44Z

Can you provide the benchmark numbers and diffs?

This was an intentional change to better align with how native libraries do this and to more readily allow optimizations that are applicable to real world usages of the value.

In real world cases, returning this the way it was (that is constructing a new object in the property) allows for promotion, constant folding, and other optimizations to light up.

BrzVlad · 2023-06-19T08:03:48Z

There are a lot of benchmarks in https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Numerics.Vectors/Perf_Matrix4x4.cs that call Matrix4x4.Identity repeatedly. For example, manually testing locally an extracted microbenchmark:

Matrix4x4 res = Matrix4x4.Identity;
for (int i = 0; i < 1000000000; i++)
    res = Matrix4x4.Add (Matrix4x4.Identity, Matrix4x4.Identity);
Console.WriteLine (res);

Before this change it takes around 5.1s and after 3.3s, on coreclr amd64. On mono the gain is much higher.

I don't really understand why building the value in the property would provide any additional support for JIT optimizations. This PR makes it such that we access the value from a static readonly field, which the JIT would also see it as a constant once the class is initialized.

SingleAccretion · 2023-06-19T09:43:19Z

I don't really understand why building the value in the property would provide any additional support for JIT optimizations. This PR makes it such that we access the value from a static readonly field, which the JIT would also see it as a constant once the class is initialized.

The idea is that => new(const...) is more amenable to RyuJit's optimizations because the constants are more readily visible, if you decompose the matrix into vector fields, as the code aspires to do. It's the same idea why static int Prop => 10 is better than static int Prop { get; } = 10.

Now, this is a bit theoretical with structs, because some things can go not as planned (as the benchmarks results, clearly, indicate), and somewhat recently we've gained the ability get constants out of static readonly structs (but at a late stage, in VN - historically RyuJit wasn't able to see through field access of static readonly structs at all); it would require a deeper investigation to determine what's up.

EgorBo · 2023-06-19T14:43:22Z

At the same time, RyuJIT should be good with static readonly structs in this case, shouldn't it?
e.g.

static readonly Matrix4x4 Identity = Matrix4x4.Identity;

[MethodImpl(MethodImplOptions.NoInlining)]
static float Test()
{
    return Identity.M22;
}

is correctly folded in return 1.0f;

tannergooding · 2023-06-19T17:03:52Z

At the same time, RyuJIT should be good with static readonly structs in this case, shouldn't it?

Yes. It is correctly optimized in Tier 1 after rejit. However, it is worse for scenarios where rejit cannot happen. This can negatively impact AOT and can leave various artifacts left around due to the existence of the static constructor.

The existing code should likewise be optimizable. But it looks like the JIT gets tripped up a bit due to the Unsafe.As that happens and so it doesn't realize it can throw out the 3 unused fields and fold the constant.

There are a lot of benchmarks in https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Numerics.Vectors/Perf_Matrix4x4.cs that call Matrix4x4.Identity repeatedly

Right, and they aren't necessarily representative of "real world" code. They're setup in a way to allow tracking general changes to the code over time instead.

An actual usage typically only uses M4x4 as an initial value for the matrix before mutating it further -or- as as the input to a transform where the user want to no-op.

Before this change it takes around 5.1s and after 3.3s, on coreclr amd64. On mono the gain is much higher.

For Mono part of this is because the general Vector4 acceleration isn't all in yet. When it is, most of this should improve quite a bit.

For RyuJIT and Mono both, this is a bit of what was touched above that Unsafe.As is causing the normal struct optimizations to give up.

Matrix4x4 in general was designed exposing public fields and that leads to a few of these pessimizations. Ideally we'd be exposing 4x Vector4 properties instead and have all the fields completely hidden.

Likewise, we'd ideally be using the new Unsafe.BitCast (after the other optimizations around BitCast go in) with the compiler correctly recognizing that its not actual aliased data so we could get it treated as 4x Vector4 fields. That then allows cases like m4x4A + m4x4B to be constant foldable by extension, and they are for the backing Matrix4x4.Impl struct.

jakobbotsch · 2023-06-20T06:54:04Z

I analyzed why we don't get expected (constant folded) codegen in these benchmarks here: #76928 (comment)

The codegen for this benchmark is terrible with and without physical promotion. The problem is around V09 that we end up address exposing -- the JIT is not able to see through the AsImpl() calls with full fidelity. If we change AsImpl to return by value instead of by ref then the problem is solved and the benchmark reduces to a vector constant. At the same time we can switch to Unsafe.BitCast. Does that seem reasonable @tannergooding ?

The idea would be to switch the AsImpl() calls that are directly dereferencing the returned pointer to an ImplValue() that is essentially Impl ImplValue() => Unsafe.As<Matrix4x4, Impl>(ref this); (or using Unsafe.BitCast, but the problem here is unrelated to the As vs BitCast debate).
(I also looked at fixing this in the JIT, but it didn't look straightforward.)

BrzVlad · 2023-06-30T07:24:59Z

Closing this since relying on static constructors to run might be problematic in some scenarios.

Return Matrix.Identity from static field instead of building it every…

52f72f6

… time This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement.

BrzVlad added the area-System.Numerics label Jun 17, 2023

ghost assigned BrzVlad Jun 17, 2023

gfoidl reviewed Jun 17, 2023

View reviewed changes

BrzVlad closed this Jun 30, 2023

ghost locked as resolved and limited conversation to collaborators Jul 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return Matrix.Identity from static field instead of creating it every time #87723

Return Matrix.Identity from static field instead of creating it every time #87723

BrzVlad commented Jun 17, 2023

ghost commented Jun 17, 2023

gfoidl Jun 17, 2023

tannergooding commented Jun 17, 2023 •

edited

Loading

BrzVlad commented Jun 19, 2023

SingleAccretion commented Jun 19, 2023

EgorBo commented Jun 19, 2023 •

edited

Loading

tannergooding commented Jun 19, 2023

jakobbotsch commented Jun 20, 2023

BrzVlad commented Jun 30, 2023

	public static Impl Identity
	public static Impl Identity { get; } = new Impl() { X = Vector2.UnitX, Y = Vector2.UnitY, Z = Vector2.Zero };

Return Matrix.Identity from static field instead of creating it every time #87723

Return Matrix.Identity from static field instead of creating it every time #87723

Conversation

BrzVlad commented Jun 17, 2023

ghost commented Jun 17, 2023

gfoidl Jun 17, 2023

Choose a reason for hiding this comment

tannergooding commented Jun 17, 2023 • edited Loading

BrzVlad commented Jun 19, 2023

SingleAccretion commented Jun 19, 2023

EgorBo commented Jun 19, 2023 • edited Loading

tannergooding commented Jun 19, 2023

jakobbotsch commented Jun 20, 2023

BrzVlad commented Jun 30, 2023

tannergooding commented Jun 17, 2023 •

edited

Loading

EgorBo commented Jun 19, 2023 •

edited

Loading