-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of my Maze Generator went from 2.1 seconds to 3.8 after upgrading to .NET 7.0 #78110
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
@devedse Thanks for the report, that's not expected. Have you tried using a profiler to see what changed? That might help us route this. |
@devedse can you add repro steps? I would like to look into this and want to make sure I'm looking at the right things. |
https://github.com/devedse/DeveMazeGeneratorCore/blob/master/DeveMazeGeneratorCore/Generators/AlgorithmBacktrack2Deluxe2.cs#L47-L50 is invalid Unsafe code. Bool is 1 byte. Int is 4 bytes. It is not ok to cast managed pointer to bool to managed pointer to int. It can be reading random memory from stack. |
Hey all, thanks for the suggestions. @jkotas I've indeed run into that issue when running my program on android. Previously though for C# (with dotnet 6.0) it seemed to work fine. I did write a "byte" version of the algorithm and have changed to that: I've now changed to this maze generation algorithm: It does indeed seem there's a huge performance improvement. 😄 Edit1: Edit2: |
As with #78127 I think the remaining performance difference can be explained by OSR. You can disable this in your <PropertyGroup>
<TieredCompilationQuickJitForLoops>false</TieredCompilationQuickJitForLoops>
</PropertyGroup> The generation is dominated by one long-running method, ;; .NET 6
G_M000_IG07: ;; offset=011AH
418D5424FE lea edx, [r12-02H]
488BCE mov rcx, rsi
458BC5 mov r8d, r13d
41FF5718 call [r15+18H]DeveMazeGeneratorCore.InnerMaps.InnerMap:get_Item(int,int):bool:this
;; .NET 7
G_M000_IG04: ;; offset=0099H
418D57FE lea edx, [r15-02H]
488BCE mov rcx, rsi
458BC4 mov r8d, r12d
488B06 mov rax, qword ptr [rsi]
4C8B6848 mov r13, qword ptr [rax+48H]
41FF5518 call [r13+18H]DeveMazeGeneratorCore.InnerMaps.InnerMap:get_Item(int,int):bool:this
I suspect in the .NET 7 OSR case there is no dominating deref of |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionI've been working on a Maze Generator as a personal hobby project for a number of years now and always try to keep it up to date with the latest .NET stuff. Sadly the opposite seems to be true. On my main computer: On my laptop: I didn't make screenshots but the times went from: Before: 2.1 seconds My codeThe following file contains the implementation of the Algorithm I'm using. Besides the code in this file I'm using my own implementation of a Random generator + my own implementation of a BitArray to store the maze. ReproduceI've created 2 branches that show the difference quite easily by simply running the ConsoleApp: dotnet 6: dotnet 7: Additional infoEven running them at the same time side by side you can clearly see the performance difference: Some more research and doing a NativeAOT compilation to see if this differs (it does?!?)I did some more investigation and compiled my code like this:
This results in 2 exe files that for some reason also differ quite significantly in performance:
The second file (in the publish folder) does seem to go slightly faster then the original .NET 6.0.
|
Also @EgorBo FYI PGO doesn't help here at all. |
Yes it is something that should be improved -- if you set The issue here is that in .NET 7 with OSR, we no longer see some of the code that executed before the main loop in
so in .NET 6 the dominating pre-loop case lets all the in-loop cases get CSE'd, while in .NET 7 and current .NET 8 we can only CSE within the loop, and that double indir within the loop is apparently costly. Possible fixes include loop peeing and/or cloning to expose an invariant INDIR tree (which we don't do right now) gated by a null check of V02. I would expect PGO to help as we should be able to clone based on type; let me look into that next. |
With .NET 8, PGO does seem to help, I get considerably faster times. So that is another possible fix. We indeed clone based on type the cloned loop is free of calls. In .NET 7 we seem to be unable to properly resolve the call targets for |
Without PGO it seems like we could clone the loop to expose the invariant method table fetch and make it non-faulting in the hot loop, and then hoist it out. I prototyped the cloning part and it's not too bad. But these loads are conditional within the loop and so we won't hoist them. Fixing that may be a bit more involved. Also note that any OSR loop is quite likely to be very hot, as we know that it's already iterated several thousand times. So we might want to be more aggressive optimizing these loops (however we have to be careful not to do a better job than Tier1 or we will see some really wonky perf behaviors).
|
Motivating example is the OSR loop in dotnet#78110 where in OSR we don't see a deref before the loop, so can't hoist repeated virtual method lookups. Turns out that these indirs are conditional so we end up not hoisting, but at least this does the cloning part. Still needs some polish, but should be functionally correct.
As noted above, PGO in .NET 8 will provide a 20% performance boost. With .NET 8 Preview 5 (coming out next week) PGO is now enabled by default. So I am going to close this issue.
|
Thanks @AndyAyersMS for the elaborate explanation and work on making dotnet even faster 😄 |
Description
I've been working on a Maze Generator as a personal hobby project for a number of years now and always try to keep it up to date with the latest .NET stuff.
Yesterday .NET 7.0 came out and I immediately upgraded my project to see if it would bring performance benefits.
Sadly the opposite seems to be true.
On my main computer:
Processor: AMD 5950x
RAM: 128GB
Operating System: Windows 11
dotnet SDK: 7.0.100
Before .NET 7 Upgrade:
After .NET 7 upgrade:
On my laptop:
Processor: Intel Core i7 12700H
RAM: 128GB
Operating System: Windows 11
dotnet SDK: 7.0.100
I didn't make screenshots but the times went from:
Before: 2.1 seconds
After: 2.6 seconds
My code
The following file contains the implementation of the Algorithm I'm using. Besides the code in this file I'm using my own implementation of a Random generator + my own implementation of a BitArray to store the maze.
https://github.com/devedse/DeveMazeGeneratorCore/blob/master/DeveMazeGeneratorCore/Generators/AlgorithmBacktrack2Deluxe2.cs
Reproduce
I've created 2 branches that show the difference quite easily by simply running the ConsoleApp:
dotnet 6:
https://github.com/devedse/DeveMazeGeneratorCore
dotnet 7:
https://github.com/devedse/DeveMazeGeneratorCore/tree/dotnet7
Additional info
Even running them at the same time side by side you can clearly see the performance difference:
Some more research and doing a NativeAOT compilation to see if this differs (it does?!?)
I did some more investigation and compiled my code like this:
This results in 2 exe files that for some reason also differ quite significantly in performance:
DeveMazeGeneratorCore.ConsoleApp\bin\Release\net7.0\win-x64\DeveMazeGenerator.ConsoleApp.exe (9.265kb):
DeveMazeGeneratorCore.ConsoleApp\bin\Release\net7.0\win-x64\publish\DeveMazeGenerator.ConsoleApp.exe (100.360kb):
The second file (in the publish folder) does seem to go slightly faster then the original .NET 6.0.
The text was updated successfully, but these errors were encountered: