Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Visit blocks in RPO during LSRA #107927

Merged
merged 8 commits into from
Sep 20, 2024
Merged

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Sep 17, 2024

Part of #107749. LSRA's currently does a lexical pass over the blocklist to build a visitation order. Since we intend to run block layout after LSRA with #107634, LSRA ideally shouldn't be sensitive to lexical ordering, and since the current logic tries to visit a block's predecessors before the block itself, it seems easier and faster to just use an RPO traversal.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 17, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr outerloop, Fuzzlyn

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@amanasifkhalid amanasifkhalid marked this pull request as ready for review September 18, 2024 01:45
@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @AndyAyersMS @kunalspathak PTAL. Fuzzlyn failures are known or NaN false positives.

Note that LSRA previously had three ordering strategies: the default preds-first one, lexical order, and randomized order (which was never implemented). RPO looks like a viable replacement for the first one, and lexical order doesn't make much sense if we plan to move block layout later, so I removed the functionality for specifying LSRA's block order. Is it ok to remove this for now, and add it back in if/when we decide to implement a randomized order?

@amanasifkhalid
Copy link
Member Author

Diffs are large, though a net size improvement. Looking at the instructions retired per collection, the larger MinOpts TP regressions are concentrated in collections with relatively few MinOpts methods, so I think the TP impact isn't as bad as it looks.

@kunalspathak
Copy link
Member

Fuzzlyn linux/arm failures seems to expose some more issues by this change. We should address all of them before merging this PR.

@amanasifkhalid
Copy link
Member Author

Fuzzlyn linux/arm failures seems to expose some more issues by this change. We should address all of them before merging this PR.

The assertion varDsc->IsAlwaysAliveInMemory() || ((regSet.GetMaskVars() & regMask) == 0) looks like the one you just fixed, though it doesn't match the assert listed at the bottom, otherTargetInterval->registerType == TYP_DOUBLE -- the latter assert's message references a method that doesn't exist in the trimmed repro. I'm guessing this is a Fuzzlyn bug? I can't reproduce either assert with the provided repro; I'll try rerunning Fuzzlyn and see if we get anything actionable.

@amanasifkhalid
Copy link
Member Author

/azp run Fuzzlyn

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

@kunalspathak the Linux arm failure didn't repro in the last Fuzzlyn run, so if it is a bug, it doesn't readily repro. Aside from inner/outerloop tests, are there any other suites you'd like me to run? JitStress doesn't seem to do anything interesting for block ordering, though if it's still worthwhile to run LSRA stress modes, I can do that -- thanks!

@kunalspathak
Copy link
Member

@kunalspathak the Linux arm failure didn't repro in the last Fuzzlyn run, so if it is a bug, it doesn't readily repro. Aside from inner/outerloop tests, are there any other suites you'd like me to run? JitStress doesn't seem to do anything interesting for block ordering, though if it's still worthwhile to run LSRA stress modes, I can do that -- thanks!

Yes, since Fuzzlyn is randomized, it might not necessary repro on every run, but we should take the failure that we saw in previous run and see why it showed up with this changes and go from there. I would usually run *jitstressregs* pipelines too.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr jitstressregs, runtime-coreclr jitstressregs-x86, runtime-coreclr jitstress2-jitstressregs

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@amanasifkhalid
Copy link
Member Author

Note that the diffs from the latest run look quite different because the collections got a bit messed up on x64: we're missing coreclr_tests and libraries_tests*, and diffs from smoke_tests.nativeaot are included multiple times.

@kunalspathak
Copy link
Member

kunalspathak commented Sep 20, 2024

loop-aware "RPO" would be useful

What is loop-aware "RPO"?

@amanasifkhalid amanasifkhalid merged commit ddf8075 into dotnet:main Sep 20, 2024
157 of 160 checks passed
@amanasifkhalid amanasifkhalid deleted the lsra-rpo branch September 20, 2024 18:38
@amanasifkhalid
Copy link
Member Author

What is loop-aware "RPO"?

When visiting a block during an RPO traversal, we check if the block is a loop header, and if so, we visit the rest of the loop's body before visiting anything else -- value numbering currently does this, if you want to see what the implementation looks like. This visit ordering has the nice property of keeping loop bodies compact.

@AndyAyersMS
Copy link
Member

The first one #108201

Still waiting on arm64 results (tomorrow). Also some of this might be mitigated by loop-aware.

@kunalspathak
Copy link
Member

Also some of this might be mitigated by loop-aware.

@amanasifkhalid - can we try locally if loop-aware helps mitigate some of the regressions?

@amanasifkhalid
Copy link
Member Author

@amanasifkhalid - can we try locally if loop-aware helps mitigate some of the regressions?

Sure, I'll try that today. I'll have the top regressions collated here soon.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Sep 26, 2024

x64 regressions:

Notes Recent Score Orig Score Linux x64 Windows x64 ViperLinux x64 ViperWindows x64 Benchmark
1.97 1.97 1.97
1.97
System.Tests.Perf_String.IndexerCheckLengthHoisting
1.75 1.75 1.75
1.75
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateArray(TestCase: ArrayOfStrings)
1.74 1.75 1.74
1.75
System.Collections.IterateForEachNonGeneric(String).SortedList(Size: 512)
1.73 1.44 1.73
1.44
System.Threading.Tasks.ValueTaskPerfTest.CreateAndAwait_FromResult_ConfigureAwait
1.70 1.75 1.70
1.75
System.Collections.IterateForEachNonGeneric(Int32).SortedList(Size: 512)
1.70 1.71 1.70
1.71
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False))
1.56 1.26 1.56
1.26
System.Collections.Sort(IntStruct).LinqOrderByExtension(Size: 512)
1.55 1.28 1.55
1.28
System.Collections.Sort(IntStruct).LinqQuery(Size: 512)
1.54 1.55 1.79
1.79
1.63
1.65
1.25
1.25
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, False))
1.54 1.55 1.80
1.80
1.63
1.65
1.24
1.25
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False))
1.52 1.27 1.52
1.27
System.Memory.Span(Int32).Fill(Size: 512)
1.51 1.52 1.51
1.52
System.Linq.Tests.Perf_Enumerable.SelectToList(input: List)
1.49 1.49 1.78
1.78
1.24
1.24
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))
1.47 1.44 1.47
1.44
System.Tests.Perf_Version.TryFormat4
1.47 1.47 1.47
1.47
System.Linq.Tests.Perf_Enumerable.SelectToArray(input: Array)
1.45 1.39 1.69
1.55
1.25
1.24
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, False))
1.44 1.40 1.44
1.40
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sher[a-z]+
1.44 1.43 1.67
1.65
1.24
1.24
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, False))
1.43 1.49 1.65
1.80
1.24
1.24
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))
1.43 1.36 1.43
1.36
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, Ordinal, False))
1.41 1.40 1.26
1.24
1.58
1.58
System.Text.Json.Document.Tests.Perf_EnumerateObject.EnumerateProperties(TestCase: StringProperties)
1.40 1.39 1.24
1.23
1.58
1.58
System.Text.Json.Document.Tests.Perf_EnumerateObject.EnumerateProperties(TestCase: NumericProperties)
1.39 1.45 1.39
1.45
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sherlock
1.39 1.39 1.23
1.23
1.23
1.27
1.53
1.53
1.59
1.55
System.Linq.Tests.Perf_OrderBy.OrderByCustomComparer(NumberOfPeople: 512)
1.38 1.46 1.38
1.46
System.Tests.Perf_UInt64.TryFormat(value: 18446744073709551615)
1.37 1.38 1.37
1.38
System.Linq.Tests.Perf_Enumerable.WhereAny_LastElementMatches(input: List)
1.36 1.36 1.36
1.36
SciMark2.kernel.benchSparseMult
1.35 1.62 1.35
1.62
System.Tests.Perf_String.Replace_Char(text: "This is a very nice sentence", oldChar: 'z', newChar: 'y')
1.33 1.25 1.46
1.30
1.22
1.20
System.Linq.Tests.Perf_Enumerable.All_AllElementsMatch(input: IEnumerable)
1.33 1.31 1.23
1.23
1.44
1.40
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "aqj", Options: NonBacktracking)
1.33 1.34 1.33
1.34
System.Memory.Span(Int32).LastIndexOfAnyValues(Size: 512)
1.33 1.33 1.28
1.28
1.38
1.38
Microsoft.Extensions.Primitives.StringSegmentBenchmark.Equals_Valid
1.32 1.33 1.32
1.33
System.Collections.Tests.DictionarySequentialKeys.ContainsValue_17_Int_Int
1.31 1.31 1.23
1.25
1.40
1.38
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "zqj", Options: None)
1.30 1.30 1.22
1.24
1.39
1.37
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "zqj", Options: NonBacktracking)
1.30 1.33 1.23
1.25
1.36
1.41
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "aqj", Options: None)
1.29 1.27 1.29
1.27
Benchstone.BenchI.BenchE.Test
1.29 1.79 1.29
1.79
Microsoft.Extensions.Primitives.StringSegmentBenchmark.Trim
1.29 1.32 1.29
1.32
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, Ordinal, False))
1.28 1.29 1.28
1.29
System.Collections.TryGetValueTrue(Int32, Int32).SortedDictionary(Size: 512)
1.28 1.28 1.40
1.38
1.17
1.19
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "\w+\s+Holmes\s+\w+", Options: NonBacktracking)
1.28 1.28 1.28
1.28
System.Buffers.Tests.ReadOnlySequenceTests(Byte).IterateGetPositionArray
1.27 1.25 1.27
1.25
System.Buffers.Tests.ReadOnlySequenceTests(Char).IterateGetPositionArray
1.27 1.27 1.27
1.27
System.Memory.Span(Int32).IndexOfAnyThreeValues(Size: 512)
1.27 1.26 1.44
1.43
1.11
1.11
System.Memory.ReadOnlySpan.IndexOfString(input: "foobardzsdzs", value: "rddzs", comparisonType: InvariantCulture)
1.27 1.27 1.27
1.27
System.Linq.Tests.Perf_Enumerable.SelectToList(input: Range)
1.26 1.22 1.26
1.22
System.Collections.Tests.Perf_SortedSet.EnumerateViewBetween
1.26 1.27 1.26
1.27
System.Linq.Tests.Perf_Enumerable.WhereAny_LastElementMatches(input: Array)
1.26 1.27 1.26
1.27
System.Globalization.Tests.StringEquality.Compare_DifferentFirstChar(Count: 1024, Options: (en-US, OrdinalIgnoreCase))
1.26 1.25 1.38
1.36
1.14
1.14
System.Linq.Tests.Perf_Enumerable.WhereSingleOrDefault_LastElementMatches(input: Array)
1.25 1.25 1.25
1.25
System.Memory.Span(Int32).LastIndexOfAnyValues(Size: 33)
1.25 1.20 1.32
1.22
1.18
1.18
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 100, ItemsPerBucket: 1)
1.24 1.23 1.24
1.23
System.Tests.Perf_Int64.Parse(value: "-9223372036854775808")
1.24 1.23 1.24
1.23
System.Tests.Perf_DateTimeOffset.ToString(format: "r")
1.24 1.23 1.22
1.22
1.26
1.25
System.Collections.IterateForEach(Int32).IEnumerable(Size: 512)
1.23 1.23 1.23
1.23
System.Collections.ContainsFalse(String).LinkedList(Size: 512)
1.23 1.23 1.23
1.23
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False))
1.23 1.24 1.23
1.24
System.Memory.Span(Int32).IndexOfAnyTwoValues(Size: 512)
1.23 1.21 1.23
1.21
System.Memory.Span(Int32).IndexOfAnyThreeValues(Size: 33)
1.23 1.25 1.23
1.25
System.Memory.Span(Int32).SequenceEqual(Size: 512)
1.23 1.25 1.23
1.25
System.Linq.Tests.Perf_Enumerable.SelectToArray(input: List)
1.23 1.22 1.23
1.22
System.Linq.Tests.Perf_Enumerable.Aggregate_Seed(input: IEnumerable)
1.23 1.22 1.19
1.19
1.26
1.25
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_SliceSlice.Count(Options: None)
1.22 1.22 1.19
1.17
1.26
1.27
System.Linq.Tests.Perf_Enumerable.ToArray(input: IEnumerable)
1.22 1.20 1.22
1.20
System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IEnumerable)
1.22 1.23 1.22
1.23
System.Tests.Perf_Uri.Ctor(input: "http://xn--hst-sna.with.xn--nicode-2ya")
1.21 1.21 1.21
1.21
System.Tests.Perf_UInt64.TryParseHex(value: "3039")
1.21 1.22 1.21
1.22
System.Tests.Perf_Int64.ParseSpan(value: "-9223372036854775808")
1.21 1.21 1.21
1.21
System.Tests.Perf_Int64.TryParse(value: "-9223372036854775808")
1.21 1.21 1.21
1.21
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "aei", Options: None)
1.21 1.16 1.29
1.20
1.13
1.13
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10, ItemsPerBucket: 1)
1.21 1.21 1.19
1.19
1.22
1.23
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_SliceSlice.Count(Options: NonBacktracking)
1.20 1.24 1.20
1.24
System.Linq.Tests.Perf_Enumerable.Repeat
1.20 1.19 1.20
1.19
System.Tests.Perf_Int64.TryParseSpan(value: "-9223372036854775808")
1.20 1.21 1.20
1.21
System.Linq.Tests.Perf_Enumerable.SelectToList(input: Array)
1.20 1.16 1.20
1.16
System.Tests.Perf_UInt64.TryParseHex(value: "FFFFFFFFFFFFFFFF")
1.20 1.18 1.20
1.18
ByteMark.BenchAssignRectangular
1.20 1.19 1.20
1.19
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "aei", Options: NonBacktracking)
1.19 1.19 1.19
1.19
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "\s[a-zA-Z]{0,12}ing\s", Options: NonBacktracking)
1.19 1.18 1.16
1.15
1.21
1.22
System.Linq.Tests.Perf_Enumerable.Zip(input: IEnumerable)
1.18 1.19 1.18
1.23
1.15
1.14
1.20
1.20
System.Linq.Tests.Perf_Enumerable.SingleWithPredicate_LastElementMatches(input: IEnumerable)
1.18 1.17 1.18
1.17
PerfLabTests.CastingPerf.FooObjCastIfIsa
1.18 1.17 1.18
1.17
System.Tests.Perf_String.Trim_CharArr(s: "Test", c: [' ', ' '])
1.17 1.17 1.15
1.15
1.20
1.20
System.Linq.Tests.Perf_Enumerable.AnyWithPredicate_LastElementMatches(input: IEnumerable)
1.17 1.18 1.17
1.18
System.Linq.Tests.Perf_Enumerable.WhereSelect(input: IEnumerable)
1.17 1.10 1.17
1.10
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).BitwiseAnd_Scalar(BufferLength: 128)
1.17 1.16 1.13
1.11
1.22
1.22
System.Linq.Tests.Perf_Enumerable.WhereFirst_LastElementMatches(input: IEnumerable)
1.17 1.18 1.17
1.18
System.Collections.Tests.Perf_PriorityQueue(Int32, Int32).Dequeue_And_Enqueue(Size: 100)
1.17 1.17 1.17
1.17
System.Memory.Span(Int32).IndexOfAnyTwoValues(Size: 33)
1.17 1.13 1.17
1.13
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10000)
1.17 1.16 1.17
1.16
System.Tests.Perf_String.Replace_String(text: "This is a very nice sentence", oldValue: "bad", newValue: "nice")
1.17 1.19 1.17
1.19
System.Collections.TryGetValueFalse(Int32, Int32).ImmutableDictionary(Size: 512)
1.17 1.15 1.17
1.15
System.Collections.IterateForEach(Int32).ImmutableSortedSet(Size: 512)
1.16 1.16 1.20
1.21
1.17
1.14
1.12
1.13
System.Linq.Tests.Perf_Enumerable.Sum(input: IEnumerable)
1.16 1.15 1.16
1.15
System.Tests.Perf_Uri.Ctor(input: "https://a.much.longer.domain.name")
1.16 1.15 1.16
1.15
BenchmarksGame.KNucleotide_9.RunBench
1.16 1.16 1.17
1.17
1.15
1.16
System.Linq.Tests.Perf_Enumerable.Select(input: Array)
1.16 1.15 1.16
1.15
System.Buffers.Tests.RentReturnArrayPoolTests(Byte).MultipleSerial(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: True)
1.15 1.20 1.15
1.20
System.Linq.Tests.Perf_OrderBy.OrderByValueType(NumberOfPeople: 512)
1.15 1.15 1.15
1.15
System.Linq.Tests.Perf_Enumerable.Select(input: List)
1.15 1.12 1.15
1.12
System.Tests.Perf_Version.TryFormatL
1.15 1.17 1.15
1.17
System.Perf_Convert.FromBase64String
1.15 1.15 1.15
1.15
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "\w+\s+Holmes\s+\w+", Options: Compiled)
1.15 1.29 1.15
1.29
System.Linq.Tests.Perf_Enumerable.Count(input: IEnumerable)
1.15 1.17 1.14
1.17
1.15
1.17
1.16
1.17
System.Linq.Tests.Perf_Enumerable.Range
1.15 1.15 1.11
1.11
1.19
1.19
System.Linq.Tests.Perf_Enumerable.WhereAny_LastElementMatches(input: IEnumerable)
1.15 1.15 1.18
1.18
1.12
1.13
System.Linq.Tests.Perf_Enumerable.Reverse(input: IEnumerable)
1.15 1.14 1.15
1.14
System.Collections.Perf_Frozen(Int16).ToFrozenDictionary(Count: 64)
1.15 1.15 1.15
1.15
System.IO.Tests.Perf_FileInfo.ctor_str
1.14 1.13 1.14
1.13
System.Perf_Convert.FromBase64Chars
1.14 1.13 1.14
1.13
System.Collections.Perf_Frozen(ReferenceType).ToFrozenDictionary(Count: 4)
1.14 1.14 1.14
1.14
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))
1.14 1.14 1.14
1.14
System.Memory.ReadOnlySpan.IndexOfString(input: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", value: "x", comparisonType: InvariantCultureIgno
1.14 1.11 1.16
1.14
1.12
1.09
System.Tests.Perf_Uri.UnescapeDataString(input: "abc%20def%20ghi%20")
1.14 1.14 1.14
1.14
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_SliceSlice.Count(Options: IgnoreCase, NonBacktracking)
1.14 1.12 1.14
1.12
Benchstone.BenchI.BubbleSort.Test
1.14 1.13 1.16
1.15
1.12
1.12
BenchmarksGame.ReverseComplement_1.RunBench
1.14 1.24 1.03
1.23
1.25
1.26
System.Linq.Tests.Perf_Enumerable.Where(input: Array)
1.14 1.19 1.17
1.33
1.17
1.17
1.07
1.09
System.Linq.Tests.Perf_Enumerable.ToList(input: IEnumerable)
1.13 1.14 1.13
1.14
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 1000)
1.13 1.13 1.13
1.13
System.Collections.IterateForEachNonGeneric(Int32).ArrayList(Size: 512)
1.13 1.11 1.13
1.11
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Email_IsMatch(Options: None)
1.13 1.12 1.13
1.12
System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options: (en-US, IgnoreCase, False))
1.13 1.09 1.20
1.11
1.07
1.08
System.Linq.Tests.Perf_Enumerable.AnyWithPredicate_LastElementMatches(input: IOrderedEnumerable)
1.13 1.13 1.13
1.13
Benchstone.BenchI.LogicArray.Test
1.13 1.09 1.13
1.09
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?s).*", Options: Compiled)
1.13 1.13 1.15
1.14
1.11
1.12
System.Tests.Perf_Single.TryParse(value: "3.4028235E+38")
1.13 1.13 1.07
1.07
1.19
1.19
System.Linq.Tests.Perf_Enumerable.Skip_One(input: IEnumerable)
1.13 1.10 1.11
1.09
1.15
1.12
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{0,2}(Tom
1.13 1.13 1.13
1.13
System.Tests.Perf_String.Split(s: "ABCDEFGHIJKLMNOPQRSTUVWXYZ", arr: [' '], options: RemoveEmptyEntries)
1.13 1.17 1.12
1.20
1.14
1.14
Microsoft.Extensions.Primitives.StringSegmentBenchmark.IndexOfAny
1.13 1.13 1.17
1.16
1.09
1.10
System.Linq.Tests.Perf_Enumerable.Select(input: IEnumerable)
1.13 1.13 1.13
1.13
MicroBenchmarks.Serializers.Json_ToStream(IndexViewModel).JsonNet_
1.13 1.14 1.13
1.14
System.Collections.Perf_DefaultFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10)
1.13 1.13 1.13
1.13
System.Collections.Perf_SubstringFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 1000)
1.13 1.14 1.13
1.14
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).Max_Scalar(BufferLength: 128)
1.12 1.15 1.12
1.15
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{2,4}(Tom
1.12 1.13 1.13
1.15
1.12
1.11
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10)
1.12 1.13 1.08
1.13
1.14
1.14
1.16
1.12
System.Linq.Tests.Perf_Enumerable.WhereSelect(input: List)
1.12 1.11 1.12
1.11
System.Tests.Perf_Enum.ToString_Flags(value: Red, Orange, Yellow, Green, Blue)
1.12 1.13 1.12
1.13
System.Numerics.Tests.Perf_BigInteger.Parse(numberString: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012
1.12 1.15 1.12
1.15
System.Collections.IterateForEach(Int32).SortedDictionary(Size: 512)
1.12 1.12 1.13
1.13
System.Numerics.Tests.Perf_BigInteger.ToStringX(numberString: 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
1.12 1.12 1.12
1.12
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False))
1.12 1.11 1.12
1.12
1.12
1.11
System.Collections.CtorFromCollection(Int32).ConcurrentQueue(Size: 512)
1.12 1.12 1.12
1.12
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))
1.12 1.11 1.12
1.11
System.Tests.Perf_DateTime.GetNow
1.12 1.13 1.12
1.13
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: Compiled)
1.12 1.12 1.12
1.12
MicroBenchmarks.Serializers.Xml_FromStream(Location).DataContractSerializer_BinaryXml_
1.12 1.11 1.14
1.14
1.09
1.09
System.Tests.Perf_Double.TryParse(value: "1.7976931348623157e+308")
1.12 1.10 1.12
1.10
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsMatch(Options: Compiled)
1.12 1.11 1.12
1.11
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsMatch(Options: IgnoreCase, Compiled)
1.12 1.14 1.10
1.09
1.14
1.20
Benchstone.BenchF.NewtR.Test
1.12 1.12 1.12
1.12
System.Numerics.Tests.Perf_BigInteger.Remainder(arguments: 1024,512 bits)
1.12 1.12 1.12
1.12
System.Linq.Tests.Perf_Enumerable.ElementAt(input: IEnumerable)
1.12 1.10 1.12
1.10
System.Collections.Sort(IntClass).Array(Size: 512)
1.12 1.11 1.12
1.11
System.Collections.CtorFromCollection(String).HashSet(Size: 512)
1.12 1.17 1.06
1.06
1.11
1.15
1.18
1.32
System.Buffers.Tests.ReadOnlySequenceTests(Byte).IterateGetPositionMemory
1.11 1.12 1.11
1.12
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, False))
1.11 1.13 1.11
1.13
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: IgnoreCase, Compiled)
1.11 1.08 1.11
1.08
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "\w+\s+Holmes", Options: NonBacktracking)
1.11 1.09 1.11
1.09
System.Linq.Tests.Perf_Enumerable.SelectToArray(input: IList)
1.11 1.12 1.09
1.10
1.13
1.14
System.Linq.Tests.Perf_Enumerable.CastToSameType(input: IEnumerable)
1.11 1.11 1.11
1.11
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: None)
1.11 1.11 1.11
1.11
System.Linq.Tests.Perf_Enumerable.ToDictionary(input: List)
1.11 1.13 1.11
1.13
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_SliceSlice.Count(Options: IgnoreCase)
1.11 1.08 1.11
1.08
System.IO.Tests.Perf_FileStream.SeekForward(fileSize: 1024, options: None)
1.11 1.11 1.05
1.05
1.18
1.18
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 100)
1.11 1.11 1.11
1.11
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).Add_Scalar(BufferLength: 128)
1.11 1.08 1.11
1.08
XmlDocumentTests.XmlNodeListTests.Perf_XmlNodeList.Enumerator
1.11 1.10 1.11
1.10
System.Buffers.Tests.RentReturnArrayPoolTests(Object).SingleSerial(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: True)
1.11 1.10 1.11
1.10
ByteMark.BenchNumericSortJagged
1.11 1.11 1.12
1.12
1.10
1.10
System.Tests.Perf_Double.Parse(value: "1.7976931348623157e+308")
1.11 1.08 1.11
1.08
System.Tests.Perf_Single.TryParse(value: "12345")
1.11 1.11 1.14
1.15
1.07
1.07
System.Collections.Perf_SubstringFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000)
1.11 1.11 1.10
1.11
1.11
1.12
System.Tests.Perf_Decimal.TryParse(value: "123456.789")
1.11 1.16 1.11
1.16
System.Text.RegularExpressions.Tests.Perf_Regex_Cache.IsMatch(total: 400000, unique: 1, cacheSize: 15)
1.11 1.09 1.11
1.09
System.Tests.Perf_Decimal.Parse(value: "123456.789")
1.11 1.13 1.14
1.13
1.07
1.13
System.Linq.Tests.Perf_Enumerable.LastWithPredicate_FirstElementMatches(input: IOrderedEnumerable)
1.10 1.14 1.14
1.22
1.07
1.07
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000, ItemsPerBucket: 1)
1.10 1.09 1.10
1.09
System.Tests.Perf_Version.ToStringL
1.10 1.10 1.11
1.10
1.10
1.10
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?s).*", Options: None)
1.10 1.08 1.10
1.08
System.Tests.Perf_Version.ToString4
1.10 1.10 1.10
1.10
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "\s[a-zA-Z]{0,12}ing\s", Options: None)
1.10 1.17 1.10
1.17
System.Buffers.Tests.ReadOnlySequenceTests(Char).IterateGetPositionMemory
1.10 1.12 1.10
1.12
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Email_IsNotMatch(Options: None)
1.10 1.11 1.10
1.11
System.Tests.Perf_Single.Parse(value: "12345")
1.10 1.10 1.10
1.10
System.Memory.ReadOnlySpan.IndexOfString(input: "AAAAA5AAAA", value: "5", comparisonType: InvariantCulture)
1.10 1.13 1.14
1.21
1.06
1.06
System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)
1.10 1.10 1.08
1.07
1.12
1.13
System.Linq.Tests.Perf_Enumerable.SelectToList(input: IEnumerable)
1.10 1.16 1.10
1.16
System.Tests.Perf_Int16.Parse(value: "0")
1.10 1.08 1.10
1.08
System.Collections.IterateFor(Int32).ImmutableList(Size: 512)
1.10 1.15 1.02
1.16
1.08
1.08
1.20
1.20
System.Linq.Tests.Perf_Enumerable.Where(input: List)
1.10 1.09 1.10
1.09
System.Collections.IterateForEachNonGeneric(Int32).Stack(Size: 512)
1.10 1.07 1.10
1.07
ByteMark.BenchLUDecomp
1.10 1.10 1.10
1.10
System.IO.Tests.Perf_Path.GetFullPathForLegacyLength
1.09 1.09 1.09
1.09
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 100)
1.09 1.08 1.09
1.08
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{0,2}(Tom
1.09 1.08 1.09
1.08
System.Linq.Tests.Perf_Enumerable.Select(input: IList)
1.09 1.09 1.09
1.09
System.Collections.Perf_SubstringFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10)
1.09 1.06 1.09
1.06
System.Text.RegularExpressions.Tests.Perf_Regex_Cache.IsMatch(total: 400000, unique: 7, cacheSize: 15)
1.09 1.09 1.09
1.09
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10)
1.09 1.12 1.09
1.12
System.Collections.AddGivenSize(String).HashSet(Size: 512)
1.09 1.09 1.09
1.09
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000)
1.09 1.09 1.09
1.09
System.Collections.IterateForEachNonGeneric(String).Stack(Size: 512)
1.09 1.08 1.09
1.08
System.Tests.Perf_Single.Parse(value: "3.4028235E+38")
1.09 1.09 1.09
1.09
System.Diagnostics.Perf_Activity.EnumerateActivityEventsLarge
1.09 1.09 1.09
1.10
1.09
1.09
System.Linq.Tests.Perf_Enumerable.SelectToArray(input: IEnumerable)
1.09 1.11 1.09
1.11
System.Tests.Perf_Double.Parse(value: "12345")
1.09 1.09 1.09
1.09
System.Linq.Tests.Perf_Enumerable.OrderBy(input: IEnumerable)
1.09 1.08 1.09
1.08
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).Max_Vector(BufferLength: 128)
1.09 1.07 1.09
1.07
System.Threading.Tests.Perf_CancellationToken.CreateManyRegisterMultipleDispose
1.09 1.08 1.12
1.11
1.06
1.06
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsMatch(Options: None)
1.09 1.09 1.09
1.09
System.Collections.ContainsFalse(String).Span(Size: 512)
1.09 1.09 1.09
1.09
Benchstone.BenchF.NewtE.Test
1.09 1.08 1.09
1.08
System.Collections.Perf_DefaultFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10)
1.08 1.09 1.08
1.09
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10000, ItemsPerBucket: 1)
1.08 1.10 1.08
1.10
System.IO.Tests.Perf_StreamWriter.WriteCharArray(writeLength: 2)
1.08 1.08 1.08
1.08
System.Linq.Tests.Perf_Enumerable.OrderByThenBy(input: IEnumerable)
1.08 1.07 1.08
1.07
System.Collections.TryGetValueFalse(Int32, Int32).SortedDictionary(Size: 512)
1.08 1.09 1.08
1.09
System.Collections.Perf_SingleCharFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10000)
1.08 1.07 1.08
1.07
System.Collections.AddGivenSize(Int32).IDictionary(Size: 512)
1.08 1.05 1.08
1.05
System.Collections.TryAddGiventSize(String).Dictionary(Count: 512)
1.08 1.09 1.08
1.09
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: Url,&lorem ipsum=dolor sit amet,16)
1.08 1.07 1.08
1.07
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "Twain", Options: None)
1.08 1.07 1.08
1.07
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "Twain", Options: NonBacktracking)
1.07 1.08 1.07
1.08
System.Tests.Perf_Int128.Parse(value: "-170141183460469231731687303715884105728")
1.07 1.07 1.07
1.07
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Mariomkas.Count(Pattern: "[\w]+://[^/\\s?#]+[^\\s?#]+(?:\?[^\\s#])?(?:#[^\\s])?", Options: Compiled)
1.07 1.08 1.07
1.08
System.Tests.Perf_Single.Parse(value: "-3.4028235E+38")
1.07 1.07 1.07
1.07
1.08
1.08
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: Json400KB)
1.07 1.07 1.07
1.07
System.Buffers.Tests.ReadOnlySequenceTests(Char).IterateGetPositionSingleSegment
1.07 1.15 1.10
1.08
1.04
1.23
System.Memory.Span(Int32).BinarySearch(Size: 512)
1.07 1.06 1.07
1.06
System.Collections.CreateAddAndClear(Int32).SortedDictionary(Size: 512)
1.07 1.07 1.07
1.07
System.Tests.Perf_Int128.TryParseSpan(value: "-170141183460469231731687303715884105728")
1.07 1.06 1.07
1.06
System.Perf_Convert.ToBase64String(formattingOptions: InsertLineBreaks)
1.07 1.10 1.07
1.10
System.Tests.Perf_Uri.CtorIdnHostPathAndQuery(input: "https://a.much.longer.domain.name/path/with?key=value#fragment")
1.07 1.07 1.07
1.07
System.Collections.Perf_SubstringFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 100)
1.07 1.06 1.07
1.06
System.Collections.IndexerSet(String).ConcurrentDictionary(Size: 512)
1.07 1.13 1.07
1.13
System.Tests.Perf_Double.TryParse(value: "12345")
1.07 1.06 1.07
1.06
System.Collections.TryGetValueFalse(String, String).ConcurrentDictionary(Size: 512)
1.06 1.06 1.06
1.06
System.Buffers.Tests.RentReturnArrayPoolTests(Object).SingleSerial(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: False)
1.06 1.06 1.06
1.06
System.Collections.TryGetValueTrue(Int32, Int32).Dictionary(Size: 512)
1.06 1.15 1.06
1.15
System.Linq.Tests.Perf_Enumerable.Take_All(input: IEnumerable)
1.06 1.05 1.07
1.06
1.06
1.05
System.Collections.IterateForEach(Int32).ImmutableList(Size: 512)
1.06 1.08 1.06
1.08
System.Collections.Perf_SubstringFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 100)
1.06 1.07 1.06
1.07
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: Url,�2020,16)
1.06 1.08 1.06
1.08
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "\b\w+n\b", Options: None)
1.06 1.06 1.06
1.06
System.MathBenchmarks.Double.Hypot
1.06 1.06 1.06
1.06
System.Perf_Convert.ToBase64CharArray(binaryDataSize: 1024, formattingOptions: InsertLineBreaks)
1.06 1.32 1.06
1.32
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: Url,&lorem ipsum=dolor sit amet,512)
1.06 1.07 1.06
1.07
MicroBenchmarks.Serializers.Xml_ToStream(ClassImplementingIXmlSerialiable).DataContractSerializer_
1.06 1.09 0.94
1.09
1.09
1.09
1.16
1.10
System.Linq.Tests.Perf_Enumerable.WhereSelect(input: Array)
1.06 1.08 1.06
1.08
System.Tests.Perf_Uri.CtorIdnHostPathAndQuery(input: "http://xn--hst-sna.with.xn--nicode-2ya/path/with?key=value#fragment")
1.06 1.08 1.06
1.08
System.Collections.AddGivenSize(String).IDictionary(Size: 512)
1.06 1.05 1.06
1.05
System.Tests.Perf_Int128.TryFormat(value: -170141183460469231731687303715884105728)
1.06 1.06 1.06
1.06
System.Collections.TryGetValueTrue(Int32, Int32).IDictionary(Size: 512)
1.06 1.07 1.06
1.07
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsMatch(Options: None)
1.06 1.05 1.06
1.05
System.Collections.Perf_SubstringFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10000)
1.06 1.06 1.06
1.06
MicroBenchmarks.Serializers.Json_ToStream(IndexViewModel).DataContractJsonSerializer_
1.06 1.31 1.06
1.31
System.Collections.Sort(BigStruct).LinqOrderByExtension(Size: 512)
1.06 1.10 1.06
1.10
System.Collections.IterateForEachNonGeneric(String).ArrayList(Size: 512)
1.06 1.06 1.06
1.06
System.Collections.ContainsKeyFalse(Int32, Int32).FrozenDictionary(Size: 512)
1.05 1.05 1.05
1.05
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple.IsMatch(Id: 10, Options: None)
1.05 1.12 0.99
1.11
1.08
1.17
1.09
1.09
System.Diagnostics.Perf_Activity.EnumerateActivityLinksLarge
1.05 1.06 1.05
1.06
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "(?i)Tom
1.05 1.40 0.63
1.12
1.75
1.75
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateArray(TestCase: ArrayOfNumbers)
1.05 1.08 1.05
1.08
System.Memory.ReadOnlySequence.Slice_Repeat(Segment: Multiple)
1.05 1.13 1.05
1.13
System.Tests.Perf_Int16.TryParse(value: "32767")
1.05 1.06 1.05
1.06
System.Text.Json.Serialization.Tests.WriteJson(Location).SerializeToWriter(Mode: Reflection)
1.04 1.10 1.04
1.10
System.Tests.Perf_Enum.ToString_Format_NonFlags(value: Monday, format: "g")
1.04 1.09 1.04
1.09
System.Tests.Perf_Enum.ToString_Format_NonFlags(value: 7, format: "G")
1.04 1.06 1.04
1.06
MicroBenchmarks.Serializers.Xml_ToStream(XmlElement).DataContractSerializer_
1.04 1.06 1.04
1.06
BenchmarksGame.FannkuchRedux_2.RunBench(n: 10, expectedSum: 73196)
1.04 1.06 1.04
1.06
System.Tests.Perf_String.Split(s: "ABCDEFGHIJKLMNOPQRSTUVWXYZ", arr: [' '], options: None)
1.04 1.23 1.04
1.23
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: Url,�2020,512)
1.04 1.16 1.04
1.16
System.Collections.AddGivenSize(Int32).Queue(Size: 512)
1.03 1.20 0.94
1.07
1.14
1.34
System.Linq.Tests.Perf_Enumerable.Where(input: IEnumerable)
1.03 1.24 1.03
1.24
System.Collections.AddGivenSize(Int32).Stack(Size: 512)
1.03 1.12 1.03
1.12
Benchstone.BenchI.XposMatrix.Test
1.01 1.18 1.01
1.18
System.Text.Json.Tests.Perf_Reader.ReadReturnBytes(IsDataCompact: False, TestCase: DeepTree)
1.01 1.09 1.01
1.09
System.Tests.Perf_String.Trim_CharArr(s: "Test ", c: [' ', ' '])
1.00 1.08 1.00
1.08
System.Diagnostics.Perf_Activity.EnumerateActivityTagObjectsLarge
1.00 1.43 1.00
1.43
System.Collections.CreateAddAndClear(Int32).Array(Size: 512)
1.00 1.08 1.00
1.08
MicroBenchmarks.Serializers.Json_ToString(Location).SystemTextJson_Reflection_
0.95 1.14 0.95
1.14
System.Memory.Span(Int32).StartsWith(Size: 512)
0.90 1.74 0.90
1.74
System.Collections.CtorFromCollection(Int32).ImmutableStack(Size: 512)

@amanasifkhalid
Copy link
Member Author

x64 improvements:

Notes Recent Score Orig Score Linux x64 Windows x64 ViperLinux x64 ViperWindows x64 Benchmark
1.20 0.87 1.20
0.87
System.Tests.Perf_Char.Char_IsLower(input: "Good afternoon, Constable!")
1.18 0.93 1.18
0.93
System.Linq.Tests.Perf_OrderBy.OrderByValueType(NumberOfPeople: 512)
1.09 0.88 1.09
0.88
System.Text.Perf_Ascii.ToLower_Bytes_Chars(Size: 128)
1.06 0.88 1.06
0.88
System.Memory.Span(Char).LastIndexOfValue(Size: 512)
1.03 0.86 1.03
0.86
System.Collections.IndexerSet(Int32).Dictionary(Size: 512)
1.01 0.94 1.01
0.94
System.Collections.Tests.Perf_PriorityQueue(Guid, Guid).K_Max_Elements(Size: 100)
1.01 0.83 1.01
0.83
System.Linq.Tests.Perf_Enumerable.WhereAny_LastElementMatches(input: Array)
1.01 0.93 1.01
0.93
System.Collections.IterateForEach(Int32).SortedSet(Size: 512)
1.00 0.84 1.00
0.84
System.Tests.Perf_String.IndexerCheckPathLength
1.00 0.74 1.00
0.74
System.Memory.Span(Int32).IndexOfValue(Size: 512)
1.00 0.70 1.00
0.70
System.Memory.Span(Int32).IndexOfAnyFiveValues(Size: 512)
1.00 0.89 1.00
0.89
System.Memory.Span(Int32).IndexOfAnyFiveValues(Size: 33)
1.00 0.87 1.00
0.87
System.Linq.Tests.Perf_Enumerable.Skip_One(input: IEnumerable)
0.98 0.91 0.98
0.91
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesWord(Options: None)
0.97 0.80 1.12
0.63
0.90
0.90
0.92
0.91
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateArray(TestCase: ArrayOfStrings)
0.97 0.91 0.97
0.91
System.Text.Perf_Ascii.ToUpper_Bytes(Size: 128)
0.97 0.93 0.97
0.93
System.Collections.CtorFromCollection(Int32).SortedList(Size: 512)
0.96 0.90 0.96
0.90
System.Tests.Perf_String.Trim(s: " Test")
0.96 0.93 0.96
0.93
System.Net.Tests.Perf_WebUtility.Decode_DecodingRequired
0.95 0.90 0.95
0.90
System.Text.Json.Tests.Perf_Reader.ReadSpanEmptyLoop(IsDataCompact: True, TestCase: Json400B)
0.95 0.94 0.95
0.94
System.Text.Json.Tests.Perf_Reader.ReadSingleSpanSequenceEmptyLoop(IsDataCompact: True, TestCase: Json400B)
0.95 0.91 0.95
0.91
System.Tests.Perf_Enum.ToString_Flags(value: 32)
0.95 0.80 1.05
0.79
0.86
0.81
System.Memory.Span(Char).BinarySearch(Size: 33)
0.95 0.91 0.95
0.91
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Mariomkas.Count(Pattern: "[\w]+://[^/\\s?#]+[^\\s?#]+(?:\?[^\\s#])?(?:#[^\\s])?", Options: None)
0.94 0.93 0.94
0.93
System.Text.Json.Serialization.Tests.WriteJson(Hashtable).SerializeToString(Mode: Reflection)
0.94 0.94 0.94
0.94
System.Linq.Tests.Perf_Enumerable.Concat_Once(input: IEnumerable)
0.94 0.94 0.94
0.94
System.Text.Json.Document.Tests.Perf_EnumerateObject.PropertyIndexer(TestCase: NumericProperties)
0.94 0.91 0.94
0.91
PerfLabTests.GetMember.GetMethod2
0.94 0.94 0.94
0.94
System.Linq.Tests.Perf_Enumerable.Concat_TenTimes(input: IEnumerable)
0.94 0.93 0.94
0.93
LinqBenchmarks.Where01LinqQueryX
0.94 0.94 0.94
0.94
System.Tests.Perf_Int128.TryParse(value: "170141183460469231731687303715884105727")
0.94 0.93 0.94
0.93
LinqBenchmarks.Where01LinqMethodX
0.94 0.94 0.94
0.94
System.Text.Json.Document.Tests.Perf_EnumerateObject.PropertyIndexer(TestCase: ObjectProperties)
0.94 0.92 0.94
0.92
System.Text.Json.Serialization.Tests.WriteJson(Hashtable).SerializeToUtf8Bytes(Mode: SourceGen)
0.94 0.94 0.94
0.94
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Email_IsNotMatch(Options: None)
0.94 0.90 0.94
0.90
PerfLabTests.GetMember.GetMethod12
0.94 0.91 0.94
0.91
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Mariomkas.Count(Pattern: "[\w]+://[^/\\s?#]+[^\\s?#]+(?:\?[^\\s#])?(?:#[^\\s])?", Options: Compiled)
0.94 0.93 0.94
0.93
Span.IndexerBench.WriteViaIndexer1(length: 1024)
0.93 0.75 0.93
0.75
System.Memory.Span(Char).EndsWith(Size: 512)
0.93 0.93 0.93
0.93
System.Tests.Perf_String.Concat_CharEnumerable
0.93 0.93 0.93
0.93
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 100, ItemsPerBucket: 5)
0.93 0.90 0.93
0.90
System.Collections.IterateForEach(String).ImmutableSortedSet(Size: 512)
0.93 0.90 0.93
0.90
System.Text.Json.Serialization.Tests.WriteJson(Hashtable).SerializeObjectProperty(Mode: Reflection)
0.93 0.79 1.03
0.79
0.84
0.80
System.Memory.Span(Char).BinarySearch(Size: 512)
0.93 0.92 0.93
0.92
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: None)
0.93 0.94 0.93
0.94
System.Collections.Concurrent.Count(Int32).Queue_EnqueueCountDequeue(Size: 512)
0.92 0.91 0.92
0.91
MicroBenchmarks.Serializers.Xml_FromStream(ClassImplementingIXmlSerialiable).DataContractSerializer_BinaryXml_
0.92 0.89 0.92
0.89
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: UnsafeRelaxed,no (escaping /) required,512)
0.92 0.91 0.92
0.91
System.Tests.Perf_Enum.ToString_Flags(value: 36)
0.92 0.93 0.92
0.93
System.Collections.ContainsTrue(Int32).FrozenSet(Size: 512)
0.92 0.94 0.92
0.94
MicroBenchmarks.Serializers.Json_ToStream(IndexViewModel).JsonNet_
0.92 0.94 0.89
0.94
0.95
0.94
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: ArrayOfNumbers)
0.92 0.92 0.92
0.92
Benchstone.MDBenchI.MDMidpoint.Test
0.92 0.92 0.93
0.93
0.91
0.92
System.Text.Json.Document.Tests.Perf_EnumerateObject.EnumerateProperties(TestCase: ObjectProperties)
0.92 0.84 0.92
0.84
System.Collections.TryGetValueFalse(Int32, Int32).FrozenDictionaryOptimized(Size: 512)
0.92 0.92 0.92
0.92
System.Linq.Tests.Perf_Enumerable.Take_All(input: IEnumerable)
0.92 0.92 0.92
0.92
System.Memory.Span(Int32).IndexOfAnyThreeValues(Size: 512)
0.92 0.92 0.92
0.92
System.IO.Tests.StreamReaderReadLineTests.ReadLine(LineLengthRange: [1025, 2048])
0.91 0.92 0.91
0.92
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{2,4}(Tom
0.91 0.93 0.91
0.93
System.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: None)
0.91 0.91 0.91
0.91
System.Collections.IterateForEach(Int32).ImmutableList(Size: 512)
0.91 0.92 0.91
0.92
System.Tests.Perf_String.Split(s: "ABCDEFGHIJKLMNOPQRSTUVWXYZ", arr: [' '], options: RemoveEmptyEntries)
0.91 0.90 0.91
0.90
System.Tests.Perf_String.Split(s: "ABCDEFGHIJKLMNOPQRSTUVWXYZ", arr: [' '], options: None)
0.91 0.93 0.91
0.93
System.Text.Json.Serialization.Tests.WriteJson(Hashtable).SerializeToUtf8Bytes(Mode: Reflection)
0.91 0.91 0.91
0.91
System.Text.Json.Tests.Perf_Get.GetUInt64
0.91 0.91 0.91
0.91
PerfLabTests.GetMember.GetMethod4
0.91 0.87 0.91
0.87
System.Tests.Perf_Int64.Parse(value: "12345")
0.91 0.90 0.91
0.90
System.Buffers.Tests.SearchValuesCharTests.IndexOfAnyExcept(Values: "abcdefABCDEF0123456789Ü")
0.91 0.90 0.91
0.90
ByteMark.BenchLUDecomp
0.91 0.90 0.91
0.90
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "\p{Sm}", Options: None)
0.90 0.91 0.90
0.91
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: ".{0,2}(Tom
0.90 0.89 0.90
0.89
System.Collections.Perf_DefaultFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000)
0.90 0.90 0.90
0.90
System.Text.Json.Document.Tests.Perf_EnumerateObject.PropertyIndexer(TestCase: StringProperties)
0.90 0.90 0.90
0.90
System.Linq.Tests.Perf_Enumerable.SelectToArray(input: IEnumerable)
0.90 0.91 0.90
0.91
System.Linq.Tests.Perf_Enumerable.WhereSingleOrDefault_LastElementMatches(input: List)
0.90 0.91 0.90
0.91
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "\p{Sm}", Options: NonBacktracking)
0.90 0.91 0.90
0.91
System.Buffers.Tests.ReadOnlySequenceTests(Byte).IterateForEachTenSegments
0.90 0.86 0.90
0.86
System.Collections.Tests.Perf_BitArray.BitArrayXor(Size: 512)
0.90 0.91 0.90
0.91
System.Linq.Tests.Perf_Enumerable.WhereSingle_LastElementMatches(input: List)
0.90 0.91 0.90
0.91
0.90
0.91
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateArray(TestCase: ArrayOfNumbers)
0.90 0.92 0.90
0.92
PerfLabTests.GetMember.GetMethod3
0.90 0.91 0.90
0.91
System.Collections.ContainsKeyTrue(String, String).ImmutableDictionary(Size: 512)
0.90 0.89 0.87
0.86
0.92
0.92
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "([A-Za-z]awyer
0.90 0.90 0.90
0.90
System.Linq.Tests.Perf_Enumerable.WhereSingle_LastElementMatches(input: IEnumerable)
0.90 0.91 0.90
0.91
PerfLabTests.GetMember.GetMethod10
0.90 0.89 0.90
0.89
System.Memory.Span(Int32).IndexOfAnyThreeValues(Size: 33)
0.89 0.93 0.89
0.93
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: ArrayOfStrings)
0.89 0.92 0.89
0.92
PerfLabTests.GetMember.GetMethod5
0.89 0.85 0.89
0.85
System.Memory.Span(Char).Clear(Size: 512)
0.89 0.89 0.91
0.90
0.87
0.88
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000, ItemsPerBucket: 5)
0.89 0.89 0.89
0.89
System.Linq.Tests.Perf_Enumerable.WhereSingleOrDefault_LastElementMatches(input: IEnumerable)
0.89 0.83 0.89
0.83
System.Memory.Span(Int32).EndsWith(Size: 512)
0.89 0.88 0.89
0.88
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchWord(Options: None)
0.89 0.88 0.89
0.88
System.Text.Perf_Ascii.ToUpper_Bytes(Size: 6)
0.89 0.88 0.88
0.88
System.Numerics.Tests.Perf_BigInteger.ToByteArray(numberString: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456
0.88 0.89 0.88
0.89
System.Text.Json.Document.Tests.Perf_EnumerateObject.EnumerateProperties(TestCase: StringProperties)
0.88 0.90 0.88
0.90
System.Linq.Tests.Perf_Enumerable.Repeat
0.88 0.88 0.88
0.88
System.Text.Json.Document.Tests.Perf_EnumerateObject.EnumerateProperties(TestCase: NumericProperties)
0.88 0.76 0.88
0.76
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).Negate(BufferLength: 128)
0.88 0.88 0.86
0.86
0.90
0.90
System.IO.Tests.Perf_StreamWriter.WriteString(writeLength: 2)
0.88 0.87 0.88
0.87
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: UnsafeRelaxed,no (escaping /) required,16)
0.88 0.85 0.84
0.82
0.92
0.88
System.Tests.Perf_Int64.ParseSpan(value: "12345")
0.88 0.87 0.88
0.87
System.Buffers.Tests.ReadOnlySequenceTests(Char).IterateGetPositionTenSegments
0.87 0.75 0.87
0.75
Microsoft.AspNetCore.Server.Kestrel.Performance.PipeThroughputBenchmark.Parse_SequentialAsync(Length: 128, Chunks: 16)
0.87 0.88 0.87
0.88
System.Tests.Perf_Int64.TryParse(value: "9223372036854775807")
0.87 0.87 0.87
0.87
System.Buffers.Tests.ReadOnlySequenceTests(Byte).IterateGetPositionTenSegments
0.87 0.84 0.87
0.84
System.Collections.IterateFor(String).ImmutableList(Size: 512)
0.87 0.86 0.87
0.86
System.Numerics.Tests.Perf_BigInteger.Subtract(arguments: 1024,1024 bits)
0.87 0.87 0.87
0.87
System.Collections.IterateForEach(Int32).SortedDictionary(Size: 512)
0.87 0.86 0.87
0.86
System.Diagnostics.Perf_Activity.EnumerateActivityTagsSmall
0.87 0.87 0.86
0.86
0.88
0.88
System.Text.Perf_Ascii.ToLower_Chars(Size: 6)
0.87 0.87 0.87
0.87
System.Tests.Perf_UInt64.Parse(value: "0")
0.87 0.84 0.87
0.84
System.Buffers.Tests.ReadOnlySequenceTests(Byte).IterateTryGetTenSegments
0.86 0.86 0.86
0.86
System.Collections.Tests.Perf_BitArray.BitArrayRightShift(Size: 512)
0.86 0.82 0.86
0.82
System.Memory.Span(Byte).IndexOfAnyThreeValues(Size: 512)
0.86 0.86 0.88
0.89
0.85
0.84
System.Runtime.InteropServices.Tests.SafeHandleTests.AddRef_GetHandle_Release
0.86 0.83 0.86
0.83
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).IndexOfMax(BufferLength: 3079)
0.86 0.89 0.86
0.89
Microsoft.Extensions.Primitives.StringSegmentBenchmark.GetSegmentHashCode
0.86 0.87 0.86
0.87
PerfLabTests.GetMember.GetMethod15
0.86 0.84 0.86
0.84
System.Collections.ContainsTrue(String).ImmutableHashSet(Size: 512)
0.85 0.84 0.85
0.84
System.IO.Tests.StreamReaderReadLineTests.ReadLineAsync(LineLengthRange: [1025, 2048])
0.85 0.93 0.85
0.93
System.Net.Primitives.Tests.IPAddressPerformanceTests.TryFormat(address: 143.24.20.36)
0.85 0.85 0.85
0.85
System.Collections.IterateForEachNonGeneric(String).Queue(Size: 512)
0.85 0.88 0.85
0.88
System.Collections.TryGetValueTrue(String, String).ImmutableDictionary(Size: 512)
0.85 0.73 0.85
0.73
System.Tests.Perf_Int64.TryParseSpan(value: "9223372036854775807")
0.85 0.83 0.85
0.83
System.Linq.Tests.Perf_Enumerable.ToDictionary(input: List)
0.85 0.82 0.85
0.82
System.Buffers.Text.Tests.Utf8ParserTests.TryParseDecimal(value: 123456.789)
0.85 0.85 0.85
0.85
System.Collections.IterateForEachNonGeneric(String).ArrayList(Size: 512)
0.85 0.75 0.85
0.75
Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark.GetService_1Injected
0.84 0.85 0.81
0.82
0.88
0.88
System.Text.Perf_Ascii.ToUpper_Chars(Size: 6)
0.84 0.83 0.84
0.83
System.Linq.Tests.Perf_Enumerable.ToDictionary(input: Array)
0.84 0.84 0.84
0.84
Microsoft.Extensions.Primitives.StringSegmentBenchmark.TrimStart
0.84 0.77 0.84
0.77
Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark.GetService_3Injected
0.84 0.85 0.84
0.85
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).Add_Vector(BufferLength: 128)
0.84 0.88 0.84
0.88
System.Memory.Span(Char).Reverse(Size: 512)
0.83 0.82 0.83
0.83
0.84
0.81
System.Tests.Perf_Int64.TryParseSpan(value: "12345")
0.83 0.64 0.83
0.64
System.Memory.Span(Char).Reverse(Size: 33)
0.83 0.64 0.83
0.64
Benchstone.BenchI.BenchE.Test
0.82 0.81 0.82
0.81
System.IO.Tests.Perf_Path.GetFullPathNoRedundantSegments
0.82 0.82 0.82
0.82
System.Memory.Span(Char).SequenceCompareToDifferent(Size: 4)
0.82 0.82 0.82
0.82
System.Memory.Span(Char).SequenceCompareTo(Size: 33)
0.82 0.84 0.80
0.85
0.84
0.84
System.Numerics.Tests.Perf_BigInteger.Divide(arguments: 1024,512 bits)
0.82 0.83 0.82
0.83
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False))
0.82 0.78 0.83
0.77
0.81
0.80
System.Collections.IterateForEachNonGeneric(Int32).Stack(Size: 512)
0.81 0.81 0.81
0.81
System.Memory.Span(Char).SequenceCompareToDifferent(Size: 33)
0.80 0.80 0.80
0.80
System.Memory.Span(Char).SequenceCompareToDifferent(Size: 512)
0.80 0.80 0.80
0.80
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False))
0.80 0.80 0.80
0.80
System.Memory.Span(Int32).SequenceCompareToDifferent(Size: 4)
0.80 0.80 0.80
0.80
System.Memory.Span(Int32).SequenceCompareToDifferent(Size: 512)
0.80 0.80 0.80
0.80
System.Memory.Span(Int32).SequenceCompareToDifferent(Size: 33)
0.79 0.80 0.79
0.80
System.Linq.Tests.Perf_Enumerable.SelectToArray(input: List)
0.79 0.79 0.79
0.79
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))
0.79 0.85 0.79
0.85
System.Tests.Perf_Enum.GetName_Generic_Flags
0.79 0.81 0.78
0.82
0.80
0.80
System.Collections.IterateForEachNonGeneric(String).Stack(Size: 512)
0.78 0.82 0.78
0.82
System.Memory.Span(Byte).BinarySearch(Size: 33)
0.78 0.85 0.78
0.85
System.Buffers.Text.Tests.Utf8ParserTests.TryParseUInt32Hex(value: FFFFFFFFFFFFFFFF)
0.78 0.79 0.78
0.79
System.IO.Tests.BinaryWriterTests.WriteHalf
0.77 0.77 0.77
0.77
System.Tests.Perf_DateTime.ToString(format: "r")
0.76 0.82 0.76
0.82
System.Memory.Span(Byte).BinarySearch(Size: 512)
0.76 0.73 0.76
0.73
System.Memory.Span(Int32).EndsWith(Size: 33)
0.75 0.75 0.75
0.75
Microsoft.AspNetCore.Server.Kestrel.Performance.PipeThroughputBenchmark.Parse_SequentialAsync(Length: 4096, Chunks: 16)
0.74 0.73 0.74
0.73
System.Linq.Tests.Perf_Enumerable.SelectToList(input: IList)
0.74 0.74 0.74
0.74
Benchstone.BenchI.Midpoint.Test
0.74 0.75 0.74
0.75
System.Memory.Span(Byte).EndsWith(Size: 33)
0.73 0.85 0.73
0.85
System.Tests.Perf_Int64.Parse(value: "9223372036854775807")
0.71 0.86 0.71
0.86
System.Tests.Perf_Int64.ParseSpan(value: "9223372036854775807")
0.69 0.69 0.69
0.69
System.Tests.Perf_Enum.IsDefined_Generic_Flags
0.68 0.63 0.68
0.63
System.Memory.Span(Byte).SequenceEqual(Size: 4)
0.67 0.71 0.67
0.71
System.Memory.Span(Char).EndsWith(Size: 33)
0.66 0.66 0.66
0.66
System.Memory.Span(Char).EndsWith(Size: 4)
0.66 0.70 0.66
0.70
System.Memory.Span(Byte).EndsWith(Size: 4)
0.65 0.65 0.65
0.65
System.Collections.Tests.Perf_BitArray.BitArrayCopyToBoolArray(Size: 512)
0.61 0.75 0.61
0.75
System.Memory.Span(Int32).EndsWith(Size: 4)
0.60 0.60 0.60
0.60
System.Collections.Tests.Perf_BitArray.BitArrayCopyToByteArray(Size: 512)
0.58 0.58 0.58
0.58
System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase))
0.20 0.20 0.20
0.20
System.Memory.Span(Int32).Clear(Size: 33)

@kunalspathak
Copy link
Member

@amanasifkhalid - thanks for sharing the data, but can you please summarize the take away from it and next steps?

@amanasifkhalid
Copy link
Member Author

thanks for sharing the data, but can you please summarize the take away from it and next steps?

I'm still trying to repro the top regressions locally -- if I revert this change on top of main, I don't see any meaningful change in benchmark results -- so I don't have any recommendations yet. I'll try reproing with the same baseline/diff commits from the regression report, and if I can repro it locally, I'll check if loop-aware RPO helps. Considering the most impacted benchmarks have loops, I expect loop-aware RPO will make a difference, though I don't know in which direction just yet.

You're correct that we have more regressions than improvements (286 vs 176) at the moment. To get an idea of how the magnitudes of regressions/improvements compare, here are some histograms:

image
Median: 1.13, Mean: 1.176

image
Median: 1.12, Mean: 1.162

image
Median: 0.875, Mean: 0.851

image
Median: 0.89, Mean: 0.871

Note that some improvements became regressions over time, and vice-versa, hence the odd tails for the recent scores. Looking at the original scores, it looks like the improvements tend to be bigger than the regressions, which seems promising for loop-aware RPO?

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Sep 27, 2024

I've looked at some regressions locally, and some look like they can easily be fixed by the loop-aware RPO. In the absence of high-fidelity edge likelihoods, we can end up with flowgraphs like this:


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1    [000..016)-> BB03(1)                 (always)                     i LIR hascall gcsafe idxlen
BB02 [0015]  1       BB03                 16    [018..023)-> BB03(1)                 (always)                     i LIR loophead idxlen bwd
BB03 [0001]  2       BB01,BB02             8    [016..023)-> BB04(0.5),BB02(0.5)     ( cond )                     i LIR keep loophead idxlen bwd
BB04 [0016]  1       BB03                  1    [022..03D)                           (return)                     i LIR idxlen bwd
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Since the profile-aware RPO only considers edge likelihoods, there isn't an obvious successor of BB03 to visit next, so it's easy to break up the loop body. In this case, the RPO does:

Final LSRA Block Sequence:
BB01 (  1   )
BB03 (  8   )
BB04 (  1   )
BB02 ( 16   )

Whereas the old LSRA block order uses block weights to decide on the next successor, so it gets this one right:

Final LSRA Block Sequence:
BB01 (  1   )
BB03 (  8   )
BB02 ( 16   )
BB04 (  1   )

The loop-aware RPO gets such examples right because of the presence of loops, but it's otherwise not aware of successor blocks' weights. For example, consider this flowgraph, which doesn't have any loops:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1    [000..008)-> BB02(0.5),BB06(0.5)     ( cond )                     i LIR
BB02 [0004]  1       BB01                  0.50 [025..02F)-> BB03(0.5),BB05(0.5)     ( cond )                     i LIR
BB03 [0006]  1       BB02                  0.50 [034..03B)-> BB04(0.5),BB10(0.5)     ( cond )                     i LIR
BB04 [0027]  1       BB03                  0.50 [034..03E)-> BB08(1)                 (always)                     i LIR
BB05 [0005]  1       BB02                  0.50 [02F..034)                           (return)                     i LIR
BB06 [0001]  1       BB01                  0.50 [008..013)-> BB07(0.5),BB09(0.5)     ( cond )                     i LIR
BB07 [0003]  1       BB06                  0.50 [018..025)-> BB08(1)                 (always)                     i LIR nullcheck
BB08 [0007]  2       BB04,BB07             0.50 [03E..040)                           (return)                     i LIR
BB09 [0002]  1       BB06                  0.50 [013..018)                           (return)                     i LIR
BB10 [0026]  1       BB03                  0    [034..035)                           (throw )                     i LIR rare hascall gcsafe
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Both RPO-based orderings interleave the cold block with the hot paths:

Final LSRA Block Sequence:
BB01 (  1   )
BB02 (  0.50)
BB03 (  0.50)
BB04 (  0.50)
BB10 (  0   )
BB05 (  0.50)
BB06 (  0.50)
BB07 (  0.50)
BB08 (  0.50)
BB09 (  0.50)

Whereas the previous implementation doesn't:

Final LSRA Block Sequence:
BB01 (  1   )
BB02 (  0.50)
BB03 (  0.50)
BB04 (  0.50)
BB05 (  0.50)
BB06 (  0.50)
BB07 (  0.50)
BB08 (  0.50)
BB09 (  0.50)
BB10 (  0   )

To handle these cases, I think we can emulate what we do during block reordering, and push rarely-run blocks to the end of the order. It's trivial to implement, and we don't have to worry about EH constraints like we do during block reordering. I think we eventually want these mismatches between likelihoods and block weights to disappear by running profile synthesis late in the frontend, though I don't think I'll get to that until later, so this seems like a decent fix for now.

For the remaining regressions I looked at, I'm seeing slight differences in code layout due to more critical edges split. This seems to happen in the case where the old ordering breaks ties using bbNums, and the RPO arbitrarily picks a different successor. I'd rather not re-introduce lexicality dependencies since these are likely to get in the way of moving block reordering completely to the backend, so perhaps we can start with the above changes, and see where we stand after.

@AndyAyersMS @kunalspathak does this all sound reasonable? Thanks!

@AndyAyersMS
Copy link
Member

Emulating reordering seems plausible, I guess, but then perhaps we should simply run ordering before LSRA (and re-ordering later if there are new blocks), and have LSRA just use the lexical order?

For benchmark runs I'm surprised we don't see PGO everywhere... are we measuring non-PGO code in some tests?

@amanasifkhalid
Copy link
Member Author

Emulating reordering seems plausible, I guess, but then perhaps we should simply run ordering before LSRA (and re-ordering later if there are new blocks), and have LSRA just use the lexical order?

I was thinking about going this route; from what we see above, better LSRA block orderings also tend to look like better block layouts, so it seems reasonable to just use lexical ordering. The only hurdles I see to this are the fact that we cannot move cold EH blocks to the end of the main body, and the fact that switch lowering can change flow in between block layout and LSRA. We already don't put much effort into ordering switch successors optimally (though 3-opt will probably fix this automatically), so maybe the latter point is fine? I'll give this a shot.

For benchmark runs I'm surprised we don't see PGO everywhere... are we measuring non-PGO code in some tests?

As far as I know, all the microbenchmarks use PGO; the non-PGO examples were PerfScore regressions handpicked from non-tiered SPMI collections to illustrate limitations. For the few benchmark regressions I was able to repro locally, the churn was primarily driven by more critical edges being split, and thus more churn in code layout. My understanding of LSRA's edge resolution is limited, but I don't see an obvious fix to these cases.

@amanasifkhalid
Copy link
Member Author

Looking at the arm64 improvements, there are some benchmarks that were initially regressed by the new block layout earlier this year, and then fixed by the new LSRA ordering, which leads me to believe the final code layout was fine -- perhaps LSRA's old sequencing logic was negatively interacting with layout churn. For example:
image

This looks like further motivation to either decouple LSRA ordering from lexical ordering completely, or to merge them.

@AndyAyersMS
Copy link
Member

we cannot move cold EH blocks to the end of the main body

Ah, good point... LSRA "layout" need not be EH aware at all.

sirntar pushed a commit to sirntar/runtime that referenced this pull request Sep 30, 2024
@amanasifkhalid
Copy link
Member Author

Ah, good point... LSRA "layout" need not be EH aware at all.

While snooping around LSRA, I noticed this TODO where the logic for remembering the first cold location assumes the first cold block is the beginning of a contiguous cold section. As mentioned above, block layout cannot satisfy this property when we have EH regions, so keeping this state accurate is important, then we cannot rely on layout order for LSRA. The current RPO traversal doesn't ensure cold blocks are visited last either, so I think it's worth pursuing enabling this invariant as a next step, alongside getting loop-aware RPO checked in.

(Sorry for the recent silence on this front. I've been trying to figure out the source of a TP regression for a massive MinOpts method in #108147, but I haven't been able to get a good trace from pin on multiple machines. That change is a nice-to-have, so I guess we can get these tweaks into LSRA's FullOpts block order first.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants