Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use new System.Text.Ascii APIs, remove internal helpers #48368

Merged
merged 5 commits into from
Jul 17, 2023

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented May 22, 2023

This PR does the following:

  • replaces a BytesOrdinalEqualsStringAndAscii with Ascii.Equals
  • replaces a AsciiIgnoreCaseEquals with Ascii.EqualsIgnoreCase
  • replaces a IsAscii with Ascii.IsValid

Since only BytesOrdinalEqualsStringAndAscii was vectorized so far and all new Ascii APIs are vectorized, I provided benchmarks only for BytesOrdinalEqualsStringAndAscii vs Ascii.Equals.

Source code, results:

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1702)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100-preview.4.23259.14
  [Host]     : .NET 8.0.0 (8.0.23.25905), X64 RyuJIT AVX2
  Job-CVKHLH : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

OutlierMode=DontRemove  LaunchCount=9 MemoryRandomization=True
Method Size Equal Mean Ratio Allocated
SystemAscii 6 False 1.687 ns 1.00 -
AspNet 6 False 2.810 ns 1.67 -
SystemAscii 6 True 4.388 ns 1.00 -
AspNet 6 True 3.536 ns 0.81 -
SystemAscii 32 False 2.150 ns 1.00 -
AspNet 32 False 3.074 ns 1.43 -
SystemAscii 32 True 3.435 ns 1.00 -
AspNet 32 True 3.304 ns 0.96 -
SystemAscii 64 False 2.120 ns 1.00 -
AspNet 64 False 3.075 ns 1.45 -
SystemAscii 64 True 5.033 ns 1.00 -
AspNet 64 True 4.681 ns 0.93 -

Summary:

  • Ascii.Equals finishes faster when the inputs don't match
  • the BytesOrdinalEqualsStringAndAscii helper is slightly faster when the inputs are equal (20% for 6 characters, 4% for 32 chars, 7% for 64 chars). Most probably the reason for that is that the new Ascii APIs check both inputs (left and right) for containing invalid Ascii characters, while the existing ASP.NET helper does it only for one of the inputs (as it knows that the other one is always valid).

@@ -54,7 +54,8 @@ private static ulong GetAsciiStringAsLong(string str)
{
Debug.Assert(str.Length == 8, "String must be exactly 8 (ASCII) characters long.");

var bytes = Encoding.ASCII.GetBytes(str);
Span<byte> bytes = stackalloc byte[8];
Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is compiled away in Release mode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my surprise you are right. Thank you for catching this.

using System.Buffers;
using System.Buffers.Binary;
using System.Diagnostics;
using System.Text;

namespace DebugProof
{
    internal class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(Test8("12345678"));
            Console.WriteLine(Test4("1234"));
        }

        static ulong Test8(string str)
        {
            Debug.Assert(str.Length == 8, "String must be exactly 8 (ASCII) characters long.");

            Span<byte> bytes = stackalloc byte[8];
            Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done);

            return BinaryPrimitives.ReadUInt64LittleEndian(bytes);
        }

        static ulong Test4(string str)
        {
            Debug.Assert(str.Length == 4, "String must be exactly 4 (ASCII) characters long.");

            Span<byte> bytes = stackalloc byte[4];
            Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done);

            return BinaryPrimitives.ReadUInt32LittleEndian(bytes);
        }
    }
}
PS C:\Users\adsitnik\source\repos\DebugProof> dotnet run -c Debug
4050765991979987505
875770417
PS C:\Users\adsitnik\source\repos\DebugProof> dotnet run -c Release
0
0

@@ -339,7 +340,7 @@ private void OnOriginFormTarget(TargetOffsetPathLength targetPath, Span<byte> ta
var previousValue = _parsedRawTarget;
if (ServerOptions.DisableStringReuse ||
previousValue == null || previousValue.Length != target.Length ||
!StringUtilities.BytesOrdinalEqualsStringAndAscii(previousValue, target))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does a null check as well, we'd need to make sure that's ok for all the changed callsites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does a null check as well, we'd need to make sure that's ok for all the changed callsites.

I am not sure if I understand. So far all the callers of StringUtilities.BytesOrdinalEqualsStringAndAscii ensured that the input is not null. But if null would be sent to StringUtilities.BytesOrdinalEqualsStringAndAscii , it would throw. With my changes, it won't.

Would you like me to keep the old helper method that simply performs a debug assert for the input and calls the new Ascii API to do the job?

bool BytesOrdinalEqualsStringAndAscii(string previousValue, ReadOnlySpan<byte> newValue)
{
    Debug.Assert(previousValue is not null);

    return Ascii.Equals(previousValue, newValue);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mistyped. I meant it checks for 0 not null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mistyped. I meant it checks for 0 not null.

The 0 check is performed by TryGetAsciiString which I am not touching in this PR (on purpose)

if (!CheckBytesInAsciiRange(vector, avxZero))

private static bool CheckBytesInAsciiRange(Vector<sbyte> check)
{
// Vectorized byte range check, signed byte > 0 for 1-127
return Vector.GreaterThanAll(check, Vector<sbyte>.Zero);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also done in BytesOrdinalEqualsStringAndAscii which is why I'm bringing it up
https://github.com/dotnet/aspnetcore/blob/main/src/Shared/ServerInfrastructure/StringUtilities.cs#L522

Copy link
Member Author

@adamsitnik adamsitnik May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look at BytesOrdinalEqualsStringAndAscii.

If the remainder of the vectorized loop contains zero (or the input is simply too small to go the vectorized code path), but the inputs are equal it can still return true:

if (offset < count)
{
var ch = (char)Unsafe.Add(ref bytes, offset);
if (((ch & 0x80) != 0) || Unsafe.Add(ref str, offset) != ch)
{
goto NotEqual;
}
}
// End of input reached, there are no inequalities via widening; so the input bytes are both ascii
// and a match to the string if it was converted via Encoding.ASCII.GetString(...)
return true;

But the most important thing is that it checks for zero only one of the inputs:

var vector = Unsafe.ReadUnaligned<Vector<sbyte>>(ref Unsafe.Add(ref bytes, offset));
if (!CheckBytesInAsciiRange(vector))

And from what I can see the other input is always a const (known header for example):

$@"// Matched a known header

It seems that only one of the inputs may contain null characters (because the other one is typically a pre-defined const) and hence Ascii.Equals will always return false in such cases because the inputs will simply not be equal?

@BrennanConroy is my understanding correct?

Edit: I just realized that the existing helper method ensures that the string input does not contain zeros:

https://github.com/dotnet/aspnetcore/blob/main/src/Shared/ServerInfrastructure/StringUtilities.cs#LL418C22-L418C41

private static bool IsValidHeaderString(string value)
{
// Method for Debug.Assert to ensure BytesOrdinalEqualsStringAndAscii
// is not called with an unvalidated string comparitor.
try
{
if (value is null)
{
return false;
}
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true).GetByteCount(value);
return !value.Contains('\0');

So it should be safe to switch to Ascii.Equals, but to be extra safe I can keep the old helper method just to verify the input and delegate to Ascii.Equals:

public static bool BytesOrdinalEqualsStringAndAscii(string previousValue, ReadOnlySpan<byte> newValue)
{
    // previousValue is a previously materialized string which *must* have already passed validation.
    Debug.Assert(IsValidHeaderString(previousValue));
    
    return Ascii.Equals(previousValue, newValue);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment for the Http1Connection.cs usage, since the strings being compared are materialized via GetAsciiStringNonNullCharacters which does the null check for us.

I think the HttpHeaders.Generated.cs usage is also fine since it's only comparing against known header values which definitely don't have \0 in them.

So it should be safe to switch to Ascii.Equals, but to be extra safe I can keep the old helper method just to verify the input and delegate to Ascii.Equals:

That would be nice 😃

* reintroduce StringUtilities.BytesOrdinalEqualsStringAndAscii
* add an assert that ensures that at least one of the inputs does not contain null character
* delegate to Ascii.Equals
@adamsitnik adamsitnik marked this pull request as ready for review May 25, 2023 14:21
@adamsitnik
Copy link
Member Author

@BrennanConroy I have added benchmarks with results, PTAL.

// Assert
Assert.False(result);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we ran this against the new implementation to make sure that we didn't introduce any behavior diff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we ran this against the new implementation to make sure that we didn't introduce any behavior diff?

That is a very good question. We have not, as the new APIs have great test coverage in dotnet/runtime:
https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.Encoding/tests/Ascii

including things like boundaries checks for the vectorized code:

https://github.com/dotnet/runtime/blob/081f87c02d69184453111d80dd66baf672ec5b4e/src/libraries/System.Text.Encoding/tests/Ascii/IsValidCharTests.cs#L90-L99

but if you want I can contribute the removed ASP.NET tests to dotnet/runtime test suite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least try undeleting them locally and running them against the new implementation to see if there are differences.

Copy link
Member

@javiercn javiercn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@BrennanConroy
Copy link
Member

Most probably the reason for that is that the new Ascii APIs check both inputs (left and right) for containing invalid Ascii characters, while the existing ASP.NET helper does it only for one of the inputs (as it knows that the other one is always valid)

Maybe I'm missing something, but does the Ascii API need to check both inputs? If it's doing an Equals check and only checks one input for valid Ascii, wouldn't it implicitly be checking the second input for valid Ascii via the equals check?

@Tratcher
Copy link
Member

Does it throw or return false for invalid ascii input?

@BrennanConroy
Copy link
Member

/benchmark plaintext aspnet-citrine-lin kestrel

@BrennanConroy
Copy link
Member

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 25, 2023

Benchmark started for plaintext on aspnet-citrine-lin with kestrel. Logs: link

@BrennanConroy
Copy link
Member

BrennanConroy commented May 25, 2023

I just tried the benchmark on one of my machines and got much bigger gaps in the happy path.

BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1702/22H2/2022Update/SunValley2)
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=8.0.100-preview.5.23275.7
  [Host]     : .NET 8.0.0 (8.0.23.27214), X64 RyuJIT AVX2
  Job-DWFIYY : .NET 8.0.0 (8.0.23.27214), X64 RyuJIT AVX2

OutlierMode=DontRemove  MemoryRandomization=True
Method Size Equal Mean Error StdDev Median Ratio RatioSD
SystemAscii 6 False 1.732 ns 0.0219 ns 0.0205 ns 1.729 ns 1.00 0.00
AspNet 6 False 2.440 ns 0.0318 ns 0.0297 ns 2.433 ns 1.41 0.02
SystemAscii 6 True 5.046 ns 0.0264 ns 0.0247 ns 5.045 ns 1.00 0.00
AspNet 6 True 3.585 ns 0.0275 ns 0.0257 ns 3.578 ns 0.71 0.01
SystemAscii 32 False 2.309 ns 0.0602 ns 0.0563 ns 2.296 ns 1.00 0.00
AspNet 32 False 2.914 ns 0.0561 ns 0.0525 ns 2.892 ns 1.26 0.04
SystemAscii 32 True 3.866 ns 0.1062 ns 0.2262 ns 3.747 ns 1.00 0.00
AspNet 32 True 3.213 ns 0.0628 ns 0.0587 ns 3.233 ns 0.82 0.05
SystemAscii 64 False 2.822 ns 0.0861 ns 0.1289 ns 2.857 ns 1.00 0.00
AspNet 64 False 3.383 ns 0.0979 ns 0.1128 ns 3.393 ns 1.20 0.07
SystemAscii 64 True 7.259 ns 0.2143 ns 0.6317 ns 6.939 ns 1.00 0.00
AspNet 64 True 4.861 ns 0.0835 ns 0.0781 ns 4.858 ns 0.68 0.04

Runtime version 8.0.0-preview.5.23272.14

@BrennanConroy
Copy link
Member

Benchmark run:

application plaintext.base plaintext.pr
CPU Usage (%) 99 100 +1.01%
Cores usage (%) 2,781 2,791 +0.36%
Working Set (MB) 126 126 0.00%
Private Memory (MB) 656 654 -0.30%
Build Time (ms) 3,869 3,436 -11.19%
Start Time (ms) 205 213 +3.90%
Published Size (KB) 96,826 96,826 0.00%
Symbols Size (KB) 53 53 0.00%
.NET Core SDK Version 8.0.100-preview.5.23275.7 8.0.100-preview.5.23275.7
load plaintext.base plaintext.pr
CPU Usage (%) 98 98 0.00%
Cores usage (%) 2,752 2,734 -0.65%
Working Set (MB) 48 48 0.00%
Private Memory (MB) 370 370 0.00%
Start Time (ms) 0 0
First Request (ms) 96 97 +1.04%
Requests/sec 11,689,046 11,630,543 -0.50%
Requests 176,411,336 175,462,792 -0.54%
Mean latency (ms) 1.33 1.27 -4.51%
Max latency (ms) 56.13 60.84 +8.39%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1,402.88 1,392.64 -0.73%
Latency 50th (ms) 0.68 0.70 +2.20%
Latency 75th (ms) 1.04 1.05 +0.96%
Latency 90th (ms) 2.22 2.12 -4.50%
Latency 99th (ms) 14.11 16.74 +18.64%

@adamsitnik
Copy link
Member Author

@BrennanConroy Could you please apply [ProcessCount(9)] and re-run the benchmarks? It will instruct the BDN to benchmark every scenario nine times (every time in a dedicated processes) and combined with [MemoryRandomization] it should give us a better representation of entire distribution.

Which scenario is the most common?

Is my understanding correct that the TechEmpower benchmarks run show basically no difference (all values seem to be within the range of error)?

@BrennanConroy
Copy link
Member

I believe I figured out why Ascii.Equals is 20-30% slower than BytesOrdinalEqualsStringAndAscii. I made some changes and local testing is now showing Ascii.Equals is 20-30% faster than BytesOrdinalEqualsStringAndAscii.

I'll try to open a draft PR in runtime with the 3 changes needed to optimize it.

@amcasey amcasey added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Jun 6, 2023
@ghost
Copy link

ghost commented Jun 13, 2023

Looks like this PR hasn't been active for some time and the codebase could have been changed in the meantime.
To make sure no breaking changes are introduced, please leave an /azp run comment here to rerun the CI pipeline and confirm success before merging the change.

@ghost ghost added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Jun 13, 2023
@adamsitnik
Copy link
Member Author

@BrennanConroy optimizations got merged to runtime: dotnet/runtime#87141

@javiercn @davidfowl Can I merge the PR now or should I wait until the changes propagate to this repo?

@adamsitnik
Copy link
Member Author

Does it throw or return false for invalid ascii input?

@Tratcher apologies, I've missed your question. It returns false for invalid ascii

@Tratcher
Copy link
Member

Tratcher commented Jul 6, 2023

/azp run

@ghost ghost removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Jul 6, 2023
@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@adamsitnik adamsitnik merged commit d994d31 into dotnet:main Jul 17, 2023
22 checks passed
@ghost ghost added this to the 8.0-preview7 milestone Jul 17, 2023
@adamsitnik adamsitnik deleted the newAsciiApi branch July 17, 2023 15:31
@davidfowl
Copy link
Member

Great job @BrennanConroy and @adamsitnik !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants