Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ada results (4070)! #20

Open
oscarbg opened this issue Apr 23, 2023 · 3 comments
Open

Ada results (4070)! #20

oscarbg opened this issue Apr 23, 2023 · 3 comments

Comments

@oscarbg
Copy link

oscarbg commented Apr 23, 2023

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA GeForce RTX 4070
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer<RGBA8>.Load random

Buffer<R8>.Load uniform: 8.139ms 1.463x
Buffer<R8>.Load linear: 8.502ms 1.401x
Buffer<R8>.Load random: 10.092ms 1.180x
Buffer<RG8>.Load uniform: 10.089ms 1.180x
Buffer<RG8>.Load linear: 9.256ms 1.287x
Buffer<RG8>.Load random: 8.333ms 1.429x
Buffer<RGBA8>.Load uniform: 8.154ms 1.460x
Buffer<RGBA8>.Load linear: 8.229ms 1.447x
Buffer<RGBA8>.Load random: 11.908ms 1.000x
Buffer<R16f>.Load uniform: 7.930ms 1.502x
Buffer<R16f>.Load linear: 7.982ms 1.492x
Buffer<R16f>.Load random: 8.136ms 1.464x
Buffer<RG16f>.Load uniform: 7.984ms 1.491x
Buffer<RG16f>.Load linear: 8.150ms 1.461x
Buffer<RG16f>.Load random: 11.856ms 1.004x
Buffer<RGBA16f>.Load uniform: 8.125ms 1.466x
Buffer<RGBA16f>.Load linear: 8.248ms 1.444x
Buffer<RGBA16f>.Load random: 8.623ms 1.381x
Buffer<R32f>.Load uniform: 7.950ms 1.498x
Buffer<R32f>.Load linear: 7.956ms 1.497x
Buffer<R32f>.Load random: 11.854ms 1.005x
Buffer<RG32f>.Load uniform: 7.944ms 1.499x
Buffer<RG32f>.Load linear: 8.006ms 1.487x
Buffer<RG32f>.Load random: 7.972ms 1.494x
Buffer<RGBA32f>.Load uniform: 15.772ms 0.755x
Buffer<RGBA32f>.Load linear: 15.816ms 0.753x
Buffer<RGBA32f>.Load random: 15.837ms 0.752x
ByteAddressBuffer.Load uniform: 7.205ms 1.653x
ByteAddressBuffer.Load linear: 6.301ms 1.890x
ByteAddressBuffer.Load random: 6.112ms 1.948x
ByteAddressBuffer.Load2 uniform: 10.265ms 1.160x
ByteAddressBuffer.Load2 linear: 9.044ms 1.317x
ByteAddressBuffer.Load2 random: 9.039ms 1.317x
ByteAddressBuffer.Load3 uniform: 12.291ms 0.969x
ByteAddressBuffer.Load3 linear: 12.033ms 0.990x
ByteAddressBuffer.Load3 random: 11.978ms 0.994x
ByteAddressBuffer.Load4 uniform: 15.934ms 0.747x
ByteAddressBuffer.Load4 linear: 19.940ms 0.597x
ByteAddressBuffer.Load4 random: 15.964ms 0.746x
ByteAddressBuffer.Load2 unaligned uniform: 10.423ms 1.142x
ByteAddressBuffer.Load2 unaligned linear: 9.037ms 1.318x
ByteAddressBuffer.Load2 unaligned random: 9.016ms 1.321x
ByteAddressBuffer.Load4 unaligned uniform: 15.938ms 0.747x
ByteAddressBuffer.Load4 unaligned linear: 19.903ms 0.598x
ByteAddressBuffer.Load4 unaligned random: 15.955ms 0.746x
StructuredBuffer<float>.Load uniform: 7.030ms 1.694x
StructuredBuffer<float>.Load linear: 5.768ms 2.064x
StructuredBuffer<float>.Load random: 5.749ms 2.071x
StructuredBuffer<float2>.Load uniform: 8.017ms 1.485x
StructuredBuffer<float2>.Load linear: 8.032ms 1.483x
StructuredBuffer<float2>.Load random: 5.807ms 2.051x
StructuredBuffer<float4>.Load uniform: 8.560ms 1.391x
StructuredBuffer<float4>.Load linear: 8.521ms 1.398x
StructuredBuffer<float4>.Load random: 8.696ms 1.369x
cbuffer{float4} load uniform: 78.939ms 0.151x
cbuffer{float4} load linear: 330.084ms 0.036x
cbuffer{float4} load random: 125.805ms 0.095x
Texture2D<R8>.Load uniform: 7.969ms 1.494x
Texture2D<R8>.Load linear: 7.993ms 1.490x
Texture2D<R8>.Load random: 7.967ms 1.495x
Texture2D<RG8>.Load uniform: 8.197ms 1.453x
Texture2D<RG8>.Load linear: 8.385ms 1.420x
Texture2D<RG8>.Load random: 8.205ms 1.451x
Texture2D<RGBA8>.Load uniform: 8.318ms 1.432x
Texture2D<RGBA8>.Load linear: 11.926ms 0.999x
Texture2D<RGBA8>.Load random: 16.152ms 0.737x
Texture2D<R16F>.Load uniform: 7.970ms 1.494x
Texture2D<R16F>.Load linear: 7.970ms 1.494x
Texture2D<R16F>.Load random: 7.979ms 1.492x
Texture2D<RG16F>.Load uniform: 7.979ms 1.492x
Texture2D<RG16F>.Load linear: 12.097ms 0.984x
Texture2D<RG16F>.Load random: 16.136ms 0.738x
Texture2D<RGBA16F>.Load uniform: 8.157ms 1.460x
Texture2D<RGBA16F>.Load linear: 21.618ms 0.551x
Texture2D<RGBA16F>.Load random: 31.902ms 0.373x
Texture2D<R32F>.Load uniform: 7.944ms 1.499x
Texture2D<R32F>.Load linear: 12.044ms 0.989x
Texture2D<R32F>.Load random: 16.292ms 0.731x
Texture2D<RG32F>.Load uniform: 7.999ms 1.489x
Texture2D<RG32F>.Load linear: 21.805ms 0.546x
Texture2D<RG32F>.Load random: 31.726ms 0.375x
Texture2D<RGBA32F>.Load uniform: 15.820ms 0.753x
Texture2D<RGBA32F>.Load linear: 32.516ms 0.366x
Texture2D<RGBA32F>.Load random: 31.546ms 0.377x
Texture2D<R8>.Sample(nearest) uniform: 16.020ms 0.743x
Texture2D<R8>.Sample(nearest) linear: 15.839ms 0.752x
Texture2D<R8>.Sample(nearest) random: 16.225ms 0.734x
Texture2D<RG8>.Sample(nearest) uniform: 16.323ms 0.730x
Texture2D<RG8>.Sample(nearest) linear: 15.803ms 0.754x
Texture2D<RG8>.Sample(nearest) random: 15.788ms 0.754x
Texture2D<RGBA8>.Sample(nearest) uniform: 15.974ms 0.745x
Texture2D<RGBA8>.Sample(nearest) linear: 16.169ms 0.736x
Texture2D<RGBA8>.Sample(nearest) random: 16.185ms 0.736x
Texture2D<R16F>.Sample(nearest) uniform: 16.365ms 0.728x
Texture2D<R16F>.Sample(nearest) linear: 16.029ms 0.743x
Texture2D<R16F>.Sample(nearest) random: 15.818ms 0.753x
Texture2D<RG16F>.Sample(nearest) uniform: 15.780ms 0.755x
Texture2D<RG16F>.Sample(nearest) linear: 16.151ms 0.737x
Texture2D<RG16F>.Sample(nearest) random: 15.795ms 0.754x
Texture2D<RGBA16F>.Sample(nearest) uniform: 16.326ms 0.729x
Texture2D<RGBA16F>.Sample(nearest) linear: 16.014ms 0.744x
Texture2D<RGBA16F>.Sample(nearest) random: 31.503ms 0.378x
Texture2D<R32F>.Sample(nearest) uniform: 16.004ms 0.744x
Texture2D<R32F>.Sample(nearest) linear: 15.830ms 0.752x
Texture2D<R32F>.Sample(nearest) random: 16.198ms 0.735x
Texture2D<RG32F>.Sample(nearest) uniform: 15.928ms 0.748x
Texture2D<RG32F>.Sample(nearest) linear: 15.985ms 0.745x
Texture2D<RG32F>.Sample(nearest) random: 31.506ms 0.378x
Texture2D<RGBA32F>.Sample(bilinear) uniform: 31.343ms 0.380x
Texture2D<RGBA32F>.Sample(nearest) linear: 31.767ms 0.375x
Texture2D<RGBA32F>.Sample(nearest) random: 31.557ms 0.377x
Texture2D<R8>.Sample(bilinear) uniform: 15.994ms 0.745x
Texture2D<R8>.Sample(bilinear) linear: 16.214ms 0.734x
Texture2D<R8>.Sample(bilinear) random: 15.821ms 0.753x
Texture2D<RG8>.Sample(bilinear) uniform: 15.786ms 0.754x
Texture2D<RG8>.Sample(bilinear) linear: 15.774ms 0.755x
Texture2D<RG8>.Sample(bilinear) random: 15.800ms 0.754x
Texture2D<RGBA8>.Sample(bilinear) uniform: 15.939ms 0.747x
Texture2D<RGBA8>.Sample(bilinear) linear: 15.820ms 0.753x
Texture2D<RGBA8>.Sample(bilinear) random: 15.778ms 0.755x
Texture2D<R16F>.Sample(bilinear) uniform: 15.992ms 0.745x
Texture2D<R16F>.Sample(bilinear) linear: 15.820ms 0.753x
Texture2D<R16F>.Sample(bilinear) random: 15.821ms 0.753x
Texture2D<RG16F>.Sample(bilinear) uniform: 15.756ms 0.756x
Texture2D<RG16F>.Sample(bilinear) linear: 15.796ms 0.754x
Texture2D<RG16F>.Sample(bilinear) random: 15.760ms 0.756x
Texture2D<RGBA16F>.Sample(bilinear) uniform: 15.779ms 0.755x
Texture2D<RGBA16F>.Sample(bilinear) linear: 15.790ms 0.754x
Texture2D<RGBA16F>.Sample(bilinear) random: 31.697ms 0.376x
Texture2D<R32F>.Sample(bilinear) uniform: 15.805ms 0.753x
Texture2D<R32F>.Sample(bilinear) linear: 15.847ms 0.751x
Texture2D<R32F>.Sample(bilinear) random: 15.996ms 0.744x
Texture2D<RG32F>.Sample(bilinear) uniform: 15.761ms 0.756x
Texture2D<RG32F>.Sample(bilinear) linear: 15.770ms 0.755x
Texture2D<RG32F>.Sample(bilinear) random: 31.517ms 0.378x
Texture2D<RGBA32F>.Sample(bilinear) uniform: 62.698ms 0.190x
Texture2D<RGBA32F>.Sample(bilinear) linear: 62.823ms 0.190x
Texture2D<RGBA32F>.Sample(bilinear) random: 93.925ms 0.127x
@oscarbg oscarbg changed the title Ada results (4070) Ada results (4070)! Apr 23, 2023
@TravisGesslein
Copy link

issue is quite old, but looks like there might be something off with the test? the cbuffer results look horrendously slow

@oscarbg
Copy link
Author

oscarbg commented Dec 24, 2023

thanks for remind to test again..
I have done an update to latest 545 vs 530 at the time drivers:
one cbuffer change:
before
cbuffer{float4} load uniform: 78.939ms 0.151x
after:
cbuffer{float4} load uniform: 69.179ms 0.172x

EDIT: latest cbuffer results with max OC in this card:
cbuffer{float4} load uniform: 57.818ms 0.188x
cbuffer{float4} load linear: 305.236ms 0.036x
cbuffer{float4} load random: 115.551ms 0.094x

@Dolkar
Copy link

Dolkar commented Jun 17, 2024

I see similar results on a 4070 Super. I also have another observation, though:
If I remove the masking with a runtime constant like so:

//uint elemIdx = (htid + i) | loadConstants.elementsMask;
uint elemIdx = (htid + i);

I get the following results instead:

cbuffer{float4} load uniform: 2.533ms 3.950x
cbuffer{float4} load linear: 273.011ms 0.037x
cbuffer{float4} load random: 100.094ms 0.100x

whereas in other cases like for structured buffers the change doesn't seem to make much of a difference.
It seems like for some reason Ada struggles with dynamically indexing into the constant buffer here, even if the index is uniform. But when the index ends up as a constant, it outperforms the other buffers again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants