Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] MSL: Fix dynamically indexed pull interpolants #2364

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ncesario-lunarg
Copy link

This is a first attempt at fixing the test failures mentioned in #1796 by copying scalarized/flattened arrays back into arrays in the fragment shader for each "unique" interpolate opration. This approach has as number of drawbacks such as negative impacts to runtime performance and shader size, as well as having limited coverage for dynamic offsets (InterpolateAtOffset) and samples (InterpolateAtSample).

This change currently fixes all the CTS tests mentioned in #1796 except dEQP-VK.pipeline.monolithic.multisample_interpolation.nonuniform_interpolant_indexing.offset, which requires some additional work (or running spirv-opt -O on the spirv before running spirv-cross also works), however, I'd like to get an idea of whether or not this approach would be acceptable in spirv-cross or an alternative is preferable.

This is a first attempt at fixing the test failures mentioned in KhronosGroup#1796
by copying scalarized/flattened arrays back into arrays in the fragment
shader for each "unique" interpolate opration. This approach has as
number of drawbacks such as negative impacts to runtime performance and
shader size, as well as having limited coverage for dynamic offsets
(InterpolateAtOffset) and samples (InterpolateAtSample).
@billhollings
Copy link
Contributor

billhollings commented Aug 15, 2024

FWIW...

The original 24 CTS failures in #1796 have already been fixed.

The 3 additional pull-model interpolation CTS failures that @aitor-lunarg added to #1796, are still occurring, but report a different error from the original (see #1796-comment).

So, the fixes here should just address pull-model interpolation, which I see it mostly does, but any extra aimed at the original 24 errors, would not be needed. However, I have not reviewed the code here in enough detail to confirm any unnecessary code.

@HansKristian-Work
Copy link
Contributor

It's very hard to reason from this PR what is being solved exactly. A solution that doesn't solve interpolateAtOffset and Samples is not good enough I think.

Do you have an example GLSL shader that demonstrates the problem? I barely remember what the pull model MSL stuff is and need a refresher.

@cdavis5e
Copy link
Contributor

I barely remember what the pull model MSL stuff is and need a refresher.

In Metal, pull-model interpolants use a special wrapper template, interpolant<T>. The various interpolations are then obtained using methods called on the interpolant object. The interpolant object itself cannot be used directly--all accesses must go through one of the interpolant accessor methods.

As I recall, the problem is that Metal only allows certain types in a stage_in struct; interpolant<T> is one of those types, but an array of interpolant<T> is not. I don't know if this is still true.

@ncesario-lunarg
Copy link
Author

ncesario-lunarg commented Aug 20, 2024

A solution that doesn't solve interpolateAtOffset and Samples is not good enough I think.

I agree. This particular change is trying to address the CTS failures, and if this is the correct approach there should be a follow up change to make it more general.

Do you have an example GLSL shader that demonstrates the problem?

The fragment shader from external/vulkancts/data/vulkan/amber/pipeline/nonuniform_interpolant_indexing/centroid.amber is an example of one of the CTS tests hitting this. In short, something like the following:

#version 450

layout(location = 0) sample in vec2 inVar[2];
layout(location = 0) out vec4 fs_out_color;

void main (void)
{
    int i = int(gl_FragCoord.x) % 2;

    // This is where spirv-cross will throw at https://github.com/KhronosGroup/SPIRV-Cross/blob/d33a39045dd8f5b87eadc3d4cc88b32918d1c41a/spirv_msl.cpp#L8395 due to i not being a constant.
    const float x = interpolateAtCentroid(inVar[i].x);

    fs_out_color = vec4(x, 0, 0, 1);
}

reproduces the issue. Replacing interpolateAtCentroid with any of the other interpolateAt* functions will have the same effect. With this proposed change, the resulting metal shader would be something like:

// ...

struct main0_in
{
    interpolant<float2, interpolation::perspective> inVar_0 [[user(locn0)]];
    interpolant<float2, interpolation::perspective> inVar_1 [[user(locn1)]];
};

fragment main0_out main0(main0_in in [[stage_in]], uint gl_SampleID [[sample_id]], float4 gl_FragCoord [[position]])
{
    main0_out out = {};
    spvUnsafeArray<float2, 2> inVar = {};

    // One new array per interpolate function times unique value parameter passed to that function.
    // This would obviously increase the shader size by quite a bit, though realistically perhaps not very
    // much after optimization. This could also be done via setting up a large "if/else" branch; I'm not
    // sure what's more amenable to metal.
    spvUnsafeArray<float2, 2> inVar_centroid_108;
    inVar[0] = in.inVar_0.interpolate_at_sample(gl_SampleID);
    inVar_centroid_108[0] = in.inVar_0.interpolate_at_centroid();
    inVar[1] = in.inVar_1.interpolate_at_sample(gl_SampleID);
    inVar_centroid_108[1] = in.inVar_1.interpolate_at_centroid();

    gl_FragCoord.xy += get_sample_position(gl_SampleID) - 0.5;
    int i = int(gl_FragCoord.x) % 2;

    // This is the "solution"
    float x = inVar2_centroid_108[i].x;

    out.fs_out_color += float4(x, 0.0, 0.0, 1.0);
    return out;
}

This change will work for the interpolateAt{Sample,Offset} cases, but the reason I say "limited coverage" for those cases is if we have something like:

// ...
float acc = 0.0;
for (int i = 0; i < 4; ++i) {
    acc += interpolateAtSample(inVar[i].x, i);
}
// ...

This is perhaps doable with loop unrolling, but if it's not possible to unroll the loop for some reason, I'm not sure how to approach it.

@HansKristian-Work
Copy link
Contributor

It seems like arrays should work, e.g. this compiles:

#pragma clang diagnostic ignored "-Wmissing-prototypes"

#include <metal_stdlib>
#include <simd/simd.h>

using namespace metal;

struct main0_out
{
    float4 FragColor [[color(0)]];
};

struct main0_in
{
    interpolant<float4, interpolation::perspective> Attr [[user(locn0)]];
    interpolant<float4, interpolation::perspective> Attr2 [[user(locn1)]];
	uint index [[user(locn2)]];
};

static inline __attribute__((always_inline))
float4 func(thread main0_in& in)
{
	thread interpolant<float4, interpolation::perspective> *Attrs[] = { &in.Attr, &in.Attr2 };
    return Attrs[in.index]->interpolate_at_sample(0);
}

fragment main0_out main0(main0_in in [[stage_in]])
{
    main0_out out = {};
    out.FragColor = func(in);
    return out;
}
define <{ <4 x float> }> @main0(%struct._interpolant_t addrspace(1)* %0, %struct._interpolant_t addrspace(1)* %1, i32 %2) local_unnamed_addr #0 {
  %4 = alloca [2 x %"struct.metal::interpolant"*], align 8
  %5 = alloca %struct.main0_in, align 8
  %6 = getelementptr inbounds %struct.main0_in, %struct.main0_in* %5, i64 0, i32 0, i32 0
  store %struct._interpolant_t addrspace(1)* %0, %struct._interpolant_t addrspace(1)** %6, align 8, !tbaa !20
  %7 = getelementptr inbounds %struct.main0_in, %struct.main0_in* %5, i64 0, i32 1, i32 0
  store %struct._interpolant_t addrspace(1)* %1, %struct._interpolant_t addrspace(1)** %7, align 8, !tbaa !27
  %8 = bitcast [2 x %"struct.metal::interpolant"*]* %4 to i8*
  call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %8) #3
  %9 = getelementptr inbounds [2 x %"struct.metal::interpolant"*], [2 x %"struct.metal::interpolant"*]* %4, i64 0, i64 0
  %10 = getelementptr inbounds %struct.main0_in, %struct.main0_in* %5, i64 0, i32 0
  store %"struct.metal::interpolant"* %10, %"struct.metal::interpolant"** %9, align 8, !tbaa !28
  %11 = getelementptr inbounds [2 x %"struct.metal::interpolant"*], [2 x %"struct.metal::interpolant"*]* %4, i64 0, i64 1
  %12 = getelementptr inbounds %struct.main0_in, %struct.main0_in* %5, i64 0, i32 1
  store %"struct.metal::interpolant"* %12, %"struct.metal::interpolant"** %11, align 8, !tbaa !28
  %13 = zext i32 %2 to i64
  %14 = getelementptr inbounds [2 x %"struct.metal::interpolant"*], [2 x %"struct.metal::interpolant"*]* %4, i64 0, i64 %13
  %15 = load %"struct.metal::interpolant"*, %"struct.metal::interpolant"** %14, align 8, !tbaa !28
  %16 = getelementptr inbounds %"struct.metal::interpolant", %"struct.metal::interpolant"* %15, i64 0, i32 0
  %17 = load %struct._interpolant_t addrspace(1)*, %struct._interpolant_t addrspace(1)** %16, align 8, !tbaa !30
  %18 = call fast <4 x float> @air.interpolate_sample_perspective.v4f32(%struct._interpolant_t addrspace(1)* nocapture readonly %17, i32 0) #2
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %8) #3
  %19 = insertvalue <{ <4 x float> }> undef, <4 x float> %18, 0
  ret <{ <4 x float> }> %19
}

@HansKristian-Work
Copy link
Contributor

If the array can be placed in a wrapper struct that resolves the pointer stuff, this sounds like the way to go. I'm not overly concerned with performance. This is an extremely esoteric feature that only has to work, not be optimal.

@ncesario-lunarg
Copy link
Author

It seems like arrays should work

Yep, that works and makes this so much easier; I should have tried that in metal first :/ I think I was making some erroneous assumptions based on

SPIRV-Cross/spirv_msl.cpp

Lines 8395 to 8396 in d33a390

SPIRV_CROSS_THROW("Trying to dynamically index into an array interface variable in tessellation. "
"This is currently unsupported.");
.

I've shifted to working on a different issue, but will try to find some cycles to wrap this up next week time permitting.

Thank you @HansKristian-Work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants