Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing or redesigning the triangular operator #768

Open
a-sully opened this issue Oct 18, 2024 · 2 comments
Open

Consider removing or redesigning the triangular operator #768

a-sully opened this issue Oct 18, 2024 · 2 comments

Comments

@a-sully
Copy link
Contributor

a-sully commented Oct 18, 2024

I was looking into implementing the triangular operator in Chromium's CoreML backend when I realized... all three of Chromium's WebNN backends have to emulate the triangular operator using a mask!

Backend Strategy
DirectML May use the DML_DIAGONAL_MATRIX1_OPERATOR_DESC operator in some circumstances, but otherwise must fall back to creating a mask
TFLite Always creates a(n unoptimized, for now) mask
CoreML May use the band_part operator in some circumstances, but otherwise must fall back to creating a mask

Notably, CoreML's band_part operator is similar to DirectML's DML_DIAGONAL_MATRIX1_OPERATOR_DESC operator, except it's not capable of handling cases where the main diagonal is excluded from the mask. See https://crbug.com/374127244 for details.

triangular was added as part of in #478. Why was this specific behavior chosen? Did we consider alternative designs, especially since the operator as specified doesn't naively map to operators provided by any of the backends we're prototyping with?

What are the use cases we care about? Would an alternative design that aligns more closely with band_part and DML_DIAGONAL_MATRIX1_OPERATOR_DESC be sufficient for those use cases? For example, we could consider replacing triangular with a diagonal (naming TBD) operator which starts from the main diagonal and offers to include additional diagonals in the upper and lower triangles. This would trivially map to the aforementioned CoreML and DirectML operators

@fdwr
Copy link
Collaborator

fdwr commented Oct 25, 2024

So CoreML's band_part appears to miss the shift offset parameter, found in other API's...

...,and unfortunately band_part has a very special meaning (higher level policy) for -1 in the lower and upper offsets rather than just allowing a free form positive/negative range which would permit range shifting. 🤔 The limitation that the diagonal can't be excluded is the clincher.

Why was this specific behavior chosen?

Inspired by the existing API's above.

What are the use cases we care about?

It's used in Stable Diffusion's text encoder:

image

Though, the number of occurrences within a single model file (SD CLIP text encoder) is few - just one!

... since the operator as specified doesn't naively map to operators provided by any of the backends we're prototyping with? ...

DML_DIAGONAL_MATRIX1_OPERATOR_DESC supports it directly.

May use the DML_DIAGONAL_MATRIX1_OPERATOR_DESC operator in some circumstances, but otherwise must fall back to creating a mask...

The fallback is really for older DML versions (which shouldn't matter anymore after framework packages complete and the newest version is always available), but I see that 4D branch you're talking about (maybe I should extend DIAGONAL_MATRIX1 for the >4D case either way).

operator which starts from the main diagonal and offers to include additional diagonals in the upper and lower triangles

Unfortunately the primary motivating case (SD CLIP text encoder) excludes the diagonal line (k=1), and so that would not be sufficient.

      "input": [[0, 1, 2, 3],
                [4, 5, 6, 7],
                [8, 9,10,11]],
      "k": 1,
      "output": [[0, 1, 2, 3],
                 [0, 0, 6, 7],
                 [0, 0, 0,11]],

So, given the backend inconsistencies, rarity of occurrence in the model, and inability of CoreML to implement it directly, maybe this operator should be migrated to an aggregate operator (that functional concept Ningxin presented at TPAC2024), or decomposed when incompatible. I thought up this possible higher-level decomposition:

tallTensor = iota(tallShape, {start:0, step: 1}); // sizes = [10,1], broadcast later in add
wideTensor = iota(wideShape, {start:0, step:-1}); // sizes = [1,10], broadcast later in add
// To select the lower triangle including diagonal, set shiftDelta = 0
// To select the lower triangle excluding diagonal, set shiftDelta = 1
// To select the upper triangle including diagonal, use lesserOrEqual and shiftDelta = 0.
booleanMask = greaterOrEqual(add(tallTensor, wideTensor), shiftDelta);
output = /*select*/where(booleanMask, input, 0.0);

// iota is a simple helper function like std::iota, filling a sequence and returning a reshaped tensor.
// It could be implemented by uploading a constant sequence generated CPU-side, or generated
// on-the-fly via cumulativeSum.

Visualizing the broadcasted sum to generate a comparison mask:
image

Interested to hear @huningxin's thoughts too.

@a-sully
Copy link
Contributor Author

a-sully commented Oct 28, 2024

Thanks for the investigation, @fdwr!

maybe this operator should be migrated to an aggregate operator

I'm sure you'll be shocked to hear that I'm receptive to removing explicit support for this operator in WebNN :)

I'd like to clarify the your use of the word "migrated", though. Are you suggesting that removal of this operator should be blocked on implementation of the "aggregate operator" proposal?

Given the lack of cross-platform support and the straightforward decomposition (which browser backends are doing some form of today anyways!) I personally don't think we should block on that 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants