-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing or redesigning the triangular
operator
#768
Comments
So CoreML's band_part appears to miss the shift offset parameter, found in other API's...
...,and unfortunately band_part has a very special meaning (higher level policy) for
Inspired by the existing API's above.
It's used in Stable Diffusion's text encoder: Though, the number of occurrences within a single model file (SD CLIP text encoder) is few - just one!
DML_DIAGONAL_MATRIX1_OPERATOR_DESC supports it directly.
The fallback is really for older DML versions (which shouldn't matter anymore after framework packages complete and the newest version is always available), but I see that 4D branch you're talking about (maybe I should extend DIAGONAL_MATRIX1 for the >4D case either way).
Unfortunately the primary motivating case (SD CLIP text encoder) excludes the diagonal line (k=1), and so that would not be sufficient. "input": [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9,10,11]],
"k": 1,
"output": [[0, 1, 2, 3],
[0, 0, 6, 7],
[0, 0, 0,11]], So, given the backend inconsistencies, rarity of occurrence in the model, and inability of CoreML to implement it directly, maybe this operator should be migrated to an aggregate operator (that functional concept Ningxin presented at TPAC2024), or decomposed when incompatible. I thought up this possible higher-level decomposition: tallTensor = iota(tallShape, {start:0, step: 1}); // sizes = [10,1], broadcast later in add
wideTensor = iota(wideShape, {start:0, step:-1}); // sizes = [1,10], broadcast later in add
// To select the lower triangle including diagonal, set shiftDelta = 0
// To select the lower triangle excluding diagonal, set shiftDelta = 1
// To select the upper triangle including diagonal, use lesserOrEqual and shiftDelta = 0.
booleanMask = greaterOrEqual(add(tallTensor, wideTensor), shiftDelta);
output = /*select*/where(booleanMask, input, 0.0);
// iota is a simple helper function like std::iota, filling a sequence and returning a reshaped tensor.
// It could be implemented by uploading a constant sequence generated CPU-side, or generated
// on-the-fly via cumulativeSum. Visualizing the broadcasted sum to generate a comparison mask: Interested to hear @huningxin's thoughts too. |
Thanks for the investigation, @fdwr!
I'm sure you'll be shocked to hear that I'm receptive to removing explicit support for this operator in WebNN :) I'd like to clarify the your use of the word "migrated", though. Are you suggesting that removal of this operator should be blocked on implementation of the "aggregate operator" proposal? Given the lack of cross-platform support and the straightforward decomposition (which browser backends are doing some form of today anyways!) I personally don't think we should block on that 🤔 |
I was looking into implementing the
triangular
operator in Chromium's CoreML backend when I realized... all three of Chromium's WebNN backends have to emulate thetriangular
operator using a mask!DML_DIAGONAL_MATRIX1_OPERATOR_DESC
operator in some circumstances, but otherwise must fall back to creating a maskband_part
operator in some circumstances, but otherwise must fall back to creating a maskNotably, CoreML's
band_part
operator is similar to DirectML'sDML_DIAGONAL_MATRIX1_OPERATOR_DESC
operator, except it's not capable of handling cases where the main diagonal is excluded from the mask. See https://crbug.com/374127244 for details.triangular
was added as part of in #478. Why was this specific behavior chosen? Did we consider alternative designs, especially since the operator as specified doesn't naively map to operators provided by any of the backends we're prototyping with?What are the use cases we care about? Would an alternative design that aligns more closely with
band_part
andDML_DIAGONAL_MATRIX1_OPERATOR_DESC
be sufficient for those use cases? For example, we could consider replacingtriangular
with adiagonal
(naming TBD) operator which starts from the main diagonal and offers to include additional diagonals in the upper and lower triangles. This would trivially map to the aforementioned CoreML and DirectML operatorsThe text was updated successfully, but these errors were encountered: