Add ops.nn.dot_product_attention
#20286
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I haven't check the performance yet, but I believe adding this operation will be beneficial since both torch and jax have optimized it in their codebases.
From a performance perspective, we should be able to replace
_compute_attention
inMultiHeadAttention
with this op if the input shapes are strictly 4D.EDITED:
It seems that CI is using
jax<=0.4.30
due to python version limitation. I implemented a pure numpy version ofdot_product_attention
for the unit tests.For backends:
jax.nn.dot_product_attention
if available. Otherwise, adapts the impl fromjax==0.4.33
.jax==0.4.31
(no customizablevmap
)jax==0.4.31
(no customizablevmap
)torch.nn.functional.scaled_dot_product_attention