Skip to content

Releases: philipturner/metal-flash-attention

v1.0.1

28 Jul 21:33
Compare
Choose a tag to compare

Added fused biases to GEMM.

v1.0.0

27 Jul 20:03
Compare
Choose a tag to compare

FlashAttention, dense and block-sparse.

The dense version consistently outperforms MPSGraph by one order of magnitude (3-5x). In some edge cases, that grows to two orders of magnitude (20x). MPSGraph is the modern API that Apple recommends for using Metal in machine learning applications.

The block-sparse version indirectly supports (and accelerates) triangular causal masks, but work distribution is sub-optimal. It is sometimes 60% faster than theoretically possible with dense, sometimes as slow as dense; performance is nondeterministic. This makes it the same as FlashAttention-2 from https://github.com/Dao-AILab/flash-attention.

v0.2.0-alpha

08 Jul 00:29
Compare
Choose a tag to compare
v0.2.0-alpha Pre-release
Pre-release

Added support for fused transposes and batched GEMM.

v0.1.0-alpha

06 Jul 17:07
Compare
Choose a tag to compare
v0.1.0-alpha Pre-release
Pre-release

Initial alpha release.