Sparse attention support #76

neverix · 2021-12-04T20:14:56Z

Currently, the inference code creates the entire attention matrix and then masks it. Sparse attention implementations like Triton are more efficient. Does the pre-training code support sparse attention? Will it ever be released?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse attention support #76

Sparse attention support #76

neverix commented Dec 4, 2021

Sparse attention support #76

Sparse attention support #76

Comments

neverix commented Dec 4, 2021