New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

gpt_big_code: make flash attention impl quantization friendly #1282

Merged

regisss merged 1 commit into huggingface:main from mgonchar:main_gpt_bigcode_quant_friendly_fsdpa

Sep 25, 2024

Commits on Sep 25, 2024

gpt_big_code: make flash attention impl quantization friendly
```
- introduce GaudiGPTBigCodeAttention class
- wrapped FusedSDPA kernel to separate ModuleFusedSDPA class
```
mgonchar committed Sep 25, 2024
Configuration menu
View commit details

Copy full SHA for 88ad54e

Browse repository at this point
Copy the full SHA

88ad54e View commit details

Browse the repository at this point in the history