-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpt_big_code: make flash attention impl quantization friendly #1282
gpt_big_code: make flash attention impl quantization friendly #1282
Conversation
bb9418e
to
31b1591
Compare
Hi @imangohari1 the PR you are referring to is fixing 2 particular test cases for particular model. Which tests exactly do you want me to run? |
|
Sorry, I don't quite understand what do you mean in item (1), which code I should modify to test it? I did manual launches and will provide logs for (2) |
Thanks. |
Hi @imangohari1 here are results: my git history during test
So, it becomes better with my change :) In fact I believe this is some noise in passrates, delta consists of these tests, which are not related at all to my change
|
31b1591
to
745f74d
Compare
rebased and retested, results are same |
@mgonchar |
Hi @imangohari1 this change is not doing much of a thing, it basically just wraps fused kernel call to separate class with To test it you may try to launch
and
In second case flash attention is not used and my code is not involved in calculation. To test for possible side effects of this change you may rollback and launch
with and without my commit. Also there is a test, covering I launched all of this during my manual tests and it was working. |
I have tested this cmds before and after. I think they are fine. p.s. this PR has been tested with an integration branch and the results looked fine. CI #255 |
hi @imangohari1 thanks for having a look. There is no comments from @jiminha in this PR, you've probably mixed it with another one here If you've tested it, I think this PR should be good to go to CI and merge? Can you please trigger it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The PR is tested both locally and with Integrated CI 255.
Please run |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
- introduce GaudiGPTBigCodeAttention class - wrapped FusedSDPA kernel to separate ModuleFusedSDPA class
745f74d
to
88ad54e
Compare
@regisss rebased, retested, style corrected |
Before submitting