-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for Pascal NaN redux #408
base: main
Are you sure you want to change the base?
Conversation
Force push over-rode but it isn't fixed.
Doesn't seem to work on my Tesla P40 card, unless oogabooga webui has some other underlying issue as well? I built your patch-1 repository by running |
Nevermind, its a bnb issue. I found the 8bit_test.py from that issue you linked and get the same |
The half vs float is something else. I tested this when doing inference from said webui. You built for all arch, I don't think it's default. 0cc4m's script tests HW matmul first and that SHOULD fail.. I will give it a try and see what happens. This script, you mean? https://gist.github.com/0cc4m/a753b6a16a618cfbe747a74920dc50f6 Reading it, it also patches BnB.. that and is for a much previous version. |
I did some testing. Load the model: Inference: Output generated in 17.68 seconds (2.60 tokens/s, 46 tokens, context 71, seed 572183632) Training: INFO:Loading raw text file dataset... It's slightly faster using the adamw_bnb_8bit optimizer. On this gen of card it will never be super great due to the lack of HW matmul... but hey, us and the $3k V100 people are in the same boat. 💯 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
One change is good, the other would be a degradation in speed. Lets discuss how to fix this while maintaining speed for other GPUs. |
Haven't really compared this recently with the new codebase. In one of the issues a commenter said that only the first change was required. |
@Ph0rk0z I would love to make this PR actionable somehow but somehow I'm still struggling to understand what Tim means. According to him one of the changes is good to go and the other "that adds lines which do 16 bit computation by casting the entire matrix to 16 bit that is more inefficient in many cases" needs improvement. Do you know what exactly he means and is this something we could wrap up together? Maybe we can already merge the change that's good to go and handle the other one separately?
|
I think I understand the change in The change in forward is at least limited in surface area to GPUs from Volta and older, so I suspect this is the part that is "good." Has anyone run the unit tests with these changes? |
Can be tried with just the fwd change to see if it still NaNs. I think people were saying it worked. I basically moved to GPTQ/GGUF and this languished a while so haven't been paying attention and re-testing. My bad. Sat so long I didn't think it would be accepted. |
I removed backwards pass so people can try it. Haven't had time to test on my machine yet, I'm down to my P6000 and P100 here. |
9b72679
to
7800734
Compare
Force push over-rode but it isn't fixed.
I have tried this and it works. Please test it on your other cards.
All credit goes to: richardwth
@richardwth
fixes: #165