-
-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate v1.2.1 Causes Consistent Errors #2215
Labels
bug
Something isn't working
Comments
Spoke too soon. I eventually run into the same error again.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please check that this issue hasn't been reported before.
Expected Behavior
Fine tuning Qwen 2.5 14B Instruct runs without errors...
Current behaviour
Was consistently getting errors like this one across multiple different cloud GPU providers (AWS, runpod, lambda) when fine tuning with axolotl v0.6.0 which uses accelerate v1.2.1. Reverting to accelerate v1.1.0 immediately resolved the issue.
Errors typically looked like this:
Steps to reproduce
Use my config yml and any training data on 8xH100 or 8xA100
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.12
axolotl branch-commit
main/3742deb
Acknowledgements
The text was updated successfully, but these errors were encountered: