-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSDP fails with [ lm_head ] layer #332
Comments
Can you provide reproducible code? Liger is working fine with fsdp in https://github.com/linkedin/Liger-Kernel/tree/main/examples/huggingface |
Yep, it working fine when training on linear and embedding layers, but lm_head. I'll try with newest commits today. |
Nope, still fails:
config.yaml for LLaMA-Factory:
|
@gotzmann i think this is due to the constraint of FSDP-1. FSDP-2 should resolve the issues but it is still under experimentation. Any specific reason for you to split lm_head too? |
I believe that's because
because at this point Embedding layer is fine because no liger kernel or any other code in llama factory is trying to perform a standalone call on it. However I know lightning does an interesting trick https://github.com/Lightning-AI/pytorch-lightning/blob/d3f9c83d6efa4f1def36aa6c199600946cdb9117/src/lightning/pytorch/strategies/strategy.py#L601-L648 to make sure these kind of operations can work smoothly under
|
🐛 Describe the bug
I'm trying to train LLaMA model with all linear layers + embeddings and head.
Whilst embeddings have no problems with FSDP over Liger, there always exceptions when [ lm_head ] is added.
I've tried different versions and latest patches not yet merged, but still getting the error:
RuntimeError: size mismatch, got input (2), mat (2x4096), vec (65667072)
Reproduce
accelerate launch --config_file fsdp.yaml src/train.py sft.yaml
Versions
v3.1 and others, too
The text was updated successfully, but these errors were encountered: