Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Sequence / Context Parallelism #1972

Open
5 tasks done
dwzhu-pku opened this issue Oct 15, 2024 · 1 comment
Open
5 tasks done

Support for Sequence / Context Parallelism #1972

dwzhu-pku opened this issue Oct 15, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@dwzhu-pku
Copy link

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Support sequence / context parallelism to allow for the SFT of >128k tokens on A/H100 GPUs. With only 8H100 gpus, we can only manage to SFT of no more than 64k tokens now.

✔️ Solution

Axolotl is backboned with Accelerate and can already intergrate with many frameworks such as Deepspeed to utilize their features. But there is still no straightforward ways to use sequence / context parallelism with these intergrations. I guess maybe this repo can offer some clues: https://github.com/jzhang38/EasyContext . It seems that we only need to monkeypatch the model, and do some stuffs with the dataloading procedure.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@dwzhu-pku dwzhu-pku added the enhancement New feature or request label Oct 15, 2024
@chiragjn
Copy link
Contributor

+1
https://github.com/pytorch/torchtitan aslo has FSDP2 implementation that includes 4D parallelism with Context Parallelism

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants