Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption for Mamba2 8B finetuning (Nemo1) #11599

Open
YasamanJafari opened this issue Dec 15, 2024 · 0 comments
Open

Memory consumption for Mamba2 8B finetuning (Nemo1) #11599

YasamanJafari opened this issue Dec 15, 2024 · 0 comments

Comments

@YasamanJafari
Copy link

In the documentation, it is mentioned that fine-tuning Mamba2 8B should be possible on 2 80GB A100s, which makes sense, and assuming everything to be fp32, the memory consumption is expected to be:

  • 32 GB for model params
  • 32 GB for gradients
  • 32 * 2 GB for optimizer states
    This will sum up to 128 GB, however, in practice, it takes around ~240 GB to fine-tune Mamba8B using Nemo1 scripts.

I would appreciate any information or explanation regarding this difference.

Thanks in advance for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant