-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix calculate_shard_storages to handle optimizer correctly #2652
fix calculate_shard_storages to handle optimizer correctly #2652
Conversation
Hi, @henrylhtsang @joshuadeng @PaulZhang12 @TroyGarden can you take a look? |
hi, @sarckk @dstaay-fb can you take a look? |
@sarckk has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@tiankongdeguiji thanks for the fix! looks good to me. I'll help you land the changes (EDIT: probably after 1st Jan) |
thx! |
hi, @sarckk can you help me land the changes? Are there any existing issues I should address? |
Hi sorry for the delay. This increases HBM and DDR estimates across the board so I'm running extra checks to avoid any regressions internally. I will land it by this week |
thx! |
@tiankongdeguiji thanks for the patience. your diff has been landed |
The parameter no longer possesses the
optimizer_class
attribute; instead, it has been updated tooptimizer_classes
. However, the current implementation of theEmbeddingStorageEstimator
still relies on the outdatedoptimizer_class
attribute to determine whether the parameter includes an optimizer. As a result, theEmbeddingStorageEstimator
fails to estimate the storage requirements for the optimizer, leading to CUDA out-of-memory errors during training.