Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about flops calculation #981

Closed
CSlearnerZM opened this issue Jun 25, 2023 · 2 comments · Fixed by #1044
Closed

A question about flops calculation #981

CSlearnerZM opened this issue Jun 25, 2023 · 2 comments · Fixed by #1044

Comments

@CSlearnerZM
Copy link

I try to understand the following function get_flops to calculate flops, but I still can't figure out why the parameter attn needs to multiply a constant value 60.

The paper Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM provides a formula for calculating flops similar to get_flops(Page 12). When we don't use the activation-checkpointing, I think attn needs to multiply 12 not 60 according to the formula. Of course, it is also possible that I misunderstood.

截屏2023-06-25 11 19 39

Could you help me to solve the above problem? Thanks!!!

@dashstander
Copy link
Contributor

I think you're right! Thanks for bringing this to our attention @CSlearnerZM

@dashstander
Copy link
Contributor

Ok, I have a branch that pretty much ports over the Megatron-DeepSpeed calculation from here

@dashstander dashstander linked a pull request Sep 25, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants