Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fast softmax only on prefill #1180

Conversation

wszczurekhabana
Copy link
Contributor

Upstream of: HabanaAI#244

Original description:

Currently running fast softmax on decode can cause perf degradation on some configs. Thus this PR turns it off for decode.

Results:

Model Batch Size Nodes Performance with Fast Softmax (Prefill Only) Performance without Fast Softmax
Llama 2-70B 31744/1024 Tokens 12 8 105.27 97.08
Llama 2-70B 24576/8192 Tokens 16 8 418.95 405.55
Llama 2-70B 16384/16384 Tokens 24 8 673.99 665.84
Llama 2-70B 4096/4096 Tokens 16 2 304.17 303.75
Llama 2-70B 4096/4096 Tokens 59 4 1149.75 1147.10

@wszczurekhabana
Copy link
Contributor Author

@libinta could you add 1.17_dependency label to this PR?

@libinta libinta added the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Aug 1, 2024
vidyasiv added a commit to emascarenhas/optimum-habana that referenced this pull request Aug 2, 2024
@yafshar
Copy link
Contributor

yafshar commented Aug 2, 2024

@wszczurekhabana please close this PR it is the same as #1159

@yafshar
Copy link
Contributor

yafshar commented Aug 2, 2024

@libinta #1159 should have the correct label

@wszczurekhabana
Copy link
Contributor Author

Thanks, did not saw #1159 , closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants