Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Dockerfile with necessary changes from NeMo for H100 self attention accuracy. #550

Closed
wants to merge 3 commits into from

Conversation

jstjohn
Copy link
Collaborator

@jstjohn jstjohn commented Dec 19, 2024

  • Noticed poor accuracy on H100 relative to the same reference model involving flash attention.
  • Was able to replicate accuracy on H100 after making these changes to the dockerfile.

…ve good H100 numerical accuracy in flash attention with TE
@jstjohn
Copy link
Collaborator Author

jstjohn commented Dec 19, 2024

/build-ci

@jstjohn jstjohn requested a review from cspades December 19, 2024 01:19
Dockerfile Outdated Show resolved Hide resolved
Signed-off-by: John St. John <[email protected]>
pstjohn added a commit that referenced this pull request Dec 23, 2024
Cherry picks some additional commits into #550 to see if these also fix
CI for 24.10

---------

Signed-off-by: John St. John <[email protected]>
Co-authored-by: Farhad Ramezanghorbani <[email protected]>
Co-authored-by: John St John <[email protected]>
Co-authored-by: John St. John <[email protected]>
@pstjohn
Copy link
Collaborator

pstjohn commented Jan 1, 2025

Closing since it looks like these edits were all included in the other dockerfile updates

@pstjohn pstjohn closed this Jan 1, 2025
nvdreidenbach pushed a commit to nvdreidenbach/bionemo-framework that referenced this pull request Jan 3, 2025
Cherry picks some additional commits into NVIDIA#550 to see if these also fix
CI for 24.10

---------

Signed-off-by: John St. John <[email protected]>
Co-authored-by: Farhad Ramezanghorbani <[email protected]>
Co-authored-by: John St John <[email protected]>
Co-authored-by: John St. John <[email protected]>
Signed-off-by: Danny <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants