Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak dlrm #131

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Tweak dlrm #131

wants to merge 1 commit into from

Conversation

Delaunay
Copy link
Collaborator

No description provided.

@Delaunay
Copy link
Collaborator Author

Delaunay commented May 29, 2023

DLRM hangs after 3-4 Batches which is what causes the GPU load to be so small.
Is it waiting on data transfert ?

K80
BASE    dlrm   1    0   39788.43   39788.43   1.4%   0.5%        1561
28f8b06 dlrm   1    0   19107.94   19107.94   5.2%   1.7%        1585
5a9ecee dlrm   1    0   19700.98   19700.98   5.9%   1.9%        1585


5a9ecee2b627
MI250 x4 dlrm    0 1        7057  513430.13  513430.13   0.9%   7.0%
MI250 x8 dlrm    0 1        5750  468302.98  468302.98   0.9%   6.6%

@Delaunay
Copy link
Collaborator Author

Need to check on a A100 to confirm if this improves things or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant