Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for smem_epilogue when mma output is not cast to half #3620

Merged
merged 10 commits into from
Dec 25, 2024

Conversation

protonu
Copy link
Collaborator

@protonu protonu commented Dec 19, 2024

Support non-stmatrix stores from regs to shared memory and then TMA when the output of mma op is not cast back to half precision - stmatrix works with half precision only.

@protonu
Copy link
Collaborator Author

protonu commented Dec 19, 2024

!test

@protonu protonu marked this pull request as ready for review December 19, 2024 15:32
csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
tests/cpp/test_matmul_scheduler.cpp Outdated Show resolved Hide resolved
@protonu protonu requested a review from jacobhinkle December 19, 2024 15:58
@protonu
Copy link
Collaborator Author

protonu commented Dec 19, 2024

!test

1 similar comment
@protonu
Copy link
Collaborator Author

protonu commented Dec 19, 2024

!test

@protonu protonu requested a review from jacobhinkle December 19, 2024 17:40
Copy link
Collaborator

@rdspring1 rdspring1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor Follow-Up Proposal:
It looks like scheduleEpilogue could be broken easily into two functions for readability instead of a single monolithic function.

void HopperMultipleMatmulScheduler::scheduleEpilogueWithVectorization() {}
void HopperMultipleMatmulScheduler::scheduleSmemEpilogue() {}

void HopperMultipleMatmulScheduler::scheduleEpilogue() {
  if (!params_->use_smem_epilogue) {
    scheduleEpilogueWithVectorization();
  } else {
    // Use stmatrix (optional) and tma store
    scheduleSmemEpilogue();
  }
}

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@protonu
Copy link
Collaborator Author

protonu commented Dec 20, 2024

!test

4 similar comments
@protonu
Copy link
Collaborator Author

protonu commented Dec 20, 2024

!test

@protonu
Copy link
Collaborator Author

protonu commented Dec 21, 2024

!test

@protonu
Copy link
Collaborator Author

protonu commented Dec 23, 2024

!test

@protonu
Copy link
Collaborator Author

protonu commented Dec 23, 2024

!test

@protonu protonu force-pushed the pbasu_smem_epi_no_stmatrix branch from a3f138a to 494aea7 Compare December 23, 2024 21:02
@protonu
Copy link
Collaborator Author

protonu commented Dec 23, 2024

!test

@protonu
Copy link
Collaborator Author

protonu commented Dec 24, 2024

!test

@protonu protonu merged commit ee63c98 into main Dec 25, 2024
48 checks passed
@protonu protonu deleted the pbasu_smem_epi_no_stmatrix branch December 25, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants