-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SM75 (Turing) support for FP6 kernel #942
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
aef047a
SM75 support for FP6 kernel
tobiasvanderwerff d390179
More consistent argument ordering in benchmark function
tobiasvanderwerff 0a4d70e
Add a note about SM75 support in the floatx README
tobiasvanderwerff 650ba03
Handle FP6 + SM75 + N>=64 edge case
tobiasvanderwerff 56718f9
Document changes made for FP6 SM75 support
tobiasvanderwerff File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change intentional? usually we talk about matmuls as mkn i.e. m x k activation and k x n weight (odd to reverse them and i'm unsure if benchmarks previously were assuming the other ordering)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I indeed made a mistake. The benchmark results are still correct, it's just the list of shapes are different.
I took the benchmark shapes from here: https://github.com/usyd-fsalab/fp6_llm/blob/ce76774bcfc26b325c1b558abcf1935026d9abbc/tests/python/run.sh - It's a bit confusing since the author use different variable names...
The code generating the list of shapes (under
__name__ == "__main__"
) are correct (follow the author), and it callsbenchmark(m, n, k)
. If you think we should benchmark a different sets of shapes, it should be good too!In summary, this change corrects my previous mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's intentional. The function signature was
def benchmark(m: int, k: int, n: int):
, but arguments were passed as(m, n, k)
, so I thought that that was unnecessarily confusing and wanted to change the ordering in either the function call or the function signature. In the function itself, the shapes become m x k for the activation and n x k for the weight.I see one benchmark example (
benchmark_gpu_sparsity
, see below) where the ordering is m, k, n, so let me change the function signature back to that ordering for consistency.ao/benchmarks/benchmark_gpu_sparsity.py
Line 25 in 4b5b5ee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed your comment before I posted my own @gau-nernst. Thanks for clarifying! I actually noticed that
m
gets passed asn
to the actual kernel, which is slightly confusing. If you don't mind, I'll change this for consistency. I don't think it should affect the results, expect thatm
will be switched byn
in the performance table.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tobiasvanderwerff You mean
k
andn
right? Your current change looks correct. Yea it doesn't affect the results, it will only show results for different shapes instead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is slightly different @gau-nernst. I'm referring to the fact that the original authors do some odd switching of the shapes in
fp6_linear.cu
. The arguments that get passed are_in_feats
(activations of shape m x k) and_weights
(shape n x k), but then they unpack the shapes asM = _weights.size(0)
,K = _in_feats.shape(0)
,N = _in_feats.shape(1)
(see below).ao/torchao/csrc/cuda/fp6_llm/fp6_linear.cu
Lines 150 to 158 in 4b5b5ee
So even though we pass the arguments correctly to the benchmark function as m, k, n, the names get switched inside the kernel. Anyway, this is mainly confusing when debugging the kernel, but it might actually be fine to just leave it as is.