-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a one workgroup argmax benchmark #49
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
@kuhar @antiagainst Could you please review? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good modulo copyright headers. For code that is primarily based on some existing benchmarks, we use dual copyright (the original author + the new one). You can see an example here: https://github.com/google/uVkCompute/blob/main/benchmarks/vmt/vmt_main.cc.
#extension GL_KHR_shader_subgroup_arithmetic : enable | ||
#extension GL_KHR_shader_subgroup_ballot : enable | ||
|
||
layout(local_size_x = 16, local_size_y = 1, local_size_z = 1) in; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is smaller than the native subgroup size for desktop GPU. How does this perform if we increase this to 32
or 64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see any performance improvement
e0a8a84
to
6604dbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR is based on #47. I opened a new one because the old one got stale.