forked from openucx/ucc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
EC/CUDA: optimize reduce with unrolling (openucx#657)
* EC/CUDA: optimize reduction with unrolling * EC/CUDA: various opt on reduce kernel * EC/CUDA: various opt on reduce strided kernel * EC/CUDA: constant tuning and cleanup * EC/CUDA: fix reduce_strided for large nbr of srcs * CODESTYLE: clang-tidy cleanup * EC/CUDA: add configurable nbr of threads * EC/CUDA: fix error with new nvidia linter compiler * EC/CUDA: fix minor revisions
- Loading branch information
1 parent
34adc11
commit e6d3919
Showing
5 changed files
with
270 additions
and
136 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.