Added new cuda kernel for encoder forwards using three dimensional kernels #459

ChrisDryden · 2024-05-25T03:44:23Z

Spent the afternoon trying to understand how the multi dimensional cuda kernel instantiation works and came up with an example for the encoder forwards but I'm having the issue that for large block sizes its slower. Would love for someone with more understanding of how this works to take a look.

This is to try to apply the advice that @ngc92 gave for this topic: #406

I'm having a hard time intuitively understanding why it could be slower since its removes all of the modulo, and division operations

…rnels

ChrisDryden · 2024-05-27T21:33:16Z

Figured out that I needed to change the block size dynamically based off of the value of C and the current block size and it is now around .0020ms faster!

Was able to find this issue out by profiling and seeing that the bottleneck at the larger kernel sizes was caused by the fact that we were reaching the c < C and not continuing. By making this change I was able to see in the profiler that this condition check is no longer required for larger block sizes!

Added new cuda kernel for encoder forwards using three dimensional ke…

b5a80ed

…rnels

ChrisDryden mentioned this pull request May 25, 2024

2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx #406

Open

Modified the block size to not call extra kernels

62ee956

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new cuda kernel for encoder forwards using three dimensional kernels #459

Added new cuda kernel for encoder forwards using three dimensional kernels #459

ChrisDryden commented May 25, 2024 •

edited

Loading

ChrisDryden commented May 27, 2024 •

edited

Loading

Added new cuda kernel for encoder forwards using three dimensional kernels #459

Are you sure you want to change the base?

Added new cuda kernel for encoder forwards using three dimensional kernels #459

Conversation

ChrisDryden commented May 25, 2024 • edited Loading

ChrisDryden commented May 27, 2024 • edited Loading

ChrisDryden commented May 25, 2024 •

edited

Loading

ChrisDryden commented May 27, 2024 •

edited

Loading