Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Implement RNN support #25755

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Ruturaj4
Copy link
Collaborator

@Ruturaj4 Ruturaj4 commented Jan 7, 2025

Created from: ROCm#171

@Ruturaj4
Copy link
Collaborator Author

Ruturaj4 commented Jan 7, 2025

@dfm and @superbobry could you please take a look?

@github-actions github-actions bot force-pushed the ci_rnn_final-upstream branch from 0b07837 to 36d037e Compare January 7, 2025 19:08
Copy link
Collaborator

@superbobry superbobry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfm want to have a look as well?

tests/experimental_rnn_test.py Outdated Show resolved Hide resolved
tests/experimental_rnn_test.py Outdated Show resolved Hide resolved
@google-ml-butler google-ml-butler bot added kokoro:force-run pull ready Ready for copybara import and testing labels Jan 8, 2025
Copy link
Collaborator

@dfm dfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall - thanks! My main high level comment is that it would be useful to move as much of the #ifdef JAX_GPU_HIP logic into vendor.h rather than in rnn_kernels.cc directly. It's ok to have some, but the more we can move, the better. Can you look into redefining some of the macros in vendor.h to consolidate the logic there?

Comment on lines 465 to 466
mlir.register_lowering(rnn_fwd_p, gpu_rnn.cudnn_rnn_fwd_lowering, platform='cuda')
mlir.register_lowering(rnn_fwd_p, gpu_rnn.miopen_rnn_fwd_lowering, platform='rocm')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since gpu_rnn is in jaxlib, these changes will cause problems with version skew. JAX always needs to work with the most recent stable release of jaxlib. Perhaps you could protect this using hasattr(gpu_rnn, "miopen_rnn_fwd_lowering")?

Comment on lines 510 to 511
mlir.register_lowering(
rnn_bwd_p, gpu_rnn.miopen_rnn_bwd_lowering, platform='rocm')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this needs to be protected against old version of jaxlib.

@Ruturaj4 Ruturaj4 force-pushed the ci_rnn_final-upstream branch from 36d037e to 4f1f486 Compare January 10, 2025 00:46
@Ruturaj4 Ruturaj4 force-pushed the ci_rnn_final-upstream branch from 4f1f486 to 1b871e2 Compare January 10, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull ready Ready for copybara import and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants