-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use out-of-place trmm when available #978
Conversation
0695032
to
1e46a36
Compare
Neglect 4.X and require >= 5. |
Please run the tests with the CI image to be sure that everything works correctly. |
cscs-ci run |
1 similar comment
cscs-ci run |
b12b618
to
5fbaa0d
Compare
cscs-ci run |
The tests pass on hohgant. I've yet to verify that performance is unchanged. |
5fbaa0d
to
fe46988
Compare
There seems to be no measurable impact from this change. |
cscs-ci run |
cscs-ci run |
@msimberg This PR is based on a very old master which makes the CI fails. |
cscs-ci run |
I'll try to summarize the various stages of trmm evolution since it's a bit complicated to follow in the code:
rocblas_Xtrmm
rocblas_Xtrmm_outofplace
ROCBLAS_V3
definition that should enable the three-parameter version, but which only seems to have an effect in documentation...Given that
trmm_outofplace
was deprecated in 5.6.Xtrmm_outofplace
I would probably for the moment actually not put an upper bound on when to switch back from
trmm_outofplace
totrmm
. It's a guessing game I suppose until they actually release 5.7.0 or 6.0.0 but it seems safer to assume thattrmm_outofplace
is going to actually be removed in the next major release rather than one minor release after it was deprecated.I have not tested this with rocblas 4.X (I'm not even sure if it works without these changes, if I remember correctly CMake support was badly broken before 5.something) but in theory it should work. I don't how much we care about that?
I also have not run large-scale runs with this change, so I don't know if it affects performance (smaller tests say no).
I've also put a version cap on when the workaround for certain
Op
s not being supported by rocblas is applied.