-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable H100 in CMake (with A100 parameters) #804
Conversation
for more information, see https://pre-commit.ci
LGTM aka best effort. |
( Let's pretend B100/200 has limited audience ;-) |
Sorry to repeat myself (see #656), with these changes we are ruining the entire idea of autotuning... Then @dev-zero and @mkrack did for A100, with about the same number of kernels. @mtaillefumier did for Mi100, but with much less kernels (412 kernels), and I did repeat the same kernels optimization for Mi250. Now, unless there is something which prevents us on running the autotuning on H100 (or any new generation), I would not consider the option to re-use previous kernels. Fine if you are doing in CP2K (just like @hfp did, see cp2k/cp2k#3368), but this is not an option for DBCSR. My current idea is to introduce a "General" kernel and show in the output a (*) for the kernels that they are not using it. Still, people should use autotuning and contribute to the kernels to get the "best" performance. Of course, there is also the possibility of dropping the entire autotuning and keep the A100 kernels for any new GPU generation (including the default kernel). Do we have any measure that said that A100 are good enough for H100? |
The main issue with tuning for H100 is the following. Running the auto-tuning framework based on the A100 is trivial: it works, and we have done so as part of testing CP2K on Alps. The problem arises with the ML predicting framework, where I was not able to finish the procedure. As a result, we either have a handful of H100 tuned kernels, or a much more complete set of A100 (tuned + predicted) kernels. I like the latter option better. |
Prediction code is rapidly aging and we have this issue hanging since quite some time. |
You don't need the ML prediction, that is a "fast" solution. Personally, I have never tried it. Note that @mkrack did not use it for A100 kernels. |
I totally agree with this, and that's why I marked is a
As mentioned at PASC24, we tried to run auto-tuning for GH200 at CSCS but it is not clear to us who is actually responsible of contributing the kernels and checking that everything is in order. I was under the impression that you were about to get access to H100. If I recall correctly our discussion at PASC24, you mentioned the following:
Therefore, I was under the impression than the whole auto-tuning pipeline needs attention, and that's why I opened this PR as a temporary workaround (it might still be beneficial to target the proper architecture in the meantime).
This is more in line with what we discussed, if I recall correctly, which is why I opened this PR. However, at the moment we don't have numbers comparing directly the two parameter sets. BTW, Lines 85 to 88 in 0f47720
|
let's turn this PR in an issue, I'm open to discussion. For the record, what I said at PASC is that autotuning is old and need refresh (2018 was the last major refresh by Shoshana + some minor updates by me and @dev-zero ), but this is what we have and we are supposed to use it (or drop it), I didn't propose any workaround. Lines 85 to 88 in 0f47720
The entire machinery is on users to provide optimized kernels, as described in the documentation. Likely I need to add "user" to the documentation to make it clear, good point. |
That's not correct, I used the ML prediction to create the 71048 predicted A100 kernels in addition to the 3043 autotuned ones. The scripts required some fixes at that time (summer 2023), but worked on JURECA like for the P100. From my experience, I can comment the following:
|
Thanks @mkrack , this is a very nice feedback and thanks for the clarification (it turns out I did a grep for "predicted" in the wrong file! You are definitely right). OK, then I think we are going to the conclusion that we can drop the ML predict part and likely the autotuning at all (we will keep it to add new kernels). I think @RMeli and @abussy went the same conclusion. Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version. |
Yes, apologies for the confusion. The workaround was my interpretation, also based on what it is done for CP2K and what I saw in the repository here (out of context). |
Thank you everyone for the input. Let's move the discussion to #805. |
Tested with Spack.