-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MI100 Support #24
Comments
Hi @LoggerHead22, this code appears to be a logic fault, thanks for noting. We haven't tested the FA on MI100 since we did most of our testing on MI250&MI300 so we are limiting the support archs. I am not sure whether it will work correctly on MI100 but you can try by adding gfx908 to the valid archs. I suppose the building process will be fine. |
Thanks for the clarification @howiejayz . Your advice really helped, the code is compiled for mi100 and runs. However, I encountered an error during the build, which is caused by the logic of the patch.
This seems logical, because a dict is being created here and then we try to take its hipified_path attribute. Replacing dict with an object of the HipifyResult class in patch helped me. |
Has this patch been merged to the main branch or do we need to apply it in order to test? |
I need mi100 support |
If you need hardware for testing mi100, I volunteer my server for this purpose. I have 8x mi100 with infinity fabric. |
Hi @sabreshao @howiejayz can you please give me a path forward? I have a bunch of mi100s and I would like them to be hot. Without flash attention, I am blocked. Maybe you could show me where in the code I would add it? give me some advice? |
Hi @ehartford! Currently I have no time to test FA on MI100 but could you try build and run based on this comment? |
I was able to compile flash attention for the MI100 using the docker image. Simply adding gfx908 to the target arch array (or in my case, removing everything BUT native and gfx908) makes it run fine. (Note: this also applies to the vLLM ROCm docker image, which was my use case) Attempts to compile outside of docker seem to fail on ROCm 6.0 due to this issue, though I was unable to downgrade back to 5.7 to test on my machine. |
I managed to build MI100 (gfx908) as well but the env var didn't work @TNT3530 . This is because the setup is protected against unknown architectures and |
Here's my PR, you folks might benefit from it: #38 |
How do I install flash attention for mi100? How is the procedure from the README.md different? |
@ehartford passing the card arch to the build should be enough: |
Also curious if support for Mi100 was finalized. |
This is awesome! Can't wait to try it! |
just realized Mi100 support was removed |
@jayz0123 was that intentional |
I can confirm that when this is patched away again to allow mi100 to build the package, the latest main builds and works fine on gfx908 at least for the dimensions i tried. So this restriction seams pretty silly, and its quite puzzling why mi100 was removed from the array again given it still works fine. |
Then - someone doesn't want it to work on mi100 |
Could you please make a PR that enables mi100 so I can test it? |
pytest test_flash_attn_ck.py warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) test_flash_attn_ck.py ................................................................................................................................................................................ [ 0%] Thread 0x00007f15117fd640 (most recent call first): Thread 0x00007f1511ffe640 (most recent call first): Thread 0x00007f15127ff640 (most recent call first): Thread 0x00007f15187ff640 (most recent call first): Thread 0x00007f1c7f115000 (most recent call first): Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 24) |
I can build it on MI100 ,but faied on FA test |
Hi, the documentation says that this implementation is compatible only with the MI200 and MI300 GPUs. But what about the MI100 gpu?
The code contains such conditions that formally match the MI100 with the gfx908 architecture.
Will this code be compatible with MI100 in practice? If not, are there any plans to add such support? Or what are the reasons that keep you from adding support for the MI100?
The text was updated successfully, but these errors were encountered: