This repository has been archived by the owner on Jun 24, 2024. It is now read-only.
Multi-GPU support for inferencing #371
Labels
issue:enhancement
New feature or request
topic:clblast
https://github.com/CNugteren/CLBlast support
topic:cublas
https://developer.nvidia.com/cublas support
With #325, we now have GPU acceleration. However, this is limited to one GPU at present. We'll need to mimic the functionality from llama.cpp in order to distribute tensors between models as appropriate.
The text was updated successfully, but these errors were encountered: