Ideas for models & also distributed inference over LAN. #6

ghchris2021 · 2024-07-27T13:30:13Z

ghchris2021
Jul 27, 2024

Congratulations on the great FOSS project, thank you very much, I look forward to see what becomes of this project!

Per. the request for ideas & aspirations for features / model support I'll share my own thoughts.

In terms of facilitating running larger models in general my primary wish lists from an inference system are:

A: Support distributed inference using any mix of combinations of GPU,+VRAM / CPU+RAM resources across an IP LAN using multiple linux PCs to share the available GPU, CPU, RAM resources effectively when dealing with models using RAM more than the 16-24 GB VRAM size of a typical GPN.

B: Support heterogeneous GPUs -- nvidia, intel arc, amd RDNA in any combination alone, together, distributed.

For model support my main desires from inference are (in no particular order):

LLama-3.1; Deepseek-coder-v2; Deepseek-chat-most recent; mistral-large; codestral; mixtral-8x22b; gemma-2-27b; codegemma; qwen2; codeqwen.

james0zan · 2024-07-27T14:28:29Z

james0zan
Jul 27, 2024
Maintainer

Thank you for the great suggestions!

Merely supporting heterogeneous GPUs would not be a problem for KTransformers because it is based on transformers/torch. It may not be as efficient as the highly optimized Marlin CUDA kernel, but it can still benefit from CPU offloading.

We are also interested in implementing an Exo-like multi-machine operator.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for models & also distributed inference over LAN. #6

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Ideas for models & also distributed inference over LAN. #6

ghchris2021 Jul 27, 2024

Replies: 1 comment

james0zan Jul 27, 2024 Maintainer

ghchris2021
Jul 27, 2024

james0zan
Jul 27, 2024
Maintainer