Backend | Model | Features |
Acceleration |
Chronicler |
---|---|---|---|---|
pytorch | llama_orig | - adepter support - visual input (multimodal adapter) |
CUDA MPS |
instruct |
pytorch | llama_hf | - lora support |
CUDA | instruct |
pytorch | gpt-2, gpt-j, auto-model | CUDA | instruct |
|
llama.cpp remote-lcpp |
any llama-based |
- quantized GGML model support - lora support - built-in GPU acceleration - memory manager support - visual input (currently only in llama.cpp server) |
CPU CUDA Metal |
instruct |
mlc-pb | only brebuilt in mlc-chat |
- quantized MLC model support |
CUDA Vulkan Metal |
raw |
remote_ob | any supported by oobabooga webui and kobold.cpp |
- all features of Oobabooga webui, including GPTQ support that are available via API - all features of Kobold.cpp that are available via API - memory manager support |
+ |
instruct |