If you want to integrate more backends into llmaz, please refer to this PR. It's always welcomed.
llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
SGLang is yet another fast serving framework for large language models and vision language models.
text-generation-inference is a Rust, Python and gRPC server for text generation inference. Used in production at Hugging Face to power Hugging Chat, the Inference API and Inference Endpoint.
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs