Serge is a chat interface crafted with llama.cpp for running GGUF models. No API keys, entirely self-hosted!
- 🌐 SvelteKit frontend
- 💾 Redis for storing chat history & parameters
- ⚙️ FastAPI + LangChain for the API, wrapping calls to llama.cpp using the python bindings
🎥 Demo:
demo.webm
🐳 Docker:
docker run -d \
--name serge \
-v weights:/usr/src/app/weights \
-v datadb:/data/db/ \
-p 8008:8008 \
ghcr.io/serge-chat/serge:latest
🐙 Docker Compose:
services:
serge:
image: ghcr.io/serge-chat/serge:latest
container_name: serge
restart: unless-stopped
ports:
- 8008:8008
volumes:
- weights:/usr/src/app/weights
- datadb:/data/db/
volumes:
weights:
datadb:
Then, just visit http://localhost:8008, You can find the API documentation at http://localhost:8008/api/docs
Ensure you have Docker Desktop installed, WSL2 configured, and enough free RAM to run models.
Instructions for setting up Serge on Kubernetes can be found in the wiki.
Category | Models |
---|---|
Alfred | 40B |
CodeLLaMA | 7B, 13B |
Falcon | 7B, 7B-Instruct, 40B, 40B-Instruct |
LLaMA 2 | 7B, 13B, 70B |
Med42 | 70B |
Meditron | 7B, 70B |
Mistral | 7B, 7B-Instruct, 7B-OpenOrca |
Neural-Chat | 7B-v3.2 |
Notus | 7B-v1 |
OpenChat | 7B-v3.5 |
OpenLLaMA | 3B-v2, 7B-v2, 13B-v2 |
Orca 2 | 7B, 13B |
PsyMedRP | 13B-v1, 20B-v1 |
Starling LM | 7B-Alpha |
Vicuna | 7B-v1.5, 13B-v1.5 |
Zephyr | 7B-Alpha, 7B-Beta |
Additional weights can be added to the serge_weights
volume using docker cp
:
docker cp ./my_weight.bin serge:/usr/src/app/weights/
LLaMA will crash if you don't have enough available memory for the model:
Need help? Join our Discord
Nathan Sarrazin and Contributors. Serge
is free and open-source software licensed under the MIT License and Apache-2.0.
If you discover a bug or have a feature idea, feel free to open an issue or PR.
To run Serge in development mode:
git clone https://github.com/serge-chat/serge.git
cd serge/
docker compose -f docker-compose.dev.yml up -d --build