A Docker Compose to run a local ChatGPT-like application using Ollama, Open WebUI & Mistral-7B-v0.1.
Simply run:
docker compose up
The ollama-models-pull service will trigger an API call to Ollama to pull the mistral model (~4GB) and shutdown when it's done. You should see the progress in the logs of that service which should end with:
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"removing any unused layers"}
{"status":"success"}
To verify the list of downloaded models, you can call Ollama on
http://localhost:11434/api/tags
.
The models are stored in a volume to avoid downloading them at each restart of Ollama.
Once the model downloaded, you can go to http://localhost. By default, the port number mapped to the host is 80
, but you can change it by editing the docker-compose.yml
file). Next, sign up to create an account (everything is local) and log in. On the top of the page, look for the Select a model
dropdown menu and select mistral:latest
. After selecting it, click on the Set as default
link to avoid having to select it again each time you create a new discussion.
By default, Ollama is set to use 1 NVIDIA GPU:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [ gpu ]
If you want to run on CPU, you can comment the lines showed above in the docker-compose.yml
file and then run docker compose up
.
If you want to run on an NVIDIA GPU, make sure that your Docker daemon configuration file contains the following:
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
You can also add
"default-runtime": "nvidia"
.
Also, you should have installed the NVIDIA CUDA Toolkit. To verify that Docker can access your GPU, you can run:
docker run --runtime nvidia --rm nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
Note
Make sure that the version of the nvidia/cuda
image is aligned with the CUDA version installed using the toolkit.