Skip to content

Commit

Permalink
Revamp llama.cpp docs (#1214)
Browse files Browse the repository at this point in the history
* Revamp llama.cpp docs

* format

* update readme

* update index page

* update readme

* bertter fomratting

* Update README.md

Co-authored-by: Victor Muštar <[email protected]>

* Update README.md

Co-authored-by: Victor Muštar <[email protected]>

* fix hashlink

* document llama hf args

* format

---------

Co-authored-by: Victor Muštar <[email protected]>
  • Loading branch information
Mishig and gary149 authored May 29, 2024
1 parent ec61483 commit d5e51eb
Show file tree
Hide file tree
Showing 3 changed files with 177 additions and 33 deletions.
93 changes: 80 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,79 @@ load_balancing_strategy: random

A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the [HuggingChat app on hf.co/chat](https://huggingface.co/chat).

0. [No Setup Deploy](#no-setup-deploy)
1. [Setup](#setup)
2. [Launch](#launch)
3. [Web Search](#web-search)
4. [Text Embedding Models](#text-embedding-models)
5. [Extra parameters](#extra-parameters)
6. [Common issues](#common-issues)
7. [Deploying to a HF Space](#deploying-to-a-hf-space)
8. [Building](#building)
0. [Quickstart](#quickstart)
1. [No Setup Deploy](#no-setup-deploy)
2. [Setup](#setup)
3. [Launch](#launch)
4. [Web Search](#web-search)
5. [Text Embedding Models](#text-embedding-models)
6. [Extra parameters](#extra-parameters)
7. [Common issues](#common-issues)
8. [Deploying to a HF Space](#deploying-to-a-hf-space)
9. [Building](#building)

## Quickstart

You can quickly start a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 1 (Start llama.cpp server):**

```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 2 (tell chat-ui to use local llama.cpp server):**

Add the following to your `.env.local`:

```ini
MODELS=`[
{
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
"preprompt": "",
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
"temperature": 0.7,
"max_new_tokens": 1024,
"truncate": 3071
},
"endpoints": [{
"type" : "llamacpp",
"baseURL": "http://localhost:8080"
}],
},
]`
```

Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 3 (make sure you have MongoDb running locally):**

```bash
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
```

Read more [here](#database).

**Step 4 (start chat-ui):**

```bash
git clone https://github.com/huggingface/chat-ui
cd chat-ui
npm install
npm run dev -- --open
```

Read more [here](#launch).

<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>

## No Setup Deploy

Expand Down Expand Up @@ -415,11 +479,14 @@ MODELS=`[{

chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.

If you want to run chat-ui with llama.cpp, you can do the following, using Zephyr as an example model:
If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:

1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
3. Add the following to your `.env.local`:
```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

```env
MODELS=`[
Expand Down
51 changes: 31 additions & 20 deletions docs/source/configuration/models/providers/llamacpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,43 @@

Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.

If you want to run Chat UI with llama.cpp, you can do the following, using Zephyr as an example model:
If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:

1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
3. Add the following to your `.env.local`:
```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

_note: you can swap the `hf-repo` and `hf-file` with your fav GGUF on the [Hub](https://huggingface.co/models?library=gguf). For example: `--hf-repo TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF` for [this repo](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) & `--hf-file tinyllama-1.1b-chat-v1.0.Q4_0.gguf` for [this file](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blob/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf)._

A local LLaMA.cpp HTTP Server will start on `http://localhost:8080` (to change the port or any other default options, please find [LLaMA.cpp HTTP Server readme](https://github.com/ggerganov/llama.cpp/tree/master/examples/server)).

Add the following to your `.env.local`:

```ini
MODELS=`[
{
"name": "Local Zephyr",
"chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
"preprompt": "",
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 2048,
"stop": ["</s>"]
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
"temperature": 0.7,
"max_new_tokens": 1024,
"truncate": 3071
},
"endpoints": [
{
"url": "http://127.0.0.1:8080",
"type": "llamacpp"
}
]
}
"endpoints": [{
"type" : "llamacpp",
"baseURL": "http://localhost:8080"
}],
},
]`
```

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
</div>
66 changes: 66 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,69 @@ Open source chat interface with support for tools, web search, multimodal and ma
🐙 **Multimodal**: Accepts image file uploads on supported providers

👤 **OpenID**: Optionally setup OpenID for user authentication

## Quickstart Locally

You can quickly have a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 1 (Start llama.cpp server):**

```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 2 (tell chat-ui to use local llama.cpp server):**

Add the following to your `.env.local`:

```ini
MODELS=`[
{
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
"preprompt": "",
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
"temperature": 0.7,
"max_new_tokens": 1024,
"truncate": 3071
},
"endpoints": [{
"type" : "llamacpp",
"baseURL": "http://localhost:8080"
}],
},
]`
```

Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 3 (make sure you have MongoDb running locally):**

```bash
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
```

Read more [here](https://github.com/huggingface/chat-ui?tab=Readme-ov-file#database).

**Step 4 (start chat-ui):**

```bash
git clone https://github.com/huggingface/chat-ui
cd chat-ui
npm install
npm run dev -- --open
```

read more [here](https://github.com/huggingface/chat-ui?tab=readme-ov-file#launch).

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
</div>

0 comments on commit d5e51eb

Please sign in to comment.