Skip to content

Commit

Permalink
Update vllama docs for uqff
Browse files Browse the repository at this point in the history
  • Loading branch information
EricLBuehler committed Sep 30, 2024
1 parent e449543 commit ce02618
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 3 deletions.
3 changes: 2 additions & 1 deletion docs/UQFF.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,4 +174,5 @@ Have you created a UQFF model on Hugging Face? If so, please [create an issue](h

| Name | Base model | UQFF model |
| -- | -- | -- |
| Phi 3.5 Mini Instruct | microsoft/Phi-3.5-mini-instruct | EricB/Phi-3.5-mini-instruct-UQFF |
| Phi 3.5 Mini Instruct | microsoft/Phi-3.5-mini-instruct | [EricB/Phi-3.5-mini-instruct-UQFF](EricB/Phi-3.5-mini-instruct-UQFF) |
| Llama 3.2 Vision | meta-llama/Llama-3.2-11B-Vision-Instruct | [EricB/Llama-3.2-11B-Vision-Instruct-UQFF](https://huggingface.co/EricB/Llama-3.2-11B-Vision-Instruct-UQFF) |
19 changes: 17 additions & 2 deletions docs/VLLAMA.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Mistral.rs supports the Llama 3.2 vision model, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.

UQFF quantizations will be released shortly.
UQFF quantizations are also available.

The Python and HTTP APIs support sending images as:
- URL
Expand All @@ -13,11 +13,14 @@ The Rust API takes an image from the [image](https://docs.rs/image/latest/image/

> Note: Some examples use the [Cephalo Llama 3.2 model](lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k), a member of the [Cephalo](https://huggingface.co/collections/lamm-mit/cephalo-664f3342267c4890d2f46b33) model collection. This model is finetune of Llama 3.2 with enhanced capabilities in scientific images. To use the base Llama 3.2 Vision model, simply use the [associated model ID](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct).
> Note: When using device mapping or model topology, only the text model and its layers will be managed. This is because it contains most of the model parameters. *The text model has 40 layers*.
## ToC
- [Interactive mode](#interactive-mode)
- [HTTP server](#http-server)
- [Rust API](#rust)
- [Python API](#python)
- [UQFF models](#uqff-models)

## Interactive mode

Expand Down Expand Up @@ -247,4 +250,16 @@ print(res.usage)
```

- You can find an example of encoding the [image via base64 here](../examples/python/phi3v_base64.py).
- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).
- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).

## UQFF models
[UQFF](UQFF.md) is a quantized file format similar to GGUF based on ISQ. It removes the memory and compute requirements that come with ISQ by providing ready-made quantizations! The key advantage over GGUF is the flexibility to store multiple quantizations in one file.

We provide UQFF files ([EricB/Llama-3.2-11B-Vision-Instruct-UQFF](https://huggingface.co/EricB/Llama-3.2-11B-Vision-Instruct-UQFF)) for this Llama 3.2 Vision model.

You can use these UQFF files to easily use quantized versions of Llama 3.2 Vision.

For example:
```
./mistralrs-server -i vision-plain -m meta-llama/Llama-3.2-11B-Vision-Instruct -a vllama --from-uqff EricB/Llama-3.2-11B-Vision-Instruct-UQFF/llama-3.2-11b-vision-q4k.uqff
```

0 comments on commit ce02618

Please sign in to comment.