An text-generation-webui Docker image for RX 7000 series #2327

evshiron · 2023-05-25T06:14:18Z

evshiron
May 25, 2023

ROCm 5.5.0 needs to be installed beforehand, read this if you haven't done it.

export TGW_MODELS_DIR=./models
docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 -v "$TGW_MODELS_DIR:/root/text-generation-webui/models" --name rocm5.5-rocm5.5-text-gen-webui ghcr.io/evshiron/rocm_lab:rocm5.5-text-gen-webui

Transformers inference with 13.6GB VRAM:

INFO:Loading 7B-hf...
INFO:Loaded the model in 6.34 seconds.
Output generated in 9.37 seconds (21.24 tokens/s, 199 tokens, context 27, seed 1005554483)

8-bit BitsAndBytes inference with 7.9GB VRAM:

INFO:Loading 7B-hf...
INFO:Loaded the model in 6.65 seconds.
Output generated in 40.70 seconds (4.89 tokens/s, 199 tokens, context 6, seed 603994963)

4-bit GPTQ inference with 4.5GB VRAM:

INFO:Loading Wizard-Vicuna-7B-Uncensored-GPTQ...
INFO:Found the following quantized model: models/Wizard-Vicuna-7B-Uncensored-GPTQ/Wizard-Vicuna-7B-Uncensored-GPTQ-4bit-128g.no-act-order.safetensors
INFO:Loaded the model in 2.10 seconds.
Output generated in 17.17 seconds (17.42 tokens/s, 299 tokens, context 52, seed 722332956)

8-bit BitsAndBytes has low performance at the moment, so GPTQ is recommended.

4-bit BitsAndBytes inference and GPTQ quantization doesn't work for now, llama.cpp is not guaranteed to work either.

The Dockerfile can be found here. For more info, please visit https://github.com/evshiron/rocm_lab.

References:

https://rentry.org/eq3hg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An text-generation-webui Docker image for RX 7000 series #2327

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

An text-generation-webui Docker image for RX 7000 series #2327

evshiron May 25, 2023

Replies: 0 comments

evshiron
May 25, 2023