Getting error with LoRA adapters for Qwen 2.5 0.5B Instruct #733

iddogino · 2025-01-09T00:14:25Z

System Info

Docker Command:

docker run --gpus all --shm-size 1g -p 80:80 -d -v /root/data:/data -e HUGGING_FACE_HUB_TOKEN='hf_###' -e MODEL_ID='${model_name}' -e TRUST_REMOTE_CODE='true' ghcr.io/predibase/lorax:main

Hardware:
AWS g6.xlarge

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      On  |   00000000:31:00.0 Off |                    0 |
| N/A   35C    P0             26W /   72W |   17449MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      4284      C   /opt/conda/bin/python3.10                   17440MiB |
+-----------------------------------------------------------------------------------------+

OS:
Amazon Linux:

NAME="Amazon Linux"
VERSION="2023"
ID="amzn"
ID_LIKE="fedora"
VERSION_ID="2023"
PLATFORM_ID="platform:al2023"
PRETTY_NAME="Amazon Linux 2023.6.20241111"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023"
HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/"
DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/"
SUPPORT_URL="https://aws.amazon.com/premiumsupport/"
BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023"
VENDOR_NAME="AWS"
VENDOR_URL="https://aws.amazon.com/"
SUPPORT_END="2028-03-15"

Model Used:

{
  "model_id": "Qwen/Qwen2.5-0.5B-Instruct",
  "model_sha": "7ae557604adf67be50417f59c2c2f167def9a775",
  "model_dtype": "torch.bfloat16",
  "model_device_type": "cuda",
  "model_pipeline_tag": "text-generation",
  "max_concurrent_requests": 128,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_length": 4095,
  "max_total_tokens": 4096,
  "waiting_served_ratio": 1.2,
  "max_batch_total_tokens": 1327744,
  "max_waiting_tokens": 20,
  "validation_workers": 2,
  "eager_prefill": false,
  "version": "0.1.0",
  "sha": null,
  "docker_label": null,
  "request_logger_url": null
}

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Running the model directly (no adapters) works fine. However, when I use any adapter (I have my own + tried several public adapters found on HuggingFace, listed below), I get the following error:

Request failed during generation: Server error: No suitable kernel. h_in=896 h_out=64 dtype=BFloat16

When I use streaming, I can actually see that the first token gets generated, and then it fails:

data: {"id":"null","object":"chat.completion.chunk","created":0,"model":"null","choices":[{"index":0,"delta":{"role":"assistant","content":"An"},"finish_reason":null}]}

data: {"error":"Request failed during generation: Server error: No suitable kernel. h_in=896 h_out=64 dtype=BFloat16","error_type":"generation"}

Some adapters I tried:

Expected behavior

Expecting model to properly generate with LoRA adapter...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting error with LoRA adapters for Qwen 2.5 0.5B Instruct #733

Getting error with LoRA adapters for Qwen 2.5 0.5B Instruct #733

iddogino commented Jan 9, 2025

Getting error with LoRA adapters for Qwen 2.5 0.5B Instruct #733

Getting error with LoRA adapters for Qwen 2.5 0.5B Instruct #733

Comments

iddogino commented Jan 9, 2025

System Info

Information

Tasks

Reproduction

Expected behavior