device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar #2729

SokolAnn · 2024-11-06T23:03:37Z

System Info

Version:
text-generation-launcher 2.4.0

Environment:

Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: 0a655a0ab5db15f08e45d8c535e263044b944190
Docker label: sha-0a655a0

Hardware: 4 x A100

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          Off |   00000000:17:00.0 Off |                    0 |
| N/A   42C    P0             69W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off |   00000000:65:00.0 Off |                    0 |
| N/A   43C    P0             71W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100 80GB PCIe          Off |   00000000:CA:00.0 Off |                    0 |
| N/A   35C    P0             61W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100 80GB PCIe          Off |   00000000:E3:00.0 Off |                    0 |
| N/A   34C    P0             64W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Deployment specificities: I am using Apptainer instead of Docker. I don't think it's responsible, since some inference queries work correctly.

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Create a SIF image of the suggested version of TGI:
apptainer pull hf_tgi.sif docker://"ghcr.io/huggingface/text-generation-inference:2.4.0"

Run meta-llama/Llama-3.2-11B-Vision-Instruct model:
apptainer run --nv --env "HF_TOKEN=$$SECRET$$" --bind ./models:/data:rw hf_tgi.sif --model-id "meta-llama/Llama-3.2-11B-Vision-Instruct" --port 27685 --revision "cee5b78e6faed15d5f2e6d8a654fd5b247c0d5ca"

The model will download, and the web server will spin up.

After this, try to use curl to call the model with a grammar:

curl localhost:27685/generate     -X POST     -H 'Content-Type: application/json'     -d '{
    "inputs": "I saw a puppy a cat and a raccoon during my bike ride in the park",
    "parameters": {
        "repetition_penalty": 1.3,
        "grammar": {
            "type": "json",
            "value": {
                "properties": {
                    "location": {
                        "type": "string"
                    },
                    "activity": {
                        "type": "string"
                    },
                    "animals_seen": {
                        "type": "integer",
                        "minimum": 1,
                        "maximum": 5
                    },
                    "animals": {
                        "type": "array",
                        "items": {
                            "type": "string"
                        }
                    }
                },
                "required": ["location", "activity", "animals_seen", "animals"]
            }
        }
    }
}'

TGI will then fail with a bunch of device-side assert errors and exit, cURL will return {"error":"Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n","error_type":"generation"}

Please note that normal inferences work both via curl and via OpenAI-compatible API with the same model on the same machine, so the problem is somehow related to "grammar". Using tools via the OpenAI-compatible API leads to the same exact error.

Expected behavior

The model should return a JSON output as in the example provided in the documentation.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar #2729

device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar #2729

SokolAnn commented Nov 6, 2024

device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar #2729

device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar #2729

Comments

SokolAnn commented Nov 6, 2024

System Info

Information

Tasks

Reproduction

Expected behavior