Improve qwen vl impl #2943

drbh · 2025-01-22T16:50:01Z

This PR improves qwen2-vl in the following ways

re enable and improve test (remove max_tokens)
adds estimation logic for vision model flops and consume vision config in launcher
- this improves the startup input token limit (avoids defaulting to 4096 for all vlms)
removes cuda graphs <=3 for qwen-vl in the launcher
add impl for RotaryPositionEmbeddingMultimodalSections in rotary.py
slice position ids on subsequent cuda graph warmup calls to avoid slicing in rotary (attempting to remove now)

drbh · 2025-01-22T21:11:59Z

This PR improves the performance and response from qwen2 vl based models. Small reproducible examples can be run with the startup commands and script below

expected output

text-generation-launcher --model-id bytedance-research/UI-TARS-7B-DPO

{
    "generated_text": "The image depicts the Statue of Liberty, a renowned landmark located on Liberty Island in New York Bay."
}
{
    "generated_text": "The image features the logo of Flash Attention, a state-of-the-art attention mechanism designed for transformers,"
}
{
    "generated_text": "The image features a stylized illustration of a rabbit, rendered in a minimalist and abstract design. The"
}

text-generation-launcher --model-id Qwen/Qwen2-VL-2B-Instruct

{
    "generated_text": "The image depicts the iconic Statue of Liberty in New York City, with the city's skyline in the"
}
{
    "generated_text": "The image compares two different implementations of attention mechanisms in neural networks:\n\n### Standard Attention Implementation"
}
{
    "generated_text": "The image depicts a rabbit in an astronaut's suit standing on a rocky, red-brown planet with"
}

text-generation-launcher --model-id Qwen/Qwen2-VL-7B-Instruct --num-shard 2

{
    "generated_text": "The image depicts the iconic Statue of Liberty, a colossal neoclassical sculpture on Liberty Island in"
}
{
    "generated_text": "The image compares the standard attention implementation with Flash Attention in the context of memory and computation operations.\n\n###"
}
{
    "generated_text": "The image depicts an astronaut in a futuristic space suit standing on a rocky surface with a reddish-orange"
}

script for testing a couple images with the models

import requests
import json

url = "http://127.0.0.1:3000/generate"

headers = {"Content-Type": "application/json"}

image_urls = [
    "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/flash-attn.png",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/rabbit.png",
]

for image in image_urls:
    query = "Describe the image"

    payload = {
        "inputs": f"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n![]({image}){query}<|im_end|>\n<|im_start|>assistant\n",
        "parameters": {"max_new_tokens": 20},
    }

    response = requests.post(url, headers=headers, json=payload)

    # print the response
    print(json.dumps(response.json(), indent=4))

danieldk · 2025-01-23T09:34:52Z

server/text_generation_server/models/__init__.py

@@ -1248,7 +1248,7 @@ def get_model(
            revision=revision,
            quantize=quantize,
            speculator=speculator,
-            dtype=dtype,
+            dtype=torch.bfloat16,


Should be passed through with default_dtype I think?

drbh added 5 commits January 21, 2025 22:31

feat: refactor model, improve startup and re enable tests

09ff966

fix: improve multimodal rotary embed caching

6ca2c60

fix: limit vision flop calc to qwen2 vl models and update config typing

9523944

fix: include clippy lint

0657ed7

feat: refactor position ids in warmup and bump tests

bd56cae

danieldk reviewed Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve qwen vl impl #2943

Improve qwen vl impl #2943

drbh commented Jan 22, 2025

drbh commented Jan 22, 2025

danieldk Jan 23, 2025

Improve qwen vl impl #2943

Are you sure you want to change the base?

Improve qwen vl impl #2943

Conversation

drbh commented Jan 22, 2025

drbh commented Jan 22, 2025

expected output

danieldk Jan 23, 2025

Choose a reason for hiding this comment