Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve qwen vl impl #2943

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Improve qwen vl impl #2943

wants to merge 5 commits into from

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Jan 22, 2025

This PR improves qwen2-vl in the following ways

  • re enable and improve test (remove max_tokens)
  • adds estimation logic for vision model flops and consume vision config in launcher
    • this improves the startup input token limit (avoids defaulting to 4096 for all vlms)
  • removes cuda graphs <=3 for qwen-vl in the launcher
  • add impl for RotaryPositionEmbeddingMultimodalSections in rotary.py
  • slice position ids on subsequent cuda graph warmup calls to avoid slicing in rotary (attempting to remove now)

@drbh
Copy link
Collaborator Author

drbh commented Jan 22, 2025

This PR improves the performance and response from qwen2 vl based models. Small reproducible examples can be run with the startup commands and script below

expected output

text-generation-launcher --model-id bytedance-research/UI-TARS-7B-DPO
{
    "generated_text": "The image depicts the Statue of Liberty, a renowned landmark located on Liberty Island in New York Bay."
}
{
    "generated_text": "The image features the logo of Flash Attention, a state-of-the-art attention mechanism designed for transformers,"
}
{
    "generated_text": "The image features a stylized illustration of a rabbit, rendered in a minimalist and abstract design. The"
}
text-generation-launcher --model-id Qwen/Qwen2-VL-2B-Instruct
{
    "generated_text": "The image depicts the iconic Statue of Liberty in New York City, with the city's skyline in the"
}
{
    "generated_text": "The image compares two different implementations of attention mechanisms in neural networks:\n\n### Standard Attention Implementation"
}
{
    "generated_text": "The image depicts a rabbit in an astronaut's suit standing on a rocky, red-brown planet with"
}
text-generation-launcher --model-id Qwen/Qwen2-VL-7B-Instruct --num-shard 2
{
    "generated_text": "The image depicts the iconic Statue of Liberty, a colossal neoclassical sculpture on Liberty Island in"
}
{
    "generated_text": "The image compares the standard attention implementation with Flash Attention in the context of memory and computation operations.\n\n###"
}
{
    "generated_text": "The image depicts an astronaut in a futuristic space suit standing on a rocky surface with a reddish-orange"
}

script for testing a couple images with the models

import requests
import json

url = "http://127.0.0.1:3000/generate"

headers = {"Content-Type": "application/json"}

image_urls = [
    "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/flash-attn.png",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/rabbit.png",
]

for image in image_urls:
    query = "Describe the image"

    payload = {
        "inputs": f"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n![]({image}){query}<|im_end|>\n<|im_start|>assistant\n",
        "parameters": {"max_new_tokens": 20},
    }

    response = requests.post(url, headers=headers, json=payload)

    # print the response
    print(json.dumps(response.json(), indent=4))

@@ -1248,7 +1248,7 @@ def get_model(
revision=revision,
quantize=quantize,
speculator=speculator,
dtype=dtype,
dtype=torch.bfloat16,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be passed through with default_dtype I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants