-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve qwen vl impl #2943
Draft
drbh
wants to merge
5
commits into
main
Choose a base branch
from
improve-qwen-vl-impl
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Improve qwen vl impl #2943
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR improves the performance and response from qwen2 vl based models. Small reproducible examples can be run with the startup commands and script below expected outputtext-generation-launcher --model-id bytedance-research/UI-TARS-7B-DPO {
"generated_text": "The image depicts the Statue of Liberty, a renowned landmark located on Liberty Island in New York Bay."
}
{
"generated_text": "The image features the logo of Flash Attention, a state-of-the-art attention mechanism designed for transformers,"
}
{
"generated_text": "The image features a stylized illustration of a rabbit, rendered in a minimalist and abstract design. The"
} text-generation-launcher --model-id Qwen/Qwen2-VL-2B-Instruct {
"generated_text": "The image depicts the iconic Statue of Liberty in New York City, with the city's skyline in the"
}
{
"generated_text": "The image compares two different implementations of attention mechanisms in neural networks:\n\n### Standard Attention Implementation"
}
{
"generated_text": "The image depicts a rabbit in an astronaut's suit standing on a rocky, red-brown planet with"
} text-generation-launcher --model-id Qwen/Qwen2-VL-7B-Instruct --num-shard 2 {
"generated_text": "The image depicts the iconic Statue of Liberty, a colossal neoclassical sculpture on Liberty Island in"
}
{
"generated_text": "The image compares the standard attention implementation with Flash Attention in the context of memory and computation operations.\n\n###"
}
{
"generated_text": "The image depicts an astronaut in a futuristic space suit standing on a rocky surface with a reddish-orange"
} script for testing a couple images with the models import requests
import json
url = "http://127.0.0.1:3000/generate"
headers = {"Content-Type": "application/json"}
image_urls = [
"https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg",
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/flash-attn.png",
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/rabbit.png",
]
for image in image_urls:
query = "Describe the image"
payload = {
"inputs": f"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n![]({image}){query}<|im_end|>\n<|im_start|>assistant\n",
"parameters": {"max_new_tokens": 20},
}
response = requests.post(url, headers=headers, json=payload)
# print the response
print(json.dumps(response.json(), indent=4)) |
danieldk
reviewed
Jan 23, 2025
@@ -1248,7 +1248,7 @@ def get_model( | |||
revision=revision, | |||
quantize=quantize, | |||
speculator=speculator, | |||
dtype=dtype, | |||
dtype=torch.bfloat16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be passed through with default_dtype
I think?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR improves qwen2-vl in the following ways
max_tokens
)RotaryPositionEmbeddingMultimodalSections
inrotary.py