Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Qwen2-VL #2183

Open
4 tasks
junwenZhang opened this issue Sep 3, 2024 · 28 comments
Open
4 tasks

support Qwen2-VL #2183

junwenZhang opened this issue Sep 3, 2024 · 28 comments
Assignees
Labels
feature request New feature or request new model triaged Issue has been triaged by maintainers

Comments

@junwenZhang
Copy link

System Info

qwen2-vl added new features of M-ROPE, please support it

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

qwen2-vl open source model

Expected behavior

tensorrt-llm support

actual behavior

tensorrt-llm not support

additional notes

no

@junwenZhang junwenZhang added the bug Something isn't working label Sep 3, 2024
@sunnyqgg
Copy link
Collaborator

sunnyqgg commented Sep 4, 2024

Hi,
I'll do it.

@lfr-0531 lfr-0531 added triaged Issue has been triaged by maintainers feature request New feature or request new model and removed bug Something isn't working labels Sep 7, 2024
@scdotbox
Copy link

where to find PR files for qwen2-vl used by tensorrt-llm

@zhaocc1106
Copy link

Any updates?

@sunnyqgg
Copy link
Collaborator

Hi, the work is in progress, I'll update it ASAP.

@Chenhaolin6
Copy link

Any updates?

2 similar comments
@junwenZhang
Copy link
Author

Any updates?

@GuangyanZhang
Copy link

Any updates?

@chenqy4933
Copy link

@sunnyqgg Is there a clear timeline to complete the model? thanks.

@pianogGG
Copy link

Hi all,
This is supposed to merge into main in early November.

@fan-niu
Copy link

fan-niu commented Nov 11, 2024

@pianogGG @sunnyqgg Hi, is this update available? Or is there any branch we can use first? Thanks

@sunnyqgg
Copy link
Collaborator

sunnyqgg commented Nov 12, 2024

Hi, the code is under review and almost done, it'll be public soon.

@Hukongtao
Copy link

Hi, the code is under review and almost done, it'll be public soon.

Hi, is there any update yet?

@linccnu
Copy link

linccnu commented Nov 19, 2024

any updates?

@LugerW-A
Copy link

How is the progress?

@sunnyqgg
Copy link
Collaborator

It's supported, pls see examples/multimodal for more info.

@peki12345
Copy link

It's supported, pls see examples/multimodal for more info.

Hi, Qwen2-VL can run successfully, but compared to directly import transformers, there is no significant improvement in time consumption and GPU memory usage. Is this within expectations?

@LugerW-A
Copy link

It's supported, pls see examples/multimodal for more info.

Hi, Qwen2-VL can run successfully, but compared to directly import transformers, there is no significant improvement in time consumption and GPU memory usage. Is this within expectations?

I've encountered the same situation. For the Qwen2-VL 2B model, TRT_LLM is more than twice as slow as vllm.

@sunnyqgg
Copy link
Collaborator

Hi @LugerW-A

  • For the Qwen2-VL 2B model, TRT_LLM is more than twice as slow as vllm.
    I have noticed this issue and fixed it already, hope it'll be public next week.
    @peki12345 GPU memory usage===> for ViT part or LLM part?

@fan-niu
Copy link

fan-niu commented Nov 27, 2024

Hi @LugerW-A

  • For the Qwen2-VL 2B model, TRT_LLM is more than twice as slow as vllm.
    I have noticed this issue and fixed it already, hope it'll be public next week.
    @peki12345 GPU memory usage===> for ViT part or LLM part?

@sunnyqgg thanks for your contribution! @kaiyux Hi could you help to public this changes? thanks a lot

@sunnyqgg @kaiyux hi currently tensorrtllm_backend does not support the qwen2-vl model. Is there a solution for this? Or can you tell us how to add support to tensorrtllm_backend? Thanks !

@alimgl-pixel
Copy link

Any update in this week @sunnyqgg

@sunnyqgg
Copy link
Collaborator

Hi,
Updated and pls try latest main code.

Thanks.

@fan-niu
Copy link

fan-niu commented Dec 11, 2024

Hi, Updated and pls try latest main code.

Thanks.

@sunnyqgg you mean currently tensorrtllm_backend already supported the qwen2-vl model ? Is it tensorrtllm_backend version v0.15.0?

@sunnyqgg
Copy link
Collaborator

Hi @fan-niu
Unfortunately as for as I know, tensorrtllm_backend shuld have no support qwen2-vl, and I'm not sure if anyone is working on it.

Thanks.

@fan-niu
Copy link

fan-niu commented Dec 12, 2024

@sunnyqgg
Because tensorrtllm backend is too closed-source, do you have any suggestions for me to implement this feature on tensorrtllm backend? thanks!

@zhaocc1106
Copy link

zhaocc1106 commented Dec 13, 2024

I found reduced accuracy and output error when two pics with qwen2-vl-7B, but one pic is ok.
Also, i found the performance is lower than vllm.

message:

min_pixels = 4 * 28 * 28
max_pixels = 1024 * 1024 / 4
messages = [
                {
                    "role": "user",
                     "content": [
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/panda.jpg",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/cat.png",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {"type": "text", "text": "描述一下两张图片的不同。"},
                     ],
                }
]

hf output:

这两个图片的不同之处在于它们展示的动物种类不同。第一张图片展示的是一只小熊猫,而第二张图片展示的是一只猫。此外,这两张图片的背景也不同,第一张图片的背景是一棵树,而第二张图片的背景是一块灰色的水泥地。

trtllm output:

这两张图片看起来完全相同,都是一个红色的动物,头靠在木板上,背景是树干和树叶。

@zhaocc1106
Copy link

zhaocc1106 commented Dec 17, 2024

I found reduced accuracy and output error when two pics with qwen2-vl-7B, but one pic is ok. Also, i found the performance is lower than vllm.

message:

min_pixels = 4 * 28 * 28
max_pixels = 1024 * 1024 / 4
messages = [
                {
                    "role": "user",
                     "content": [
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/panda.jpg",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/cat.png",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {"type": "text", "text": "描述一下两张图片的不同。"},
                     ],
                }
]

hf output:

这两个图片的不同之处在于它们展示的动物种类不同。第一张图片展示的是一只小熊猫,而第二张图片展示的是一只猫。此外,这两张图片的背景也不同,第一张图片的背景是一棵树,而第二张图片的背景是一块灰色的水泥地。

trtllm output:

这两张图片看起来完全相同,都是一个红色的动物,头靠在木板上,背景是树干和树叶。

VisionAttentionOpt code can change as following. Then multi pics request will be ok.
https://github.com/NetEase-Media/grps_trtllm/blob/b7bde55c177314621311aed8bc060c6deb9a0ed5/tools/qwen2vl/build_vit_engine.py#L222 - 248

But at now, i found the performance is lower than vllm when multi concurrent request. That 1~2 concurency is better than vllm.

@sunnyqgg
Copy link
Collaborator

Hi, please use the latest code which is public today, and for multi-batch accuracy please change the the attention_mask_vit in tensorrt_llm/runtime/multimodal_model_runner.py,

           attention_mask_vit = torch.full([1, seq_length, seq_length],
                                           torch.finfo(torch.float16).min,
                                           device=image.device,
                                           dtype=image.dtype)
           for i in range(1, len(cu_seqlens)):
               attention_mask_vit[..., cu_seqlens[i - 1]:cu_seqlens[i],
                                  cu_seqlens[i - 1]:cu_seqlens[i]] = 0

please let me know if there have any other issues.

Thanks.

@zhaocc1106
Copy link

zhaocc1106 commented Dec 17, 2024

Hi, please use the latest code which is public today, and for multi-batch accuracy please change the the attention_mask_vit in tensorrt_llm/runtime/multimodal_model_runner.py,

           attention_mask_vit = torch.full([1, seq_length, seq_length],
                                           torch.finfo(torch.float16).min,
                                           device=image.device,
                                           dtype=image.dtype)
           for i in range(1, len(cu_seqlens)):
               attention_mask_vit[..., cu_seqlens[i - 1]:cu_seqlens[i],
                                  cu_seqlens[i - 1]:cu_seqlens[i]] = 0

please let me know if there have any other issues.

Thanks.

I think this is indeed the cause of the bug. By debuging, i found qwen2vl use VisionSdpaAttention default. I use it to instead of VisionAttention, and found can also fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request new model triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests