support Qwen2-VL #2183

junwenZhang · 2024-09-03T09:05:00Z

System Info

qwen2-vl added new features of M-ROPE, please support it

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

qwen2-vl open source model

Expected behavior

tensorrt-llm support

actual behavior

tensorrt-llm not support

additional notes

no

The text was updated successfully, but these errors were encountered:

sunnyqgg · 2024-09-04T06:26:32Z

Hi,
I'll do it.

scdotbox · 2024-09-19T08:58:14Z

where to find PR files for qwen2-vl used by tensorrt-llm

zhaocc1106 · 2024-09-24T03:22:23Z

Any updates?

sunnyqgg · 2024-09-24T05:07:05Z

Hi, the work is in progress, I'll update it ASAP.

Chenhaolin6 · 2024-10-08T10:16:23Z

Any updates?

junwenZhang · 2024-10-21T09:56:20Z

Any updates?

GuangyanZhang · 2024-10-22T06:29:42Z

Any updates?

chenqy4933 · 2024-10-24T01:39:13Z

@sunnyqgg Is there a clear timeline to complete the model? thanks.

pianogGG · 2024-10-25T02:49:55Z

Hi all，
This is supposed to merge into main in early November.

fan-niu · 2024-11-11T01:54:56Z

@pianogGG @sunnyqgg Hi, is this update available? Or is there any branch we can use first? Thanks

sunnyqgg · 2024-11-12T03:38:28Z

Hi, the code is under review and almost done, it'll be public soon.

Hukongtao · 2024-11-18T06:56:48Z

Hi, the code is under review and almost done, it'll be public soon.

Hi, is there any update yet?

linccnu · 2024-11-19T06:39:20Z

any updates?

LugerW-A · 2024-11-19T09:19:18Z

How is the progress?

sunnyqgg · 2024-11-20T02:24:00Z

It's supported, pls see examples/multimodal for more info.

peki12345 · 2024-11-22T00:48:20Z

It's supported, pls see examples/multimodal for more info.

Hi, Qwen2-VL can run successfully, but compared to directly import transformers, there is no significant improvement in time consumption and GPU memory usage. Is this within expectations?

LugerW-A · 2024-11-26T03:20:35Z

It's supported, pls see examples/multimodal for more info.

Hi, Qwen2-VL can run successfully, but compared to directly import transformers, there is no significant improvement in time consumption and GPU memory usage. Is this within expectations?

I've encountered the same situation. For the Qwen2-VL 2B model, TRT_LLM is more than twice as slow as vllm.

sunnyqgg · 2024-11-26T10:13:35Z

Hi @LugerW-A

For the Qwen2-VL 2B model, TRT_LLM is more than twice as slow as vllm.
I have noticed this issue and fixed it already, hope it'll be public next week.
@peki12345 GPU memory usage===> for ViT part or LLM part?

fan-niu · 2024-11-27T02:55:58Z

Hi @LugerW-A

For the Qwen2-VL 2B model, TRT_LLM is more than twice as slow as vllm.
I have noticed this issue and fixed it already, hope it'll be public next week.
@peki12345 GPU memory usage===> for ViT part or LLM part?

@sunnyqgg thanks for your contribution! @kaiyux Hi could you help to public this changes? thanks a lot

@sunnyqgg @kaiyux hi currently tensorrtllm_backend does not support the qwen2-vl model. Is there a solution for this? Or can you tell us how to add support to tensorrtllm_backend? Thanks !

alimgl-pixel · 2024-12-04T07:09:38Z

Any update in this week @sunnyqgg

sunnyqgg · 2024-12-11T09:03:56Z

Hi,
Updated and pls try latest main code.

Thanks.

fan-niu · 2024-12-11T09:18:54Z

Hi, Updated and pls try latest main code.

Thanks.

@sunnyqgg you mean currently tensorrtllm_backend already supported the qwen2-vl model ? Is it tensorrtllm_backend version v0.15.0?

sunnyqgg · 2024-12-12T09:09:40Z

Hi @fan-niu ，
Unfortunately as for as I know， tensorrtllm_backend shuld have no support qwen2-vl, and I'm not sure if anyone is working on it.

Thanks.

fan-niu · 2024-12-12T09:13:42Z

@sunnyqgg
Because tensorrtllm backend is too closed-source, do you have any suggestions for me to implement this feature on tensorrtllm backend? thanks！

zhaocc1106 · 2024-12-13T10:32:03Z

I found reduced accuracy and output error when two pics with qwen2-vl-7B, but one pic is ok.
Also, i found the performance is lower than vllm.

message:

min_pixels = 4 * 28 * 28
max_pixels = 1024 * 1024 / 4
messages = [
                {
                    "role": "user",
                     "content": [
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/panda.jpg",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/cat.png",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {"type": "text", "text": "描述一下两张图片的不同。"},
                     ],
                }
]

hf output:

这两个图片的不同之处在于它们展示的动物种类不同。第一张图片展示的是一只小熊猫，而第二张图片展示的是一只猫。此外，这两张图片的背景也不同，第一张图片的背景是一棵树，而第二张图片的背景是一块灰色的水泥地。

trtllm output:

这两张图片看起来完全相同，都是一个红色的动物，头靠在木板上，背景是树干和树叶。

zhaocc1106 · 2024-12-17T12:43:35Z

I found reduced accuracy and output error when two pics with qwen2-vl-7B, but one pic is ok. Also, i found the performance is lower than vllm.

message:

min_pixels = 4 * 28 * 28
max_pixels = 1024 * 1024 / 4
messages = [
                {
                    "role": "user",
                     "content": [
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/panda.jpg",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {
                             "type": "image",
                             "image": "file:///tmp/tmp.FD4KtwMJkZ/data/cat.png",
                             "min_pixels": min_pixels,
                             "max_pixels": max_pixels,
                         },
                         {"type": "text", "text": "描述一下两张图片的不同。"},
                     ],
                }
]

hf output:

这两个图片的不同之处在于它们展示的动物种类不同。第一张图片展示的是一只小熊猫，而第二张图片展示的是一只猫。此外，这两张图片的背景也不同，第一张图片的背景是一棵树，而第二张图片的背景是一块灰色的水泥地。

trtllm output:

这两张图片看起来完全相同，都是一个红色的动物，头靠在木板上，背景是树干和树叶。

VisionAttentionOpt code can change as following. Then multi pics request will be ok.
https://github.com/NetEase-Media/grps_trtllm/blob/b7bde55c177314621311aed8bc060c6deb9a0ed5/tools/qwen2vl/build_vit_engine.py#L222 - 248

But at now, i found the performance is lower than vllm when multi concurrent request. That 1~2 concurency is better than vllm.

sunnyqgg · 2024-12-17T14:22:02Z

Hi, please use the latest code which is public today, and for multi-batch accuracy please change the the attention_mask_vit in tensorrt_llm/runtime/multimodal_model_runner.py,

           attention_mask_vit = torch.full([1, seq_length, seq_length],
                                           torch.finfo(torch.float16).min,
                                           device=image.device,
                                           dtype=image.dtype)
           for i in range(1, len(cu_seqlens)):
               attention_mask_vit[..., cu_seqlens[i - 1]:cu_seqlens[i],
                                  cu_seqlens[i - 1]:cu_seqlens[i]] = 0

please let me know if there have any other issues.

Thanks.

zhaocc1106 · 2024-12-17T15:15:32Z

Hi, please use the latest code which is public today, and for multi-batch accuracy please change the the attention_mask_vit in tensorrt_llm/runtime/multimodal_model_runner.py,
           attention_mask_vit = torch.full([1, seq_length, seq_length],
                                           torch.finfo(torch.float16).min,
                                           device=image.device,
                                           dtype=image.dtype)
           for i in range(1, len(cu_seqlens)):
               attention_mask_vit[..., cu_seqlens[i - 1]:cu_seqlens[i],
                                  cu_seqlens[i - 1]:cu_seqlens[i]] = 0
please let me know if there have any other issues.

Thanks.

I think this is indeed the cause of the bug. By debuging, i found qwen2vl use VisionSdpaAttention default. I use it to instead of VisionAttention, and found can also fix it.

junwenZhang added the bug Something isn't working label Sep 3, 2024

lfr-0531 assigned sunnyqgg, ncomly-nvidia and AdamzNV Sep 7, 2024

lfr-0531 added triaged Issue has been triaged by maintainers feature request New feature or request new model and removed bug Something isn't working labels Sep 7, 2024

zhaocc1106 mentioned this issue Sep 24, 2024

Qwen2-VL Model support NetEase-Media/grps_trtllm#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support Qwen2-VL #2183

support Qwen2-VL #2183

junwenZhang commented Sep 3, 2024

sunnyqgg commented Sep 4, 2024

scdotbox commented Sep 19, 2024

zhaocc1106 commented Sep 24, 2024

sunnyqgg commented Sep 24, 2024

Chenhaolin6 commented Oct 8, 2024

junwenZhang commented Oct 21, 2024

GuangyanZhang commented Oct 22, 2024

chenqy4933 commented Oct 24, 2024

pianogGG commented Oct 25, 2024

fan-niu commented Nov 11, 2024

sunnyqgg commented Nov 12, 2024 •

edited

Loading

Hukongtao commented Nov 18, 2024

linccnu commented Nov 19, 2024

LugerW-A commented Nov 19, 2024

sunnyqgg commented Nov 20, 2024

peki12345 commented Nov 22, 2024

LugerW-A commented Nov 26, 2024

sunnyqgg commented Nov 26, 2024

fan-niu commented Nov 27, 2024 •

edited

Loading

alimgl-pixel commented Dec 4, 2024

sunnyqgg commented Dec 11, 2024

fan-niu commented Dec 11, 2024 •

edited

Loading

sunnyqgg commented Dec 12, 2024

fan-niu commented Dec 12, 2024

zhaocc1106 commented Dec 13, 2024 •

edited

Loading

zhaocc1106 commented Dec 17, 2024 •

edited

Loading

sunnyqgg commented Dec 17, 2024

zhaocc1106 commented Dec 17, 2024 •

edited

Loading

support Qwen2-VL #2183

support Qwen2-VL #2183

Comments

junwenZhang commented Sep 3, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

sunnyqgg commented Sep 4, 2024

scdotbox commented Sep 19, 2024

zhaocc1106 commented Sep 24, 2024

sunnyqgg commented Sep 24, 2024

Chenhaolin6 commented Oct 8, 2024

junwenZhang commented Oct 21, 2024

GuangyanZhang commented Oct 22, 2024

chenqy4933 commented Oct 24, 2024

pianogGG commented Oct 25, 2024

fan-niu commented Nov 11, 2024

sunnyqgg commented Nov 12, 2024 • edited Loading

Hukongtao commented Nov 18, 2024

linccnu commented Nov 19, 2024

LugerW-A commented Nov 19, 2024

sunnyqgg commented Nov 20, 2024

peki12345 commented Nov 22, 2024

LugerW-A commented Nov 26, 2024

sunnyqgg commented Nov 26, 2024

fan-niu commented Nov 27, 2024 • edited Loading

alimgl-pixel commented Dec 4, 2024

sunnyqgg commented Dec 11, 2024

fan-niu commented Dec 11, 2024 • edited Loading

sunnyqgg commented Dec 12, 2024

fan-niu commented Dec 12, 2024

zhaocc1106 commented Dec 13, 2024 • edited Loading

zhaocc1106 commented Dec 17, 2024 • edited Loading

sunnyqgg commented Dec 17, 2024

zhaocc1106 commented Dec 17, 2024 • edited Loading

sunnyqgg commented Nov 12, 2024 •

edited

Loading

fan-niu commented Nov 27, 2024 •

edited

Loading

fan-niu commented Dec 11, 2024 •

edited

Loading

zhaocc1106 commented Dec 13, 2024 •

edited

Loading

zhaocc1106 commented Dec 17, 2024 •

edited

Loading

zhaocc1106 commented Dec 17, 2024 •

edited

Loading