-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INTERNVL2 如何将图像部分和语言模块一起联合推理? #2
Comments
看下README.md,关于InternVL2相关文档: |
是这样的,我分别加速了vit模型和LLM模型,但是LLM模型中存在一个input_embedde 图像输入,请问作者是怎么解决这个问题的? |
trtllm支持ptuning prompt table的输入,例如c++代码:https://github.com/NetEase-Media/grps_trtllm/blob/b473c258b85e598a852871f40ba86138d597e830/src/utils.cc#L604C8-L604C23 。python也有对应的api输入,可以看下链接:https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal 另外如果只是参数做了微调的话可以用这个工程跑下试试。 |
我看了 Intervl2-4B的源码输入,他第一步input_ids 是空 ,但是hidden_states 是一个三维张量,模型会生成下一个input_idx ,我看了 你写的phi3model,没有第一不是inpot_idx是空的情况啊 |
没太理解你说的什么意思,实现原理问题要不邮件联系我,[email protected],回你微信联系方式吧。 |
No description provided.
The text was updated successfully, but these errors were encountered: