We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glm-4v使用vllm推理。
目前想尝试服务器部署,使用多客户端对glm4v的api server发起请求,但是用Transformers推理返回结果太慢。看到glm4-chat用vllm推理的速度快了不少,因此希望4v也能支持vllm推理。
The text was updated successfully, but these errors were encountered:
同问,sss
Sorry, something went wrong.
在最近的PR中已经将GLM-4v适配了vllm=0.6.2,相信很快就会合并。您可以部署最新版的vllm,并且参考PR 585readme中的示例进行推理
@sixsixcoder 想问下支持使用glm 4v 9b int4进行推理嘛?
暂不支持
sixsixcoder
No branches or pull requests
Feature request / 功能建议
glm-4v使用vllm推理。
Motivation / 动机
目前想尝试服务器部署,使用多客户端对glm4v的api server发起请求,但是用Transformers推理返回结果太慢。看到glm4-chat用vllm推理的速度快了不少,因此希望4v也能支持vllm推理。
Your contribution / 您的贡献
The text was updated successfully, but these errors were encountered: