-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问fastmoe能被集成到VLLM里吗 #191
Comments
好问题. 目前没有这样的尝试. |
FastMoE 的单卡版本或多卡并行版本并不涉及对 kv-cache 进行变动. 理论上和 page attention 是正交关系. 可以一起用. |
好的谢谢,我也是这个思路,正常尝试进行集成
…---原始邮件---
发件人: "La Eako ***@***.***>
发送时间: 2024年2月4日(周日) 下午3:26
收件人: ***@***.***>;
抄送: ***@***.******@***.***>;
主题: Re: [laekov/fastmoe] 请问fastmoe能被集成到VLLM里吗 (Issue #191)
FastMoE 的单卡版本或多卡并行版本并不涉及对 kv-cache 进行变动. 理论上和 page attention 是正交关系. 可以一起用.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
定义MOE的时候,需要显式的self.moe().cuda()这样去把fastmoe layer放到GPU上吗 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: