We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
第87行,88行,92行,似乎是先把up和gate拼起来,再在后面补上empty,这样的话后面分割的时候不就乱了吗? 整体流程相当于[5504,2048]->[1376,2048],把ffn切成4份 然后把gate和up连在一块,[1376,2048]->[2752,2048] 然后补上初始化的空张量,[2752,2048]->[2816,2048] 这样维度上是对的,但问题在于内部变成[(1376;1376;64),2048],这样后面做计算的时候,up的一部分会算在gate里,而up补了两份空张量。 佬,不知道上面的推导对不对,是有什么我看漏的地方吗
The text was updated successfully, but these errors were encountered:
No branches or pull requests
第87行,88行,92行,似乎是先把up和gate拼起来,再在后面补上empty,这样的话后面分割的时候不就乱了吗?
整体流程相当于[5504,2048]->[1376,2048],把ffn切成4份
然后把gate和up连在一块,[1376,2048]->[2752,2048]
然后补上初始化的空张量,[2752,2048]->[2816,2048]
这样维度上是对的,但问题在于内部变成[(1376;1376;64),2048],这样后面做计算的时候,up的一部分会算在gate里,而up补了两份空张量。
佬,不知道上面的推导对不对,是有什么我看漏的地方吗
The text was updated successfully, but these errors were encountered: