We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者您好!
论文中说到:在finetune 时用shifted sparse attention进行训练,在推理的时候可以用full attention。
在finetune 时用shifted sparse attention进行训练,在推理的时候可以用full attention。
所以在推理的时候,既可以用shifted sparse attention,也可以用full attention。
想问一下:论文中的实验结果,推理时用的attention是shifted sparse attention?还是full attention?在推理时使用两种不同attention,效果会有差别吗?如果效果有差别的话,差别有多大?论文中貌似没提到这一点)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
作者您好!
论文中说到:
在finetune 时用shifted sparse attention进行训练,在推理的时候可以用full attention。
所以在推理的时候,既可以用shifted sparse attention,也可以用full attention。
想问一下:论文中的实验结果,推理时用的attention是shifted sparse attention?还是full attention?在推理时使用两种不同attention,效果会有差别吗?如果效果有差别的话,差别有多大?论文中貌似没提到这一点)
The text was updated successfully, but these errors were encountered: