Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

论文中的evaluate结果,推理时用的attention是shifted sparse attention?还是full attention? #170

Open
zhangxiann opened this issue Jan 19, 2024 · 0 comments

Comments

@zhangxiann
Copy link

zhangxiann commented Jan 19, 2024

作者您好!

论文中说到:在finetune 时用shifted sparse attention进行训练,在推理的时候可以用full attention。

所以在推理的时候,既可以用shifted sparse attention,也可以用full attention。

想问一下:论文中的实验结果,推理时用的attention是shifted sparse attention?还是full attention?在推理时使用两种不同attention,效果会有差别吗?如果效果有差别的话,差别有多大?论文中貌似没提到这一点)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant