New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

论文中的evaluate结果，推理时用的attention是shifted sparse attention？还是full attention？ #170

Open

zhangxiann opened this issue Jan 19, 2024 · 0 comments

zhangxiann commented Jan 19, 2024 •

edited

Loading

作者您好！

论文中说到：在finetune 时用shifted sparse attention进行训练，在推理的时候可以用full attention。

所以在推理的时候，既可以用shifted sparse attention，也可以用full attention。

想问一下：论文中的实验结果，推理时用的attention是shifted sparse attention？还是full attention？在推理时使用两种不同attention，效果会有差别吗？如果效果有差别的话，差别有多大？论文中貌似没提到这一点）

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment