Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding "Sink Tokens" #65

Open
clarenceluo78 opened this issue Nov 9, 2023 · 0 comments
Open

Questions Regarding "Sink Tokens" #65

clarenceluo78 opened this issue Nov 9, 2023 · 0 comments

Comments

@clarenceluo78
Copy link

clarenceluo78 commented Nov 9, 2023

Hi! Thank you for you interesting paper and its implementation! I have a few questions I hope you can clarify:

  1. When employing the pre-trained model with a "sink token," is this token also prepended to the input during inference? If so, could you explain why Figure 7 presents visualizations with identical token lengths between two models? If not, is the added trainable "sink token" identitcal or functionally equivalent to each model's bos token (e.g. <s>) ensuring compatibility between inference and the training corpus?
  2. The ablation study on the number of initial tokens suggests that incorporating just one initial token still yields reasonable results(?) for most models, except perhaps for the llama2. Considering this, if four initial tokens are optimal, have you experimented with training models using four additional "sink tokens" to align with this assumption?

Btw my own research also touches on the role of initial tokens in LLMs and I find your findings to be quite complementary to my experiment results. I would be delighted to discuss more on this if you are interested, and good luck with your iclr result :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant