Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results for Section 3.2 Rolling KV Cache (Without Pretraining) #61

Open
timljj opened this issue Nov 1, 2023 · 1 comment
Open

Results for Section 3.2 Rolling KV Cache (Without Pretraining) #61

timljj opened this issue Nov 1, 2023 · 1 comment

Comments

@timljj
Copy link

timljj commented Nov 1, 2023

Hi,

Do you have any experiment results for attention sink for the non pre-training case? From what I read, all the results shown in the paper are from pretraining with attention sinks.

Additionally, did you ever test smaller cache sizes like 128? If I understood correctly, the model should not break with smaller cache sizes?

@timljj timljj changed the title Results Section 3.2 (Without Pretraining) Results for Section 3.2 (Without Pretraining) Nov 1, 2023
@timljj timljj changed the title Results for Section 3.2 (Without Pretraining) Results for Section 3.2 Rolling KV Cache (Without Pretraining) Nov 1, 2023
@Guangxuan-Xiao
Copy link
Collaborator

We did not pre-train LLMs in most experiments. Only section 4.2 includes pre-training experiments. You can use StreamingLLM with off-the-shelf Llama models, just like our demo.

Guangxuan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants