-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several improvements related to KVCache #1870
Conversation
- Shrink buffers returned by KVCache to just cover input_pos entries - Refactor child classes of model.py classes to avoid copy and paste
I am interested in smart KV caches for memory-efficient inference. |
Hello @mseeger Thanks for this and the other PR 👍 A couple of notes:
|
Hi @Andrei-Aksionov, let me know what I should change. Indeed, I could split this into 3:
It is just that 1. is really needed for the other two. I understand this single file argument. But maybe there should be a balance? In the end, you inherit code from Another point: Why I am looking at LitGPT and not Hugging Face (a colleague pointed your project out to me) is because the structure is so much clearer. I can learn a lot about different models by immediately seeing what is common and what is different. In HF, this is next to impossible, because everything is copied for every new model. |
Let's do #2 and #3 in two separate PRs for now.
Here I agree. It's easier to see the difference between PEFT implementation and the base model, if we reuse unchanged parts of the code from the base model. I would recommend to focus on #2 and #3 first. Another point: Why I am looking at LitGPT and not Hugging Face (a colleague pointed your project out to me) is because the structure is so much clearer. I can learn a lot about different models by immediately seeing what is common and what is different. In HF, this is next to impossible, because everything is copied for every new model. So pleasant to hear it. Thanks 😊 |
Note that depending on the attention implementation and its handling of masking this may or may not have the impact you have in mind. |
@t-vi I am subselecting the mask at the same time. But do you have anything else in mind? |
OK, I'll close this one and submit a new one, where I drop the refactoring of constructors of |
Implements what is asked for in #1867.
KVCache
buffers are only as large asconfig.n_query_groups
KVCache
to just coverinput_pos
entriesmodel.py
classes to avoid copy and pasteAlso adding comments, docstrings here and there.