Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several improvements related to KVCache #1870

Closed
wants to merge 2 commits into from

Conversation

mseeger
Copy link
Contributor

@mseeger mseeger commented Dec 12, 2024

Implements what is asked for in #1867.

  • Ensure that KVCache buffers are only as large as config.n_query_groups
  • Shrink buffers returned by KVCache to just cover input_pos entries
  • Refactor child classes of model.py classes to avoid copy and paste

Also adding comments, docstrings here and there.

- Shrink buffers returned by KVCache to just cover input_pos entries
- Refactor child classes of model.py classes to avoid copy and paste
@mseeger
Copy link
Contributor Author

mseeger commented Dec 12, 2024

I am interested in smart KV caches for memory-efficient inference.

@Andrei-Aksionov
Copy link
Collaborator

Hello @mseeger

Thanks for this and the other PR 👍

A couple of notes:

  1. It's better to create multiple smaller PRs instead of a large one that covers multiple topics. In this case reviewing will be done much faster.
  2. Refactor child classes of model.py classes to avoid copy and paste. It's not a bug, but a feature :) Yes, code duplication makes it a bit more tedious to add changes, but also it's considered to be easier to read the code when everything (or the majority of the code) is in one file.
    It's even in the main readme:

Developer friendly - Easy debugging with no abstraction layers and single file implementations.

@mseeger
Copy link
Contributor Author

mseeger commented Dec 12, 2024

Hi @Andrei-Aksionov,

let me know what I should change. Indeed, I could split this into 3:

  1. Refactor child classes ...
  2. KV cache buffers
  3. Shrink buffers

It is just that 1. is really needed for the other two.

I understand this single file argument. But maybe there should be a balance? In the end, you inherit code from model.py in the other modules adapter.py, ..., lots of it, just the constructors are not properly done. I just introduced some simple factory functions. Maybe I did too much OOP, but for me, the new state is simpler, since I immediately see where the differences are going from the base case to adapter, etc.

Another point: Why I am looking at LitGPT and not Hugging Face (a colleague pointed your project out to me) is because the structure is so much clearer. I can learn a lot about different models by immediately seeing what is common and what is different. In HF, this is next to impossible, because everything is copied for every new model.

@Andrei-Aksionov
Copy link
Collaborator

let me know what I should change. Indeed, I could split this into 3:

1. Refactor child classes ...
2. KV cache buffers
3. Shrink buffers

Let's do #2 and #3 in two separate PRs for now.


... But maybe there should be a balance? ...
... since I immediately see where the differences are going from the base case to adapter, etc.

Here I agree. It's easier to see the difference between PEFT implementation and the base model, if we reuse unchanged parts of the code from the base model.
I'll think about it over the weekend.
Most likely we can do it in "baby steps", without touching the base model.py at first.

I would recommend to focus on #2 and #3 first.


Another point: Why I am looking at LitGPT and not Hugging Face (a colleague pointed your project out to me) is because the structure is so much clearer. I can learn a lot about different models by immediately seeing what is common and what is different. In HF, this is next to impossible, because everything is copied for every new model.

So pleasant to hear it. Thanks 😊

@t-vi
Copy link
Contributor

t-vi commented Dec 19, 2024

Shrink buffers returned by KVCache to just cover input_pos entries

Note that depending on the attention implementation and its handling of masking this may or may not have the impact you have in mind.

@mseeger
Copy link
Contributor Author

mseeger commented Dec 19, 2024

@t-vi I am subselecting the mask at the same time. But do you have anything else in mind?

@mseeger
Copy link
Contributor Author

mseeger commented Dec 26, 2024

OK, I'll close this one and submit a new one, where I drop the refactoring of constructors of model.py classes.

@mseeger mseeger closed this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants