Several improvements related to KVCache #1870

mseeger · 2024-12-12T14:43:24Z

Implements what is asked for in #1867.

Ensure that KVCache buffers are only as large as config.n_query_groups
Shrink buffers returned by KVCache to just cover input_pos entries
Refactor child classes of model.py classes to avoid copy and paste

Also adding comments, docstrings here and there.

- Shrink buffers returned by KVCache to just cover input_pos entries - Refactor child classes of model.py classes to avoid copy and paste

mseeger · 2024-12-12T14:53:52Z

I am interested in smart KV caches for memory-efficient inference.

Andrei-Aksionov · 2024-12-12T15:33:13Z

Hello @mseeger

Thanks for this and the other PR 👍

A couple of notes:

It's better to create multiple smaller PRs instead of a large one that covers multiple topics. In this case reviewing will be done much faster.
Refactor child classes of model.py classes to avoid copy and paste. It's not a bug, but a feature :) Yes, code duplication makes it a bit more tedious to add changes, but also it's considered to be easier to read the code when everything (or the majority of the code) is in one file.
It's even in the main readme:

Developer friendly - Easy debugging with no abstraction layers and single file implementations.

mseeger · 2024-12-12T17:11:09Z

Hi @Andrei-Aksionov,

let me know what I should change. Indeed, I could split this into 3:

Refactor child classes ...
KV cache buffers
Shrink buffers

It is just that 1. is really needed for the other two.

I understand this single file argument. But maybe there should be a balance? In the end, you inherit code from model.py in the other modules adapter.py, ..., lots of it, just the constructors are not properly done. I just introduced some simple factory functions. Maybe I did too much OOP, but for me, the new state is simpler, since I immediately see where the differences are going from the base case to adapter, etc.

Another point: Why I am looking at LitGPT and not Hugging Face (a colleague pointed your project out to me) is because the structure is so much clearer. I can learn a lot about different models by immediately seeing what is common and what is different. In HF, this is next to impossible, because everything is copied for every new model.

Andrei-Aksionov · 2024-12-12T17:28:39Z

let me know what I should change. Indeed, I could split this into 3:

1. Refactor child classes ...
2. KV cache buffers
3. Shrink buffers

Let's do #2 and #3 in two separate PRs for now.

... But maybe there should be a balance? ...
... since I immediately see where the differences are going from the base case to adapter, etc.

Here I agree. It's easier to see the difference between PEFT implementation and the base model, if we reuse unchanged parts of the code from the base model.
I'll think about it over the weekend.
Most likely we can do it in "baby steps", without touching the base model.py at first.

I would recommend to focus on #2 and #3 first.

Another point: Why I am looking at LitGPT and not Hugging Face (a colleague pointed your project out to me) is because the structure is so much clearer. I can learn a lot about different models by immediately seeing what is common and what is different. In HF, this is next to impossible, because everything is copied for every new model.

So pleasant to hear it. Thanks 😊

t-vi · 2024-12-19T11:51:02Z

Shrink buffers returned by KVCache to just cover input_pos entries

Note that depending on the attention implementation and its handling of masking this may or may not have the impact you have in mind.

mseeger · 2024-12-19T20:57:37Z

@t-vi I am subselecting the mask at the same time. But do you have anything else in mind?

mseeger · 2024-12-26T12:58:08Z

OK, I'll close this one and submit a new one, where I drop the refactoring of constructors of model.py classes.

- Ensure that KVCache buffers are only as large as config.n_query_groups

5e24aa5

- Shrink buffers returned by KVCache to just cover input_pos entries - Refactor child classes of model.py classes to avoid copy and paste

mseeger requested review from rasbt and lantiga as code owners December 12, 2024 14:43

Fix

28a1c5b

mseeger mentioned this pull request Dec 12, 2024

Improvements of KVCache and refactoring of subclasses of classes in model.py #1867

Open

mseeger closed this Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several improvements related to KVCache #1870

Several improvements related to KVCache #1870

mseeger commented Dec 12, 2024

mseeger commented Dec 12, 2024

Andrei-Aksionov commented Dec 12, 2024

mseeger commented Dec 12, 2024

Andrei-Aksionov commented Dec 12, 2024

t-vi commented Dec 19, 2024

mseeger commented Dec 19, 2024

mseeger commented Dec 26, 2024

Several improvements related to KVCache #1870

Several improvements related to KVCache #1870

Conversation

mseeger commented Dec 12, 2024

mseeger commented Dec 12, 2024

Andrei-Aksionov commented Dec 12, 2024

mseeger commented Dec 12, 2024

Andrei-Aksionov commented Dec 12, 2024

t-vi commented Dec 19, 2024

mseeger commented Dec 19, 2024

mseeger commented Dec 26, 2024