New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[llama] Store KV Cache on CPU and Use PyTorch `SPDA` for Next token generation #1182

Open

zhentaoyu wants to merge 4 commits into huggingface:main from zhentaoyu:cpu_sdpa

Commits on Sep 18, 2024

cpu_kv and cpu_sdpa on llama
```
Signed-off-by: Yu Zhentao <[email protected]>
```
zhentaoyu committed Sep 18, 2024
Configuration menu
View commit details

Copy full SHA for aee4795

Browse repository at this point
Copy the full SHA

aee4795 View commit details

Browse the repository at this point in the history
refact code and add README
```
Signed-off-by: Yu Zhentao <[email protected]>
```
zhentaoyu committed Sep 18, 2024
Configuration menu
View commit details

Copy full SHA for 1b4ee20

Browse repository at this point
Copy the full SHA

1b4ee20 View commit details

Browse the repository at this point in the history
fix kv_cache_on_host if statement and add non_blocking copy
```
Signed-off-by: Yu Zhentao <[email protected]>
```
zhentaoyu committed Sep 18, 2024
Configuration menu
View commit details

Copy full SHA for fd29d4e

Browse repository at this point
Copy the full SHA

fd29d4e View commit details

Browse the repository at this point in the history
add long-context example in README
```
Signed-off-by: Yu Zhentao <[email protected]>
```
zhentaoyu committed Sep 18, 2024
Configuration menu
View commit details

Copy full SHA for 74e94ff

Browse repository at this point
Copy the full SHA

74e94ff View commit details

Browse the repository at this point in the history