Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llama] Store KV Cache on CPU and Use PyTorch SPDA for Next token generation #1182

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Commits on Sep 18, 2024

  1. cpu_kv and cpu_sdpa on llama

    Signed-off-by: Yu Zhentao <[email protected]>
    zhentaoyu committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    aee4795 View commit details
    Browse the repository at this point in the history
  2. refact code and add README

    Signed-off-by: Yu Zhentao <[email protected]>
    zhentaoyu committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    1b4ee20 View commit details
    Browse the repository at this point in the history
  3. fix kv_cache_on_host if statement and add non_blocking copy

    Signed-off-by: Yu Zhentao <[email protected]>
    zhentaoyu committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    fd29d4e View commit details
    Browse the repository at this point in the history
  4. add long-context example in README

    Signed-off-by: Yu Zhentao <[email protected]>
    zhentaoyu committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    74e94ff View commit details
    Browse the repository at this point in the history