Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PagedAttention #13

Merged
merged 54 commits into from
Dec 3, 2023
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
ec98e4f
Add new paged attention basic layout
EricLBuehler Nov 25, 2023
f6dc172
Update readme
EricLBuehler Nov 25, 2023
859338c
Add commend
EricLBuehler Nov 25, 2023
680df93
Add calls to vllm paged attention impl via FFI
EricLBuehler Nov 26, 2023
15a8194
Debug state
EricLBuehler Nov 27, 2023
da1cd99
Use cc .std, .opt_level, .include, still debug
EricLBuehler Nov 27, 2023
c0c9e0e
Use setup.py
EricLBuehler Nov 27, 2023
268a963
Better install
EricLBuehler Nov 27, 2023
fec3e9e
Add build.rs
EricLBuehler Nov 27, 2023
4c21697
Add some installation instructions
EricLBuehler Nov 28, 2023
2e1575f
Update installation instructions
EricLBuehler Nov 28, 2023
e382d8e
Add pytorch installation method
EricLBuehler Nov 28, 2023
11cfa2d
Update pytorch installation method
EricLBuehler Nov 28, 2023
7e61a17
Update pytorch installation method
EricLBuehler Nov 28, 2023
be6bf5f
Update pytorch installation method
EricLBuehler Nov 28, 2023
56a14c2
Update pytorch installation method
EricLBuehler Nov 28, 2023
1d4700a
Update readme
EricLBuehler Nov 28, 2023
75cd454
Add the Compiling PagedAttention CUDA kernels section
EricLBuehler Nov 28, 2023
8eeb250
Fix typo
EricLBuehler Nov 28, 2023
e453c05
Make directions explicit
EricLBuehler Nov 28, 2023
1df0f81
Only compile specific kernel
EricLBuehler Nov 29, 2023
32a72c0
Small update
EricLBuehler Nov 29, 2023
0746cc5
Add basic framework of PagedAttention.forward
EricLBuehler Nov 30, 2023
3dfb563
Add initial BlockDiagonalCausalMask
EricLBuehler Dec 1, 2023
c7cfca9
temporary save
EricLBuehler Dec 1, 2023
97e91c5
Implement attention bias
EricLBuehler Dec 1, 2023
eba2ce8
Refactor, mostly implement PagedAttention
EricLBuehler Dec 2, 2023
ad0dfd6
Clippy fixes
EricLBuehler Dec 2, 2023
75105db
Implement .copy_ manually
EricLBuehler Dec 2, 2023
d76f42e
Hide memory-efficient-attention behind a feature flag
EricLBuehler Dec 2, 2023
fbbe865
Describe feature flags
EricLBuehler Dec 2, 2023
9fe413f
TODO: multi_query_kv_attention
EricLBuehler Dec 2, 2023
3f57043
TODO: memory_efficient_attention_forward
EricLBuehler Dec 2, 2023
9ac36a5
Update readme
EricLBuehler Dec 2, 2023
3a850bc
Add scaled dot product attn
EricLBuehler Dec 3, 2023
7e2e4e6
Add flash_attn support
EricLBuehler Dec 3, 2023
f0d9832
Remove all lossy 'as' conversions
EricLBuehler Dec 3, 2023
d946898
Update readme
EricLBuehler Dec 3, 2023
f40f9c9
Fix typos
EricLBuehler Dec 3, 2023
0d9f3b5
Update readme
EricLBuehler Dec 3, 2023
85de7d5
Update readme
EricLBuehler Dec 3, 2023
8eff99a
Add minimal CUDA install script for CI
EricLBuehler Dec 3, 2023
173ca49
Debugging CI
EricLBuehler Dec 3, 2023
82ce3da
Debugging CI
EricLBuehler Dec 3, 2023
6d81221
Debugging CI
EricLBuehler Dec 3, 2023
40a8589
Debugging CI
EricLBuehler Dec 3, 2023
c8145b9
Debugging CI
EricLBuehler Dec 3, 2023
9a26979
Add no-paged-attention for CI
EricLBuehler Dec 3, 2023
a3e2d90
Add no-paged-attention for CI
EricLBuehler Dec 3, 2023
eb25e82
Update CI to not use CUDA
EricLBuehler Dec 3, 2023
323da9f
Update CI to not use CUDA
EricLBuehler Dec 3, 2023
cb9c583
Split up attn_bias
EricLBuehler Dec 3, 2023
7e6607f
Remove link from Contributing
EricLBuehler Dec 3, 2023
416bd19
Merge branch 'master' into paged_attention
EricLBuehler Dec 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 32 additions & 32 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,38 +41,38 @@ jobs:
with:
command: fmt
args: --all -- --check

clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add clippy
- uses: actions-rs/cargo@v1
with:
command: clippy
args: --workspace --tests --examples -- -D warnings

docs:
name: Docs
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
with:
command: doc
args: --workspace

#clippy:
# name: Clippy
# runs-on: ubuntu-latest
# steps:
# - uses: actions/checkout@v2
# - uses: actions-rs/toolchain@v1
# with:
# profile: minimal
# toolchain: stable
# override: true
# - run: rustup component add clippy
# - uses: actions-rs/cargo@v1
# with:
# command: clippy
# args: --workspace --tests --examples -- -D warnings
#
#docs:
# name: Docs
# runs-on: ubuntu-latest
# steps:
# - uses: actions/checkout@v2
# - uses: actions-rs/toolchain@v1
# with:
# profile: minimal
# toolchain: stable
# override: true
# - uses: actions-rs/cargo@v1
# with:
# command: doc
# args: --workspace
typos:
name: Typos
runs-on: ubuntu-latest
Expand Down
39 changes: 39 additions & 0 deletions .github/workflows/install.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import subprocess

subprocess.run(["sudo", "apt", "update", "-y"])
subprocess.run(["sudo", "apt", "install", "libssl-dev", "-y"])
subprocess.run(["sudo", "apt", "install", "pkg-config", "-y"])

try:
import torch
works = True
except:
works = False

if works:
first = subprocess.run(["sudo", "find", "/", "-name", "libtorch_cpu.so"]).stdout.split("\n")
else:
first = []

nvcc_release = subprocess.run(["nvcc", "--version"])
assert nvcc_release.returncode == 0

nvcc_release = nvcc_release.stdout.split("\n")[3] #Cuda compilation tools, release 11.5, V11.5.119
nvcc_release = nvcc_release.split("release ")[1] #['Cuda compilation tools, ', '11.5, V11.5.119']
nvcc_release = float(nvcc_release[1].split(","))

print(f"Got nvcc version {nvcc_release}")
if nvcc_release<=11.8:
subprocess.run(["pip", "install", "torch==2.1.0", "torchvision==0.16.0", "torchaudio==2.1.0", "--index-url", "https://download.pytorch.org/whl/cu118"])
else:
subprocess.run(["pip", "install", "torch==2.1.0", "torchvision==0.16.0", "torchaudio==2.1.0", "--index-url", "https://download.pytorch.org/whl/cu121"])

after = subprocess.run(["sudo", "find", "/", "-name", "libtorch_cpu.so"]).stdout.split("\n")
different = list(filter(lambda x: x not in first, after))[0]

with open("~/.bashrc", "a") as f:
f.write("# candle-vllm")
f.write(f"export LD_LIBRARY_PATH={different}:$LD_LIBRARY_PATH")
f.write("export LIBTORCH_USE_PYTORCH=1")

subprocess.run(["source", "~/.bashrc"])
Loading
Loading