forked from mlc-ai/mlc-llm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Serving] Support batched prefill and benchmark
This PR supports the current serving framework with batched prefill, which helps improve the throughput of prefill. Some data structures are tweaked for less runtime overhead. This PR also brings the benchmark of serving engine with real-time dataset as input.
- Loading branch information
1 parent
36ea52d
commit d1f3c5a
Showing
11 changed files
with
561 additions
and
278 deletions.
There are no files selected for viewing
Submodule flashinfer
updated
28 files
+3 −4 | .clang-format | |
+1 −0 | .gitignore | |
+1 −0 | .gitmodules | |
+1 −0 | include/flashinfer.cuh | |
+28 −0 | include/flashinfer/cascade.cuh | |
+73 −38 | include/flashinfer/cp_async.cuh | |
+516 −497 | include/flashinfer/decode.cuh | |
+19 −26 | include/flashinfer/layout.cuh | |
+16 −3 | include/flashinfer/math.cuh | |
+156 −39 | include/flashinfer/mma.cuh | |
+221 −120 | include/flashinfer/page.cuh | |
+25 −14 | include/flashinfer/permuted_smem.cuh | |
+986 −570 | include/flashinfer/prefill.cuh | |
+2 −2 | include/flashinfer/rope.cuh | |
+9 −14 | include/flashinfer/state.cuh | |
+33 −17 | include/flashinfer/utils.cuh | |
+141 −299 | include/flashinfer/vec_dtypes.cuh | |
+162 −0 | include/flashinfer/wrapper.cuh | |
+27 −60 | src/bench_batch_decode.cu | |
+17 −24 | src/bench_single_decode.cu | |
+42 −56 | src/bench_single_prefill.cu | |
+28 −44 | src/cpu_reference.h | |
+29 −74 | src/test_batch_decode.cu | |
+271 −36 | src/test_batch_prefill.cu | |
+59 −81 | src/test_page.cu | |
+18 −24 | src/test_single_decode.cu | |
+24 −37 | src/test_single_prefill.cu | |
+167 −147 | src/tvm_wrapper.cu |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.