how to get the maximum throughput? #942

CSEEduanyu · 2024-08-05T13:38:59Z

CSEEduanyu
Aug 5, 2024

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Decode batch. #running-req: 1010, #token: 55006, token usage: 0.93, gen throughput (token/s): 1397.13, #queue-req: 12279
why throughput only 1397?

Reproduction

--schedule-conservativeness 0.3

Environment

A100 llama2-13B

CSEEduanyu · 2024-08-05T13:39:12Z

CSEEduanyu
Aug 5, 2024
Author

why throughput only 1397?

0 replies

zhyncs · 2024-08-05T17:51:32Z

zhyncs
Aug 5, 2024
Maintainer

For users, using the default parameters is fine. I tested it on the A100 in the following way and did not reproduce your issue.

python -m sglang.launch_server --model-path meta-llama/Llama-2-13b-chat-hf --enable-torch-compile --disable-radix-cache
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-13b-chat-hf  --disable-log-requests

python3 -m sglang.bench_serving --backend sglang --num-prompts 3000
python3 -m sglang.bench_serving --backend vllm --num-prompts 3000

============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Successful requests:                     3000
Benchmark duration (s):                  438.67
Total input tokens:                      747330
Total generated tokens:                  711554
Total generated tokens (retokenized):    711917
Request throughput (req/s):              6.84
Input token throughput (tok/s):          1703.62
Output token throughput (tok/s):         1622.06
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   222741.82
Median E2E Latency (ms):                 225970.23
---------------Time to First Token----------------
Mean TTFT (ms):                          203884.12
Median TTFT (ms):                        205177.88
P99 TTFT (ms):                           410928.77
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          81.25
Median TPOT (ms):                        81.90
P99 TPOT (ms):                           128.41
---------------Inter-token Latency----------------
Mean ITL (ms):                           945.42
Median ITL (ms):                         60.47
P99 ITL (ms):                            312.49
==================================================


============ Serving Benchmark Result ============
Backend:                                 vllm
Traffic request rate:                    inf
Successful requests:                     3000
Benchmark duration (s):                  652.81
Total input tokens:                      747330
Total generated tokens:                  711554
Total generated tokens (retokenized):    711912
Request throughput (req/s):              4.60
Input token throughput (tok/s):          1144.79
Output token throughput (tok/s):         1089.99
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   316621.50
Median E2E Latency (ms):                 316469.49
---------------Time to First Token----------------
Mean TTFT (ms):                          285801.99
Median TTFT (ms):                        281912.70
P99 TTFT (ms):                           594545.05
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          144.03
Median TPOT (ms):                        129.98
P99 TPOT (ms):                           497.96
---------------Inter-token Latency----------------
Mean ITL (ms):                           1341.83
Median ITL (ms):                         98.32
P99 ITL (ms):                            584.26
==================================================

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to get the maximum throughput? #942

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

how to get the maximum throughput? #942

CSEEduanyu Aug 5, 2024

Checklist

Describe the bug

Reproduction

Environment

Replies: 2 comments

CSEEduanyu Aug 5, 2024 Author

zhyncs Aug 5, 2024 Maintainer

CSEEduanyu
Aug 5, 2024

CSEEduanyu
Aug 5, 2024
Author

zhyncs
Aug 5, 2024
Maintainer