how to get the maximum throughput? #942
Replies: 2 comments
-
why throughput only 1397? |
Beta Was this translation helpful? Give feedback.
0 replies
-
For users, using the default parameters is fine. I tested it on the A100 in the following way and did not reproduce your issue. python -m sglang.launch_server --model-path meta-llama/Llama-2-13b-chat-hf --enable-torch-compile --disable-radix-cache
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-13b-chat-hf --disable-log-requests
python3 -m sglang.bench_serving --backend sglang --num-prompts 3000
python3 -m sglang.bench_serving --backend vllm --num-prompts 3000
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checklist
Describe the bug
Decode batch. #running-req: 1010, #token: 55006, token usage: 0.93, gen throughput (token/s): 1397.13, #queue-req: 12279
why throughput only 1397?
Reproduction
--schedule-conservativeness 0.3
Environment
Beta Was this translation helpful? Give feedback.
All reactions