Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 0.6.2 vs 0.4.2 qwen1.5b模型,0.6.2推理性能差距有慢3倍 #2752

Open
1 of 3 tasks
xliangwu opened this issue Nov 13, 2024 · 5 comments
Open
1 of 3 tasks
Assignees

Comments

@xliangwu
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

线上一直用的0.4.2版本,最近需要用function call,所以升级到最新版本,发现推理性能慢了好几几倍。

Reproduction

0.6.2
image

0.4.2
baf4597731421fdbdf33bdba7073dbc0

Environment

3090Ti,centos

Error traceback

No response

@xliangwu
Copy link
Author

xliangwu commented Nov 13, 2024

补充下信息:
lmdeploy 之后的参数:
2024-11-13 12:35:13,900 - lmdeploy - INFO - async_engine.py:168 - updated backend_config=TurbomindEngineConfig(dtype='auto', model_format='hf', tp=1, session_len=16384, max_batch_size=12, cache_max_entry_count=0.8, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=True, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=8192, max_prefill_iters=2)

观察显卡使用率,发现0.4.2 可以达到98%,但是0.6.2 只能跑到40%

@lvhan028
Copy link
Collaborator

麻烦提供下复现方式

lmdeploy 每次发版都会做模型精度评测和推理速度测试,一直都是符合预期的。

@lvhan028 lvhan028 self-assigned this Nov 13, 2024
@lvhan028
Copy link
Collaborator

cc @zhulinJulia24

@zhulinJulia24
Copy link
Collaborator

image
a100上单卡0.6.2.post1版本lmdeploy,使用benchmark测速符合预期,显卡利用率99% 符合预期的

@xliangwu
Copy link
Author

谢谢 我准备下我的数据和环境,然后提供复现的数据。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants