Remove threadsafe #2907

grimoire · 2024-12-17T08:58:31Z

Thread-safe mode has been removed.
asyncio.Queue -> asyncio.Event
Better host performance

Note that EOS would be output in this PR.

lvhan028 · 2024-12-18T03:42:36Z

We have users who use pytorch engine in multi-thread env.
Pls provide a guide for them about migrating the non-threadsafe pytorch engine

lvhan028 · 2024-12-18T03:50:16Z

Add WARNING that threadsafe is removed

lvhan028 · 2024-12-18T03:56:28Z

"Better host performance", so what's the performance now?

grimoire · 2024-12-18T08:32:34Z

"Better host performance", so what's the performance now?

llama3-8b, tp=1, 3000 prompt, 256 concurrency

concurrency: 256
elapsed_time: 133.107s

first token latency(s)(min, max, ave): 0.119, 4.574, 0.621
per-token latency(s) percentile(50, 75, 95, 99): [0.028, 0.03, 0.284, 0.47]

number of prompt tokens: 676779
number of completion tokens: 612685
token throughput (completion token): 4602.956 token/s
token throughput (prompt + completion token): 9687.436 token/s
RPS (request per second): 22.538 req/s
RPM (request per minute): 1352.297 req/min

llama3-8b, tp=1, 10000 prompt, 512 concurrency

concurrency: 512
elapsed_time: 386.856s

first token latency(s)(min, max, ave): 0.259, 7.529, 0.823
per-token latency(s) percentile(50, 75, 95, 99): [0, 0.055, 0.894, 1.138]

number of prompt tokens: 2238358
number of completion tokens: 1995438
token throughput (completion token): 5158.094 token/s
token throughput (prompt + completion token): 10944.123 token/s
RPS (request per second): 25.849 req/s
RPM (request per minute): 1550.966 req/min

lvhan028 · 2024-12-18T08:42:14Z

"Note that EOS would be output in this PR."
@lzhangzz will the tm refactoring you are working on output the EOS and stop_token_id to async_engine?

lmdeploy/pytorch/engine/engine_checker.py

docs/zh_cn/advance/pytorch_multithread.md

RunningLeon

LGTM

lzhangzz · 2024-12-25T06:21:53Z

"Note that EOS would be output in this PR." @lzhangzz will the tm refactoring you are working on output the EOS and stop_token_id to async_engine?

We need to discuss how EOS/stop_token_ids should be skipped in the async engine.

For some models, EOS is part of their chat template we may exclude the token in the reponse but step should not be rewinded (i.e. the token is kept in kv cache).

However, for models like vicuna, EOS must be excluded from both response and kv cache (rewind step to the token before EOS).

lvhan028 · 2024-12-31T09:48:37Z

docs/en/advance/pytorch_multithread.md

+print(output[1].text)
+```
+
+If you do need multithreading, it would be easy to warp it like below:


@lzhangzz After PR #2968, can users use pipeline api in multithreading?

lvhan028 · 2025-01-03T04:38:50Z

lmdeploy/serve/async_engine.py

-                    adapter_name: Optional[str] = None,
-                    use_tqdm: bool = False,
-                    **kwargs):
+    async def async_batch_infer(


Is this API purely for multithread migration?

… remove-threadsafe

chengyuma · 2025-01-10T07:09:36Z

大佬，请教一下这个 PR 带来性能提升的原因是什么，是因为把 Queue 换成了 asyncio.Queue 吗？

grimoire · 2025-01-12T08:56:56Z

@chengyuma 核心是用协程让 cpu 和 gpu 运算 overlap，避免 gpu 等待。

chengyuma · 2025-01-13T02:00:34Z

@chengyuma 核心是用协程让 cpu 和 gpu 运算 overlap，避免 gpu 等待。

那多线程也可以避免 GPU 等待，在这个场景里协程的优势是什么呢，是更好的调度，更轻量的调度？

grimoire · 2025-01-13T02:11:16Z

@chengyuma python 的多线程有 GIL 的，实际不是真并行

chengyuma · 2025-01-13T07:28:54Z

@chengyuma python 的多线程有 GIL 的，实际不是真并行

这个我知道，可是协程也不是并行呀

grimoire · 2025-01-13T08:24:26Z

@chengyuma 协程可控，可以保证起了足够多 kernel 后再 await 切换 task；线程更不可控

chengyuma · 2025-01-13T08:56:28Z

@chengyuma 协程可控，可以保证起了足够多 kernel 后再 await 切换 task；线程更不可控

嗯嗯，有道理，谢谢！

grimoire added 5 commits December 17, 2024 10:55

remove threadsafe

19738b2

optimize performance

acab4e7

22.4

47a03e7

22.5

7f8dec6

delete jsonl

a1caab8

lvhan028 requested review from AllentDan, irexyc and RunningLeon December 17, 2024 13:03

lvhan028 added the improvement label Dec 17, 2024

Merge branch 'main' into remove-threadsafe

2de5db2

add docs

1007f86

RunningLeon reviewed Dec 18, 2024

View reviewed changes

lmdeploy/pytorch/engine/engine_checker.py Outdated Show resolved Hide resolved

fix link

053a34e

RunningLeon reviewed Dec 18, 2024

View reviewed changes

docs/zh_cn/advance/pytorch_multithread.md Show resolved Hide resolved

grimoire added 2 commits December 18, 2024 19:28

rst

7254c0d

Merge branch 'main' into remove-threadsafe

b9998d9

RunningLeon approved these changes Dec 23, 2024

View reviewed changes

grimoire added 2 commits December 23, 2024 11:53

remove sleep req step

d62076b

remove scheduler sleep

6ded3f6

lvhan028 reviewed Dec 31, 2024

View reviewed changes

grimoire added 2 commits January 2, 2025 12:08

Merge branch 'main' into remove-threadsafe

9b3ad0e

fix ut

23cbcc5

lvhan028 reviewed Jan 3, 2025

View reviewed changes

solve conflict

54f8d7b

grimoire and others added 2 commits January 3, 2025 14:06

Merge branch 'remove-threadsafe' of github.com:grimoire/lmdeploy into…

751a98e

… remove-threadsafe

recovery async engine

4a8b970

lvhan028 approved these changes Jan 3, 2025

View reviewed changes

lvhan028 merged commit aabc90d into InternLM:main Jan 3, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove threadsafe #2907

Remove threadsafe #2907

grimoire commented Dec 17, 2024 •

edited

Loading

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

grimoire commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

RunningLeon left a comment

lzhangzz commented Dec 25, 2024

lvhan028 Dec 31, 2024

lvhan028 Jan 3, 2025

grimoire Jan 3, 2025

chengyuma commented Jan 10, 2025

grimoire commented Jan 12, 2025

chengyuma commented Jan 13, 2025

grimoire commented Jan 13, 2025

chengyuma commented Jan 13, 2025

grimoire commented Jan 13, 2025

chengyuma commented Jan 13, 2025

Remove threadsafe #2907

Remove threadsafe #2907

Conversation

grimoire commented Dec 17, 2024 • edited Loading

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

grimoire commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

RunningLeon left a comment

Choose a reason for hiding this comment

lzhangzz commented Dec 25, 2024

lvhan028 Dec 31, 2024

Choose a reason for hiding this comment

lvhan028 Jan 3, 2025

Choose a reason for hiding this comment

grimoire Jan 3, 2025

Choose a reason for hiding this comment

chengyuma commented Jan 10, 2025

grimoire commented Jan 12, 2025

chengyuma commented Jan 13, 2025

grimoire commented Jan 13, 2025

chengyuma commented Jan 13, 2025

grimoire commented Jan 13, 2025

chengyuma commented Jan 13, 2025

grimoire commented Dec 17, 2024 •

edited

Loading