Torchrun launching multiple api_server #2402

AllentDan · 2024-08-30T08:15:31Z

No description provided.

zhyncs · 2024-09-01T09:43:50Z

docs/en/llm/api_server.md

@@ -249,6 +249,33 @@ curl http://{server_ip}:{server_port}/v1/chat/interactive \
 lmdeploy serve gradio api_server_url --server-name ${gradio_ui_ip} --server-port ${gradio_ui_port}
 ```

+## Launch multiple api servers
+
+Following is a possible way to launch multiple api servers through torchrun. Just create a python script with the following codes.


Hi @AllentDan In what scenarios would this feature be used?

Some researchers tend to use torchrun.

Conflicts: lmdeploy/serve/openai/api_server.py

lvhan028 · 2024-12-24T04:07:18Z

docs/zh_cn/llm/api_server.md

+fi
+# 启动 torchrun 并放入后台
+# 再次强调多机环境下并不需要传--nnodes 或者 --master-addr 等参数，相当于每个机器上执行一次单节点的 torchrun 即可。
+torchrun \


每个节点上都手动执行一次么？

不是，这里集群调度会自动在每个节点都运行这个脚本

lvhan028 · 2024-12-24T04:10:01Z

docs/zh_cn/llm/api_server.md

+2. torchrun 启动脚本 `torchrun --nproc_per_node 2 script.py InternLM/internlm2-chat-1_8b --proxy_url http://{proxy_node_name}:{proxy_node_port}`. **注意**： 多机多卡不要用默认 url `0.0.0.0:8000`，我们需要输入真实ip对应的地址，如：`11.25.34.55:8000`。多机情况下，因为不需要子节点间的通信，所以并不需要用户指定 torchrun 的 `--nnodes` 等参数，只要能保证每个节点执行一次单节点的 torchrun 就行。
+
+```python
+import os


这部分脚本可以放到 CLI 么？

torchrun --nproc_per_node 2 lmdeploy serve node <model_path> --proxy-url ip:port

似乎不太行。直接给脚本也方便用户自己自定义需求吧

lvhan028 · 2024-12-26T03:03:15Z

docs/zh_cn/llm/api_server.md

+source /path/to/your/home/miniconda3/bin/activate /path/to/your/home/miniconda3/envs/your_env
+export HOME=/path/to/your/home
+# 获取主节点IP地址（假设 MLP_WORKER_0_HOST 是主节点的IP）
+MASTER_IP=${MLP_WORKER_0_HOST}


MLP_WORKER_0_HOST, MLP_ROLE_INDEX 是火山云上的环境变量么

docs/en/llm/api_server.md

RunningLeon

LGTM

Add torchrun launching multiple api_server

5733c57

zhyncs reviewed Sep 1, 2024

View reviewed changes

AllentDan added 2 commits December 11, 2024 16:24

Merge branch 'main' into torchrun

a112a67

Conflicts: lmdeploy/serve/openai/api_server.py

update with proxy

08ae7fd

lvhan028 self-requested a review December 12, 2024 04:48

lvhan028 added the improvement label Dec 12, 2024

AllentDan added 3 commits December 18, 2024 12:48

custom tp and backend

b1e30ef

typo

a40f435

add an example

48893af

lvhan028 reviewed Dec 24, 2024

View reviewed changes

AllentDan added 3 commits December 24, 2024 15:30

refine

7729484

Merge branch 'main' into torchrun

25c3339

Merge branch 'main' into torchrun

8e0b65a

lvhan028 reviewed Dec 26, 2024

View reviewed changes

lvhan028 approved these changes Dec 26, 2024

View reviewed changes

lvhan028 requested a review from RunningLeon December 26, 2024 03:23

RunningLeon reviewed Dec 26, 2024

View reviewed changes

docs/en/llm/api_server.md Show resolved Hide resolved

format code snippet

52888ad

RunningLeon approved these changes Dec 26, 2024

View reviewed changes

lvhan028 merged commit d9b8372 into InternLM:main Dec 26, 2024
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchrun launching multiple api_server #2402

Torchrun launching multiple api_server #2402

AllentDan commented Aug 30, 2024

zhyncs Sep 1, 2024

AllentDan Sep 2, 2024

lvhan028 Dec 24, 2024

AllentDan Dec 24, 2024

lvhan028 Dec 24, 2024

AllentDan Dec 24, 2024

lvhan028 Dec 26, 2024

AllentDan Dec 26, 2024

RunningLeon left a comment

Torchrun launching multiple api_server #2402

Torchrun launching multiple api_server #2402

Conversation

AllentDan commented Aug 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RunningLeon left a comment

Choose a reason for hiding this comment