Deploying and Serving LLMs with Deepspeed

This guide provides steps for deploying and serving LLMs with Deepspeed, to legerage features such as automatic tensor parallelism (AutoTP).

Setup

Please follow setup.md to setup the environment first. Additional, you will need to install deepspeed dependencies as below.

pip install .[deepspeed]

Configure Serving Parameters

Please follow the serving document for configuring the parameters. In the configuration file, you need to set deepspeed to true to enable Deepspeed AutoTP feature.

deepspeed: true

Deploy and Test

Please follow the serving document for deploying and testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serve_deepspeed.md

serve_deepspeed.md

Deploying and Serving LLMs with Deepspeed

Setup

Configure Serving Parameters

Deploy and Test

Files

serve_deepspeed.md

Latest commit

History

serve_deepspeed.md

File metadata and controls

Deploying and Serving LLMs with Deepspeed

Setup

Configure Serving Parameters

Deploy and Test