This guide provides steps for deploying and serving LLMs with Deepspeed, to legerage features such as automatic tensor parallelism (AutoTP).
Please follow setup.md to setup the environment first. Additional, you will need to install deepspeed dependencies as below.
pip install .[deepspeed]
Please follow the serving document for configuring the parameters. In the configuration file, you need to set deepspeed
to true to enable Deepspeed AutoTP feature.
deepspeed: true
Please follow the serving document for deploying and testing.