Skip to content

PyTorch distributed training acceleration framework

License

Notifications You must be signed in to change notification settings

hanwen-sun/torchacc

 
 

Repository files navigation

docs CI License

TorchAcc

TorchAcc is an AI training acceleration framework developed by Alibaba Cloud’s PAI team.

TorchAcc is built on PyTorch/XLA and provides an easy-to-use interface to accelerate the training of PyTorch models. At the same time, TorchAcc has implemented extensive optimizations for distributed training, memory management, and computation specifically for GPUs, ultimately achieving improved ease of use, better GPU training performance, and enhanced scalability for distributed training.

Documentation

Highlighted Features

  • Rich distributed parallelism strategies

    • Data Parallelism
    • Fully Sharded Data Parallelism
    • Tensor Parallelism
    • Pipeline Parallelism
    • Context Parallelism
  • Memory efficient

  • High Performance

  • Easy-to-use API

    You can accelerate your transformer models with just a few lines of code using TorchAcc.

Architecture Overview

The main goal of TorchAcc is to provide a high-performance AI training framework. It utilizes IR abstractions at different layers and employs static graph compilation optimization like XLA and dynamic graph compilation optimization like BladeDISC, as well as distributed optimization techniques, to offer a comprehensive end-to-end optimization solution from the underlying operators to the upper-level models.

Installation

Docker

sudo docker run  --gpus all --net host --ipc host --shm-size 10G -it --rm --cap-add=SYS_PTRACE registry.cn-hangzhou.aliyuncs.com/pai-dlc/acc:r2.3.0-cuda12.1.0-py3.10 bash

Build from source

see the contribution guide.

Getting Started

We present a straightforward example for training a Transformer model using TorchAcc, illustrating the usage of the TorchAcc API. You can quickly initiate training a Transformer model with TorchAcc by executing the following command:

torchrun --nproc_per_node=4 benchmarks/transformer.py --bf16 --acc --disable_loss_print --fsdp_size=4 --gc

LLMs training examples

Utilizing HuggingFace Transformers

If you are familiar with HuggingFace Transformers's Trainer, you can easily accelerate a Transformer model using TorchAcc, see the huggingface transformers

LLMs training acceleration with FlashModels

If you want to try the latest features of Torchacc or want to use the TorchAcc interface more flexibly for model acceleration, you can use our LLM acceleration library, FlashModels. FlashModels integrates various distributed implementations of commonly used open-source LLMs and provides a wealth of examples and benchmarks.

https://github.com/AlibabaPAI/FlashModels

SFT using modelscope/swift

coming soon..

Contributing

see the contribution guide.

Contact Us

You can contact us by adding our DingTalk group:

License

Apache License 2.0

About

PyTorch distributed training acceleration framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.0%
  • Shell 4.0%