Skip to content

Latest commit

 

History

History
 
 

YOLOX for PyTorch

This repository provides scripts to train YOLOX model on Intel® Gaudi® AI accelerator to achieve state-of-the-art accuracy. To obtain model performance data, refer to the Intel Gaudi Model Performance Data page. For more information about training deep learning models using Gaudi, visit developer.habana.ai. Before you get started, make sure to review the Supported Configurations.

The YOLOX demo included in this release is YOLOX-S in lazy mode training for different batch sizes with FP32 and BF16 mixed precision.

Table of Contents

Model Overview

YOLOX is an anchor-free object detector that adopts the architecture of YOLO with DarkNet53 backbone. The anchor-free mechanism greatly reduces the number of model parameters and therefore simplifies the detector. Additionally, YOLOX also provides improvements to the previous YOLO series such as decoupled head, advanced label assigning strategy, and strong data augmentation. The decoupled head contains a 1x1 conv layer, followed by two parallel branches with two 3x3 conv layers for classification and regression tasks respectively, which helps the model converge faster with better accuracy. The advanced label assignment, SimOTA, selects the top k predictions with the lowest cost as the positive samples for a ground truth object. SimOTA not only reduces training time by approximating the assignment instead of using an optimization algorithm, but also improves AP of the model. Additionally, Mosaic and MixUp image augmentation are applied to the training process to further improve the accuracy. Equipped with these latest advanced techniques, YOLOX remarkably achieves a better trade-off between training speed and accuracy than other counterparts in all model sizes.

This repository is an implementation of PyTorch version YOLOX, based on the source code from https://github.com/Megvii-BaseDetection/YOLOX. More details can be found in the paper YOLOX: Exceeding YOLO Series in 2021 by Zhen Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun.

Setup

Please follow the instructions provided in the Gaudi Installation Guide to set up the environment including the $PYTHON environment variable. To achieve the best performance, please follow the methods outlined in the Optimizing Training Platform Guide. The guides will walk you through the process of setting up your system to run the model on Gaudi.

Clone Intel Gaudi Model-References

In the docker container, clone this repository and switch to the branch that matches your Intel Gaudi software version. You can run the hl-smi utility to determine the Intel Gaudi software version

git clone -b [Intel Gaudi software version] https://github.com/HabanaAI/Model-References

Go to PyTorch YOLOX directory:

cd Model-References/PyTorch/computer_vision/detection/yolox

Install Model Requirements

Install the required packages and add current directory to PYTHONPATH:

pip install -r requirements.txt
pip install -v -e .
export PYTHONPATH=$PWD:$PYTHONPATH

Setting up the Dataset

Download COCO 2017 dataset from http://cocodataset.org using the following commands:

cd Model-References/PyTorch/computer_vision/detection/yolox
source download_dataset.sh

You can either set the dataset location to the YOLOX_DATADIR environment variable:

export YOLOX_DATADIR=/data/COCO

Or create a sub-directory, datasets, and create a symbolic link from the COCO dataset path to the 'datasets' sub-directory.

mkdir datasets
ln -s /data/COCO ./datasets/COCO

Alternatively, you can pass the COCO dataset location to the --data_dir argument of the training commands.

Training Examples

Run Single Card and Multi-Card Training Examples

NOTE: YOLOX only supports Lazy mode.

Run training on 1 HPU:

  • FP32 data type, train for 500 steps:

    $PYTHON tools/train.py \
        --name yolox-s --devices 1 --batch-size 16 --data_dir /data/COCO --hpu steps 500 output_dir ./yolox_output
  • BF16 data type. train for 500 steps:

    PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16_yolox.txt PT_HPU_AUTOCAST_FP32_OPS_LIST=ops_fp32_yolox.txt $PYTHON tools/train.py \
        --name yolox-s --devices 1 --batch-size 16 --data_dir /data/COCO --hpu --autocast \
        steps 500 output_dir ./yolox_output

Run training on 8 HPUs:

NOTE: mpirun map-by PE attribute value may vary on your setup. For the recommended calculation, refer to the instructions detailed in mpirun Configuration.

  • FP32 data type, train for 2 epochs:

    export MASTER_ADDR=localhost
    export MASTER_PORT=12355
    mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root \
    $PYTHON tools/train.py \
        --name yolox-s --devices 8 --batch-size 128 --data_dir /data/COCO --hpu max_epoch 2 output_dir ./yolox_output
  • BF16 data type. train for 2 epochs:

    export MASTER_ADDR=localhost
    export MASTER_PORT=12355
    PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16_yolox.txt PT_HPU_AUTOCAST_FP32_OPS_LIST=ops_fp32_yolox.txt mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root \
    $PYTHON tools/train.py \
        --name yolox-s --devices 8 --batch-size 128 --data_dir /data/COCO --hpu --autocast\
        max_epoch 2 output_dir ./yolox_output
  • BF16 data type, train for 300 epochs:

    export MASTER_ADDR=localhost
    export MASTER_PORT=12355
    PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16_yolox.txt PT_HPU_AUTOCAST_FP32_OPS_LIST=ops_fp32_yolox.txt mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root \
    $PYTHON tools/train.py \
        --name yolox-s --devices 8 --batch-size 128 --data_dir /data/COCO --hpu --autocast \
        print_interval 100 max_epoch 300 save_history_ckpt False eval_interval 300 output_dir ./yolox_output

Supported Configurations

Device Intel Gaudi Software Version PyTorch Version
Gaudi 1.16.2 2.2.2

Changelog

1.12.0

  • Removed PT_HPU_LAZY_MODE environment variable.
  • Removed flag use_lazy_mode.
  • Removed HMP data type.
  • Updated run commands which allows for overriding the default lower precision and FP32 lists of ops.

1.10.0

  • Enabled mixed precision training using PyTorch autocast on Gaudi.

Training Script Modifications

The following are the changes made to the training scripts:

  • Added source code to enable training on CPU.

  • Added source code to support Gaudi devices.

    • Enabled HMP data type.

    • Added support to run training in Lazy mode.

    • Re-implemented loss function with TorchScript and deployed the function to CPU.

    • Enabled distributed training with HCCL backend on 8 HPUs.

    • mark_step() is called to trigger execution.

Known Issues

Eager mode is not supported.