Distributed training (Distributed Data Parallel) demo with Resnet50.
- If you would like to check how to run this demo in Intel(R) DevCloud, please checkout devcloud branch.
- Run with torch.distributed.launch script
python -m torch.distributed.launch --nproc_per_node=2 resnet_ddp.py
- Run with torchrun
torchrun --nproc_per_node=2 resnet_ddp.py
- Run with IPEX launch script
source /opt/intel/oneapi/mpi/latest/env/vars.sh
python launch.py --distributed --nproc_per_node 2 resnet_ddp.py
- Run with Horovod
horovodrun -np 2 python resnet_ddp.py
python resnet_ddp.py --backend [ccl|nccl|gloo|...]