We compare our results with some popular frameworks and official releases in terms of speed.
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
- Python 3.7
- PyTorch 1.4
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08
The time we measured is the average training time for an iteration, including data processing and model training. The training speed is measure with s/iter. The lower, the better. Note that we skip the first 50 iter times as they may contain the device warmup time.
Here we compare our MMAction2 repo with other video understanding toolboxes in the same data and model settings by the training time per iteration. Here, we use
- commit id 7f3490d(1/5/2020) of MMAction
- commit id 8d53d6f(5/5/2020) of Temporal-Shift-Module
- commit id 8299c98(7/7/2020) of PySlowFast
- commit id f13707f(12/12/2018) of BSN(boundary sensitive network)
- commit id 45d0514(17/10/2019) of BMN(boundary matching network)
To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. The rawframe dataset we used is generated by the data preparation tools, the video dataset we used is a special version of resized video cache called '256p dense-encoded video', featuring a faster decoding speed which is generated by the scripts here. Significant improvement can be observed when comparing with normal 256p videos as shown in the table below, especially when the sampling is sparse(like TSN).
For each model setting, we kept the same data preprocessing methods to make sure the same feature input. In addition, we also used Memcached, a distributed cached system, to load the data for the same IO time except for fair comparisons with Pyslowfast which uses raw videos directly from disk by default.
We provide the training log based on which we calculate the average iter time, with the actual setting logged inside, feel free to verify it and fire an issue if something does not make sense.
Model | input | io backend | batch size x gpus | MMAction2 (s/iter) | GPU mem(GB) | MMAction (s/iter) | GPU mem(GB) | Temporal-Shift-Module (s/iter) | GPU mem(GB) | PySlowFast (s/iter) | GPU mem(GB) |
---|---|---|---|---|---|---|---|---|---|---|---|
TSN | 256p rawframes | Memcached | 32x8 | 0.32 | 8.1 | 0.38 | 8.1 | 0.42 | 10.5 | x | x |
TSN | 256p videos | Disk | 32x8 | 1.42 | 8.1 | x | x | x | x | TODO | TODO |
TSN | 256p dense-encoded video | Disk | 32x8 | 0.61 | 8.1 | x | x | x | x | TODO | TODO |
I3D heavy | 256p videos | Disk | 8x8 | 0.34 | 4.6 | x | x | x | x | 0.44 | 4.6 |
I3D heavy | 256p dense-encoded video | Disk | 8x8 | 0.35 | 4.6 | x | x | x | x | 0.36 | 4.6 |
I3D | 256p rawframes | Memcached | 8x8 | 0.43 | 5.0 | 0.56 | 5.0 | x | x | x | x |
TSM | 256p rawframes | Memcached | 8x8 | 0.31 | 6.9 | x | x | 0.41 | 9.1 | x | x |
Slowonly | 256p videos | Disk | 8x8 | 0.32 | 3.1 | TODO | TODO | x | x | 0.34 | 3.4 |
Slowonly | 256p dense-encoded video | Disk | 8x8 | 0.25 | 3.1 | TODO | TODO | x | x | 0.28 | 3.4 |
Slowfast | 256p videos | Disk | 8x8 | 0.69 | 6.1 | x | x | x | x | 1.04 | 7.0 |
Slowfast | 256p dense-encoded video | Disk | 8x8 | 0.68 | 6.1 | x | x | x | x | 0.96 | 7.0 |
R(2+1)D | 256p videos | Disk | 8x8 | 0.45 | 5.1 | x | x | x | x | x | x |
R(2+1)D | 256p dense-encoded video | Disk | 8x8 | 0.44 | 5.1 | x | x | x | x | x | x |
Model | MMAction2 (s/iter) | BSN(boundary sensitive network) (s/iter) | BMN(boundary matching network) (s/iter) |
---|---|---|---|
BSN (TEM + PEM + PGM) | 0.074(TEM)+0.040(PEM) | 0.101(TEM)+0.040(PEM) | x |
BMN (bmn_400x100_2x8_9e_activitynet_feature) | 3.27 | x | 3.30 |
- MMAction2
# rawframes
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_rawframes
# videos
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_video_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_video
- MMAction
python -u tools/train_recognizer.py configs/TSN/tsn_kinetics400_2d_rgb_r50_seg3_f1s1.py
- Temporal-Shift-Module
python main.py kinetics RGB --arch resnet50 --num_segments 3 --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 1 --batch-size 256 -j 32 --dropout 0.5 --consensus_type=avg --eval-freq=10 --npb --print-freq 1
- MMAction2
# rawframes
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_rawframes
# videos
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_video
- MMAction
python -u tools/train_recognizer.py configs/I3D_RGB/i3d_kinetics400_3d_rgb_r50_c3d_inflate3x1x1_seg1_f32s2.py
- PySlowFast
python tools/run_net.py --cfg configs/Kinetics/I3D_8x8_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_i3d_r50_8x8_video.log
You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'.
- MMAction2
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowfast configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowfast_video
- PySlowFast
python tools/run_net.py --cfg configs/Kinetics/SLOWFAST_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowfast_r50_4x16_video.log
You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'.
- MMAction2
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowonly configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowonly_video
- PySlowFast
python tools/run_net.py --cfg configs/Kinetics/SLOW_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowonly_r50_4x16_video.log
You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'.
- MMAction2
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_r2plus1d configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py --work-dir work_dirs/benchmark_r2plus1d_video