Investigate Batch size scaling in DDP setups #60

Delaunay · 2023-02-16T17:19:48Z

CI got 2 GPUs so we have full coverage of the planning methods.
Here is the comparison from 1 - 2 GPUs.

vit_l_32 : performance decreases; model too small ? (relies on NVLINK)
resnet152: performance stay the same, but it should scale linearly, the batch size is probably not scaled to use both GPUs

bench                Plan          metric     GPU1     GPU2
-----------------------------------------------------------
hf_t5                         train_rate     2.28     2.22
bert                          train_rate    18.64    17.85
learning_to_paint             train_rate   753.86   802.15
efficientnet_b4               train_rate    27.40    27.23
convnext_large                train_rate     2.11     2.10
ppo                  DDP      train_rate   918.81   834.84
resnet50                      train_rate    38.66    36.31
hf_reformer                   train_rate     3.86     3.71
soft_actor_critic             train_rate 12553.41 12258.97
super_slomo                   train_rate     1.35     1.29
dlrm                          train_rate 37923.96 38091.68
efficientnet_b0               train_rate    89.85    85.08
regnet_y_128gf                train_rate     1.23     1.26
td3                           train_rate 14678.08 13971.88
squeezenet1_1                 train_rate   173.54   163.83
vit_l_32             DDP      train_rate   100.73    51.80
resnet152            DDP      train_rate    87.15    96.54
stargan                       train_rate    26.83    24.76
efficientnet_b7               train_rate    11.41    11.18
speech_transformer            train_rate    35.99    35.87

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Batch size scaling in DDP setups #60

Investigate Batch size scaling in DDP setups #60

Delaunay commented Feb 16, 2023 •

edited

Loading

Investigate Batch size scaling in DDP setups #60

Investigate Batch size scaling in DDP setups #60

Comments

Delaunay commented Feb 16, 2023 • edited Loading

Delaunay commented Feb 16, 2023 •

edited

Loading