Multi gpu training #69

psteinb · 2020-02-03T16:21:21Z

This needs a bit more testing, but I think going multi-gpu is somewhat straight forward. Or did you try that already?

psteinb · 2020-02-11T15:44:40Z

Almost there, apparently there is a problematic interplay of tf and keras:
tensorflow/tensorflow#30728
keras-team/keras#13057
keras-team/keras#13255
I need to check how to fix this.

psteinb · 2020-02-12T15:29:24Z

done implementing multi-gpu training. I hope putting that into the constructor of N2V was the right choice. I also added an example notebook derived from examples/2D/denoising2D_BSD68/BSD68_reproducibility_multi_gpu.ipynb.

I'll supply more extensive numbers later, my current estimate for training n2v from this notebook is:

a single P100 with tf 1.12 and keras 2.2.4: ~93 seconds per epoch after warm-up
double P100s with tf 1.12 and keras 2.2.4: ~56 seconds per epoch after warm-up

I'll provide 4 GPU numbers later. Note that this "improvement" is expected to be non-linear as keras internally does parallize the batches, so a batch size of 128 will be parallelized to 2 batches of 64 images. As discussed earlier this approach is currently not support with tf 1.14 and keras 2.2.{4,5} due to the bugs mentioned above.

Would love to hear your feedback on this.

tibuch · 2020-06-24T08:10:15Z

Thank you for this PR!

I have this on my to-do list, but wasn't able to get my hands on a multi-GPU system. I guess the cluster should work for testing.

Although I am very confident that it just works, I would like to test it as well :)

psteinb · 2020-06-24T08:35:26Z

thanks for having a look. Last time I checked, all GPU configs with >=3 GPUs fail to run due to some problems with the keras data augmentations. Maybe this is leveraged by looking into bringing n2v 100% to tf.keras?

snehashis-roy · 2020-08-19T20:19:12Z

Hi,
I want to use 2 gpus for training. As explained in the notebook, I used the following config,

config = N2VConfig(X_train, unet_kern_size=3, unet_n_depth=3, unet_n_first = 64,
                           train_steps_per_epoch=int(dim[0] / 128), train_epochs=50, train_loss='mse',
                           batch_norm=True, train_num_gpus=2,
                           train_batch_size=64, n2v_perc_pix=1.0, n2v_patch_shape=(128,128),
                           n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5)

I have set CUDA_VISIBLE_DEVICES to 1,2 before running the training. I have used pip install n2v to install N2V. My TF-GPU is 1.14.1, keras 2.2.5, numpy 1.19.1

The training still uses 1 GPU. Please let me know what I am missing.

tibuch · 2020-08-20T06:47:24Z

Hi @piby2,

This functionality is not part of the official N2V release yet.

If you would like to test it you would have to clone the fork psteinb/n2v and checkout the branch multi_gpu_training. Then you can run pip install . from inside the git repo and this version will be installed.

psteinb added 2 commits February 3, 2020 17:18

added multi_gpu option to training

0be67a3

add sentinals for backend later

274dfb9

psteinb added 2 commits February 12, 2020 16:19

multi gpu training works for 2D training

99e4e55

notebook to illustrate use of multiple GPUs

1b0a8d3

psteinb changed the title ~~WIP: Multi gpu training~~ Multi gpu training Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi gpu training #69

Multi gpu training #69

psteinb commented Feb 3, 2020

psteinb commented Feb 11, 2020

psteinb commented Feb 12, 2020

tibuch commented Jun 24, 2020

psteinb commented Jun 24, 2020

snehashis-roy commented Aug 19, 2020

tibuch commented Aug 20, 2020

Multi gpu training #69

Are you sure you want to change the base?

Multi gpu training #69

Conversation

psteinb commented Feb 3, 2020

psteinb commented Feb 11, 2020

psteinb commented Feb 12, 2020

tibuch commented Jun 24, 2020

psteinb commented Jun 24, 2020

snehashis-roy commented Aug 19, 2020

tibuch commented Aug 20, 2020