Question about U-net architecture for segmentation #464

joriswuts · 2020-06-01T09:01:44Z

Hi,

I am a master student biomedical engineering at the University of Ghent. I have been using MONAI for my master thesis the last couple of months which is about the segmentation of bone lesions from multi-modal whole body MRI. When comparing the model architecture of MONAI with most U-nets that are described in literature I noticed that the MONAI U-net used strided convolutions instead of max-pooling for downsampling. I was wondering why that is and if there is are any reference papers suggesting to do this.

Kind regards,
Joris Wuts

Nic-Ma · 2020-06-01T14:33:30Z

Hi @ericspod ,

Could you please help provide more information here as you initialized the UNet code?
Thanks.

joriswuts · 2020-06-01T15:34:44Z

Hi,

I initialized the UNet with this code:
monai.networks.nets. model = UNet(dimensions=3, in_channels=3, out_channels=2, channels=(16, 32, 64,128,256), strides=(2, 2,2,2), num_res_units=2, norm=Norm.BATCH).to(device)
The model works great for the purpose I am using it but for the report of my master's dissertation I wanted to give an in depth visualisation of the model.

When visualising the computational graph in Tensorboard I noticed there are no max pooling layers and that the downsampling is done by the strided convolutions.

Also on the upsampling part I am a bit confused as to where the residual layers are and how they connect. It seems to me that is it kind a similar to the model below. Is there a schematic visualisation of the model somewhere? That would really help me to understand the model better.
Thanks in advance,

Joris Wuts
( https://raw.githubusercontent.com/mattmacy/vnet.pytorch/master/images/diagram.png)

ericspod · 2020-06-01T16:47:09Z

Hi @joriswuts,
It might be helpful to print the network to stdout, it prints a tree structure which is a bit hard to read at first but does show the structure of the UNet correctly. Upsample layers are given as:

(2): Sequential(
  (conv): Convolution(
	(conv): ConvTranspose3d(64, 16, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1), output_padding=(1, 1, 1))
	(norm): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
	(act): PReLU(num_parameters=1)
  )
  (resunit): ResidualUnit(
	(conv): Sequential(
	  (unit0): Convolution(
		(conv): Conv3d(16, 16, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
		(norm): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
		(act): PReLU(num_parameters=1)
	  )
	)
	(residual): Identity()
  )
)

What this boils down to is a sequence of layers comprised of transpose convolution, normalization, activation, followed by a residual unit with a single sequence of convolution-normalization-activation layers.

A way of thinking about UNet is that it is a tower of large layers containing one level of the encode and decode paths with the skip connection between them. You can then describe the structure of the layer and then the whole network as a stack of these. I've attached a diagram of this I've used in other papers, it's very similar to your configuration other than the use of instance norm.

joriswuts · 2020-06-01T17:08:15Z

@ericspod
Thanks a lot for this information. I was just searching your other papers and found a clarifying explanation about the usage of strided convolutions instead of max pooling layers.

Kind regards,
Joris Wuts

joriswuts · 2020-06-01T19:31:37Z

@ericspod, Just to be completely sure. Could it be that the residual connection in my model have a stride one convolution in them instead of a simple identity operation. I added an image of the first downsampling block in tensorboard. I noticed that a stride 1 convolution in the residual layer is also proposed in several papers that got referenced in the documents of DLTK and MONAI. (e.g. https://arxiv.org/pdf/1603.05027.pdf)
If I interpret the code of MONAI correctly the convolution is also in the residual part of the upsampling path.

ericspod · 2020-06-02T17:15:11Z

Residual units are implemented in UNet with the ResidualUnit class, the residual part uses a convolution to change the input dimensions to match the output dimensions if this is necessary but will use nn.Identity if not. This way you don't have a needless convolutional layer in the residual path. In the upsample stream of the network the residual unit will come after the transpose convolution so it won't need to do this. In the downsampling part of the network the residual path will have a convolution since the number of channels and dimensions do change.

AnnaKlemen · 2021-11-17T16:20:13Z

@ericspod Is it possible to find the see the settings used in the references paper (Left-Ventricle Quantification Using Residual U-Net) e.g. optimizer, learning rate, weight decay, learning rate scheduler ect. ? :)
I have tried with the setting from the "Road Extraction by Deep Residual U-Net"-paper, but it really does not work on my problem. (SGD, lr start: 0.001, reduced by 0.1 every 20 epochs)

ericspod · 2021-11-19T00:56:39Z

I used Adam with a learning rate of 0.001 with default values otherwise. I didn't do learning rate scheduling in this paper.

AnnaKlemen · 2021-11-19T11:34:06Z

I used Adam with a learning rate of 0.001 with default values otherwise. I didn't do learning rate scheduling in this paper.

Thanks, that is also what I have most luck with using :D

BK0x7C8 · 2023-08-31T10:20:37Z

Hi,

I am a master student biomedical engineering at the University of Ghent. I have been using MONAI for my master thesis the last couple of months which is about the segmentation of bone lesions from multi-modal whole body MRI. When comparing the model architecture of MONAI with most U-nets that are described in literature I noticed that the MONAI U-net used strided convolutions instead of max-pooling for downsampling. I was wondering why that is and if there is are any reference papers suggesting to do this.

Kind regards, Joris Wuts

Hey there,

I am currently in the same spot as you, writing a thesis for my biomed. engineering studies. I am using the MONAI U-Net, too, and yes, it seems, the max pooling operations are replaced by strided convolutions. I cam to the same conclusion by printing out the model sequences.

Nic-Ma added the question Further information is requested label Jun 1, 2020

wyli closed this as completed Jun 1, 2020

ericspod mentioned this issue Aug 4, 2020

Can you visualize the 3D Unet model in Spleen 3D segmentation example? #846

Closed

ericspod mentioned this issue Sep 3, 2020

Basis UNet #980

Closed

ericspod mentioned this issue Nov 19, 2020

Residual Units question #1258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about U-net architecture for segmentation #464

Question about U-net architecture for segmentation #464

joriswuts commented Jun 1, 2020

Nic-Ma commented Jun 1, 2020

joriswuts commented Jun 1, 2020

ericspod commented Jun 1, 2020

joriswuts commented Jun 1, 2020

joriswuts commented Jun 1, 2020

ericspod commented Jun 2, 2020

AnnaKlemen commented Nov 17, 2021

ericspod commented Nov 19, 2021

AnnaKlemen commented Nov 19, 2021

BK0x7C8 commented Aug 31, 2023

Question about U-net architecture for segmentation #464

Question about U-net architecture for segmentation #464

Comments

joriswuts commented Jun 1, 2020

Nic-Ma commented Jun 1, 2020

joriswuts commented Jun 1, 2020

ericspod commented Jun 1, 2020

joriswuts commented Jun 1, 2020

joriswuts commented Jun 1, 2020

ericspod commented Jun 2, 2020

AnnaKlemen commented Nov 17, 2021

ericspod commented Nov 19, 2021

AnnaKlemen commented Nov 19, 2021

BK0x7C8 commented Aug 31, 2023