Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about U-net architecture for segmentation #464

Closed
joriswuts opened this issue Jun 1, 2020 · 10 comments
Closed

Question about U-net architecture for segmentation #464

joriswuts opened this issue Jun 1, 2020 · 10 comments
Labels
question Further information is requested

Comments

@joriswuts
Copy link

Hi,

I am a master student biomedical engineering at the University of Ghent. I have been using MONAI for my master thesis the last couple of months which is about the segmentation of bone lesions from multi-modal whole body MRI. When comparing the model architecture of MONAI with most U-nets that are described in literature I noticed that the MONAI U-net used strided convolutions instead of max-pooling for downsampling. I was wondering why that is and if there is are any reference papers suggesting to do this.

Kind regards,
Joris Wuts

@Nic-Ma Nic-Ma added the question Further information is requested label Jun 1, 2020
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jun 1, 2020

Hi @ericspod ,

Could you please help provide more information here as you initialized the UNet code?
Thanks.

@joriswuts
Copy link
Author

Hi,

I initialized the UNet with this code:
monai.networks.nets. model = UNet(dimensions=3, in_channels=3, out_channels=2, channels=(16, 32, 64,128,256), strides=(2, 2,2,2), num_res_units=2, norm=Norm.BATCH).to(device)
The model works great for the purpose I am using it but for the report of my master's dissertation I wanted to give an in depth visualisation of the model.

When visualising the computational graph in Tensorboard I noticed there are no max pooling layers and that the downsampling is done by the strided convolutions.

Also on the upsampling part I am a bit confused as to where the residual layers are and how they connect. It seems to me that is it kind a similar to the model below. Is there a schematic visualisation of the model somewhere? That would really help me to understand the model better.
Thanks in advance,

Joris Wuts
( https://raw.githubusercontent.com/mattmacy/vnet.pytorch/master/images/diagram.png)

@ericspod
Copy link
Member

ericspod commented Jun 1, 2020

Hi @joriswuts,
It might be helpful to print the network to stdout, it prints a tree structure which is a bit hard to read at first but does show the structure of the UNet correctly. Upsample layers are given as:

(2): Sequential(
  (conv): Convolution(
	(conv): ConvTranspose3d(64, 16, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1), output_padding=(1, 1, 1))
	(norm): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
	(act): PReLU(num_parameters=1)
  )
  (resunit): ResidualUnit(
	(conv): Sequential(
	  (unit0): Convolution(
		(conv): Conv3d(16, 16, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
		(norm): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
		(act): PReLU(num_parameters=1)
	  )
	)
	(residual): Identity()
  )
)

What this boils down to is a sequence of layers comprised of transpose convolution, normalization, activation, followed by a residual unit with a single sequence of convolution-normalization-activation layers.

A way of thinking about UNet is that it is a tower of large layers containing one level of the encode and decode paths with the skip connection between them. You can then describe the structure of the layer and then the whole network as a stack of these. I've attached a diagram of this I've used in other papers, it's very similar to your configuration other than the use of instance norm.

unet

@joriswuts
Copy link
Author

@ericspod
Thanks a lot for this information. I was just searching your other papers and found a clarifying explanation about the usage of strided convolutions instead of max pooling layers.

Kind regards,
Joris Wuts

@wyli wyli closed this as completed Jun 1, 2020
@joriswuts
Copy link
Author

@ericspod, Just to be completely sure. Could it be that the residual connection in my model have a stride one convolution in them instead of a simple identity operation. I added an image of the first downsampling block in tensorboard. I noticed that a stride 1 convolution in the residual layer is also proposed in several papers that got referenced in the documents of DLTK and MONAI. (e.g. https://arxiv.org/pdf/1603.05027.pdf)
If I interpret the code of MONAI correctly the convolution is also in the residual part of the upsampling path.
Schermafbeelding 2020-06-01 om 21 24 03

@ericspod
Copy link
Member

ericspod commented Jun 2, 2020

Residual units are implemented in UNet with the ResidualUnit class, the residual part uses a convolution to change the input dimensions to match the output dimensions if this is necessary but will use nn.Identity if not. This way you don't have a needless convolutional layer in the residual path. In the upsample stream of the network the residual unit will come after the transpose convolution so it won't need to do this. In the downsampling part of the network the residual path will have a convolution since the number of channels and dimensions do change.

@AnnaKlemen
Copy link

@ericspod Is it possible to find the see the settings used in the references paper (Left-Ventricle Quantification Using Residual U-Net) e.g. optimizer, learning rate, weight decay, learning rate scheduler ect. ? :)
I have tried with the setting from the "Road Extraction by Deep Residual U-Net"-paper, but it really does not work on my problem. (SGD, lr start: 0.001, reduced by 0.1 every 20 epochs)

@ericspod
Copy link
Member

I used Adam with a learning rate of 0.001 with default values otherwise. I didn't do learning rate scheduling in this paper.

@AnnaKlemen
Copy link

I used Adam with a learning rate of 0.001 with default values otherwise. I didn't do learning rate scheduling in this paper.

Thanks, that is also what I have most luck with using :D

@BK0x7C8
Copy link

BK0x7C8 commented Aug 31, 2023

Hi,

I am a master student biomedical engineering at the University of Ghent. I have been using MONAI for my master thesis the last couple of months which is about the segmentation of bone lesions from multi-modal whole body MRI. When comparing the model architecture of MONAI with most U-nets that are described in literature I noticed that the MONAI U-net used strided convolutions instead of max-pooling for downsampling. I was wondering why that is and if there is are any reference papers suggesting to do this.

Kind regards, Joris Wuts

Hey there,

I am currently in the same spot as you, writing a thesis for my biomed. engineering studies. I am using the MONAI U-Net, too, and yes, it seems, the max pooling operations are replaced by strided convolutions. I cam to the same conclusion by printing out the model sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants