Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Residual Units question #1258

Closed
kissievendor opened this issue Nov 18, 2020 · 13 comments
Closed

Residual Units question #1258

kissievendor opened this issue Nov 18, 2020 · 13 comments
Assignees
Labels
question Further information is requested

Comments

@kissievendor
Copy link

kissievendor commented Nov 18, 2020

I don't completely understand the Residual Units in the UNet.

In my understanding, the Residual Units are used in place of Skip Connections.
My network works fine without the Residual Units. But my question is, how do I report the network architecture without the Residual Units?

Or do the Residual Units serve another purpose?

Also... Without the Residual Units, the network seems to do only one convolution per layers.
(See comment below)

Thank you so much!

Best,
Kirsten

@Nic-Ma Nic-Ma added the question Further information is requested label Nov 19, 2020
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Nov 19, 2020

Hi @ericspod ,

I thin this is a FAQ, could you please help share some info?
Thanks in advance.

@kissievendor
Copy link
Author

kissievendor commented Nov 19, 2020

Related to this question I think:
How it is that the number of parameters is so low in this network?
One would think that Strided convolution would add to parameters in contrast with maxpooling.

This is my network + summary:
image
image

@ericspod
Copy link
Member

ericspod commented Nov 19, 2020

Residual units are used to define the layers of the encode and decode path of the network, skip connects are still present and are not substituted in any way. The classical UNet implements each layer as a sequence of convolutions followed by maxpooling in the encode side, and upsampling followed by convolutions in the decode side. With out network with num_res_units set to some value great than 0 the encode layers are residual units of however many convolutions num_res_units is set to with the first having the given stride (2 in your case). In the decode side the convolution is transposed so the upsampling is done without adding new parameters since it replaces what would otherwise be a convolution. Here a very similar network is used with 2 residual convolutions and is illustrated in Figure 1. Also I answered a similar question here.

@kissievendor
Copy link
Author

Residual units are used to define the layers of the encode and decode path of the network, skip connects are still present and are not substituted in any way. The classical UNet implements each layer as a sequence of convolutions followed by maxpooling in the encode side, and upsampling followed by convolutions in the decode side. With out network with num_res_units set to some value great than 0 the encode layers are residual units of however many convolutions num_res_units is set to with the first having the given stride (2 in your case). In the decode side the convolution is transposed so the upsampling is done without adding new parameters since it replaces what would otherwise be a convolution. Here a very similar network is used with 2 residual convolutions and is illustrated in Figure 1. Also I answered a similar question here.

Thank you for your quick and clear response.

But as you can see from the summary, when num_res_units is 0, the network only has one convolution before the number of features is doubled. Whereas with the residual units there are two plus that unit.
And also other networks that work with strided convolutions (e.g. Vnet) there are succesive convolutions.

So is this intentional / does it still constitute as a 3D-Unet?

@ericspod
Copy link
Member

With num_res_units as 0 then the decode layers will have only the transpose convolution, followed by normalization/dropout/activation layers except in the topmost layer which has none of these. Looking at the network UNet(2,1,1,(2,4,8),(2,2)) this prints:

UNet(
  (model): Sequential(
    (0): Convolution(
      (conv): Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (adn): ADN(
        (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (D): Dropout(p=0.0, inplace=False)
        (A): PReLU(num_parameters=1)
      )
    )
    (1): SkipConnection(
      (submodule): Sequential(
        (0): Convolution(
          (conv): Conv2d(2, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
        (1): SkipConnection(
          (submodule): Convolution(
            (conv): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (adn): ADN(
              (N): InstanceNorm2d(8, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (D): Dropout(p=0.0, inplace=False)
              (A): PReLU(num_parameters=1)
            )
          )
        )
        (2): Convolution(
          (conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
      )
    )
    (2): Convolution(
      (conv): ConvTranspose2d(4, 1, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    )
  )
)

The second-last decode layer is

(conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(adn): ADN(
  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (D): Dropout(p=0.0, inplace=False)
  (A): PReLU(num_parameters=1)
)

while the final one is just a transpose convolution layer.

@kissievendor
Copy link
Author

Okay. So without the residual units the network is not equal to a unet?
Because it has only a single convolution per layer?

@ericspod
Copy link
Member

Structurally it's the same in that it's an autoencoder architecture with skip connections, but yes is not completely equivalent to the original definition.

@kissievendor
Copy link
Author

Okay, thank you.
Do you perhaps know a paper I could cite for this network (no residual units)?

I would use BasicUnit instead, but I don't think it is working yet.

In any case thank you for the explanation.

@ericspod
Copy link
Member

Please cite the paper in the UNet docstring, it uses residual units but we don't have any reference for one that doesn't.

@kissievendor
Copy link
Author

And what is then the difference between 1 and 2 residual units?
Is 2 the number to get the 'normal' network they use in the paper in the docstring?

Because then you have three convolutions per layer I think.

@ericspod
Copy link
Member

The num_res_units value states how many sets of convolution/normalization/dropout/activation layers each residual unit has. In the paper this is 2.

@kissievendor
Copy link
Author

kissievendor commented Nov 20, 2020

But isn't there one too many Convolutions per layer? As can be seen in this summary. This summary is different from print(model), but I just want to make sure it is right.
image

@ericspod
Copy link
Member

Here is the output for UNet(2,1,1,(2,4,8),(2,2),num_res_units=2):

UNet(
  (model): Sequential(
    (0): ResidualUnit(
      (conv): Sequential(
        (unit0): Convolution(
          (conv): Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
        (unit1): Convolution(
          (conv): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
      )
      (residual): Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    )
    (1): SkipConnection(
      (submodule): Sequential(
        (0): ResidualUnit(
          (conv): Sequential(
            (unit0): Convolution(
              (conv): Conv2d(2, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
              (adn): ADN(
                (N): InstanceNorm2d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                (D): Dropout(p=0.0, inplace=False)
                (A): PReLU(num_parameters=1)
              )
            )
            (unit1): Convolution(
              (conv): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (adn): ADN(
                (N): InstanceNorm2d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                (D): Dropout(p=0.0, inplace=False)
                (A): PReLU(num_parameters=1)
              )
            )
          )
          (residual): Conv2d(2, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        )
        (1): SkipConnection(
          (submodule): ResidualUnit(
            (conv): Sequential(
              (unit0): Convolution(
                (conv): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (adn): ADN(
                  (N): InstanceNorm2d(8, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                  (D): Dropout(p=0.0, inplace=False)
                  (A): PReLU(num_parameters=1)
                )
              )
              (unit1): Convolution(
                (conv): Conv2d(8, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (adn): ADN(
                  (N): InstanceNorm2d(8, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                  (D): Dropout(p=0.0, inplace=False)
                  (A): PReLU(num_parameters=1)
                )
              )
            )
            (residual): Conv2d(4, 8, kernel_size=(1, 1), stride=(1, 1))
          )
        )
        (2): Sequential(
          (0): Convolution(
            (conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
            (adn): ADN(
              (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (D): Dropout(p=0.0, inplace=False)
              (A): PReLU(num_parameters=1)
            )
          )
          (1): ResidualUnit(
            (conv): Sequential(
              (unit0): Convolution(
                (conv): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (adn): ADN(
                  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                  (D): Dropout(p=0.0, inplace=False)
                  (A): PReLU(num_parameters=1)
                )
              )
            )
            (residual): Identity()
          )
        )
      )
    )
    (2): Sequential(
      (0): Convolution(
        (conv): ConvTranspose2d(4, 1, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
        (adn): ADN(
          (N): InstanceNorm2d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (D): Dropout(p=0.0, inplace=False)
          (A): PReLU(num_parameters=1)
        )
      )
      (1): ResidualUnit(
        (conv): Sequential(
          (unit0): Convolution(
            (conv): Conv2d(1, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          )
        )
        (residual): Identity()
      )
    )
  )
)

The second-last decode layer is

(0): Convolution(
	(conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
	(adn): ADN(
	  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
	  (D): Dropout(p=0.0, inplace=False)
	  (A): PReLU(num_parameters=1)
	)
  )
  (1): ResidualUnit(
	(conv): Sequential(
	  (unit0): Convolution(
		(conv): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
		(adn): ADN(
		  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
		  (D): Dropout(p=0.0, inplace=False)
		  (A): PReLU(num_parameters=1)
		)
	  )
	)
	(residual): Identity()
  )

Here we have the transpose convolution followed by a residual unit with one convolution. I guess this does vary from what I'd said in that the transpose convolution is outside the residual connection, which is needed since the input needs to be upsampled, but each decode layer does have two convolutions only as per the num_res_units value.

@wyli wyli closed this as completed Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants