Residual Units question #1258

kissievendor · 2020-11-18T09:29:14Z

I don't completely understand the Residual Units in the UNet.

In my understanding, the Residual Units are used in place of Skip Connections.
My network works fine without the Residual Units. But my question is, how do I report the network architecture without the Residual Units?

Or do the Residual Units serve another purpose?

Also... Without the Residual Units, the network seems to do only one convolution per layers.
(See comment below)

Thank you so much!

Best,
Kirsten

Nic-Ma · 2020-11-19T02:26:16Z

Hi @ericspod ,

I thin this is a FAQ, could you please help share some info?
Thanks in advance.

kissievendor · 2020-11-19T10:21:07Z

Related to this question I think:
How it is that the number of parameters is so low in this network?
One would think that Strided convolution would add to parameters in contrast with maxpooling.

This is my network + summary:

ericspod · 2020-11-19T19:12:19Z

Residual units are used to define the layers of the encode and decode path of the network, skip connects are still present and are not substituted in any way. The classical UNet implements each layer as a sequence of convolutions followed by maxpooling in the encode side, and upsampling followed by convolutions in the decode side. With out network with num_res_units set to some value great than 0 the encode layers are residual units of however many convolutions num_res_units is set to with the first having the given stride (2 in your case). In the decode side the convolution is transposed so the upsampling is done without adding new parameters since it replaces what would otherwise be a convolution. Here a very similar network is used with 2 residual convolutions and is illustrated in Figure 1. Also I answered a similar question here.

kissievendor · 2020-11-19T19:31:18Z

Residual units are used to define the layers of the encode and decode path of the network, skip connects are still present and are not substituted in any way. The classical UNet implements each layer as a sequence of convolutions followed by maxpooling in the encode side, and upsampling followed by convolutions in the decode side. With out network with num_res_units set to some value great than 0 the encode layers are residual units of however many convolutions num_res_units is set to with the first having the given stride (2 in your case). In the decode side the convolution is transposed so the upsampling is done without adding new parameters since it replaces what would otherwise be a convolution. Here a very similar network is used with 2 residual convolutions and is illustrated in Figure 1. Also I answered a similar question here.

Thank you for your quick and clear response.

But as you can see from the summary, when num_res_units is 0, the network only has one convolution before the number of features is doubled. Whereas with the residual units there are two plus that unit.
And also other networks that work with strided convolutions (e.g. Vnet) there are succesive convolutions.

So is this intentional / does it still constitute as a 3D-Unet?

ericspod · 2020-11-19T19:37:06Z

With num_res_units as 0 then the decode layers will have only the transpose convolution, followed by normalization/dropout/activation layers except in the topmost layer which has none of these. Looking at the network UNet(2,1,1,(2,4,8),(2,2)) this prints:

UNet(
  (model): Sequential(
    (0): Convolution(
      (conv): Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (adn): ADN(
        (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (D): Dropout(p=0.0, inplace=False)
        (A): PReLU(num_parameters=1)
      )
    )
    (1): SkipConnection(
      (submodule): Sequential(
        (0): Convolution(
          (conv): Conv2d(2, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
        (1): SkipConnection(
          (submodule): Convolution(
            (conv): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (adn): ADN(
              (N): InstanceNorm2d(8, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (D): Dropout(p=0.0, inplace=False)
              (A): PReLU(num_parameters=1)
            )
          )
        )
        (2): Convolution(
          (conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
      )
    )
    (2): Convolution(
      (conv): ConvTranspose2d(4, 1, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    )
  )
)

The second-last decode layer is

(conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(adn): ADN(
  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (D): Dropout(p=0.0, inplace=False)
  (A): PReLU(num_parameters=1)
)

while the final one is just a transpose convolution layer.

kissievendor · 2020-11-19T19:40:52Z

Okay. So without the residual units the network is not equal to a unet?
Because it has only a single convolution per layer?

ericspod · 2020-11-19T20:00:50Z

Structurally it's the same in that it's an autoencoder architecture with skip connections, but yes is not completely equivalent to the original definition.

kissievendor · 2020-11-19T20:05:37Z

Okay, thank you.
Do you perhaps know a paper I could cite for this network (no residual units)?

I would use BasicUnit instead, but I don't think it is working yet.

In any case thank you for the explanation.

ericspod · 2020-11-19T20:18:51Z

Please cite the paper in the UNet docstring, it uses residual units but we don't have any reference for one that doesn't.

kissievendor · 2020-11-20T10:40:28Z

And what is then the difference between 1 and 2 residual units?
Is 2 the number to get the 'normal' network they use in the paper in the docstring?

Because then you have three convolutions per layer I think.

ericspod · 2020-11-20T14:49:11Z

The num_res_units value states how many sets of convolution/normalization/dropout/activation layers each residual unit has. In the paper this is 2.

kissievendor · 2020-11-20T15:48:59Z

But isn't there one too many Convolutions per layer? As can be seen in this summary. This summary is different from print(model), but I just want to make sure it is right.

ericspod · 2020-11-23T17:35:50Z

Here is the output for UNet(2,1,1,(2,4,8),(2,2),num_res_units=2):

UNet(
  (model): Sequential(
    (0): ResidualUnit(
      (conv): Sequential(
        (unit0): Convolution(
          (conv): Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
        (unit1): Convolution(
          (conv): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (adn): ADN(
            (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
      )
      (residual): Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    )
    (1): SkipConnection(
      (submodule): Sequential(
        (0): ResidualUnit(
          (conv): Sequential(
            (unit0): Convolution(
              (conv): Conv2d(2, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
              (adn): ADN(
                (N): InstanceNorm2d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                (D): Dropout(p=0.0, inplace=False)
                (A): PReLU(num_parameters=1)
              )
            )
            (unit1): Convolution(
              (conv): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (adn): ADN(
                (N): InstanceNorm2d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                (D): Dropout(p=0.0, inplace=False)
                (A): PReLU(num_parameters=1)
              )
            )
          )
          (residual): Conv2d(2, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        )
        (1): SkipConnection(
          (submodule): ResidualUnit(
            (conv): Sequential(
              (unit0): Convolution(
                (conv): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (adn): ADN(
                  (N): InstanceNorm2d(8, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                  (D): Dropout(p=0.0, inplace=False)
                  (A): PReLU(num_parameters=1)
                )
              )
              (unit1): Convolution(
                (conv): Conv2d(8, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (adn): ADN(
                  (N): InstanceNorm2d(8, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                  (D): Dropout(p=0.0, inplace=False)
                  (A): PReLU(num_parameters=1)
                )
              )
            )
            (residual): Conv2d(4, 8, kernel_size=(1, 1), stride=(1, 1))
          )
        )
        (2): Sequential(
          (0): Convolution(
            (conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
            (adn): ADN(
              (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (D): Dropout(p=0.0, inplace=False)
              (A): PReLU(num_parameters=1)
            )
          )
          (1): ResidualUnit(
            (conv): Sequential(
              (unit0): Convolution(
                (conv): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (adn): ADN(
                  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
                  (D): Dropout(p=0.0, inplace=False)
                  (A): PReLU(num_parameters=1)
                )
              )
            )
            (residual): Identity()
          )
        )
      )
    )
    (2): Sequential(
      (0): Convolution(
        (conv): ConvTranspose2d(4, 1, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
        (adn): ADN(
          (N): InstanceNorm2d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (D): Dropout(p=0.0, inplace=False)
          (A): PReLU(num_parameters=1)
        )
      )
      (1): ResidualUnit(
        (conv): Sequential(
          (unit0): Convolution(
            (conv): Conv2d(1, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          )
        )
        (residual): Identity()
      )
    )
  )
)

The second-last decode layer is

(0): Convolution(
	(conv): ConvTranspose2d(12, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
	(adn): ADN(
	  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
	  (D): Dropout(p=0.0, inplace=False)
	  (A): PReLU(num_parameters=1)
	)
  )
  (1): ResidualUnit(
	(conv): Sequential(
	  (unit0): Convolution(
		(conv): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
		(adn): ADN(
		  (N): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
		  (D): Dropout(p=0.0, inplace=False)
		  (A): PReLU(num_parameters=1)
		)
	  )
	)
	(residual): Identity()
  )

Here we have the transpose convolution followed by a residual unit with one convolution. I guess this does vary from what I'd said in that the transpose convolution is outside the residual connection, which is needed since the input needs to be upsampled, but each decode layer does have two convolutions only as per the num_res_units value.

Nic-Ma assigned ericspod Nov 19, 2020

Nic-Ma added the question Further information is requested label Nov 19, 2020

wyli closed this as completed Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Residual Units question #1258

Residual Units question #1258

kissievendor commented Nov 18, 2020 •

edited

Loading

Nic-Ma commented Nov 19, 2020

kissievendor commented Nov 19, 2020 •

edited

Loading

ericspod commented Nov 19, 2020 •

edited

Loading

kissievendor commented Nov 19, 2020

ericspod commented Nov 19, 2020

kissievendor commented Nov 19, 2020

ericspod commented Nov 19, 2020

kissievendor commented Nov 19, 2020

ericspod commented Nov 19, 2020

kissievendor commented Nov 20, 2020

ericspod commented Nov 20, 2020

kissievendor commented Nov 20, 2020 •

edited

Loading

ericspod commented Nov 23, 2020

Residual Units question #1258

Residual Units question #1258

Comments

kissievendor commented Nov 18, 2020 • edited Loading

Nic-Ma commented Nov 19, 2020

kissievendor commented Nov 19, 2020 • edited Loading

ericspod commented Nov 19, 2020 • edited Loading

kissievendor commented Nov 19, 2020

ericspod commented Nov 19, 2020

kissievendor commented Nov 19, 2020

ericspod commented Nov 19, 2020

kissievendor commented Nov 19, 2020

ericspod commented Nov 19, 2020

kissievendor commented Nov 20, 2020

ericspod commented Nov 20, 2020

kissievendor commented Nov 20, 2020 • edited Loading

ericspod commented Nov 23, 2020

kissievendor commented Nov 18, 2020 •

edited

Loading

kissievendor commented Nov 19, 2020 •

edited

Loading

ericspod commented Nov 19, 2020 •

edited

Loading

kissievendor commented Nov 20, 2020 •

edited

Loading