Question about the `use_conv` parameter in timm.layers.mlp.Mlp #2232

zhaohm14 · 2024-07-15T04:29:38Z

zhaohm14
Jul 15, 2024

Hi there,

Thank you for timm! It's been immensely helpful for my projects.

I am currently working on training a model using the Mlp class from timm's repository, specifically the implementation found here: timm.layers.mlp.Mlp. I have some questions regarding the use_conv parameter:

Input Shape Handling:
When use_conv=True, the nn.Conv2d is employed. I am trying to understand how the input x should be formatted. The usual input format for nn.Linear is [N, ..., in_features], but nn.Conv2d expects [N, C, H, W]. Should there be a reshape operation to transform x into [N, C, H, W] format before applying nn.Conv2d? If so, what are the recommended dimensions for C, H, and W?
Comparison between nn.Conv2d(kernel_size=1) and nn.Linear:
Could you elaborate on the functional differences between using nn.Conv2d with kernel_size=1 and a traditional nn.Linear layer? They seem to perform similar operations; how does the choice between them affect the model's performance or parameter efficiency?

Thanks for your time and assistance!

Answered by rwightman

Jul 15, 2024

Keep bugs / features in discussions please.

If use_conv is True, yes it's expected that the input shape is NCHW. Does not matter what HW is. This mode is intended for places where you'd want an MLP in a predominantly convolutional or between convolutional / 2d spatial (avg pool, max pool) layers and you don't want to reshape/permute. If you have N?C then you use the default. I used it in ConvNeXt and MaxViT (

pytorch-image-models/timm/models/convnext.py

Lines 130 to 164 in 3196d6b

     mlp_layer = partial(GlobalResponseNormMlp if use_grn else Mlp, use_conv=conv_mlp)  
   self.use_conv_mlp = conv_mlp  
   self.conv_dw = create_conv2d(  
   in_chs,  
   out_chs,  
   kernel_size=kerne…

View full answer

rwightman · 2024-07-15T05:54:09Z

rwightman
Jul 15, 2024
Maintainer

Keep bugs / features in discussions please.

If use_conv is True, yes it's expected that the input shape is NCHW. Does not matter what HW is. This mode is intended for places where you'd want an MLP in a predominantly convolutional or between convolutional / 2d spatial (avg pool, max pool) layers and you don't want to reshape/permute. If you have N?C then you use the default. I used it in ConvNeXt and MaxViT (

pytorch-image-models/timm/models/convnext.py

Lines 130 to 164 in 3196d6b

    
               mlp_layer = partial(GlobalResponseNormMlp if use_grn else Mlp, use_conv=conv_mlp) 
        
               self.use_conv_mlp = conv_mlp 
        
               self.conv_dw = create_conv2d( 
        
                   in_chs, 
        
                   out_chs, 
        
                   kernel_size=kernel_size, 
        
                   stride=stride, 
        
                   dilation=dilation[0], 
        
                   depthwise=True, 
        
                   bias=conv_bias, 
        
               ) 
        
               self.norm = norm_layer(out_chs) 
        
               self.mlp = mlp_layer(out_chs, int(mlp_ratio * out_chs), act_layer=act_layer) 
        
               self.gamma = nn.Parameter(ls_init_value * torch.ones(out_chs)) if ls_init_value is not None else None 
        
               if in_chs != out_chs or stride != 1 or dilation[0] != dilation[1]: 
        
                   self.shortcut = Downsample(in_chs, out_chs, stride=stride, dilation=dilation[0]) 
        
               else: 
        
                   self.shortcut = nn.Identity() 
        
               self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 
        
           def forward(self, x): 
        
               shortcut = x 
        
               x = self.conv_dw(x) 
        
               if self.use_conv_mlp: 
        
                   x = self.norm(x) 
        
                   x = self.mlp(x) 
        
               else: 
        
                   x = x.permute(0, 2, 3, 1) 
        
                   x = self.norm(x) 
        
                   x = self.mlp(x) 
        
                   x = x.permute(0, 3, 1, 2) 
        
               if self.gamma is not None: 
        
                   x = x.mul(self.gamma.reshape(1, -1, 1, 1)) 
        
               x = self.drop_path(x) + self.shortcut(shortcut)

)

Theoretically there is no difference, practically the kernels used are a bit different so you may find one is faster than the other for certain shapes on certain hardware, etc.

1 reply

zhaohm14 Jul 15, 2024
Author

Understood. Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the `use_conv` parameter in timm.layers.mlp.Mlp #2232

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

	mlp_layer = partial(GlobalResponseNormMlp if use_grn else Mlp, use_conv=conv_mlp)
	self.use_conv_mlp = conv_mlp
	self.conv_dw = create_conv2d(
	in_chs,
	out_chs,
	kernel_size=kerne…

Question about the use_conv parameter in timm.layers.mlp.Mlp #2232

zhaohm14 Jul 15, 2024

Replies: 1 comment · 1 reply

rwightman Jul 15, 2024 Maintainer

zhaohm14 Jul 15, 2024 Author

Question about the `use_conv` parameter in timm.layers.mlp.Mlp #2232

zhaohm14
Jul 15, 2024

Replies: 1 comment 1 reply

rwightman
Jul 15, 2024
Maintainer

zhaohm14 Jul 15, 2024
Author