A mismatch in shufflenetv1 about ReLU #53

zjykzj · 2021-06-03T09:39:06Z

hi @nmaac @megvii-model , in ShuffleNetV1, there are two different modifications compared paper with this

one is for channel shuffle before/after dwconv, this is fixed in #16 #40

two is the usage of relu in shufflenetv1_unit.py here

        if self.stride == 1:
            return F.relu(x + x_proj)
        elif self.stride == 2:
            return torch.cat((self.branch_proj(x_proj), F.relu(x)), 1)

when stride=2, there is different with paper description, and this also different with our common usage. I want to know this is better design or just a mistake. Look forward to your reply

The text was updated successfully, but these errors were encountered:

nmaac · 2021-06-03T10:07:53Z

The code is exactly the same with the paper (channel shuffle and relu). Please note that relu+avgpool+relu=relu+avgpool.

zjykzj · 2021-06-03T11:09:03Z

The code is exactly the same with the paper (channel shuffle and relu). Please note that relu+avgpool+relu=relu+avgpool.

nice work !!! Let me briefly describe your implementation

there are three Stage in ShuffleNetV1, the resolution downsampling operation is performed in the first layer of each stage (use AvgPool)

for the first stage, upstream operation implementation is Conv2d -> BN -> ReLU -> MaxPool, so there is no need to implement ReLU for identity map

for following stage, upstream operation implementation is Block like this

        if self.stride == 1:
            return F.relu(x + x_proj)

so ReLU + AvgPool = ReLU + AvgPool + Relu, so also no need to implement extra activation for identity map

zjykzj · 2021-06-03T11:31:32Z

There is a similar trick in shufflenetv2 implementation here. Paper describes block like this

when stride=1, separating feature maps using channel split operations; this can be done by channel shuffle operation

zjykzj · 2021-06-03T11:34:27Z

so the last ShuffleV2Block's channel shuffle operation is ignored, right ?? @nmaac

nmaac · 2021-06-03T12:52:37Z

The last block is followed by a fully connected layer therefore the additional channel shuffle has no effect and can be omitted.

zjykzj · 2021-06-04T01:33:13Z

The last block is followed by a fully connected layer therefore the additional channel shuffle has no effect and can be omitted.

nice!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A mismatch in shufflenetv1 about ReLU #53

A mismatch in shufflenetv1 about ReLU #53

zjykzj commented Jun 3, 2021

nmaac commented Jun 3, 2021

zjykzj commented Jun 3, 2021

zjykzj commented Jun 3, 2021 •

edited

Loading

zjykzj commented Jun 3, 2021

nmaac commented Jun 3, 2021

zjykzj commented Jun 4, 2021

A mismatch in shufflenetv1 about ReLU #53

A mismatch in shufflenetv1 about ReLU #53

Comments

zjykzj commented Jun 3, 2021

nmaac commented Jun 3, 2021

zjykzj commented Jun 3, 2021

zjykzj commented Jun 3, 2021 • edited Loading

zjykzj commented Jun 3, 2021

nmaac commented Jun 3, 2021

zjykzj commented Jun 4, 2021

zjykzj commented Jun 3, 2021 •

edited

Loading