Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A mismatch in shufflenetv1 about ReLU #53

Open
zjykzj opened this issue Jun 3, 2021 · 6 comments
Open

A mismatch in shufflenetv1 about ReLU #53

zjykzj opened this issue Jun 3, 2021 · 6 comments

Comments

@zjykzj
Copy link

zjykzj commented Jun 3, 2021

hi @nmaac @megvii-model , in ShuffleNetV1, there are two different modifications compared paper with this

one is for channel shuffle before/after dwconv, this is fixed in #16 #40

two is the usage of relu in shufflenetv1_unit.py here

        if self.stride == 1:
            return F.relu(x + x_proj)
        elif self.stride == 2:
            return torch.cat((self.branch_proj(x_proj), F.relu(x)), 1)

when stride=2, there is different with paper description, and this also different with our common usage. I want to know this is better design or just a mistake. Look forward to your reply

@nmaac
Copy link
Collaborator

nmaac commented Jun 3, 2021

The code is exactly the same with the paper (channel shuffle and relu). Please note that relu+avgpool+relu=relu+avgpool.

@zjykzj
Copy link
Author

zjykzj commented Jun 3, 2021

The code is exactly the same with the paper (channel shuffle and relu). Please note that relu+avgpool+relu=relu+avgpool.

nice work !!! Let me briefly describe your implementation

there are three Stage in ShuffleNetV1, the resolution downsampling operation is performed in the first layer of each stage (use AvgPool)

for the first stage, upstream operation implementation is Conv2d -> BN -> ReLU -> MaxPool, so there is no need to implement ReLU for identity map

for following stage, upstream operation implementation is Block like this

        if self.stride == 1:
            return F.relu(x + x_proj)

so ReLU + AvgPool = ReLU + AvgPool + Relu, so also no need to implement extra activation for identity map

@zjykzj
Copy link
Author

zjykzj commented Jun 3, 2021

There is a similar trick in shufflenetv2 implementation here. Paper describes block like this

figure-3

when stride=1, separating feature maps using channel split operations; this can be done by channel shuffle operation

@zjykzj
Copy link
Author

zjykzj commented Jun 3, 2021

so the last ShuffleV2Block's channel shuffle operation is ignored, right ?? @nmaac

@nmaac
Copy link
Collaborator

nmaac commented Jun 3, 2021

The last block is followed by a fully connected layer therefore the additional channel shuffle has no effect and can be omitted.

@zjykzj
Copy link
Author

zjykzj commented Jun 4, 2021

The last block is followed by a fully connected layer therefore the additional channel shuffle has no effect and can be omitted.

nice!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants