-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A mismatch in shufflenetv1 about ReLU #53
Comments
The code is exactly the same with the paper (channel shuffle and relu). Please note that relu+avgpool+relu=relu+avgpool. |
nice work !!! Let me briefly describe your implementation there are three Stage in ShuffleNetV1, the resolution downsampling operation is performed in the first layer of each stage (use AvgPool) for the first stage, upstream operation implementation is Conv2d -> BN -> ReLU -> MaxPool, so there is no need to implement ReLU for identity map for following stage, upstream operation implementation is Block like this
so ReLU + AvgPool = ReLU + AvgPool + Relu, so also no need to implement extra activation for identity map |
There is a similar trick in shufflenetv2 implementation here. Paper describes block like this when stride=1, separating feature maps using |
so the last ShuffleV2Block's |
The last block is followed by a fully connected layer therefore the additional channel shuffle has no effect and can be omitted. |
nice!!! |
hi @nmaac @megvii-model , in ShuffleNetV1, there are two different modifications compared paper with this
one is for channel shuffle before/after dwconv, this is fixed in #16 #40
two is the usage of relu in shufflenetv1_unit.py here
when stride=2, there is different with paper description, and this also different with our common usage. I want to know this is better design or just a mistake. Look forward to your reply
The text was updated successfully, but these errors were encountered: