You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think a batch normalization is missing after the last residual block and before the pooling/fully connected layer in the current pre-activation resnet implementation.
There should be something like out = F.relu(self.bn(out), inplace=True) inserted there, where self.bn = nn.BatchNorm2d(512) should be define in the __init__ function, as done in other public implementations such as this one.
The current Pre-activation resnet cannot converge on CIFAR10. I tried SGD optimizer with initial learning rate 0.1 on pre-act resnet18. The loss soon goes to NaN after the first several iterations.
Fixing the codes in my proposed way leads to good convergence.
:-)
Thanks,
Haotao
The text was updated successfully, but these errors were encountered:
moyix
added a commit
to moyix/pytorch-cifar
that referenced
this issue
Dec 8, 2021
It's maybe due to the fact that there are two consequent batch normalization layers.
You already have a batch normalization layer (self.bn1) after self.conv1. While in self.layer1, the self._make_layer() creates another batch normalization layer after self.bn1. Two consequent batch normalization layers easily lead NaN.
Hi Kuang,
I think a batch normalization is missing after the last residual block and before the pooling/fully connected layer in the current pre-activation resnet implementation.
There should be something like
out = F.relu(self.bn(out), inplace=True)
inserted there, whereself.bn = nn.BatchNorm2d(512)
should be define in the__init__
function, as done in other public implementations such as this one.The current Pre-activation resnet cannot converge on CIFAR10. I tried SGD optimizer with initial learning rate 0.1 on pre-act resnet18. The loss soon goes to NaN after the first several iterations.
Fixing the codes in my proposed way leads to good convergence.
:-)
Thanks,
Haotao
The text was updated successfully, but these errors were encountered: