Added a flag to allow skipping the first projection in small ResNets #11176
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This should fix #10583
In this issue, I described how the small ResNets implemented here have an extra convolution (for projection), that is neither present in the original paper or in PyTorch.
This PR allows to get rid of this extra convolution with a keyword argument that defaults to the previous behaviour so that this remains a non-breaking change.
This PR created from #10584 as it is having problem to merge in codebase with manual copybara sync.
Type of change
For a new feature or function, please create an issue first to discuss it
with us before submitting a pull request.
Note:I didn't wait for a discussion because this seemed like a relatively small and simple change,
so it's not bothering for me to have written it even if rejected in the end.
Tests
I added tests to check the parameter count.
I also offline checked the number of non-trainable parameters to checked that it matched against PyTorch's one.
Checklist