Skip to content

Question on Default init_values in SwinTransformerV2CrBlock #2233

Discussion options

You must be logged in to vote

@zhaohm14 norm is at the end of the residual path, so the norm's weight is the last scaling layer before merging with shortcut, therefore, it's similar to layer-scale, skip-init, and resnet zero-init-bn which all scale the residual by a single scalar or one-per-channel and typically start with 0 to very small value.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@zhaohm14
Comment options

@rwightman
Comment options

Answer selected by zhaohm14
@zhaohm14
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants