This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
Avoiding accidental errors with sanity checks #5053
AkshitaB
started this conversation in
Show and tell
Replies: 1 comment 12 replies
-
The sanity check fails for me when using However, I don't see this argument anywhere? When I search the entire repo for it, I only see it appear in a commit message and this guide: And I don't see it where I would expect (https://github.com/allenai/allennlp/blob/main/allennlp/training/trainer.py) |
Beta Was this translation helpful? Give feedback.
12 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Developing neural models nowadays no longer involves just writing your own models in PyTorch. Often, one leverages existing modules and builds new architectures on top of existing ones. Thus, it is all the more crucial to sanity check your code to avoid accidental issues.
This post describes an architecture-based sanity check for model development in AllenNLP - ensuring that normalization layers are not combined with layers containing bias. We added this tool to the library to make it easier for users to make sure that their models are robust. We also discovered such unintended layer combinations on sanity checking our own models! Read on to find more.
Why should normalization layers not be combined with layers containing bias?
Batch normalization is a technique to normalize the input values in the batch for all layers in the network, in order to ensure that the distributions of inputs remain the same across layers. This helps in speeding up the training as one can use higher learning rates.
To put it simply, BatchNorm standardizes the inputs (transforms it to have a mean of 0 and variance of 1), and then rescales and offsets it (using learned parameters Gamma and Beta). The paper also mentions the following:
That is, the batch normalization layer already contains a bias term. Thus, it is unnecessary to include bias in the layer that precedes it. The bias term, if present, will be unused (as it will not receive any gradients).
Thus, it can be useful in some situations to reduce the extra parameters and computations by removing the bias terms in the preceding layer.
Need for automatically detecting normalization-bias layer combinations
For convolution and linear layers in torch, the bias term can be excluded simply by setting
bias=False
, for instance:linear = torch.nn.Linear(inp, out, bias=False)
However, when developing slightly more complex models that use pre-implemented modules, it can be hard to keep track of the computation graph. One can inadvertently add a normalization layer without excluding the bias term. Thus, it is helpful to have tools that detect such combinations.
Sanity check in AllenNLP:
NormalizationBiasVerification
One can check for normalization-bias combinations during model development in AllenNLP.
Consider the following toy model:
This yields the following output:
How it works
The
NormalizationBiasVerification
registers forward hooks on the model's modules at the time of initialization. The "check" involves running the forward method on sample inputs. The hooks create a computation graph, which is used to determine if any layers containing bias are followed by any normalization layers.An actual example
The following example is from one of our QaNet models for reading comprehension. It highlights how such layer combinations can go unnoticed.
As can be seen, this model builds on
Seq2SeqEncoder
objects. At first glance, it is not immediately obvious what the computation graph will look like. Here's the output of the verification:In instantiating the model with
Seq2SeqEncoder
objects, we discover that the final computation graph contains a normalization layer right after_encoding_proj_layer
and_modeling_proj_layer
.AllenNLP models
We found that the following models in AllenNLP failed the sanity check:
BertPredictionHeadTransform
. We find that the extra bias term in these models can be safely ignored.Note
This verification runs by default in AllenNLP's
GradientDescentTrainer
when training. It can be turned off by settingrun_sanity_checks
toFalse
(in AllenNLP v2.2.0, you need to setenable_default_callbacks
toFalse
).References
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ArXiv, abs/1502.03167.
Beta Was this translation helpful? Give feedback.
All reactions