Avoiding accidental errors with sanity checks #5053

AkshitaB · 2021-03-15T10:58:47Z

AkshitaB
Mar 15, 2021
Maintainer

Developing neural models nowadays no longer involves just writing your own models in PyTorch. Often, one leverages existing modules and builds new architectures on top of existing ones. Thus, it is all the more crucial to sanity check your code to avoid accidental issues.

This post describes an architecture-based sanity check for model development in AllenNLP - ensuring that normalization layers are not combined with layers containing bias. We added this tool to the library to make it easier for users to make sure that their models are robust. We also discovered such unintended layer combinations on sanity checking our own models! Read on to find more.

Why should normalization layers not be combined with layers containing bias?

Batch normalization is a technique to normalize the input values in the batch for all layers in the network, in order to ensure that the distributions of inputs remain the same across layers. This helps in speeding up the training as one can use higher learning rates.

To put it simply, BatchNorm standardizes the inputs (transforms it to have a mean of 0 and variance of 1), and then rescales and offsets it (using learned parameters Gamma and Beta). The paper also mentions the following:

Note that, since we normalize W u+b, the bias b can be ignored since its effect will be canceled by the subsequent mean subtraction (the role of the bias is subsumed by β in Alg. 1).

That is, the batch normalization layer already contains a bias term. Thus, it is unnecessary to include bias in the layer that precedes it. The bias term, if present, will be unused (as it will not receive any gradients).

Thus, it can be useful in some situations to reduce the extra parameters and computations by removing the bias terms in the preceding layer.

Need for automatically detecting normalization-bias layer combinations

For convolution and linear layers in torch, the bias term can be excluded simply by setting bias=False, for instance:

linear = torch.nn.Linear(inp, out, bias=False)

However, when developing slightly more complex models that use pre-implemented modules, it can be hard to keep track of the computation graph. One can inadvertently add a normalization layer without excluding the bias term. Thus, it is helpful to have tools that detect such combinations.

Sanity check in AllenNLP: `NormalizationBiasVerification`

One can check for normalization-bias combinations during model development in AllenNLP.

from allennlp.sanity_checks import NormalizationBiasVerification
verification = NormalizationBiasVerification(your_model)
verification.check(inputs=model_inputs)

Consider the following toy model:

class BiasNormModel(Model):
    def __init__(self, use_bias=True):
        super().__init__(vocab=Vocabulary())
        self.lin = torch.nn.Linear(3, 5, bias=use_bias)
        self.bn = torch.nn.BatchNorm1d(5)

    def forward(self, x):
        # x: (B, 3)
        out = self.bn(self.lin(x))
        return {"loss": out.sum()}
        
your_model = BiasNormModel(use_bias=True)

This yields the following output:

The model failed the NormalizationBiasVerification check:

Detected a layer 'lin' with bias followed by a normalization layer 'bn'.

This makes the normalization ineffective and can lead to unstable training. Either remove the normalization or turn off the bias.

How it works

The NormalizationBiasVerification registers forward hooks on the model's modules at the time of initialization. The "check" involves running the forward method on sample inputs. The hooks create a computation graph, which is used to determine if any layers containing bias are followed by any normalization layers.

An actual example

The following example is from one of our QaNet models for reading comprehension. It highlights how such layer combinations can go unnoticed.

def __init__(
      self,
      vocab: Vocabulary,
      text_field_embedder: TextFieldEmbedder,
      num_highway_layers: int,
      phrase_layer: Seq2SeqEncoder,
      matrix_attention_layer: MatrixAttention,
      modeling_layer: Seq2SeqEncoder,
      dropout_prob: float = 0.1,
      initializer: InitializerApplicator = InitializerApplicator(),
      regularizer: Optional[RegularizerApplicator] = None,
  ) -> None:
      super().__init__(vocab, regularizer)

      text_embed_dim = text_field_embedder.get_output_dim()
      encoding_in_dim = phrase_layer.get_input_dim()
      encoding_out_dim = phrase_layer.get_output_dim()
      modeling_in_dim = modeling_layer.get_input_dim()
      modeling_out_dim = modeling_layer.get_output_dim()

      self._text_field_embedder = text_field_embedder

      self._embedding_proj_layer = torch.nn.Linear(text_embed_dim, encoding_in_dim)
      self._highway_layer = Highway(encoding_in_dim, num_highway_layers)

      self._encoding_proj_layer = torch.nn.Linear(encoding_in_dim, encoding_in_dim)
      self._phrase_layer = phrase_layer

      self._matrix_attention = matrix_attention_layer

      self._modeling_proj_layer = torch.nn.Linear(encoding_out_dim * 4, modeling_in_dim)
      self._modeling_layer = modeling_layer

      self._span_start_predictor = torch.nn.Linear(modeling_out_dim * 2, 1)
      self._span_end_predictor = torch.nn.Linear(modeling_out_dim * 2, 1)

      self._span_start_accuracy = CategoricalAccuracy()
      self._span_end_accuracy = CategoricalAccuracy()
      self._span_accuracy = BooleanAccuracy()
      self._metrics = SquadEmAndF1()
      self._dropout = torch.nn.Dropout(p=dropout_prob) if dropout_prob > 0 else lambda x: x

      initializer(self)

As can be seen, this model builds on Seq2SeqEncoder objects. At first glance, it is not immediately obvious what the computation graph will look like. Here's the output of the verification:

20:48:11 - WARNING - allennlp.sanity_checks.normalization_bias_verification -

The model failed the NormalizationBiasVerification check:

Detected a layer '_encoding_proj_layer' with bias followed by a normalization layer > '_phrase_layer._encoder_blocks.0._conv_norm_layers.0'.

Detected a layer '_modeling_proj_layer' with bias followed by a normalization layer
'_modeling_layer._encoder_blocks.0._conv_norm_layers.0'.

This makes the normalization ineffective and can lead to unstable training. Either remove the normalization or turn off the bias.

In instantiating the model withSeq2SeqEncoder objects, we discover that the final computation graph contains a normalization layer right after _encoding_proj_layer and _modeling_proj_layer.

AllenNLP models

We found that the following models in AllenNLP failed the sanity check:

The vilbert backbone (used in our VQA and VE models)
QaNet and NumericallyAugmentedQaNet

The extra bias terms have been removed in the vilbert backbone and the QaNet models. For the Vilbert VQA model, it led to a slight improvement in the VQA score (52% -> 54%).

coref_bert
BART
Next Token Transformer LM

The other 3 models wrap around pretrained model weights from huggingface. Specifically, the issue occurs in BertPredictionHeadTransform. We find that the extra bias term in these models can be safely ignored.

Note

This verification runs by default in AllenNLP's GradientDescentTrainer when training. It can be turned off by setting run_sanity_checks to False (in AllenNLP v2.2.0, you need to set enable_default_callbacks to False).

References

Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ArXiv, abs/1502.03167.

JohnGiorgi · 2021-03-29T15:10:32Z

JohnGiorgi
Mar 29, 2021

The sanity check fails for me when using PretrainedTransformerEmbedder, but I think that falls under point 5. in the guide above and so I can safely ignore it by setting run_sanity_check=False in GradientDescentTrainer.

However, I don't see this argument anywhere? When I search the entire repo for it, I only see it appear in a commit message and this guide:

And I don't see it where I would expect (https://github.com/allenai/allennlp/blob/main/allennlp/training/trainer.py)

12 replies

matt-gardner Apr 2, 2021

I was suggesting keeping that parameter for whatever current or future use you see it useful for, and adding a new one that just deals with running the sanity check (which is probably fine to be in the set of default callbacks when the sanity check flag is not disabled). But again, just a suggestion, do whatever you think is best.

matt-gardner Apr 2, 2021

I think the effect of my suggestion and your original one is the same, it's just a small difference in implementation which may or may not be easier the way I suggested it.

matt-gardner Apr 2, 2021

Actually, there may not be any real difference between our suggestions... Sorry for the noise 😅.

epwalsh Apr 2, 2021
Maintainer

Cool, PR coming soon.

epwalsh Apr 2, 2021
Maintainer

#5091

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding accidental errors with sanity checks #5053

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 12 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Avoiding accidental errors with sanity checks #5053

AkshitaB Mar 15, 2021 Maintainer

Why should normalization layers not be combined with layers containing bias?

Need for automatically detecting normalization-bias layer combinations

Sanity check in AllenNLP: NormalizationBiasVerification

How it works

An actual example

AllenNLP models

Note

References

Replies: 1 comment · 12 replies

JohnGiorgi Mar 29, 2021

matt-gardner Apr 2, 2021

matt-gardner Apr 2, 2021

matt-gardner Apr 2, 2021

epwalsh Apr 2, 2021 Maintainer

epwalsh Apr 2, 2021 Maintainer

AkshitaB
Mar 15, 2021
Maintainer

Sanity check in AllenNLP: `NormalizationBiasVerification`

Replies: 1 comment 12 replies

JohnGiorgi
Mar 29, 2021

epwalsh Apr 2, 2021
Maintainer

epwalsh Apr 2, 2021
Maintainer