Reproduction

We present the code and instructions to reproduce our NeurIPS 2022 Spotlight paper "Understanding the Failure of Batch Normalization for Transformers in NLP" on neural machine translation experiments.
For other tasks, you can easily modify the normalization module in language modeling, named entity recognition, text classification to reproduce the corresponding results. For the reason of license, we do not include them here. We are still appending new features.

The codes are based on fairseq (v0.9.0)

BN/RBN module is located at: fairseq\modules\norm\mask_batchnorm3d.py

Reproduction

Install PyTorch (we use Python=3.6 and PyTorch=1.7.1, higher version of python and PyTorch should also work)

conda create -n rbn python=3.6
conda activate rbn
conda install pytorch==1.7.1 torchvision torchaudio cudatoolkit=11.0 -c pytorch (or pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html)

Install fairseq by:

cd RegularizedBN
pip install --editable ./

Install other requirements

pip install -r requirements.txt

IWSLT14 De-En

Download the data from google drive and extract it in data-bin. You can also download it from Baidu Netdisk.

cd data-bin
unzip iwslt14.tokenized.de-en.zip
cd ..

Training the model (8GB GPU memory is enough)

chmod +x ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh ./iwslt14_bash/train-iwslt14-post-max-epoch.sh

For Pre-Norm Transformer:  
BN: 
CUDA_VISIBLE_DEVICES=0 ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh batch_1_1
RBN: 
CUDA_VISIBLE_DEVICES=1 ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh batch_diff_0.1_0.01  
LN: 
CUDA_VISIBLE_DEVICES=2 ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh layer_1

For Post-Norm Transformer:  
BN: 
CUDA_VISIBLE_DEVICES=0 ./iwslt14_bash/train-iwslt14-post-max-epoch.sh batch_1_1
RBN: 
CUDA_VISIBLE_DEVICES=1 ./iwslt14_bash/train-iwslt14-post-max-epoch.sh batch_diff_60_0
LN: 
CUDA_VISIBLE_DEVICES=2 ./iwslt14_bash/train-iwslt14-post-max-epoch.sh layer_1

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
iwslt14_bash		iwslt14_bash
scripts		scripts
tests		tests
.DS_Store		.DS_Store
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dev.tsv		dev.tsv
hubconf.py		hubconf.py
inference.py		inference.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test.tsv		test.tsv
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproduction

IWSLT14 De-En

About

Releases

Packages

Languages

License

wjxts/RegularizedBN

Folders and files

Latest commit

History

Repository files navigation

Reproduction

IWSLT14 De-En

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages