A TensorFlow 2.0 implementation of Adapters in NLP as described in the paper Parameter-Efficient Transfer Learning for NLP based on HuggingFace's Transformers.
Houlsby et al. (2019) introduced adapters as an alternative approach for adaptation in transfer learning in NLP within deep transformer-based architectures. Adapters are task-specific neural modules that are added between layers of a pre-trained network. After coping weights from a pre-trained network, pre-trained weights will be frozen, and only Adapters will be trained.
Adapters provide numerous benefits over plain fully fine-tuning or other approaches that result in compact models such as multi-task learning:
- It is a lightweight alternative to fully fine-tuning that trains only a few trainable parameters per task without sacrificing performance.
- Yielding a high degree of parameter sharing between down-stream tasks due to being frozen of original network parameters.
- Unlike multi-task learning that requires simultaneous access to all tasks, it allows training on down-stream tasks sequentially. Thus, adding new tasks do not require complete joint retraining. Further, eliminates the hassle of weighing losses or balancing training set sizes.
- Training adapters for each task separately, leading to that the model not forgetting how to perform previous tasks (the problem of catastrophic forgetting).
Learn more in the paper "Parameter-Efficient Transfer Learning for NLP".
An example of training adapters in BERT's encoders on the MRPC classification task:
pip install transformers
python run_tf_glue_adapter_bert.py \
--casing bert-base-uncased \
--bottleneck_size 64\
--non_linearity gelu\
--task mrpc \
--batch_size 32 \
--epochs 10 \
--max_seq_length 128 \
--learning_rate 3e-4 \
--warmup_ratio 0.1 \
--saved_models_dir "saved_models"\