This is the source code of "On Transferability of Prompt Tuning for Natural Language Processing", an NAACL 2022 paper [pdf].
Prompt tuning (PT) is a promising parameter-efficient method to utilize extremely large pre-trained language models (PLMs), which can achieve comparable performance to full-parameter fine-tuning by only tuning a few soft prompts. However, PT requires much more training time than fine-tuning. Intuitively, knowledge transfer can help to improve the efficiency. To explore whether we can improve PT via prompt transfer, we empirically investigate the transferability of soft prompts across different downstream tasks and PLMs in this work. We find that (1) in zero-shot setting, trained soft prompts can effectively transfer to similar tasks on the same PLM and also to other PLMs with a cross-model projector trained on similar tasks; (2) when used as initialization, trained soft prompts of similar tasks and projected prompts of other PLMs can significantly accelerate training and also improve the performance of PT. Moreover, to explore what decides prompt transferability, we investigate various transferability indicators and find that the overlapping rate of activated neurons strongly reflects the transferability, which suggests how the prompts stimulate PLMs is essential. Our findings show that prompt transfer is promising for improving PT, and further research shall focus more on prompts' stimulation to PLMs.
- Prompt-Transferability-1.0 provides the original codes and details to reproduce the results in the paper.
- Prompt-Transferability-2.0-latest refactors the Prompt-Transferability-1.0 and provides more user-friendly codes for users. In this
README.md
, we mainly demostrate the usage of thePrompt-Transferability-2.0-latest
.
- python==3.8.0
We recommend to create a new Anaconda environment to manage the required packages.
conda create -n prompt_transfer python=3.8.0
conda activate prompt_transfer
bash requirements.sh
Note: If the system shows error about the torch, please find the proper version that can matches your CPUs or GPUs.
User can also directly create the environment via Prompt-Transferability-2.0-latest/environment.yml
.
conda env create -f environment.yml
Excute Prompt-Transferability-2.0-latest/download_data.sh
to download datasets.
cd Prompt-Transferability-2.0-latest
bash download_data.sh
You can easily use PromptHub for various perposes, including prompt training, evaluation, cross-task transfer, cross-model transfer, and activated neuron. The Colab notebook and the example script also demonstrate the usages. Or, you can run the bash file to run a quick example.
cd Prompt-Transferability-2.0-latest
bash example/train.sh
The script of train.sh
is:
DATASET=sst2
LEARNINGRATE=1e-2
EPOCH=3
python example/train.py \
--output_dir outputs \
--dataset $DATASET \
--learning_rate $LEARNINGRATE \
--num_train_epochs EPOCH \
--save_total_limit 1 \
--evaluation_strategy epoch \
--save_strategy epoch \
--load_best_model_at_end true \
--metric_for_best_model combined_score
The above code train.py
shows an example that includes prompt training, evaluation, activated neuron analysis on a specific dataset.
from prompt_hub.hub import PromptHub
from prompt_hub.training_args import PromptTrainingArguments
OUTPUT=outputs
DATASET_1=sst2
DATASET_2=rotten_tomatoes
MODEL=roberta-base
PRMOPTLEN=100
LEARNINGRATE=1e-2
# Training config
args = PromptTrainingArguments(
output_dir= OUTPUT,
dataset= DATASET_1,
backbone= MODEL,
prompt_len= PRMOPTLEN,
learning_rate= LEARNINGRATE
)
trainer = PromptHub(args=args)
# Prompt training and evaluation
trainer.train_prompt()
trainer.eval_prompt()
# Cross-task evaluation
cross_task_eval_results = trainer.cross_task_eval(MODEL, DATASET_1, DATASET_1)
# Activated neuron
activated_neuron_before_relu, activated_neuron_after_relu = trainer.activated_neuron(args.backbone, args.dataset)
OUTPUT
: Output directory.DATASET_1
: Source (training) dataset. This framework supports all datasets here. We selectsst2
as the example.DATASET_1
: Target dataset. This framework supports all datasets here. We selectrotten_tomatoes
as the example.MODEL
: Backbone models. This framework supportsBert
,Roberta
,GPT
, andT5 v1.1
, currently. We selectroberta-base
as the example.PROMPTLEN
: The length of prompt. We set100
here.LEARNINGRATE
: Learning rate. We set1e-2
here.
Users can use the well-trained prompts in Prompt-Transferability-2.0-latest/task_prompt_emb
or re-train the prompts by your own as the following instruction.
We first need to define a set of arguments or configurations, including backbone
(backbone model), dataset
, prompt_len
(the length of soft prompt), etc. Then we instantiate a PromptHub
object and provide the parameters.
from prompt_hub.training_args import PromptTrainingArguments
# Training config
OUTPUT=outputs
DATASET_1=sst2
DATASET_2=rotten_tomatoes
MODEL=roberta-base
PRMOPTLEN=100
LEARNINGRATE=1e-2
args = PromptTrainingArguments(
output_dir= OUTPUT,
dataset= DATASET_1,
backbone= MODEL,
prompt_len= PRMOPTLEN,
learning_rate= LEARNINGRATE
)
trainer = PromptHub(args=args)
For a complete list of arguments, please refer to Prompt-Transferability-2.0-latest/prompt_hub/training_args.py
and HuggingFace transformers.training_arguments
for more details.
Then we start training a soft prompt. (Optional) You can pass the parameters to overwrite the default configurations in the arguments you passed in.
# Optional arguments to overwrite default parameters
# trainer.train_prompt('roberta-large', 'sst2')
trainer.train_prompt()
With the prompt (trained on specific dataset and utilized backbone model), we excute the following code to evaluate its performance.
# Optional arguments to overwrite default parameters
# eval_results = trainer.eval_prompt('roberta-base', 'sst2')
eval_results = trainer.eval_prompt()
Prompt can directly transfer among tasks. Here, we provide an example to transfer the prompt trained from SST2
dataset to Rotten Tomatoes
dataset.
cross_task_eval_results = trainer.cross_task_eval('roberta-base', 'sst2', 'rotten_tomatoes')
Prompt can utilize a well-trained projector to transfer among different backbones.
We first use SST2
datase and ttrain a projector that can transfer (from roberta-base
to roberta-large
).
trainer.cross_model_train(source_model='roberta-base', target_model='roberta-large', task='sst2')
Then, we utilize it to transfer the prompt among different models.
cross_model_eval_results = trainer.cross_model_eval(source_model='roberta-base', target_model='roberta-large', task='sst2')
Prompt can be seen as a paradigm to manipulate PLMs (stimulate artificial neurons) knowledge to perform downstream tasks. We further observe that similar prompts will activate similar neurons; thus, the activated neurons can be a transferability indicator.
Definition of Neurons: the output values between 1st and 2nd layers of feed-forward network FFN (in every layer of a PLM) [Refer to Section 6.1 in the paper]
Given a model and the trained task-specific prompt, you can obtain the activated neurons values.
activated_neuron_before_relu, activated_neuron_after_relu = trainer.activated_neuron('roberta-base', 'sst2')
You can caculate the similarity/transferability between two prompts via actiaved neurons.
cos_sim = trainer.neuron_similarity(backbone='roberta-base', task1='sst2', task2='rotten_tomatoes')
To further demonstrate the importance of task-specific neurons, we mask them and find the model performance on the corresponding task will degrade. Visualization of activated neurons is also supported.
eval_metric, mask = trainer.mask_activated_neuron(args.backbone, args.dataset, ratio=0.2)
trainer.plot_neuron()
Please cite our paper if it is helpful to your work!
@inproceedings{su-etal-2022-transferability,
title = "On Transferability of Prompt Tuning for Natural Language Processing",
author = "Su, Yusheng and
Wang, Xiaozhi and
Qin, Yujia and
Chan, Chi-Min and
Lin, Yankai and
Wang, Huadong and
Wen, Kaiyue and
Liu, Zhiyuan and
Li, Peng and
Li, Juanzi and
Hou, Lei and
Sun, Maosong and
Zhou, Jie",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.naacl-main.290",
doi = "10.18653/v1/2022.naacl-main.290",
pages = "3949--3969"
}