This is the official code of our EMNLP 2022 (Findings) paper Holistic Sentence Embeddings for Better Out-of-Distribution Detection and EACL 2023 (Findings) paper Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features.
This repository implements the OOD detection algorithms developed by us (Avg-Avg published at EMNLP 2022 and GNOME published at EACL 2023) along with the following baselines:
Algorithm | Paper |
---|---|
MC Dropout | Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (ICML 2016) |
Maximum Softmax Probability | A Baseline For Detecting Misclassfied and Out-of-Distribution Samples in Nerual Networks (ICML 2017) |
ODIN | Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks (ICLR 2018) |
Maha Distance | A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks (NIPS 2018) |
LOF | Deep Unknown Intent Detection with Margin Loss (ACL 2019) |
Energy Score | Energy-based Out-of-distribution Detection (NIPS 2020) |
ContraOOD | Contrastive Out-of-Distribution Detection for Pretrained Transformers (EMNLP 2021) |
KNN Distance | Out-of-Distribution Detection with Deep Nearest Neighbors (ICML 2022) |
D2U | D2U: Distance-to-Uniform Learning for Out-of-Scope Detection (NAACL 2022) |
Python: 3.7.9
To install the dependencies, run
pip install -r requirements.txt
For the datasets used in our papers, please download the nlp_ood_datasets.zip
file from this Google Drive link and unzip it under the root directory (a 'dataset' directory will be created).
Vanilla training with cross-entroy loss:
python train.py --model roberta-base --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16
Add --loss_type scl
or --loss_type margin
to use the supervised contrastive auxiliary targets proposed in Contrastive Out-of-Distribution Detection for Pretrained Transformers:
python train.py --model roberta-base --loss scl --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16
Add '--optimizer recadam' to use the RecAdam optimizer:
python train.py --model roberta-base --optimizer recadam --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16
Extract features from a fine-tuned model first:
python extract_full_features.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --output_dir <YOUR_FT_DIR> --model roberta-base --pretrained_model <PATH_TO_FINETUNED_MODEL>
GNOME addtionally needs pre-trained features:
python extract_full_features.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --output_dir <YOUR_PRE_DIR> --model roberta-base
Maha with last-cls
pooled features:
python ood_test_embedding.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FT_DIR> --token_pooling cls --layer_pooling last
Avg-Avg (Ours, EMNLP 2022), i.e., Maha with avg-avg
pooled features:
python ood_test_embedding.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FT_DIR> --token_pooling avg --layer_pooling avg
KNN (with the pooling way that you choose, cls-last
by default):
python ood_test_embedding_knn.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FEARTURE_DIR> --token_pooling <cls/avg> --layer_pooling <last/avg/a list of layer indexes like 1,2,11,12>
GNOME (Ours, EACL 2023) (--std
for score normalization, --ensemble_way mean/min
for choosing the aggregator mean
or min
):
python ood_test_embedding_gnome.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --ft_dir <YOUR_FT_DIR> --pre_dir <YOUR_PRE_DIR> --std --ensemble_way mean
Note: Our algorithms Avg-Avg and GNOME are tested on the features extraced from the model trained with the vanilla entropy loss. For reproducing the results of Contrastive Out-of-Distribution Detection for Pretrained Transformers, just use the model trained with contrastive targets to extract features.
To test MSP (base) / Energy (energy) / D2U (d2u) / ODIN (odin) / LOF (lof) / MC Dropout (mc), just specify the method by passing it to the ood_method
argument in the test.py
script:
python test.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --model roberta-base --pretrained_model <PATH_TO_FINETUNED_MODEL> --ood_method base/energy/d2u/odin/lof/mc
If you find this repository to be useful for your research, please consider citing.
@inproceedings{chen-etal-2022-holistic, title = "Holistic Sentence Embeddings for Better Out-of-Distribution Detection", author = "Chen, Sishuo and Bi, Xiaohan and Gao, Rundong and Sun, Xu", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-emnlp.497", pages = "6676--6686" } @inproceedings{chen-etal-2023-fine, title = "Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features", author = "Chen, Sishuo and Yang, Wenkai and Bi, Xiaohan and Sun, Xu", booktitle = "Findings of the Association for Computational Linguistics: EACL 2023", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-eacl.41", pages = "564--579" }
This repository relies on resources from FSSD_OoD_Detection, Huggingface Transformers, and RecAdam. We thank the original authors for their open-sourcing.