Official repo of LaREx: Latent Representation Entropy density for distribution shift detection UAI 2024 Paper.
Distribution shift detection is paramount in safety-critical tasks that rely on Deep Neural Networks (DNNs). The detection task entails deriving a confidence score to assert whether a new input sample aligns with the training data distribution of the DNN model. While DNN predictive uncertainty offers an intuitive confidence measure, exploring uncertainty-based distribution shift detection with simple sample-based techniques has been relatively overlooked in recent years due to computational overhead and lower performance than plain post-hoc methods. This paper proposes using simple sample-based techniques for estimating uncertainty and employing the entropy density from intermediate representations to detect distribution shifts. We demonstrate the effectiveness of our method using standard benchmark datasets for out-of-distribution detection and across different common perception tasks with Convolutional Neural Network architectures. Our scope extends beyond classification, encompassing image-level distribution shift detection for object detection and semantic segmentation tasks. Our results show that our method's performance is comparable to existing State-of-the-Art methods while being computationally faster and lighter than other Bayesian approaches, affirming its practical utility.
Our method has been tested in three use cases:
- Image classification using ResNet18,
- Object detection using FasterRCNN,
- Semantic segmentation using DeeplabV3 and UNet.
Each one of them has its own working environment and usage, detailed below.
We make extensive use of the ResNet18 pytorch implementation. For this case, go to the folder image_classification
, create your environment with python=3.7, then install the requirements-classification.txt
.
The config file in config/config.yaml
serves as the unique config for training, extracting latent space samples and analyzing the samples.
- Specify your parameters in the configuration files in the
config
folder. - Use the
train_classifier.py
script. By default this script logs to a local mlflow server. The best and last checkpoints will be saved in the folderlightning_logs
.
- Choose the ood datasets in the
configs/config.yaml
file. - Specify your model checkpoint in this config file also, from the checkpoint obtained in the previous step.
- Choose the baselines to be calculated also in the config file.
- Run the
extract_mcd_samples.py
script. All extracted information will be saved in a folder calledMcd_samples
.
Without modifying the config.yaml
file, run the analyze_mcd_samples.py
. The results will be automatically logged to an mlflow server. Specify your server uri in the script or comment to log to a local mlflow server. The results will also be saved to a csv file.
This section builds on the work of VOS: Virtual Outlier synthesis.
- Refer to VOS installation process, get the BDD100k InD and OoD datasets, and checkpoints from their repository.
- For the installation of the detectron2 library, one modification in the Region Proposal Network (RPN) was necessary (The addition of a Module Called MCDRpnHead in the
detectron2/modeling/proposal_generator/rpn.py
script). For convenience, the modified version fo the detectron2 library can be cloned from here
-
Specify your ood dataset in the
configs/Inference//mcd.yaml
parameter:OOD_DATASET
to be either 'openimages_ood_val' for OpenImages OoD, or 'coco_ood_val_bdd' for OoD COCO -
run:
python get_mcd_and_entropy_samples_bdd_ood.py
--dataset-dir path/to/dataset/dir
--test-dataset bdd_custom_val
--config-file BDD-Detection/faster-rcnn/vanilla.yaml
--inference-config Inference/mcd.yaml
--random-seed 0
--image-corruption-level 0
--visualize 0
Extracted samples will be saved in the MCD_evaluation_data
folder.
- Specify the name of your extracted files, and the OoD dataset to analyze in the
configs/MCD_evaluation/config_rcnn.yaml
file. - run
python mcd_analysis.py
. The results will be automatically logged to an mlflow server, and saved to a csv.
- Create an environment with python=3.7, and install the
requirements-segmentation.txt
, which is the same for Deeplab and for Unet - Download the datasets Woodscapes or cityscapes
- Train DeepLab model running:
Or train UNet model by running:
python train_deeplab_v3p.py -m deeplabv3p-backbone-dropblock2d --batch 16 --epochs 100 --loss_type focal_loss --dataset woodscape --datapath /your_path_to_dataset/WoodScape
In either of their respective folderspython train_unet_sem_seg.py --batch 16 --epochs 100 --loss_type focal_loss --dataset woodscape --datapath /your_path_to_dataset/WoodScape
- Use notebooks for feature extraction, and results analysis
This work has been supported by the French government under the "France 2030” program, as part of the SystemX Technological Research Institute within the confiance.ai Program.
This publication was made possible by the use of the FactoryIA supercomputer, financially supported by the Ile-de-France Regional Council.
If you found any part of this code is useful in your research, please consider citing our paper:
@inproceedings{ arnez2024latent, title={Latent Representation Entropy Density for Distribution Shift Detection}, author={Fabio Arnez and Daniel Alfonso Montoya Vasquez and Ansgar Radermacher and Fran{\c{c}}ois Terrier}, booktitle={The 40th Conference on Uncertainty in Artificial Intelligence}, year={2024}, url={https://openreview.net/forum?id=1CKLfh3Ge7 } }