Unsupervised SAM (UnSAM) is a "segment anything" model for promptable and automatic whole-image segmentation which does not require human annotations.
Segment Anything without Supervision
XuDong Wang, Jingfeng Yang, Trevor Darrell
UC Berkeley
Preprint
[project page
] [arxiv
] [colab (UnSAM)
] [colab (pseudo-label)
] [bibtex
]
- 07/01/2024 Initial commit
- The performance gap between unsupervised segmentation models and SAM can be significantly reduced. UnSAM not only advances the state-of-the-art in unsupervised segmentation by 10% but also achieves comparable performance with the labor-intensive, fully-supervised SAM.
- The supervised SAM can also benefit from our self-supervised labels. By training UnSAM with only 1% of SA-1B images, a lightly semi-supervised UnSAM can often segment entities overlooked by supervised SAM, exceeding SAM’s AR by over 6.7% and AP by 3.9% on SA-1B.
See installation instructions.
See Preparing Datasets for UnSAM.
UnSAM has two major stages: 1) generating pseudo-masks with divide-and-conquer and 2) learning unsupervised segmentation models from pseudo-masks of unlabeled data.
Our Divide-and-Conquer approach can be used to provide multi-granular masks without human supervision.
If you want to run Divide-and-Conquer locally, we provide demo_dico.py
that is able to visualize the pseudo-masks.
Please download the CutLER's checkpoint from here, and then run it with:
cd divide_and_conquer
python demo_dico.py \
--input /path/to/input/image \
--output /path/to/save/output \
--preprocess true \
--postprocess true \ #postprocess requires gpu
--opts MODEL.WEIGHTS /path/to/cutler_checkpoint \
MODEL.DEVICE gpu
We give a few demo images in docs/demos/. Following, we give some visualizations of the pseudo-masks on the demo images.
Try out the UnSAM demo using Colab (no GPU needed):
If you want to run UnSAM or UnSAM+ demos locally, we provide demo_whole_image.py
that is able to demo builtin configs.
Please download UnSAM/UnSAM+'s checkpoints from the model zoo.
Run it with:
cd whole_image_segmentation
python demo_whole_image.py \
--input /path/to/input/image \
--output /path/to/save/output \
--opts \
MODEL.WEIGHTS /path/to/UnSAM_checkpoint \
MODEL.DEVICE cpu
The configs are made for training, therefore we need to specify MODEL.WEIGHTS
to a model from model zoo for evaluation.
This command will run the inference and save the results in the local path.
- To run on cpu, add
MODEL.DEVICE cpu
after--opts
. - To save outputs to a directory (for images) or a file (for webcam or video), use
--output
.
Following, we give some visualizations of the model predictions on the demo images.
The following command will pops up a gradio website link in the terminal, on which users can interact with our model.
Please download UnSAM/UnSAM+'s checkpoints from the model zoo.
For details of the command line arguments, see demo_promptable.py -h
or look at its source code
to understand its behavior.
- To run on cpu, add
cpu
after--device
.
python demo_promptable.py \
--ckpt /path/to/UnSAM_checkpoint \
--conf_files configs/semantic_sam_only_sa-1b_swinT.yaml \
--device gpu
Following, we give some visualizations of the model predictions on the demo images.
To evaluate a model's performance on 7 different datasets, please refer to datasets/README.md for
instructions on preparing the datasets. Next, select a model from the model zoo, specify the "model_weights", "config_file"
and the path to "DETECTRON2_DATASETS" in tools/eval.sh
, then run the script.
bash tools/{promptable, whole_image}_eval.sh
UnSAM achieves the state-of-the-art results on unsupervised image segmentation, using a backbone of ResNet50 and training with only 1% of SA-1B data. We show zero-shot unsupervised image segmentation performance on 7 different datasets, including COCO, LVIS, ADE20K, Entity, SA-1B, Part-ImageNet and PACO.
Methods | Models | Backbone | # of Train Images | Avg. | COCO | LVIS | ADE20K | Entity | SA-1B | PtIn | PACO |
---|---|---|---|---|---|---|---|---|---|---|---|
Prev. Unsup. SOTA | - | ViT-Base | 0.2M | 30.1 | 30.5 | 29.1 | 31.1 | 33.5 | 33.3 | 36.0 | 17.1 |
UnSAM (ours) | - | ResNet50 | 0.1M | 39.2 | 40.5 | 37.7 | 35.7 | 39.6 | 41.9 | 51.6 | 27.5 |
UnSAM (ours) | download | ResNet50 | 0.4M | 41.1 | 42.0 | 40.5 | 37.5 | 41.0 | 44.5 | 52.7 | 29.7 |
UnSAM+ can outperform SAM on most experimented benchmarks (including SA-1B), when training UnSAM on 1% of SA-1B with both ground truth masks and our unsupervised labels. This demonstrates that the supervised SAM can also benefit from our self-supervised labels.
Methods | Models | Backbone | # of Train Images | Avg. | COCO | LVIS | ADE20K | Entity | SA-1B | PtIn | PACO |
---|---|---|---|---|---|---|---|---|---|---|---|
SAM | - | ViT-Base | 11M | 42.1 | 49.6 | 46.1 | 45.8 | 45.9 | 60.8 | 28.3 | 18.1 |
UnSAM+ (ours) | download | ResNet50 | 0.1M | 48.8 | 52.2 | 50.8 | 45.3 | 49.8 | 64.8 | 46.0 | 32.3 |
Despite using a backbone that is 3× smaller and being trained on only 1% of SA-1B, our lightly semi-supervised UnSAM+ surpasses the fully-supervised SAM in promptable segmentation task on COCO.
Methods | Models | Backbone | # of Train Images | Point (Max) | Point (Oracle) |
---|---|---|---|---|---|
SAM | - | ViT-B/8 (85M) | 11M | 52.1 | 68.2 |
UnSAM (ours) | download | Swin-Tiny (25M) | 0.1M | 40.3 | 59.5 |
UnSAM+ (ours) | download | Swin-Tiny (25M) | 0.1M | 52.4 | 69.5 |
The majority of UnSAM, CutLER, Detectron2 and DINO are licensed under the CC-BY-NC license, however portions of the project are available under separate license terms: Mask2Former, Semantic-SAM, CascadePSP, Bilateral Solver and CRF are licensed under the MIT license; If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0.
This codebase is based on CutLER, SAM, Mask2Former, Semantic-SAM, CascadePSP, BFS, CRF, DINO and Detectron2. We appreciate the authors for open-sourcing their codes.
UnSAM's wide range of detection capabilities may introduce similar challenges to many other visual recognition methods. As the image can contain arbitrary instances, it may impact the model output.
If you have any general questions, feel free to email us at XuDong Wang. If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).
If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.
@misc{wang2024segmentsupervision,
title={Segment Anything without Supervision},
author={XuDong Wang and Jingfeng Yang and Trevor Darrell},
year={2024},
eprint={2406.20081},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.20081},
}