Discover slices of data on which your models underperform.
Getting Started | What is domino? | Docs | Contributing | Paper | About
pip install "domino[clip,text] @ git+https://github.com/HazyResearch/domino@main"
For more detailed installation instructions, see the docs.
import domino
To learn more follow along in our tutorial on Google Colab or dive into the docs.
Machine learning models that achieve high overall accuracy often make systematic errors on coherent slices of validation data. Domino provides tools to help discover these slices.
What is a slice? A slice is a set of data samples that share a common characteristic. As an example, in large image datasets, photos of vintage cars comprise a slice (i.e. all images in the slice share a common subject). The term slice has a number of synonyms that you might be more familiar with (e.g. subgroup, subpopulation, stratum).
Slice discovery is the task of mining unstructured input data (e.g. images, videos, audio) for semantically meaningful subgroups on which a model performs poorly. We refer to automated techniques that mine input data for semantically meaningful slices as slice discovery methods (SDM). Given a labeled validation dataset and a trained classifier, an SDM computes a set of slicing functions that partition the dataset into slices. This process is illustrated below.
This repository is named domino
in reference to the pizza chain of the same name, known for its reliable slice deliveries. It is a slice discovery hub that provides implementations of popular slice discovery methods under a common API. It also provides tools for running quantative evaluations of slice discovery methods.
To see a full list of implemented methods, see the docs.
Useful References:
- 📄 Domino (ICLR 22)
- 📄 User study evaluating slice discovery methods (arXiv)
- 📄 PlaneSpot (TMLR 22)
- 📄 Spotlight (FAccT 22)
Blogposts:
- 🌍 BlogPost
Reach out to Sabri Eyuboglu (eyuboglu [at] stanford [dot] edu) if you would like to get involved or contribute!