- [05/09/2024]: update processed Jiasaw datasets, refer to the datasets/readme.md
- [08/23/2023]: update all running logs from our
$45,079$ experiments ($14,428$ GPU hours)! - [08/18/2023]: add a jupyter notebook tutorial for running FFB!
- [08/18/2023]: add a step-by-step guideline for running FFB!
- [08/18/2023]: add NLP task - Jigsaw Toxic Comment Classification!
- [08/01/2023]: design a logo for FFB!
- [07/12/2023]: update the datasets and downloading instructions!
The Fair Fairness Benchmark is a PyTorch-based framework for evaluating the fairness of machine learning models. The framework is designed to be simple and customizable, making it accessible to researchers with varying levels of expertise. The benchmark includes a set of predefined fairness metrics and algorithms, but users can easily modify or add new metrics and algorithms to suit their specific research questions. For more information, please refer to our paper FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods.
This benchmark aims to be
- minimalistic
- hackable
- beginner-friendly
- torch-idiomatic
- reference implementation for researchers
- ......
Please refer to the datasets/readme.md for datasets downloading instructions.
- UCI Adult: U.S. census data predicting an individual's income over $50K using demographic and financial details.
- COMPAS: Criminal defendants' records used to predict recidivism within two years.
- German Credit: Information about credit applicants at a German bank used for credit risk rating prediction.
- Bank Marketing: Data from a Portuguese bank used to predict client subscription to term deposit.
- ACS: From the American Community Survey, used for multiple prediction tasks such as income and employment.
- KDD Census: Like UCI Adult but with more instances, used to predict if an individual’s income is over $50K.
- CelebFaces Attributes: 20k celebrity face images annotated with 40 binary labels of specific facial attributes.
- UTKFace: Over 20k face images from diverse ethnicities and ages, annotated with age, gender, and ethnicity.
The statistics of the datasets are as the following:
- ERM: Standard machine learning method that minimizes the empirical risk of the training data. Serves as a common baseline for fairness methods.
- DiffDP, DiffEopp, DiffEodd: Gap regularization methods for demographic parity, equalized opportunity, and equalized odds. These fairness definitions cannot be optimized directly, but gap regularization is differentiable and can be optimized using gradient descent.
- PRemover: Aim to minimize the mutual information between the prediction accuracy and the sensitive attributes.
- HSIC: Minimizes the Hilbert-Schmidt Independence Criterion between the prediction accuracy and the sensitive attributes.
- AdvDebias: Learns a classifier that maximizes the prediction ability and simultaneously minimizes an adversary's ability to predict the sensitive attributes from the predictions.
- LAFTR: A fair representation learning method aiming to learn an intermediate representation that minimizes the classification loss, reconstruction error, and the adversary's ability to predict the sensitive attributes from the representation.
1. Not all widely used fairness datasets stably exhibit fairness issues. We found that in some cases, the bias in these datasets is either not consistently present or its manifestation varies significantly. This finding indicates that relying on these datasets for fairness analysis might not always provide stable or reliable results.
2.The utility-fairness performance of the current fairness method exhibits trade-offs. We conduct experiments using various in-processing fairness methods and analyze the ability to adjust the trade-offs to cater to specific needs while maintaining a balance between accuracy and fairness.
To install the Fair Fairness Benchmark, simply clone this repository and install the required dependencies by running the following command:
pip install -r requirements.txt
python -u ./ffb_tabular_erm.py --dataset acs --model erm --sensitive_attr age --target_attr income --batch_size 32 --seed 89793 --log_freq 1 --num_training_steps 150
python -u ./ffb_tabular_diffdp.py --dataset acs --model diffdp --sensitive_attr race --target_attr income --batch_size 4096 --lam 1.4 --seed 89793 --log_freq 1 --num_training_steps 150
wait;
Ensure you have Anaconda or Miniconda installed on your system. If not, download and install from the official Miniconda site. The important python packages are:
pandas==1.5.3
torch==1.13.1+cu116
wandb==0.14.0
scikit-learn==1.2.2
tabulate==0.9.0
statsmodels==0.13.5
# Navigate to your preferred directory
cd path/to/your/directory
# Clone the repository from GitHub
git clone https://github.com/ahxt/fair_fairness_benchmark.git
# Navigate to the cloned directory
cd fair_fairness_benchmark
# Create a new conda environment
conda create --name ffb_env python=3.8
# Activate the environment
conda activate ffb_env
# Install required packages
pip install -r requirements.txt
Weights & Biases is a tool used for experiment tracking. Our code in the repository uses wandb
. We highly recomend to use wandb for tracking, if not, you can just only delete the all lines of codes that inlcudes "wandb". Ples follow these steps to setup wandb.
# Install wandb
pip install wandb
# Login to your wandb account. If you don't have one, you'll be prompted to create it.
wandb login
Before running, ensure you've downloaded the necessary datasets as per the instructions in datasets/readme.md
.
# Run the first example
python -u ./ffb_tabular_erm.py --dataset acs --model erm --sensitive_attr age --target_attr income --batch_size 32 --seed 89793 --log_freq 1 --num_training_steps 150
# Run the second example
python -u ./ffb_tabular_diffdp.py --dataset acs --model diffdp --sensitive_attr race --target_attr income --batch_size 4096 --lam 1.4 --seed 89793 --log_freq 1 --num_training_steps 150
We welcome contributions from the research community to improve and extend the Fair Fairness Benchmark. If you have an idea for a new metric or algorithm, or would like to report a bug, please open an issue or submit a pull request.
The Fair Fairness Benchmark is released under the MIT License.
If you find our resources useful, please kindly cite our paper.
@misc{han2023ffb,
title={FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods},
author={Xiaotian Han and Jianfeng Chi and Yu Chen and Qifan Wang and Han Zhao and Na Zou and Xia Hu},
year={2023},
eprint={2306.09468},
archivePrefix={arXiv},
primaryClass={cs.LG}
}