Benchopt is a package to simplify and make more transparent and reproducible the comparisons of optimization algorithms. This benchmark is dedicated to simulation-based inference (SBI) algorithms. The goal of SBI is to approximate the posterior distribution of a stochastic model (or simulator):
q_{\phi}(\theta \mid x) \approx p(\theta \mid x) = \frac{p(x \mid \theta) p(\theta)}{p(x)}
where \theta denotes the model parameters and x is an observation. In SBI the likelihood p(x \mid \theta) is implicitly modeled by the stochastic simulator. Placing a prior p(\theta) over the simulator parameters, allows us to generate samples from the joint distribution p(\theta, x) = p(x \mid \theta) p(\theta) which can then be used to approximate the posterior distribution p(\theta \mid x), e.g. via the training of a deep generative model q_{\phi}(\theta \mid x).
In this benchmark we only consider amortized SBI algorithms that allow for inference for any new observation x, without simulating new data after the initial training phase.
CPU, Python 3.8 - 3.11
If a MacOS device with a M1 (ARM) processor is used, run the following before proceeding to the below installation instructions of benchopt:
conda install pyarrow
This benchmark can be run using the following commands:
pip install -U benchopt git clone https://github.com/JuliaLinhart/benchmark_sbi cd benchmark_sbi benchopt install . benchopt run .
Alternatively, options can be passed to benchopt <install/run>
to restrict the installations/runs to some solvers or datasets:
benchopt <install/run> -s 'npe_sbi[flow=nsf]' -d 'slcp[train_size=4096'] --n-repetitions 3
Use benchopt run -h
for more details about these options, or visit https://benchopt.github.io/api.html.
Results are saved in the outputs/ folder, with a .html file that offers a visual interface showing convergence plots for the different datasets, solvers and metrics. They were obtained by running
benchopt run --n-repetitions 10 --max-runs 1000 --timeout 1000000000000000000
where the parameters max-runs
and timeout
are given high values to avoid premature stopping of the algorithms without convergence.
Everyone is welcome to contribute by adding datasets, solvers (algorithms) or metrics.
Datasets represent different prior-simulator pairs that define a joint distribution p(\theta, x) = p(\theta) p(x \mid \theta). The data they are expected to return (
Dataset.get_data
) consist in a set of training parameters-observation pairs (\theta, x), a set of testing parameters-observation pairs (\theta, x) and and a set of reference posterior-observation pairs (p(\theta \mid x), x).To add a dataset, add a file in the
datasets
folder.It is possible to directly use a simulator from the
sbibm
package as showcased with the providedtwo_moons
example dataset, or to implement a custom simulator from scratch as we did for the slcp example dataset.Solvers represent different amortized SBI algorithms (NRE, NPE, FMPE, ...) or different implementations (
sbi
,lampe
, ...) of such algorithms. They are initialized (Solver.set_objective
) with the training pairs and the prior p(\theta). After training (Solver.run
), they are expected to return (Solver.get_result
) a pair of functionslog_prob
andsample
that evaluate the posterior log-density \log q_{\phi}(\theta \mid x) and generate parameters \theta \sim q_{\phi}(\theta \mid x), respectively.To add a solver, add a file in the
solvers
folder.Metrics evaluate the quality of the estimated posterior obtained from the solver. The main objective is the expected negative log-likelihood \mathbb{E}_{p(\theta, x)} [ - \log q_{\phi}(\theta \mid x) ] over the test set. Other metrics such as the C2ST and EMD scores are computed (
Objective.compute
) using the reference posteriors (if available).To add a metric, implement it in the
benchmark_utils.metrics.py
file.