title | fancyhdrtitle | author-meta | authors | affiliations | tags | date | bibliography | bibtex | natbib | biblio-style | link-citations | colorlinks | numbersections | section-titles | usehancyhdr | fontsize | geometry | fontfamily | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
*SMPyBandits*: an Experimental Framework for Single and Multi-Players Multi-Arms Bandits Algorithms in Python |
SMPyBandits |
Lilian Besson |
|
|
|
23 March 2018 |
paper.bib |
true |
false |
ieeetr |
true |
true |
true |
true |
true |
10pt |
scale=0.71 |
palatino |
SMPyBandits is a package for numerical simulations on single-player and multi-players Multi-Armed Bandits (MAB) algorithms, written in Python (2 or 3). This library is the most complete open-source implementation of state-of-the-art algorithms tackling various kinds of sequential learning problems referred to as Multi-Armed Bandits. It aims at being extensive, simple to use and maintain, with a clean and well documented codebase. It allows fast prototyping of simulations and experiments, with an easy configuration system and command-line options to customize experiments.
Multi-Armed Bandit (MAB) problems are well-studied sequential decision making problems in which an agent repeatedly chooses an action (the "arm" of a one-armed bandit) in order to maximize some total reward [@Robbins52], [@LaiRobbins85]. Initial motivation for their study came from the modeling of clinical trials, as early as 1933 with the seminal work of Thompson [@Thompson33], where arms correspond to different treatments with unknown, random effect. Since then, MAB models have been proved useful for many more applications, that range from cognitive radio [@Jouini09] to online content optimization like news article recommendation [@Li10], online advertising [@LiChapelle11], A/B Testing [@Kaufmann14], [@Jamieson17], or portfolio optimization [@Sani12].
More formally, a stochastic MAB is defined by Bernoulli
, binomial
, Poisson
, and a generic discrete
distributions,
as well as exponential
, gamma
, Gaussian
and uniform
continuous distributions,
which can be truncated to an interval
SMPyBandits is a complete open-source implementation of single-player (classical) bandit algorithms,
containing over 65 algorithms.
It uses a well-designed hierarchical structure and class inheritance scheme to minimize redundancy in the codebase.
For instance, most existing algorithms are index-based, and new ones can be written easily by inheriting from the IndexPolicy
class.
An index-based algorithm computes an index
For Cognitive Radio and other applications, a well-studied extension is to consider rhoRand
, MEGA
, MusicalChair
, and our state-of-the-art algorithms RandTopM
and MCTopM
from @BessonALT2018).
For comparison, realistic or full-knowledge centralized algorithms are also implemented.
With this numerical framework, simulations can run on a single CPU or a multi-core machine using joblib [@joblib],
and summary plots are automatically saved as high-quality PNG, PDF and EPS, using matplotlib [@matplotlib] and seaborn [@seaborn].
Raw data from each simulation is also saved in a HDF5® file using h5py [@h5py], an efficient and compressed binary format, to allow easy post-mortem exploration of simulation results.
Making new simulations is very easy, one only needs to write a configuration script (configuration.py
), without needing a complete knowledge of the internal code architecture.
A complete Sphinx documentation, for each algorithm and all parts of the codebase, even including the constants in the different configuration files, is available here: SMPyBandits.GitHub.io
.
We show how to install SMPyBandits, and an example of how to run a simple experiment. This bash snippet 1 shows how to clone the code 2, and install the requirements for Python 3 (once):
# 1. get the code in the folder you want
$ git clone https://GitHub.com/SMPyBandits/SMPyBandits.git
$ cd SMPyBandits.git
# 2. install the requirements
$ pip install -r requirements.txt
Launching simulations is easy, for instance this snippet shows how to start N=1000
etc) in the command line is not required, but it is convenient:
# 3. run a single-player simulation
$ BAYES=False ARM_TYPE=Bernoulli N=1000 T=10000 K=9 N_JOBS=4 \
MEANS=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9] python3 main.py configuration.py
A small script configuration.py
is used to import the arm classes, the policy classes and define the problems and the experiments.
Choosing the algorithms is easy by customizing the configuration["policies"]
list in the configuration.py
file.
For instance, one can compare the standard anytime klUCB
algorithm against the non-anytime variant klUCBPlusPlus
algorithm, and also UCB
(with Thompson
(with Beta posterior).
configuration["policies"] = [
{ "archtype": klUCB, "params": { "klucb": klucbBern } },
{ "archtype": klUCBPlusPlus, "params": { "horizon": HORIZON, "klucb": klucbBern } },
{ "archtype": UCBalpha, "params": { "alpha": 1 } },
{ "archtype": Thompson, "params": { "posterior": Beta } }
]
Running the simulation as shown above will save figures in a sub-folder, as well as save data (pulls, rewards and regret) in a HDF5 file [^HDF5 example].
Figure \ref{fig:plot1} below shows the average regret for these
[^HDF5 example]: E.g., this simulation produces this HDF5 file, GitHub.com/SMPyBandits/SMPyBandits/blob/master/plots/paper/example.hdf5.
SMPyBandits was used for the following research articles since
- For @BessonALT2018, we used SMPyBandits for all the simulations for multi-player bandit algorithms 3. We designed the two
RandTopM
andMCTopM
algorithms and proved than they enjoy logarithmic regret in the usual setting, and outperform significantly the previous state-of-the-art solutions (i.e.,rhoRand
,MEGA
andMusicalChair
).
- In @BessonWCNC2018, we used SMPyBandits to illustrate and compare different aggregation algorithms 4. We designed a variant of the Exp3 algorithm for online aggregation of experts [@Bubeck12], called
Aggregator
. Aggregating experts is a well-studied idea in sequential learning and in machine learning in general. We showed that it can be used in practice to select on the run the best bandit algorithm for a certain problem from a fixed pool of experts. This idea and algorithm can have interesting impact for Opportunistic Spectrum Access applications [@Jouini09] that use multi-armed bandits algorithms for sequential learning and network efficiency optimization.
- In @Besson2018c, we used SMPyBandits to illustrate and compare different "doubling trick" schemes 5. In sequential learning, an algorithm is anytime if it does not need to know the horizon
$T$ of the experiments. A well-known trick for transforming any non-anytime algorithm to an anytime variant is the "Doubling Trick": start with an horizon$T_0\in\mathbb{N}^*$ , and when$t > T_i$ , use$T_{i+1} = 2 T_i$ . We studied two generic sequences of growing horizons (geometric and exponential), and we proved two theorems that generalized previous results. A geometric sequence suffices to conserve minimax regret bounds (in $R_T = \mathcal{O}(\sqrt{T})$), with a constant multiplicative loss$\ell \leq 4$ , but cannot be used to conserve a logarithmic regret bound (in $R_T = \mathcal{O}(\log(T))$). And an exponential sequence can be used to conserve logarithmic bounds, with a constant multiplicative loss also$\ell \leq 4$ in the usual setting. It is still an open question to know if a well-tuned exponential sequence can conserve minimax bounds, or "weak" minimax bounds (in $R_T = \mathcal{O}(\sqrt{T \log(T)})$).
This library is written in Python [@python], for versions 2.7+ or 3.4+, using matplotlib
[@matplotlib] for 2D plotting, numpy
[@numpy] for data storing, random number generations and operations on arrays, scipy
[@scipy] for statistical and special functions, and seaborn
[@seaborn] for pretty plotting and colorblind-aware colormaps.
Optional dependencies include joblib
[@joblib] for parallel simulations, numba
[@numba] for automatic speed-up on small functions, sphinx
[@sphinx] for generating the documentation, virtualenv
[@virtualenv] for launching simulations in isolated environments, and jupyter
[@jupyter] used with ipython
[@ipython] to experiment with the code.
Footnotes
-
$;$ See
SMPyBandits.GitHub.io/How_to_run_the_code.html
for more details. ↩ -
$;$ SMPyBandits is also available on Pypi, see
pypi.org/project/SMPyBandits
. You can install it directly withsudo pip install SMPyBandits
, or from avirtualenv
[@virtualenv]. ↩ -
$;$ See the page
SMPyBandits.GitHub.io/MultiPlayers.html
on the documentation. ↩ -
$;$ See the page
SMPyBandits.GitHub.io/Aggregation.html
on the documentation. ↩ -
$;$ See the page
SMPyBandits.GitHub.io/DoublingTrick.html
on the documentation. ↩