This short document aim at documenting the API used in my SMPyBandits environment, and closing this issue #3.
- Arms are defined in this folder (
Arms/
), see for exampleArms.Bernoulli
- MAB algorithms (also called policies) are defined in this folder (
Policies/
), see for examplePolicies.Dummy
for a fully random policy,Policies.EpsilonGreedy
for the epsilon-greedy random policy,Policies.UCB
for the "simple" UCB algorithm, or alsoPolicies.BayesUCB
,Policies.klUCB
for two UCB-like algorithms,Policies.AdBandits
for the AdBandits algorithm, andPolicies.Aggregator
for my aggregated bandits algorithms. - Environments to encapsulate date are defined in this folder (
Environment/
): MAB problem use the classEnvironment.MAB
, simulation results are stored in aEnvironment.Result
, and the class to evaluate multi-policy single-player multi-env isEnvironment.Evaluator
. - very_simple_configuration.py` imports all the classes, and define the simulation parameters as a dictionary (JSON-like).
main.py
runs the simulations, then display the final ranking of the different policies and plots the results (saved to this folder (plots/
)).
For more details, see these UML diagrams.
- Change the default settings defined in
Environment/plotsettings.py
.
- Change the config file, i.e.,
configuration.py
for single-player simulations, orconfiguration_multiplayers.py
for multi-players simulations. - A good example of a very simple configuration file is given in very_simple_configuration.py`
- Change the main script, i.e.,
main.py
for single-player simulations,main_multiplayers.py
for multi-players simulations. Some plots can be disabled or enabled by commenting a few lines, and some options are given as flags (constants in the beginning of the file). - If needed, change, improve or add some methods to the simulation environment class, i.e.,
Environment.Evaluator
for single-player simulations, andEnvironment.EvaluatorMultiPlayers
for multi-players simulations. They use a class to store their simulation result,Environment.Result
andEnvironment.ResultMultiPlayers
.
In other words, what's the API of this project?
- Make a new file, e.g.,
MyArm.py
- Save it in
Arms/
- The file should contain a class of the same name, inheriting from
Arms/Arm
, e.g., like thisclass MyArm(Arm): ...
(no need for anysuper
call) - This class
MyArm
has to have at least an__init__(...)
method to create the arm object (with or without arguments - named or not); a__str__
method to print it as a string; adraw(t)
method to draw a reward from this arm (t
is the time, which can be used or not); and should have amean()
method that gives/computes the mean of the arm - Finally, add it to the
Arms/__init__.py
file:from .MyArm import MyArm
- For examples, see
Arms.Bernoulli
,Arms.Gaussian
,Arms.Exponential
,Arms.Poisson
.
- For example, use this template:
from .Arm import Arm
class MyArm(Arm):
def __init__(self, *args, **kwargs):
# TODO Finish this method that initialize the arm MyArm
def __str__(self):
return "MyArm(...)".format('...') # TODO
def draw(self, t=None):
# TODO Simulates a pull of this arm. t might be used, but not necessarily
def mean(self):
# TODO Returns the mean of this arm
- Make a new file, e.g.,
MyPolicy.py
- Save it in
Policies/
- The file should contain a class of the same name, it can inherit from
Policies/IndexPolicy
if it is a simple index policy, e.g., like this,class MyPolicy(IndexPolicy): ...
(no need for anysuper
call), or simply likeclass MyPolicy(object): ...
- This class
MyPolicy
has to have at least an__init__(nbArms, ...)
method to create the policy object (with or without arguments - named or not), with at least the parameternbArms
(number of arms); a__str__
method to print it as a string; achoice()
method to choose an arm (index among0, ..., nbArms - 1
, e.g., at random, or based on a maximum index if it is an index policy); and agetReward(arm, reward)
method called when the armarm
gave the rewardreward
, and finally astartGame()
method (possibly empty) which is called when a new simulation is ran. - Optionally, a policy class can have a
handleCollision(arm)
method to handle a collision after choosing the armarm
(eg. update an internal index, change a fixed offset etc). - Finally, add it to the
Policies/__init__.py
file:from .MyPolicy import MyPolicy
- For examples, see
Arms.Uniform
for a fully randomized policy,Arms.EpsilonGreedy
for a simple exploratory policy,Arms.Softmax
for another simple approach,Arms.UCB
for the class Upper Confidence-Bounds policy based on indexes, so inheriting fromPolicies/IndexPolicy
). There is alsoArms.Thompson
andArms.BayesUCB
for Bayesian policies (using a posterior, e.g., likeArms.Beta
),Arms.klUCB
for a policy based on the Kullback-Leibler divergence.- For less classical
Arms.AdBandit
is an approach combining Bayesian and frequentist point of view, andArms.Aggregator
is my aggregating policy.
- For example, use this template:
class MyPolicy(object):
def __init__(self, nbArms, *args, **kwargs):
self.nbArms = nbArms
# TODO Finish this method that initialize the arm MyArm
def __str__(self):
return "MyArm(...)".format('...') # TODO
def startGame(self):
pass # Can be non-trivial, TODO if needed
def getReward(self, arm, reward):
# TODO After the arm 'arm' has been pulled, it gave the reward 'reward'
pass # Can be non-trivial, TODO if needed
def choice(self):
# TODO Do a smart choice of arm
return random.randint(self.nbArms)
def handleCollision(self, arm):
pass # Can be non-trivial, TODO if needed
Other
choice...()
methods can be added, if this policyMyPolicy
has to be used for multiple play, ranked play, etc.
- Make a new file, e.g.,
MyPoliciesMultiPlayers.py
- Save it in
PoliciesMultiPlayers/
- The file should contain a class, of the same name, e.g., like this,
class MyPoliciesMultiPlayers(object):
- This class
MyPoliciesMultiPlayers
has to have at least an__init__
method to create the arm; a__str__
method to print it as a string; and achildren
attribute that gives a list of players (single-player policies). - Finally, add it to the
PoliciesMultiPlayers/__init__.py
file:from .MyPoliciesMultiPlayers import MyPoliciesMultiPlayers
For examples, see
PoliciesMultiPlayers.OracleNotFair
andPoliciesMultiPlayers.OracleFair
for full-knowledge centralized policies (fair or not),PoliciesMultiPlayers.CentralizedFixed
andPoliciesMultiPlayers.CentralizedCycling
for non-full-knowledge centralized policies (fair or not). There is also thePoliciesMultiPlayers.Selfish
decentralized policy, where all players runs in without any knowledge on the number of players, and no communication (decentralized).
PoliciesMultiPlayers.Selfish
is the simplest possible example I could give as a template.
MIT Licensed (file LICENSE).
© 2016-2018 Lilian Besson.