Friendly machine learning for LHCb experiment. Project should enable one to train and compare classifiers on some training dataset.
The programming language is python, the analysis is performed in IPython notebooks - commonly used in machine learning interactive shell for python, which is good for development, analysis and presenting results (plots, histograms and so on)
- Dalitz Demo (several uniforming classifiers on dataset from uBoost paper)
- Decay of tau into three muons
- Generation of toy Monte-Carlo
- Any other notebook from repository can be viewed: find it in notebook-viewer
- working on uniform classifiers - the classifiers with low correlation of predictions and mass (or some other variable(s))
- measures of uniformity (
SDE
,Theil
,CvM
,KS
) - uBoost optimized implementation inside
- uGradientBoosting (with different losses, specially FlatnessLoss is very interesting)
- measures of uniformity (
- parameter optimization
Seegrid_search
module, there is a simulated annealing-like optimization of parameters on dataset, this optimization can be performed on cluster. - plots, plots, plots
Seereports
module, it is a good way to visualize learning curves, roc curves, flatness of predictions on variables. - there is also procedure to generate toy Monte-Carlo in
toymc
module
(generates new set of events based on the set of events we already have with same distribution) and special notebook 'ToyMonteCarlo' to demonstrate and analyze its results. - parallelism
ClassifiersDict fromreports
can train classifiers on IPython cluster,
uBoost is quite slow, and it has built-in parallelism option: different BDTs inside uBoost can be trained parallelly in cluster.
###Getting started To run most of the notebooks, only IPython and some python libraries are needed.
To run example notebooks on some machine, one should have
- IPython
- Some python libraries that can be installed using any package manager for python
(
apt-get
will work too, but Ubuntu repo contains quite old versions of libraries), better use pip
The libraries you need are numpy
, scipy
, pandas
, scikit-learn
, matplotlib
, rootpy
, root-numpy
and maybe something else, basically the packages are installed via command-line:
sudo pip install numpy scipy pandas scikit-learn matplotlib rootpy root-numpy
IPython can be installed via pip as well
sudo pip install "ipython[all]"
To use the repository, clone it with git
git clone https://github.com/anaderi/lhcb_trigger_ml.git cd lhcb_trigger_ml sudo pip install -e .
To run IPython, there is shell script: hep_ml/runIpython.sh
In order to work with .root files, you need CERN ROOT (make sure you have it by typing root
in the console)
with pyROOT package.