Skip to content

Latest commit

 

History

History
executable file
·
61 lines (48 loc) · 3.26 KB

README.md

File metadata and controls

executable file
·
61 lines (48 loc) · 3.26 KB

hep_ml

Friendly machine learning for LHCb experiment. Project should enable one to train and compare classifiers on some training dataset.

The programming language is python, the analysis is performed in IPython notebooks - commonly used in machine learning interactive shell for python, which is good for development, analysis and presenting results (plots, histograms and so on)

Brief demos:

Main points

  • working on uniform classifiers - the classifiers with low correlation of predictions and mass (or some other variable(s))
    • measures of uniformity (SDE, Theil, CvM, KS)
    • uBoost optimized implementation inside
    • uGradientBoosting (with different losses, specially FlatnessLoss is very interesting)
  • parameter optimization
    See grid_search module, there is a simulated annealing-like optimization of parameters on dataset, this optimization can be performed on cluster.
  • plots, plots, plots
    See reports module, it is a good way to visualize learning curves, roc curves, flatness of predictions on variables.
  • there is also procedure to generate toy Monte-Carlo in toymc module
    (generates new set of events based on the set of events we already have with same distribution) and special notebook 'ToyMonteCarlo' to demonstrate and analyze its results.
  • parallelism
    ClassifiersDict from reports can train classifiers on IPython cluster,
    uBoost is quite slow, and it has built-in parallelism option: different BDTs inside uBoost can be trained parallelly in cluster.

###Getting started To run most of the notebooks, only IPython and some python libraries are needed.

To run example notebooks on some machine, one should have

  • IPython
  • Some python libraries that can be installed using any package manager for python (apt-get will work too, but Ubuntu repo contains quite old versions of libraries), better use pip

The libraries you need are numpy, scipy, pandas, scikit-learn, matplotlib, rootpy, root-numpy and maybe something else, basically the packages are installed via command-line:

sudo pip install numpy scipy pandas scikit-learn matplotlib rootpy root-numpy

IPython can be installed via pip as well

sudo pip install "ipython[all]" 

To use the repository, clone it with git

git clone https://github.com/anaderi/lhcb_trigger_ml.git
cd lhcb_trigger_ml
sudo pip install -e .

To run IPython, there is shell script: hep_ml/runIpython.sh

In order to work with .root files, you need CERN ROOT (make sure you have it by typing root in the console) with pyROOT package.