This project has the utilities for creating track quality classifiers for the Level-1 track finding. It includes:
- Datasets
- Models
- Analysis Tools
Datasets is a general utility for translating root files generated by the level-1 tracking ntuple makers into python .h5 files split into X and y test and train
Dataset = TrackDataSet("Dataset_name")
Instantiate the Dataset class, for more dataset classes see dataset.py
Dataset.load_data_from_root("root_file_path.root",n_batches)
Load the root datafile, providing the full file path of the root file. N_batches is used if you want to use exactly the same number of events across files for direct comparision. If the full dataset is needed set to a large value 100,000+
Dataset.generate_test_train()
Generate the test train split, this will currently automatically do a shuffle, feature transformation, a particle ID and fake balancing to the dataset, and create a 0.1 test train split. generate_test() and generate_train() can be used to only create testing and training datasets. See dataset.py to create your own features or balancing
Dataset.save_test_train_h5("filepath")
Save the files, this will create the filepath directory and create an X_train.h5, X_test.h5, y_train.h5 and y_test.h5. It will also create a json file describing the files
A model is an algorithm that predicts y based on X, within this folder there is the generic class TrackQualityModel that is inherted from. There are a number of general quality models:
- CutTrackQualityModel
- Performs basic
$\chi^2$ cuts on the tracks
- Performs basic
- GBDTTrackQualityModel
Subclassed into:
- SklearnClassifierModel (Using Sklearn)
- XGBoostClassifierModel (Xgboost using the sklearn interface)
- FullXGBoostClassifierModel (Xgboost with full xgboost interface with more parameters to tweak)
- TFDFClassifierModel (tensorflow decision forests, soon to be depreciated)
- NNTrackQualityModel
- Unused, needs updating and expanding
To train a model first generate a dataset as above. Create the model with name "name"
model = XGBoostClassifierModel("name")
Load the dataset located in "Train_Dataset_path"
model.load_data("Train_Dataset_path")
Train the model
model.train()
Save the model in the filepath
model.save_model("filepath")
Parameters of the model can be adapated before training with:
model.min_child_weight['value'] = 1.37
model.alpha['value'] = 0.93
model.early_stopping['value'] = 5
model.learning_rate['value'] = 0.32
model.n_estimators['value'] = 60
model.subsample['value'] = 0.25
model.max_depth['value'] = 3
model.gamma['value'] = 0.0
model.rate_drop['value'] = 0.79
model.skip_drop['value'] = 0.15
Various scripts for training BDTs for single training, combined datasets or incremental training are in Scripts
Assuming a model has been already trained as above, first create the model
model = XGBoostClassifierModel("name")
load the saved model
model.load_model("filepath")
load the testing dataset
model.load_data("Test_Dataset_path")
test the model
model.test()
Evaluate the model, runs a ROC curve calculation and creates some plots found in the save_dir. Also runs the ROC calculation in
model.evaluate(plot=True,save_dir="save_dir")
Plot model importances
plot_model(model,"save_dir")
As the evaluation takes a "long" time the model can be saved after this step and the predictions and ROC calculations will be saved in arrays
model.full_save("save_dir")
And loaded:
model.full_load("save_dir")
To see a single model being evaluated run EvaluateModel
In order to evaluate the quantisation of the model and the firmware usage there are utilities to synthesize the model.
To use the HLS and HDL options you will need a vivado install and to add it to your path with the following:
export PATH="$/opt/Xilinx/Vitis_HLS/2021.2/include:$PATH"
source /opt/Xilinx/Vivado/2021.2/settings64.sh
export BUILD_VIVADO_VERSION=2021.2/
export BUILD_VIVADO_BASE=/opt/Xilinx/Vivado/
To run the synthesis first define which precisions to use in a list, single precisions are allowed but in a list format
precisions = ['ap_fixed<12,6>','ap_fixed<11,6>','ap_fixed<11,5>','ap_fixed<10,6>','ap_fixed<10,5>','ap_fixed<10,4>']
Then run the synthesis
synth_model(model,sim=True,hdl=True,hls=True,cpp=True,onnx=True,python=True,
test_events=10000,
precisions=precisions,
save_dir="filepath")
The Sim flag runs a C-simulation of the BDT needed to evaluate if the precision selected loses performance The HDL flag runs the HDL synthesis and produces a resource vs precision plot The HLS flag runs the HLS synthesis and produces a resource vs precision plot The cpp flag runs a c++ version of the tree needed for bit accurate emulation in CMSSW, the model produced is ported to CMSSW The onnx flag generates an onnx model needed for deployment as simulation in CMSSW The python flag runs a python version of the tree for comparison
Run CompareModels.py, takes a long time
Use the scripts in AnalysisTools/TrackRun For single dataset evaluation for checking track parameters use:
python track_efficiency_BDT_scan.py Degradation10/Degradation10_Test_TrackNtuple.root Degradation10 True 100000
Which will run on the Degradation10 test sample to produce plots in the Degradation10 directory
To compare BDT ROC performance adapt track_efficiency_BDT_scan namely lines 35 and 36 for your trained models, then run with:
python track_efficiency_BDT_scan.py Degradation10/Degradation10_Test_TrackNtuple.root Degradation10 True True 100000
Which will create the ROCCurve.png plot of all BDTs evaluated on the Degradation10 sample
For quick evaluation reduce the final number to 1 (this is the number of batches of tracks to evaluate)
Use an anaconda or miniconda install and create the tq environemnt with
conda env create -f environment.yml
Then run
conda activate tq
Followed by
export PYTHONPATH="/home/cb719/Documents/L1Trigger/Tracker/TrackQuality/TrackQuality_package/"
If you have a vivado installation and want to run the firmware building then inspect setupenv and adapt it for your filepaths
Dataset config dict showing wrong number of events, h5filepath and loaded timestamp
Dataset saves test or train when one is not generated