git clone [email protected]:HEP-KBFI/ml-tau-reco.git
(as we run everything using a common singularity image no further software installation is needed)
The development environment (all system and python libraries) is defined in scripts/run-env.sh
.
Run a short test of the full pipeline using
./scripts/run-env.sh ./scripts/test_pipeline.sh
To update the libraries in the development environment, see https://github.com/HEP-KBFI/singularity.
To ensure no formatting issues from people using different editors, run the following every time before committing
./scripts/run-env.sh pre-commit run --all
The repository is set up to test the basic functionality of the code in a Github action, configured in test.yml which launches scripts/test_pipeline.sh.
In order to push code, put your new code in a new branch and push it:
git checkout main
git pull origin
git checkout -b my_new_feature_branch
git commit ...
git push origin my_new_feature_branch
Then open a PR on github for your new branch. Basic tests should pass and your code should run in the tests to ensure it's usable by others.
sbatch scripts/trainSimpleDnn.sh
Launch the notebook server on manivald once
[manivald] ./scripts/run-env.sh jupyter notebook
Note the port XXXX and the token. You may close the SSH session, since screen
keeps your notebook server running.
Open an SSH tunnel for the notebook from your laptop to manivald, replacing XXXX and MYUSER:
[laptop] ssh -N -f -L XXXX:localhost:XXXX [email protected]
Navigate from your laptop browser to the notebook address that begins with https://localhost:XXXX
.
The general command to produce the ntuples is
./scripts/run-env.sh python3 src/edm4hep_to_ntuple.py
It takes all the configuration settings from config/ntupelizer
. All the parameters can be replaced on commandline.
The same configuration file is used to check the validity of the ntuple. Validation script is run as follows:
./scripts/run-env.sh python ./src/validation.py
Feel free to implement/suggest any other tests for validation as there are currently only the most basic ones.
To run your tauBuilder code on the general Ntuples and produce tauBuilder tuples for the metric evaluation adapt src/runTauBuilder.py
and run:
[manivald] ./scripts/run-env.sh python3 src/runBuilder.py builder=HPS n_files=1 verbosity=1 output_dir=/local/veelken/CLIC_tau_ntuples/$VERSION
it will run both datasets by default, so if you only want ZH_Htautau, then add also samples_to_process=['ZH_Htautau']
to the end of the command
After updating in config/metrics
the paths where the tauBuilder for a specific algorithm has written it's output one simply runs:
./scripts/run-env.sh python3 src/calculate_metrics.py
The code in this repo was used in the context of the paper "Tau lepton identification and reconstruction: a new frontier for jet-tagging ML algorithms" containing more on the individual algorithms implemented in this repo. The paper can be found as a preprint here.
Our implementation of tau lepton impact parameters using the Key4HEP format is documented in impactparameters/impact.pdf