Skip to content

Step 2: Extractor2Histo

Anne-Laure Pequegnot edited this page Jan 21, 2015 · 8 revisions

Presentation

PreSkim> cd ../Extractor2Histo

The aim of this step is to convert the Extractuples into histograms. At this point, you also do the full selection (implementation of the different cut variable, choice of MVA or Chi2 method...).

The script that fills this role is Extractor2Histo.cpp. You can feed the script directly with the Extractuples lists or with the skims you created in step 1. The python files correspond to the configuration files which have to be tuned to run on your own datasets.

How to use it

Usually, you feed the script with the skims created at the step before. So, the first thing you have to do is to build a symbolic link pointing on your step 2 skims in the corresponding folder, as you have already done before in step 1.

Extractor2Histo> mkdir skims
Extractor2Histo> cd skims
skims> mkdir data
skims> mkdir semie
skims> mkdir semimu
skims> cd data
data> ln -s ../../../PreSkim/skims/<aDate>/data/* .
data> cd ../semie
semie> ln -s ../../../PreSkim/skims/<aDate>/semie/* .
semie> cd ../semimu
semimu> ln -s ../../../PreSkim/skims/<aDate>/semimu/* .
semimu> cd ../../

Now edit the python configuration files to adapt the script to your own datasets.

For data

With extractDataFromSkim.py:

The script extractDataFromSkim.py is built like that:

files = [
    ["myHistoOutputFile1.root", "skims/data/mySkimOutputFile1.root", "type"],
    ["myHistoOutputFile2.root", "skims/data/mySkimOutputFile2.root", "type"]
]

where

  • myHistoOutputFile1.root is the name of your output rootfile. It will be stored in the plots//1-btag/data/ and plots//2-btag/data/ directories, where 1-btag and 2-btag corresponds to your selection "requiring exactly one b-tagged jet" or "requiring at least 2 b-tagged jets". The script generates automatically the directories /, and aCategory stands for semimu or semie ;
  • skims/data/mySkimOutputFile1.root the skims created in step 1 you have just linked ;
  • "type" is an option to specify if your dataset is a semi-muonic or a semi-electronic channel ;

For MC

With extractMCFromSkim.py:

The script extractMCFromSkim.py is built like that:

files = [
    ["myHistoOutputFile1.root", "skims/%s/mySkimOutputFile1.root"],
    ["myHistoOutputFile2.root", "skims/%s/mySkimOutputFile2.root"]
]

where

  • myHistoOutputFile1.root is the name of your output rootfile. It will be stored in the plots//1-btag// and plots//2-btag// directories, where 1-btag and 2-btag corresponds to your selection "requiring exactly one b-tagged jet" or "requiring at least 2 b-tagged jets". The script generates automatically the directories / and aCategory stands for semimu or semie ;
  • skims/%s/mySkimOutputFile1.root the skims created in step 1 you have just linked. Note that here, you don't have to specify the type (semimu or semie): %s stands for it and the script replaces it by corresponding types automatically.

This is the usage for extractorToHisto script:

./extractorToHisto  {--input-list <string>|-i <string>} {--data|--mc}
                       {--semimu|--semie} [--weight <double>] [-n <int>]
                       [--pdf-syst <string>] [--pileup-syst <string>]
                       [--trigger-syst <string>] [--jec-syst <string>]
                       [--pileup <string>] [--mva] [--skim] --b-tag <int>
                       -o <string> [--] [--version] [-h]


Where: 

   --input-list <string>
     (OR required)  A text file containing a list of input files
         -- OR --
   -i <string>,  --input-file <string>
     (OR required)  The input file


   --data
     (OR required)  Is this data?
         -- OR --
   --mc
     (OR required)  Is this mc?


   --semimu
     (OR required)  Is this semi-mu channel?
         -- OR --
   --semie
     (OR required)  Is this semi-e channel?


   --weight <double>
     MC generator weight

   -n <int>,  -- <int>
     Maximal number of entries to process

   --pdf-syst <string>
     PDF systematic to compute

   --pileup-syst <string>
     PU profile to use for pileup reweigthing

   --trigger-syst <string>
     Computing trigger weight systematic

   --jec-syst <string>
     Computing trigger weight for this JEC up / down

   --pileup <string>
     PU profile used for MC production

   --mva
     Use MVA instead of chi2

   --skim
     Run over a skimmed file

   --b-tag <int>
     (required)  Number of b-tagged jet to require

   -o <string>,  --output-file <string>
     (required)  output file

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.


   Convert extractor tuples to histograms

For example, if you want to run on a specific data skim, type this command:

Extractor2Histo> ./extractorToHisto -i skims/data/MTT_SingleMu_Run2012D-TOPMuPlusJets-prompt.root --data --semimu -o test_outuput.root --skim --b-tag 2

Run the script

Now you are ready to run the script. Type the following commands:

Extractor2Histo> source ../setup_lyoserv_env.sh
Extractor2Histo> make
Extractor2Histo> ./extractDataFromSkim.py
Extractor2Histo> ./extractMCFromSkim.py
Extractor2Histo> ./extractMCSystematicsFromSkim.py

Compute systematics errors

Special treatment for PDF systematics

Systematics due to Parton Distribution Functions are computed following this recipe. More informations can be found here:

To summarize, a set of PDFs contain the nominal PDF plus error PDFs called eigenvectors. You also have to take into account the variation of the strong coupling alpha_s(Mz).

Signal samples we are using in HTT analysis are generated with CT10 LO set of PDF whereas TT_powheg samples use CT10 NNLO: don't forget to change preSkim.cpp to use the correct set of PDFs for ones or the others samples.

All the uncertainties, due to the PDFs and to αs, are evaluated at 68% C.L. Because CT10 PDF fits are using the standard CTEQ PDF evolution, a factor 1.64485 is applied for PDF uncertainty using CTEQ and 5/6 for alpha_s uncertainty with CTEQ to obtain the corresponding 68% C.L. It has been chosen to consider ∆αs = 0.0012 as the 68% C.L. variation of the strong coupling constant: the nominal alpha_s(Mz) is 0.118, so we use parameter 0.117 and 0.119 for up and down alpha_s systematics.

  • PDF uncertainty:

for each member of the PDFs set corresponding to the error PDFs (member 0 corresponds to nominal vaue), we compute an additionnal weight to apply to the event weight, thanks to the formula discribe in the reweighting method. Note that the positive variations along the eigenvector correspond to the even member numbers, and the negative ones to the odd member number.Then you compute the up and down variations du to PDF uncertainty in each mtt bin, using the formula describe in paragraph 2.1, without forgetting the normalization factor.

See script computePdfSystematics.py.

  • Alphas uncertainty: Using the same reweighting method, you compute the variation between the nominal mtt value and the mtt value obtain with the new PDF up (down) weight correponding to alpha_s(Mz) = 0.119 (0.117).

See script computeAlphasSystematics.py.

  • Combine PDF and alpha_s systematics

To do this, you just have to sum in quadrature the different variations you have just computed.

See script combineAlphasPdfSystematics.py.

Finally, just execute these few commands:

Extractor2Histo> ./computePdfSystematics.py
Extractor2Histo> ./computeAlphasSystematics.py
Extractor2Histo> ./combineAlphasPdfSystematics.py

Compute total systematics errors for errors bands in plotIt

Extractor2Histo> ./createSystHistograms -i systematicSamples.yml