Fitting a CMS-style simplified likelihood from the full likelihood #1528

WolfgangWaltenberger · 2021-07-14T15:54:06Z

WolfgangWaltenberger
Jul 14, 2021

Hey guys, I wish to fit a simplified likelihood the way CMS defined it from a full likelihood. To this end, I wish to fit a per-signal multivariate Gaussian (and later add an asymmetric term) for all nuisances together. So one single multivariate Gaussian "nuisance" term that correlates all signal regions.

I start with a simple toy model

model = pyhf.simplemodels.uncorrelated_background( signal=[1.0, 1.0], bkg=[30.0, 52.0], bkg_uncertainty=[3.0, 7.0] )

I take it that this is a model with a single channel but two bins. The background expectations are 30 and 52, respectively. They have two(?) Gaussian(?) nuisances, centered around zero and with sigmas of 3 and 7, respectively? The nuisances are uncorrelated, every bin has their own nuisance?

>>> data = [30, 52] + model.config.auxdata                                                        
>>> print ( "data", data )
data [30, 52, 100.0, 55.183673469387756]

I define the data to "sit" exactly at the expectation values. The auxdata is said to be the latter two numbers of

data [30, 52, 100.0, 55.183673469387756]

which is already the first thing I do not understand. Where do the numbers 100 and 55 come from?

Ultimately what I want from the procedure is a sampled model that gives me expected yields of ~ 30 and 52, and a covariance matrix of

| 30+3**2   0                |
|     0         52 + 7**2    |

to account for the Poissonian stats error plus the errors from the nuisances. I assume at this stage I already misunderstand something, though. Anyways, I fit the model.

result, result_obj = pyhf.infer.mle.fit( data, model, return_uncertainties=True, return_result_obj=True )

I would read this as a fit of three parameters (mu_SIG, nu1, nu2) to two observations (nobs1, nobs2 = 30, 52). Which obviously cannot work, so maybe the auxdata are also observations, so it is four observations, total?

>>> print ( result )
array([[0.03637303, 5.36795087],
                    [0.99984465, 0.09354211],
                    [0.99982958, 0.10435276]])

From this output I assume I fitted indeed three parameters. But the nuisances are maybe not Gaussians? Maybe lognormals or sth else? Also I am not sure about the number of parameters, because model.config.parameters tells me:

>>> model.config.parameters                                                              
['mu', 'uncorr_bkguncrt']

But I do see that the covariance matrix is also 3x3:

>>> result_obj.minuit.covariance                                                        
 x0      x1      x2 
 x0     17.6  -0.137  -0.166 
 x1   -0.137 0.00875 0.00129 
 x2   -0.166 0.00129  0.0109

So is this the cov matrix of the fitted mu_SIG, nu1, nu2?

Once I understand this, I will sample the model:

sampled_parameters = np.random.multivariate_normal( result_obj.minuit.values, result_obj.minuit.covariance, size=50000 )

I used pyhf.set_backend("numpy", pyhf.optimize.minuit_optimizer() )

And from these I can obtain sampled yields

model_predictions = [ model.expected_data(p, include_auxdata=False) for p in sampled_parameters ]

by plugging the sampled parameters into the model. The sample covariance should then give me the covariance matrix of nuisances. I would then subtract the error terms due to the Poissonians and voila'! have a covariance matrix for a simplified likelihood the form of one_multivariate_Gaussian times n_Poissonians, for n signal regions.

Except -- not yet.

Wolfgang

Answered by alexander-held

Jul 14, 2021

Hi, I will try to start by answering some questions. This model

import pyhf

model = pyhf.simplemodels.uncorrelated_background(
    signal=[1.0, 1.0], bkg=[30.0, 52.0], bkg_uncertainty=[3.0, 7.0]
)

implements the following:

a free-floating normalization factor (a normfactor modifier),
a Poisson-constrained shapesys modifier that looks like a single modifier but actually has two parameters, and implements two Poisson terms, each term acting on only one bin each.

We can have a look at these parameters:

for p in model.config.par_order:
    param_set = model.config.param_set(p)
    print(f"parameter:   {p}")
    print(f"  param set: {param_set}")
    print(f"  # of par.: {param_set.n_parame…

View full answer

alexander-held · 2021-07-14T17:30:43Z

alexander-held
Jul 14, 2021
Maintainer

Hi, I will try to start by answering some questions. This model

import pyhf

model = pyhf.simplemodels.uncorrelated_background(
    signal=[1.0, 1.0], bkg=[30.0, 52.0], bkg_uncertainty=[3.0, 7.0]
)

implements the following:

a free-floating normalization factor (a normfactor modifier),
a Poisson-constrained shapesys modifier that looks like a single modifier but actually has two parameters, and implements two Poisson terms, each term acting on only one bin each.

We can have a look at these parameters:

for p in model.config.par_order:
    param_set = model.config.param_set(p)
    print(f"parameter:   {p}")
    print(f"  param set: {param_set}")
    print(f"  # of par.: {param_set.n_parameters}")

results in

parameter:   mu
  param set: <pyhf.parameters.paramsets.unconstrained object at 0x7f88ed4a5580>
  # of par.: 1
parameter:   uncorr_bkguncrt
  param set: <pyhf.parameters.paramsets.constrained_by_poisson object at 0x7f88f13bbe20>
  # of par.: 2

The auxiliary data for shapesys modifiers is the square of the ratio of nominal yield per bin and uncertainty. In this specific case, this means (30/3)^2=100 and (52/7)^2=55.18, as you can see in model.config.auxdata. The auxiliary data are observations from a (sometimes fictitious) measurement we approximate via the constraint terms. The concept is explained here in the pyhf tutorial.

The covariance matrix you put above is supposed to be for the post-fit model prediction per bin? By design these per-bin uncertainties will be close to sqrt(N), where N is the data yield per bin.

In the MLE fit, you are fitting three parameters, the free-floating signal normalization and the two nuisance parameters. I am using a function from cabinetry below (cabinetry.model_utils.get_parameter_names) to list the results:

data = [30, 52] + model.config.auxdata
result, result_obj = pyhf.infer.mle.fit(
    data, model, return_uncertainties=True, return_result_obj=True
)


def get_parameter_names(model):
    labels = []
    for parname in model.config.par_order:
        for i_par in range(model.config.param_set(parname).n_parameters):
            labels.append(
                f"{parname}[bin_{i_par}]"
                if model.config.param_set(parname).n_parameters > 1
                else parname
            )
    return labels


for parname, res in zip(get_parameter_names(model), result):
    print(f"{parname}: {res}")

The three results are:

mu: 2.518194838009627e-05
uncorr_bkguncrt[bin_0]: 0.9999996931514099
uncorr_bkguncrt[bin_1]: 0.9999996383745946

The signal normalization factor is consistent with 0, as expected from the data we are fitting. The expected values for the two nuisance parameters are 1, nothing is pulled. You can see again here how this is a single modifier known to pyhf as "uncorr_bkguncrt", which has two components.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fitting a CMS-style simplified likelihood from the full likelihood #1528

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Fitting a CMS-style simplified likelihood from the full likelihood #1528

WolfgangWaltenberger Jul 14, 2021

Replies: 1 comment

alexander-held Jul 14, 2021 Maintainer

WolfgangWaltenberger
Jul 14, 2021

alexander-held
Jul 14, 2021
Maintainer