Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more useful log likelihoods #1376

Open
1 of 3 tasks
ben18785 opened this issue Aug 5, 2021 · 10 comments
Open
1 of 3 tasks

Add more useful log likelihoods #1376

ben18785 opened this issue Aug 5, 2021 · 10 comments
Labels

Comments

@ben18785
Copy link
Collaborator

ben18785 commented Aug 5, 2021

There are a number of log-likelihoods which would be useful (this came up in discussion for the PKPD app.):

  • log-normal: used a lot in PKPD modelling and elsewhere for handling multiplicative noise Log-normal log-likelihood #1378
  • a version of the current MultiplicativeGaussianLogLikelihood where $\eta=1$: it seems like the eta free-to-vary case is just too hard to identify in most cases
  • Poisson / negative-binomial: useful for handling counts in epidemiological modelling

@DavAug I think you had a perspective on the log-normal?

@DavAug
Copy link
Member

DavAug commented Aug 5, 2021

Yes, I have thought a little about the situation when we'd like to have a constant standard deviation on the log-scale. Such a noise is the log-normal noise model, which in most software packages is stated as this

log X = log y + sigma * eta,

where X is measureable, y is the ODE output and eta is a standard normal random variable. Then log X has the expectation value log y and standard deviation sigma. Note, however, that the expectation of X is not y, which is usually the standard assumption for error models around ODE solutions. In other words, y is not the underlying "truth" when we assume that measurement noise is corrupting the measurements without bias. This has consequences for the inference problem and will change the inferred parameters for the model y.

Looking a little bit more into this you can find that y is the median of X and that the expectation of X is given by

E[X] = y * exp(sigma**2 / 2).

So, in order to impose the common assumption that E[X] = y, we need to shift the mean of the Gaussian:

log X = log y + mu + sigma * eta,

with mu = - sigma**2 / 2 (to see this let y -> y * exp(mu) in the expectation above and set it equal to y or look up the mean of the log-normal for general mu and sigma on wikipedia. This will give us a distribution that is has mean E[X] = y and a constant standard deviation sigma on the log scale.

Note, the current MulitplicativeGaussianLogLikelihood is an approximation of the log-normal distribution for sigma << 1. Here, the standard deviation on the log scale is in general not constant for varying y.

@ben18785
Copy link
Collaborator Author

ben18785 commented Aug 5, 2021

Interesting. I suppose we could include the shift of the mean as an optional argument to the log-likelihood then? I would probably still have it as a default so that the expectation is the median since this is standard (and I can see arguments for modelling the median opposed to the mean).

Re: the multiplicative model, yep, I see the approximation. I don't think it matters too much if the standard deviation is non-constant on the log scale: or rather, it may do, if that isn't an appropriate assumption to make. That's why it'd be interesting to write a short paper on this.

@DavAug
Copy link
Member

DavAug commented Aug 5, 2021

Sounds good! Do you have an example for when the median might be more appropriate to model?

@ben18785
Copy link
Collaborator Author

ben18785 commented Aug 5, 2021

I don't have an example, but the general arguments for modelling the median probably apply. Namely that it can be more representative of the bulk of data and less susceptible to outliers. In my opinion, I think that what matters most is that people are aware of this when choosing a noise model.

Another related question, is how do people in PKPD model constant + multiplicative noise? Is it similar to how our ConstantAndMultiplicativeGaussianLogLikelihood works or something else.

@DavAug
Copy link
Member

DavAug commented Aug 5, 2021

Hmm, I guess it depends on the application as always 😂 , but I am not yet convinced that the benefits of modelling the median is appropriate when you have ODEs that are not as rigid as a simple median over a distribution of data points, in particular in situations where the number of data points is not much much larger than the number of model parameters. Anyway, worth exploring.

Yep, ConstantAndMultiplicativeGaussianLogLikelihood is the standard PKPD implementation.

@ben18785
Copy link
Collaborator Author

ben18785 commented Aug 5, 2021

A practical reason to model the median would be if you were fearful of model misspecification and so the influence of outliers, I suppose. I can actually see this being more not less reasonable when you have fewer data points.

@DavAug
Copy link
Member

DavAug commented Aug 5, 2021

Ok I see! For interpolation I can see how that may help. I guess, I am thinking more in terms of extrapolation based on your model. If you are worried about misspecifications or outliers in your dataset you can forget about those predictions anyway, and also the median estimation will probably not fix that. But I suspect that the parameters based on the median will have significant consequences on the the predicted dynamics for future times even if your model is not misspecified and there are no outliers, simply because your ODE model estimates parameters with a bias.

@ben18785
Copy link
Collaborator Author

ben18785 commented Aug 5, 2021

Yep, all worth exploring!

ben18785 added a commit that referenced this issue Aug 6, 2021
@ben18785
Copy link
Collaborator Author

ben18785 commented Aug 6, 2021

I'm wondering what's best to do about the MultiplicativeGaussianLogLikelihood model. Since the model with unfixed eta seems to be so unwieldy as to be not useful in practice, I'm in favour of just replacing the old model with one where eta=1. @DavAug @MichaelClerx Any thoughts?

@MichaelClerx
Copy link
Member

Not from me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants