Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature suggestion: posterior probability of being in the zero-inflation state #1551

Open
jsocolar opened this issue Oct 15, 2023 · 10 comments
Labels

Comments

@jsocolar
Copy link
Contributor

I wrote the following function to predict the posterior probability of being in the zero-inflation state for the discrete zero-inflated families in brms. If of interest, I'll submit a PR with doc + tests.

posterior_zi_prob <- function (brmsfit, ...) {
  disc_zi_families <- c(
    "zero_inflated_poisson", "zero_inflated_negbinomial",
    "zero_inflated_binomial", "zero_inflated_beta_binomial"
  )
  
  fam <- brmsfit$family$family
  
  if(!(fam %in% disc_zi_families)) {
    stop2(paste0(
      "zi_probs available only for discrete zero-inflated families (",
      paste(disc_zi_families, collapse = ", "),
      "). Supplied brmsfit has family: ",
      fam
    ))
  }
  
  lik <- exp(log_lik(brmsfit, ...))
  zi_lik_given_obs_zero <- posterior_linpred(
    brmsfit, 
    dpar = "zi",
    transform = TRUE,
    ...
  )
  
  resp_name <- brmsfit$formula$resp
  resp_data <- brmsfit$data[[resp_name]]
  resp_is_zero <- as.integer(resp_data == 0)
  
  zi_lik <- sweep(zi_lik_given_obs_zero, MARGIN=2, resp_is_zero, `*`)
  
  zi_lik / lik
}
@paul-buerkner
Copy link
Owner

I am a bit confused. What is the difference of your suggestion to just predicting zi directly via posterior_epred(..., dpar = "zi")? After all zi is by definition the probability of being in the zero-inflation arm.

@jsocolar
Copy link
Contributor Author

This refers to a situation where we care about the posterior probability that a particular observation arises from the latent zero-inflation state.

This probability of being in the zero-inflation state is zero given an observed nonzero, and is higher than zi given an observed zero.

@paul-buerkner
Copy link
Owner

I don't fully understand the when this is needed. So posterior_epred(dpar = "zi") give the posterior predicted probability for any (new) observation given predictors. You make it sound as if we now additionally condition the predcictions on (new?) observed data. Or am I misunderstanding something.

@jsocolar
Copy link
Contributor Author

jsocolar commented Oct 16, 2023

You're absolutely correct that we're conditioning on the observed data (not necessarily new data; the commoner use case will be the data used in model fitting). I think what you're missing is why this can be useful :)

Sometimes with these zero-inflated models, we actually want inference, row-wise, on which observations came from the zero-inflation process. In other words, we want to "un-marginalize" over the latent discrete state $Z$, where $Z = 1$ says that the observation arises from zero inflation and $Z = 0$ says it does not. (Throughout this comment, I suppress indexing on the observation id (i.e. the row); $Z$ is a vector over the data rows). If we were sampling the latent discrete parameter we'd write Z ~ bernoulli(zi).

A good concrete example is an occupancy model fitted as a zero-inflated binomial or zero-inflated beta-binomial (but I don't think the use cases for this are restricted to occupancy modeling). A common goal in inference from an occupancy model is to get a posterior distribution for occupancy across the actual study sites (usually not new data, but the actual sites used in fitting). That is, we want a posterior distribution, per site, of what the probability is that that side is occupied, conditioning on the observed data.

Let $L_0$ be the likelihood of the observed outcome via $Z = 0$, and $L_1$ be the likelihood via $Z = 1$. The overall likelihood is $L_0 + L_1$, which is how we marginalize out $Z$. The posterior probability $p(Z == 1) = \frac{L_1}{L_0 + L_1}$. The easiest intuitive foothold here is that when we observe a nonzero outcome, $L_1$ is zero, and as expected we are a posteriori certain that $Z = 0$. Note also that when we observe a zero outcome, the posterior $Z$ state is not simply 1 with probability zi. Note that $L_1$ is simply zi, and so the posterior probability of $Z = 1$ is higher than zi (because $L_0 + L_1$ is always less than one; note that the former contains a 1 - zi term and the latter is zi).

Is that clarifying?

In an occupancy modeling context, this is useful in case we want to report (posterior distributions for) descriptive statistics of the set of occupied sites, or in general compute any derived quantity based on $Z$. This turns out to be quite common. Likewise, I'm pretty sure that there are applications of zero-inflated Poisson and zero-inflated negative binomial models where inference on $Z$ itself is desired.

Edit: a final note is that when we desire inference about the posterior distribution for $Z$, it's still better to report out probabilities rather than bernoulli samples from those probabilities (which is what the unmarginalized model would do), because there's some loss of information when sampling, and it's always possible to sample later.

@jsocolar jsocolar reopened this Oct 16, 2023
@jsocolar
Copy link
Contributor Author

jsocolar commented Oct 20, 2023

Just a note on naming: I guess posterior_zi_prob() is confusing and not well differentiated from posterior_linpred(..., dpar = "zi", transform = T) and posterior_epred(..., dpar = "zi"). Maybe something like latent_zi_state_prob() would be better.

@paul-buerkner
Copy link
Owner

Thank you for explaining. So this is basically getting the conditional mixture weights per observation? If so, perhaps we can generalize pp_mixture to not only work with formal mixture models (created via mixture) but also with de-facto mixture models such as zero-inflated and hurdle models? This would, I think, lead to a more consistent interface.

@jsocolar
Copy link
Contributor Author

Sounds good. Note that in hurdle models the mixture weights are always either 0 or 1 and depend only on data, with no dependency on parameters. Do you think it's worthwhile to enable pp_mixture to compute the trivial mixture weights for hurdle models? For zero-inflated models, only a subset of the weights are trivial.

@paul-buerkner
Copy link
Owner

Yeah, that make sense. For consistency, it could still be nice to have the hurdle models still even though the answer is trivial.

@jsocolar
Copy link
Contributor Author

Edge case: what is the desired return of pp_mixture if a zero-inflated or hurdle (or zero-one-inflated) model is included as a mixture component in an explicit mixture model, e.g.

set.seed(1)
dat <- data.frame(
  y = runif(100)
)
mix2 <- mixture(gaussian, zero_one_inflated_beta)
fit2 <- brm(bf(y ~ 1), dat2, family = mix2) 

I tried checking if the behavior is already defined for nested mixtures in brms, but it appears not to be, as this errors:

mix3 <- mixture(mixture(gaussian, zero_one_inflated_beta), lognormal)
fit3 <- brm(bf(y~1), dat2, family = mix3)

My instinct is not to return the zero-inflation mixture weights at all and just return the weights of the explicitly defined (via mixture()) mixture components, and I note that any other choice would substantially change the existing behavior of pp_mixture(fit3), rather than merely introduce new functionality.

@paul-buerkner
Copy link
Owner

I think we should have an argument that needs to be turned on explicitely for zi/hu models to be used directly within pp_mixture. This way we ensure that the functions behavior does not change too much just based on the input without explicit user intervention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants