Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posterior predictive confidence intervals differ to calculation by predict.bsts #30

Open
tcuongd opened this issue May 12, 2019 · 1 comment

Comments

@tcuongd
Copy link

tcuongd commented May 12, 2019

Hi, from looking at the code I can see that CausalImpact generates the posterior predictive samples by getting the state value draws and sampling noise with variance equal to the sigma.obs draws from the bsts model object.

When I try to replicate the inference step using predict.bsts, I get different results. In particular, the credible intervals produced by generating the posterior predictive with bsts.predict are larger than CausalImpact suggests. The bounds of the relative difference credible interval can differ by 2-3%. Is there inherently a difference between how CausalImpact prediction and predict.bsts works? I've attached example code below.

library(magrittr)
library(CausalImpact)

# Dummy data
set.seed(1)
x1 <- 8000 + arima.sim(model = list(ar = 0.99), n = 100)
y <- 1.2 * x1 + rnorm(100, 0, 1)
y[71:100] <- y[71:100] + 10
data <- cbind(y, x1)

pre.period <- c(1, 70)
post.period <- c(71, 100)

# Run CausalImpact
impact <- CausalImpact(data, pre.period, post.period, model.args = list(niter = 10000, standardize.data = F))

# Predictions from bsts
bsts_model <- impact$model$bsts.model
bsts_predict <-
  predict.bsts(bsts_model,
               newdata = data[post.period[1]:post.period[2], "x1"])

# Actuals as a matrix
observed_post <-
  data[post.period[1]:post.period[2], "y"] %>% 
  rep(nrow(bsts_predict$distribution)) %>% 
  matrix(nrow = nrow(bsts_predict$distribution), byrow = T)

# Calculate differences
ppd_diff <- observed_post - bsts_predict$distribution
# Cumulative to last day of post period
diff_cum <- rowSums(ppd_diff)
reldiff_cum <- rowSums(ppd_diff)/rowSums(observed_post)

# Calculated stats
cat("Absolute difference:\n")
c(quantile(diff_cum, 0.025), mn = mean(diff_cum), quantile(diff_cum, 0.975)) %>% 
  print()
# Returns 75.6, 204.3, 342.3

cat("Relative difference:\n")
c(quantile(reldiff_cum, 0.025), mn = mean(reldiff_cum), quantile(reldiff_cum, 0.975)) %>% 
  print()
# Returns 0.026%, 0.071%, 0.12%

# Compare to CausalImpact CIs
summary(impact)
# Returns 123.2, 203.7, 288.0
#         0.043%, 0.071%, 0.1%
@nredell
Copy link

nredell commented Jul 23, 2019

I ran across this issue as well. My guess is that the difference lies in the fact that predict.bsts is predicting from period 101 onward. If you check bsts_model$timestamp.info from the code above and compare it to the what the predict.bsts help docs say it makes sense:

The time stamps give the time points as which each prediction is desired. They must be interpretable as integer (0 or larger) time steps following the last time stamp in object. If NULL, then the requested predictions are interpreted as being at 1, 2, 3, ... steps following the training data.

Looking through https://github.com/cran/bsts/blob/d20c0f3e3dc8a02bf4485234673004892d4a92a1/R/format.prediction.data.R might shed some light.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants