Julian Faraway 2024-08-20
See the introduction for an overview.
This example is discussed in more detail in my book Extending the Linear Model with R
Required libraries:
library(faraway)
library(ggplot2)
library(lme4)
library(INLA)
library(knitr)
library(cmdstanr)
register_knitr_engine(override = FALSE)
library(brms)
library(mgcv)
Load in and plot the data:
data(penicillin, package="faraway")
summary(penicillin)
treat blend yield
A:5 Blend1:4 Min. :77
B:5 Blend2:4 1st Qu.:81
C:5 Blend3:4 Median :87
D:5 Blend4:4 Mean :86
Blend5:4 3rd Qu.:89
Max. :97
ggplot(penicillin,aes(x=blend,y=yield,group=treat,linetype=treat))+geom_line()
ggplot(penicillin,aes(x=treat,y=yield,group=blend,linetype=blend))+geom_line()
The production of penicillin uses a raw material, corn steep liquor,
which is quite variable and can only be made in blends sufficient for
four runs. There are four processes, A, B, C and D, for the production.
See help(penicillin)
for more information about the data.
In this example, the treatments are the four processes. These are the specific four processes of interest that we wish to compare. The five blends are five among many blends that would be randomly created during production. We are not interested in these five specific blends but are interested in how the blends vary. An interaction between blends and treatments would complicate matters. But (a) there is no reason to expect this exists and (b) with only one replicate per treatment and blend combination, it is difficult to check for an interaction.
The plots show no outliers, no skewness, no obviously unequal variances and no clear evidence of interaction. Let’s proceed.
- Is there a difference between treatments? If so, what?
- Is there variation between the blends? What is the extent of this variation?
Consider the model:
where the
lmod <- aov(yield ~ blend + treat, penicillin)
summary(lmod)
Df Sum Sq Mean Sq F value Pr(>F)
blend 4 264 66.0 3.50 0.041
treat 3 70 23.3 1.24 0.339
Residuals 12 226 18.8
There is no significant difference between the treatments. The blends do meet the 5% level for statistical significance. But this asserts a difference between these particular five blends. It’s less clear what this means about blends in general. We can get the estimated parameters with:
coef(lmod)
(Intercept) blendBlend2 blendBlend3 blendBlend4 blendBlend5 treatB treatC treatD
90 -9 -7 -4 -10 1 5 2
Blend 1 and treatment A are the reference levels. We can also use a sum (or deviation) coding:
op <- options(contrasts=c("contr.sum", "contr.poly"))
lmod <- aov(yield ~ blend + treat, penicillin)
coef(lmod)
(Intercept) blend1 blend2 blend3 blend4 treat1 treat2 treat3
86 6 -3 -1 2 -2 -1 3
options(op)
The fit is the same but the parameterization is different. We can get the full set of estimated effects as:
model.tables(lmod)
Tables of effects
blend
blend
Blend1 Blend2 Blend3 Blend4 Blend5
6 -3 -1 2 -4
treat
treat
A B C D
-2 -1 3 0
Since we are not interested in the blends specifically, we may wish to treat it as a random effect. The model becomes:
where the
op <- options(contrasts=c("contr.sum", "contr.poly"))
mmod <- lmer(yield ~ treat + (1|blend), penicillin)
faraway::sumary(mmod)
Fixed Effects:
coef.est coef.se
(Intercept) 86.00 1.82
treat1 -2.00 1.68
treat2 -1.00 1.68
treat3 3.00 1.68
Random Effects:
Groups Name Std.Dev.
blend (Intercept) 3.43
Residual 4.34
---
number of obs: 20, groups: blend, 5
AIC = 118.6, DIC = 128
deviance = 117.3
options(op)
We get the same fixed effect estimates but now we have an estimated blend SD. We can get random effect estimates:
ranef(mmod)$blend
(Intercept)
Blend1 4.28788
Blend2 -2.14394
Blend3 -0.71465
Blend4 1.42929
Blend5 -2.85859
which are a shrunk version of the fixed effect estimates. We can test for a difference of the fixed effects with:
anova(mmod)
Analysis of Variance Table
npar Sum Sq Mean Sq F value
treat 3 70 23.3 1.24
No p-value is supplied because there is some doubt in general over the validity of the null F-distribution. In this specific example, with a simple balanced design, it can be shown that the null F is correct. As it happens, it is the same as that produced in the all fixed effects analysis earlier:
anova(lmod)
Analysis of Variance Table
Response: yield
Df Sum Sq Mean Sq F value Pr(>F)
blend 4 264 66.0 3.50 0.041
treat 3 70 23.3 1.24 0.339
Residuals 12 226 18.8
So no evidence of a difference between the treatments. More general tests are available such as the Kenward-Roger method which adjusts the degrees of freedom - see Extending the Linear Model with R for details.
We can test the hypothesis
rmod <- lmer(yield ~ treat + (1|blend), penicillin)
nlmod <- lm(yield ~ treat, penicillin)
as.numeric(2*(logLik(rmod)-logLik(nlmod,REML=TRUE)))
[1] 2.7629
lrstatf <- numeric(1000)
for(i in 1:1000){
ryield <- unlist(simulate(nlmod))
nlmodr <- lm(ryield ~ treat, penicillin)
rmodr <- lmer(ryield ~ treat + (1|blend), penicillin)
lrstatf[i] <- 2*(logLik(rmodr)-logLik(nlmodr,REML=TRUE))
}
mean(lrstatf > 2.7629)
[1] 0.043
The result falls just below the 5% level for significance. Because of resampling variability, we should repeat with more boostrap samples. At any rate, the evidence for variation between the blends is not decisive.
Integrated nested Laplace approximation is a method of Bayesian computation which uses approximation rather than simulation. More can be found on this topic in Bayesian Regression Modeling with INLA and the chapter on GLMMs
Use the most recent computational methodology:
inla.setOption(inla.mode="compact")
inla.setOption("short.summary",TRUE)
Fit the default INLA model:
formula = yield ~ treat+f(blend, model="iid")
result = inla(formula, family="gaussian", data=penicillin)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 84.047 2.421 79.264 84.044 88.842 84.045 0
treatB 0.948 3.424 -5.832 0.950 7.713 0.950 0
treatC 4.924 3.424 -1.860 4.928 11.686 4.927 0
treatD 1.942 3.424 -4.839 1.945 8.706 1.944 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 3.7e-02 1.30e-02 1.8e-02 3.60e-02 6.70e-02 0.036
Precision for blend 2.3e+04 2.49e+04 1.7e+03 1.53e+04 8.87e+04 92.349
is computed
Precision for the blend effect looks implausibly large. There is a problem with default gamma prior (it needs to be more informative).
Try a half-normal prior on the blend precision. I have used variance of the response to help with the scaling so these are more informative.
resprec <- 1/var(penicillin$yield)
formula = yield ~ treat+f(blend, model="iid", prior="logtnormal", param=c(0, resprec))
result = inla(formula, family="gaussian", data=penicillin)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 84.029 2.766 78.529 84.028 89.536 84.029 0
treatB 0.967 2.715 -4.426 0.969 6.346 0.969 0
treatC 4.952 2.715 -0.444 4.955 10.328 4.955 0
treatD 1.963 2.715 -3.430 1.966 7.342 1.965 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 0.064 0.024 0.028 0.060 0.121 0.062
Precision for blend 0.101 0.115 0.013 0.067 0.401 0.061
is computed
Looks more plausible. Compute the transforms to an SD scale for the blend and error. Make a table of summary statistics for the posteriors:
sigmaalpha <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaalpha,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
restab=cbind(restab, sapply(result$marginals.random$blend,function(x) inla.zmarginal(x, silent=TRUE)))
colnames(restab) = c("mu","B-A","C-A","D-A","blend","error",levels(penicillin$blend))
data.frame(restab) |> kable()
mu | B.A | C.A | D.A | blend | error | Blend1 | Blend2 | Blend3 | Blend4 | Blend5 | |
---|---|---|---|---|---|---|---|---|---|---|---|
mean | 84.029 | 0.96707 | 4.9522 | 1.9633 | 4.2033 | 4.1666 | 4.3137 | -2.1561 | -0.71861 | 1.4373 | -2.8751 |
sd | 2.7633 | 2.7127 | 2.7128 | 2.7127 | 1.8778 | 0.78733 | 2.8285 | 2.6148 | 2.5477 | 2.5732 | 2.672 |
quant0.025 | 78.529 | -4.4253 | -0.4433 | -3.4298 | 1.59 | 2.8761 | -0.67902 | -7.6627 | -5.9798 | -3.526 | -8.5146 |
quant0.25 | 82.269 | -0.78751 | 3.1982 | 0.20892 | 2.8516 | 3.6028 | 2.3412 | -3.7615 | -2.2488 | -0.18053 | -4.5377 |
quant0.5 | 84.022 | 0.9628 | 4.949 | 1.9593 | 3.854 | 4.0742 | 4.1905 | -2.0191 | -0.65279 | 1.3096 | -2.7351 |
quant0.75 | 85.776 | 2.7113 | 6.697 | 3.7077 | 5.157 | 4.6339 | 6.1014 | -0.45159 | 0.81706 | 2.9849 | -1.0686 |
quant0.975 | 89.522 | 6.3329 | 10.315 | 7.3284 | 8.841 | 5.9576 | 10.222 | 2.7274 | 4.319 | 6.8054 | 1.9715 |
Also construct a plot the SD posteriors:
ddf <- data.frame(rbind(sigmaalpha,sigmaepsilon),errterm=gl(2,nrow(sigmaalpha),labels = c("blend","error")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("yield")+ylab("density")+xlim(0,15)
Posterior for the blend SD is more diffuse than the error SD. Posterior for the blend SD has zero density at zero.
Is there any variation between blends? We framed this question as an hypothesis test previously but that is not sensible in this framework. We might ask the probability that the blend SD is zero. Since we have posited a continuous prior that places no discrete mass on zero, the posterior probability will be zero, regardless of the data. Instead we might ask the probability that the operator SD is small. Given the response is measured to the nearest integer, 1 is a reasonable representation of small if we take this to mean the smallest amount we care about. (Clearly you cannot rely on the degree of rounding to make such decisions in general).
We can compute the probability that the operator SD is smaller than 1:
inla.pmarginal(1, sigmaalpha)
[1] 0.0010724
The probability is very small.
Now try more informative gamma priors for the precisions. Define it so
the mean value of gamma prior is set to the inverse of the variance of
the fixed-effects model residuals. We expect the two error variances to
be lower than this variance so this is an overestimate. The variance of
the gamma prior (for the precision) is controlled by the apar
shape
parameter - smaller values are less informative.
apar <- 0.5
lmod <- lm(yield ~ treat, data=penicillin)
bpar <- apar*var(residuals(lmod))
lgprior <- list(prec = list(prior="loggamma", param = c(apar,bpar)))
formula = yield ~ treat+f(blend, model="iid", hyper = lgprior)
result <- inla(formula, family="gaussian", data=penicillin)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 84.029 2.892 78.293 84.028 89.770 84.028 0
treatB 0.968 2.683 -4.356 0.970 6.280 0.970 0
treatC 4.953 2.683 -0.374 4.956 10.262 4.956 0
treatD 1.964 2.683 -3.361 1.966 7.275 1.966 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 0.064 0.025 0.028 0.060 0.123 0.061
Precision for blend 0.072 0.057 0.012 0.057 0.223 0.060
is computed
Compute the summaries as before:
sigmaalpha <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaalpha,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
restab=cbind(restab, sapply(result$marginals.random$blend,function(x) inla.zmarginal(x, silent=TRUE)))
colnames(restab) = c("mu","B-A","C-A","D-A","blend","error",levels(penicillin$blend))
data.frame(restab) |> kable()
mu | B.A | C.A | D.A | blend | error | Blend1 | Blend2 | Blend3 | Blend4 | Blend5 | |
---|---|---|---|---|---|---|---|---|---|---|---|
mean | 84.029 | 0.96784 | 4.9533 | 1.9642 | 4.5338 | 4.1625 | 4.6551 | -2.3269 | -0.77545 | 1.5517 | -3.1027 |
sd | 2.8893 | 2.6805 | 2.6805 | 2.6805 | 1.7544 | 0.78904 | 2.8393 | 2.7563 | 2.7315 | 2.7412 | 2.7779 |
quant0.025 | 78.295 | -4.356 | -0.3734 | -3.3603 | 2.1248 | 2.8541 | -0.67398 | -7.9765 | -6.2796 | -3.7967 | -8.8347 |
quant0.25 | 82.208 | -0.77063 | 3.2154 | 0.22588 | 3.287 | 3.598 | 2.8127 | -4.0082 | -2.4517 | -0.16542 | -4.7957 |
quant0.5 | 84.021 | 0.96347 | 4.9499 | 1.9601 | 4.1817 | 4.0761 | 4.5511 | -2.2751 | -0.76163 | 1.504 | -3.0352 |
quant0.75 | 85.834 | 2.6959 | 6.682 | 3.6924 | 5.3983 | 4.6353 | 6.3732 | -0.5987 | 0.90791 | 3.2133 | -1.3447 |
quant0.975 | 89.754 | 6.2665 | 10.249 | 7.2622 | 8.9184 | 5.9434 | 10.557 | 2.9786 | 4.602 | 7.1132 | 2.188 |
Make the plots:
ddf <- data.frame(rbind(sigmaalpha,sigmaepsilon),errterm=gl(2,nrow(sigmaalpha),labels = c("blend","error")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("yield")+ylab("density")+xlim(0,15)
Posterior for blend SD has no weight near zero.
We can compute the probability that the operator SD is smaller than 1:
inla.pmarginal(1, sigmaalpha)
[1] 2.6757e-05
The probability is very small.
In Simpson et al (2015), penalized complexity priors are proposed. This requires that we specify a scaling for the SDs of the random effects. We use the SD of the residuals of the fixed effects only model (what might be called the base model in the paper) to provide this scaling.
lmod <- lm(yield ~ treat, data=penicillin)
sdres <- sd(residuals(lmod))
pcprior <- list(prec = list(prior="pc.prec", param = c(3*sdres,0.01)))
formula <- yield ~ treat + f(blend, model="iid", hyper = pcprior)
result <- inla(formula, family="gaussian", data=penicillin)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 84.030 2.637 78.797 84.029 89.271 84.029 0
treatB 0.966 2.765 -4.529 0.968 6.446 0.968 0
treatC 4.950 2.765 -0.548 4.954 10.427 4.953 0
treatD 1.962 2.765 -3.534 1.965 7.441 1.964 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 0.064 0.024 0.028 0.060 0.122 0.060
Precision for blend 0.138 0.167 0.016 0.088 0.569 0.077
is computed
Compute the summaries as before:
sigmaalpha <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaalpha,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
restab=cbind(restab, sapply(result$marginals.random$blend,function(x) inla.zmarginal(x, silent=TRUE)))
colnames(restab) = c("mu","B-A","C-A","D-A","blend","error",levels(penicillin$blend))
data.frame(restab) |> kable()
mu | B.A | C.A | D.A | blend | error | Blend1 | Blend2 | Blend3 | Blend4 | Blend5 | |
---|---|---|---|---|---|---|---|---|---|---|---|
mean | 84.03 | 0.96586 | 4.9504 | 1.962 | 3.6957 | 4.1815 | 3.9335 | -1.9668 | -0.65547 | 1.3113 | -2.6233 |
sd | 2.6344 | 2.7623 | 2.7623 | 2.7623 | 1.7136 | 0.79235 | 2.7297 | 2.4345 | 2.3398 | 2.3762 | 2.5142 |
quant0.025 | 78.798 | -4.5283 | -0.54716 | -3.533 | 1.3349 | 2.8687 | -0.68688 | -7.1652 | -5.5419 | -3.2052 | -7.9915 |
quant0.25 | 82.345 | -0.81832 | 3.1669 | 0.17798 | 2.4618 | 3.6147 | 1.9534 | -3.4579 | -2.0351 | -0.18014 | -4.1947 |
quant0.5 | 84.023 | 0.96162 | 4.9473 | 1.958 | 3.371 | 4.0943 | 3.8086 | -1.7908 | -0.56255 | 1.1441 | -2.4521 |
quant0.75 | 85.702 | 2.7396 | 6.7249 | 3.7359 | 4.5582 | 4.656 | 5.6811 | -0.34807 | 0.72739 | 2.7249 | -0.86314 |
quant0.975 | 89.256 | 6.4325 | 10.414 | 7.4278 | 7.9427 | 5.9709 | 9.6527 | 2.4591 | 3.9556 | 6.338 | 1.7675 |
Make the plots:
ddf <- data.frame(rbind(sigmaalpha,sigmaepsilon),errterm=gl(2,nrow(sigmaalpha),labels = c("blend","error")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("yield")+ylab("density")+xlim(0,15)
Posterior for blend SD has no weight at zero. Results are comparable to previous analyses.
We can plot the posterior marginals of the random effects:
nlevels = length(unique(penicillin$blend))
rdf = data.frame(do.call(rbind,result$marginals.random$blend))
rdf$blend = gl(nlevels,nrow(rdf)/nlevels,labels=1:nlevels)
ggplot(rdf,aes(x=x,y=y,group=blend, color=blend)) +
geom_line() +
xlab("") + ylab("Density")
There is substantial overlap and we cannot distinguish the blends.
We can compute the probability that the operator SD is smaller than 1:
inla.pmarginal(1, sigmaalpha)
[1] 0.0048721
The probability is still very small.
STAN performs Bayesian inference using MCMC. I
use cmdstanr
to access Stan from R.
You see below the Stan code to fit our model. Rmarkdown allows the use of Stan chunks (elsewhere I have R chunks). The chunk header looks like this.
STAN chunk will be compiled to ‘mod’. Chunk header is:
cmdstan, output.var="mod", override = FALSE
data {
int<lower=0> N;
int<lower=0> Nt;
int<lower=0> Nb;
array[N] int<lower=1,upper=Nt> treat;
array[N] int<lower=1,upper=Nb> blk;
vector[N] y;
}
parameters {
vector[Nb] eta;
vector[Nt] trt;
real<lower=0> sigmablk;
real<lower=0> sigmaepsilon;
}
transformed parameters {
vector[Nb] bld;
vector[N] yhat;
bld = sigmablk * eta;
for (i in 1:N)
yhat[i] = trt[treat[i]]+bld[blk[i]];
}
model {
eta ~ normal(0, 1);
y ~ normal(yhat, sigmaepsilon);
}
We have used uninformative priors for the treatment effects and the two variances. We prepare data in a format consistent with the command file. Needs to be a list.
ntreat <- as.numeric(penicillin$treat)
blk <- as.numeric(penicillin$blend)
penidat <- list(N=nrow(penicillin), Nt=max(ntreat), Nb=max(blk), treat=ntreat, blk=blk, y=penicillin$yield)
Do the MCMC sampling:
fit <- mod$sample(
data = penidat,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500 # print update every 500 iters
)
Running MCMC with 4 parallel chains...
Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1 finished in 0.2 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.2 seconds.
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.2 seconds.
Total execution time: 0.3 seconds.
We get some warnings but nothing too serious.
Extract the draws into a convenient dataframe format:
draws_df <- fit$draws(format = "df")
Check the diagnostics on the most problematic parameter:
ggplot(draws_df,
aes(x=.iteration,y=sigmablk,color=factor(.chain))) + geom_line() +
labs(color = 'Chain', x="Iteration")
which is satisfactory.
We consider only the parameters of immediate interest:
fit$summary(c("trt","sigmablk","sigmaepsilon","bld"))
# A tibble: 11 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 trt[1] 84.2 84.1 3.54 3.22 78.6 90.1 1.01 571. 474.
2 trt[2] 85.2 85.1 3.31 2.99 79.9 90.9 1.01 631. 424.
3 trt[3] 89.2 89.2 3.50 3.15 83.8 95.0 1.01 580. 370.
4 trt[4] 86.2 86.2 3.44 3.00 80.8 92.2 1.01 561. 374.
5 sigmablk 5.22 4.40 3.58 2.61 1.12 12.5 1.01 447. 298.
6 sigmaepsilon 4.95 4.77 1.13 1.00 3.51 7.01 1.00 1225. 2229.
7 bld[1] 3.97 3.78 3.50 3.29 -1.08 9.84 1.01 588. 514.
8 bld[2] -2.28 -1.92 3.28 2.84 -8.15 2.41 1.01 613. 492.
9 bld[3] -0.899 -0.729 3.16 2.47 -6.53 4.27 1.01 566. 425.
10 bld[4] 1.22 1.10 3.17 2.62 -3.73 6.42 1.01 703. 702.
11 bld[5] -2.97 -2.65 3.39 3.11 -8.90 1.76 1.01 665. 374.
We see that the posterior block SD and error SD are similar in mean but the former is more variable. The effective sample size are adequate (although it would easy to do more given the rapid computation).
We plot the block and error SDs posteriors:
sdf = stack(draws_df[,startsWith(colnames(draws_df),"sigma")])
colnames(sdf) = c("yield","sigma")
levels(sdf$sigma) = c("block","epsilon")
ggplot(sdf, aes(x=yield,color=sigma)) + geom_density()
We see that the error SD can be localized much more than the block SD. We can compute the chance that the block SD is less than one. We’ve chosen 1 as the response is only measured to the nearest integer so an SD of less than one would not be particularly noticeable.
fit$summary("sigmablk", tailprob = ~ mean(. <= 1))
# A tibble: 1 × 2
variable tailprob
<chr> <dbl>
1 sigmablk 0.0432
We see that this probability is small and would be smaller if we had specified a lower threshold.
We can also look at the blend effects:
sdf = stack(draws_df[,startsWith(colnames(draws_df),"bld")])
colnames(sdf) = c("yield","blend")
levels(sdf$blend) = as.character(1:5)
ggplot(sdf, aes(x=yield,color=blend)) + geom_density()
We see that all five blend distributions clearly overlap zero. We can also look at the treatment effects:
sdf = stack(draws_df[,startsWith(colnames(draws_df),"trt")])
colnames(sdf) = c("yield","trt")
levels(sdf$trt) = LETTERS[1:4]
ggplot(sdf, aes(x=yield,color=trt)) + geom_density()
We did not include an intercept so the treatment effects are not differences from zero. We see the distributions overlap substantially.
BRMS stands for Bayesian Regression Models with STAN. It provides a convenient wrapper to STAN functionality.
Fitting the model is very similar to lmer
as seen above:
suppressMessages(bmod <- brm(yield ~ treat + (1|blend), penicillin, cores=4, backend = "cmdstanr"))
We get some warnings but not as severe as seen with our STAN fit above. We can obtain some posterior densities and diagnostics with:
plot(bmod)
Looks OK.
We can look at the STAN code that brms
used with:
stancode(bmod)
// generated with brms 2.19.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2 : K) {
means_X[i - 1] = mean(X[ : , i]);
Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
array[M_1] vector[N_1] z_1; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = sd_1[1] * z_1[1];
lprior += student_t_lpdf(Intercept | 3, 87, 5.9);
lprior += student_t_lpdf(sigma | 3, 0, 5.9)
- 1 * student_t_lccdf(0 | 3, 0, 5.9);
lprior += student_t_lpdf(sd_1 | 3, 0, 5.9)
- 1 * student_t_lccdf(0 | 3, 0, 5.9);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1 : N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
We see that brms
is using student t distributions with 3 degrees of
freedom for the priors. For the two error SDs, this will be truncated at
zero to form half-t distributions. You can get a more explicit
description of the priors with prior_summary(bmod)
. These are
qualitatively similar to the half-normal and the PC prior used in the
INLA fit.
We examine the fit:
summary(bmod)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: yield ~ treat + (1 | blend)
Data: penicillin (Number of observations: 20)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~blend (Number of levels: 5)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 3.82 2.34 0.37 9.85 1.01 752 717
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 84.18 2.79 78.60 89.95 1.00 1354 970
treatB 0.90 3.15 -5.45 7.08 1.00 2352 2158
treatC 4.96 3.13 -1.37 11.08 1.00 2464 2550
treatD 1.91 3.13 -4.58 7.91 1.00 2448 2294
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 4.80 1.05 3.23 7.38 1.00 1263 1937
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The parameterisation of the treatment effects is different from the STAN version but not in an important way.
We can estimate the tail probability as before
bps = as_draws_df(bmod)
mean(bps$sd_blend__Intercept < 1)
[1] 0.07625
About the same as seen previously. The priors used here put greater weight on smaller values of the SD.
It is possible to fit some GLMMs within the GAM framework of the mgcv
package. An explanation of this can be found in this
blog
The blend
term must be a factor for this to work:
gmod = gam(yield ~ treat + s(blend, bs = 're'), data=penicillin, method="REML")
and look at the summary output:
summary(gmod)
Family: gaussian
Link function: identity
Formula:
yield ~ treat + s(blend, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 84.00 2.48 33.94 3.5e-14
treatB 1.00 2.75 0.36 0.721
treatC 5.00 2.75 1.82 0.091
treatD 2.00 2.75 0.73 0.479
Approximate significance of smooth terms:
edf Ref.df F p-value
s(blend) 2.86 4 2.5 0.038
R-sq.(adj) = 0.361 Deviance explained = 55.8%
-REML = 51.915 Scale est. = 18.833 n = 20
We get the fixed effect estimates. We also get a test on the random effect (as described in this article. The hypothesis of no variation between the operators is rejected.
We can get an estimate of the operator and error SD:
gam.vcomp(gmod)
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(blend) 3.4339 1.2853 9.1744
scale 4.3397 2.9088 6.4746
Rank: 2/2
which is the same as the REML estimate from lmer
earlier.
The random effect estimates for the four operators can be found with:
coef(gmod)
(Intercept) treatB treatC treatD s(blend).1 s(blend).2 s(blend).3 s(blend).4 s(blend).5
84.00000 1.00000 5.00000 2.00000 4.28789 -2.14394 -0.71465 1.42930 -2.85859
which is again the same as before.
In Wood (2019), a simplified
version of INLA is proposed. The first construct the GAM model without
fitting and then use the ginla()
function to perform the computation.
gmod = gam(yield ~ treat + s(blend, bs = 're'), data=penicillin, fit = FALSE)
gimod = ginla(gmod)
We get the posterior density for the intercept as:
plot(gimod$beta[1,],gimod$density[1,],type="l",xlab="yield",ylab="density")
and for the treatment effects as:
xmat = t(gimod$beta[2:4,])
ymat = t(gimod$density[2:4,])
matplot(xmat, ymat,type="l",xlab="yield",ylab="density")
legend("right",c("B","C","D"),col=1:3,lty=1:3)
xmat = t(gimod$beta[5:9,])
ymat = t(gimod$density[5:9,])
matplot(xmat, ymat,type="l",xlab="yield",ylab="density")
legend("right",paste0("blend",1:5),col=1:5,lty=1:5)
It is not straightforward to obtain the posterior densities of the hyperparameters.
See the Discussion of the single random effect model for general comments. In this example, the default model for INLA failed due to a default prior that was insufficiently informative. But the default prior in STAN produced more credible results. As in the simple single random effect sample, the conclusions were very sensitive to the choice of prior. There was a substantive difference between STAN and INLA, particularly regarding the lower tail of the blend SD. Although the priors are not identical, preventing direct comparison, STAN does give higher weight to the lower tail. This is concerning.
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgcv_1.9-1 nlme_3.1-165 brms_2.21.0 Rcpp_1.0.13 cmdstanr_0.8.1 knitr_1.48 INLA_24.06.27
[8] sp_2.1-4 lme4_1.1-35.5 Matrix_1.7-0 ggplot2_3.5.1 faraway_1.0.8
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 farver_2.1.2 dplyr_1.1.4 loo_2.8.0 fastmap_1.2.0
[6] tensorA_0.36.2.1 digest_0.6.36 estimability_1.5.1 lifecycle_1.0.4 Deriv_4.1.3
[11] sf_1.0-16 StanHeaders_2.32.10 processx_3.8.4 magrittr_2.0.3 posterior_1.6.0
[16] compiler_4.4.1 rlang_1.1.4 tools_4.4.1 utf8_1.2.4 yaml_2.3.10
[21] data.table_1.15.4 labeling_0.4.3 bridgesampling_1.1-2 pkgbuild_1.4.4 classInt_0.4-10
[26] plyr_1.8.9 abind_1.4-5 KernSmooth_2.23-24 withr_3.0.1 grid_4.4.1
[31] stats4_4.4.1 fansi_1.0.6 xtable_1.8-4 e1071_1.7-14 colorspace_2.1-1
[36] inline_0.3.19 emmeans_1.10.3 scales_1.3.0 MASS_7.3-61 cli_3.6.3
[41] mvtnorm_1.2-5 rmarkdown_2.27 generics_0.1.3 RcppParallel_5.1.8 rstudioapi_0.16.0
[46] reshape2_1.4.4 minqa_1.2.7 DBI_1.2.3 proxy_0.4-27 rstan_2.32.6
[51] stringr_1.5.1 splines_4.4.1 bayesplot_1.11.1 parallel_4.4.1 matrixStats_1.3.0
[56] vctrs_0.6.5 boot_1.3-30 jsonlite_1.8.8 systemfonts_1.1.0 units_0.8-5
[61] glue_1.7.0 nloptr_2.1.1 codetools_0.2-20 ps_1.7.7 distributional_0.4.0
[66] stringi_1.8.4 gtable_0.3.5 QuickJSR_1.3.1 munsell_0.5.1 tibble_3.2.1
[71] pillar_1.9.0 htmltools_0.5.8.1 Brobdingnag_1.2-9 R6_2.5.1 fmesher_0.1.7
[76] evaluate_0.24.0 lattice_0.22-6 backports_1.5.0 rstantools_2.4.0 class_7.3-22
[81] svglite_2.1.3 coda_0.19-4.1 gridExtra_2.3 checkmate_2.3.2 xfun_0.46
[86] pkgconfig_2.0.3