Julian Faraway 06 January 2023
See the introduction for an overview.
This example is discussed in more detail in my book Extending the Linear Model with R
Required libraries:
library(faraway)
library(ggplot2)
library(lme4)
library(pbkrtest)
library(RLRsim)
library(INLA)
library(knitr)
library(rstan, quietly=TRUE)
library(brms)
library(mgcv)
Read about our analysis of some data from the Junior Schools Project. In addition to a math test, students also took a test in English. Although it would be possible to analyze the English test results in the same way that we analyzed the math scores, additional information may be obtained from analyzing them simultaneously. Hence we view the data as having a bivariate response with English and math scores for each student. The student is a nested factor within the class which is in turn nested within the school. We express the multivariate response for each individual by introducing an additional level of nesting at the individual level. So we might view this as just another nested model except that there is a fixed subject effect associated with this lowest level of nesting.
We set up the data in a format with one test score per line with an
indicator subject
identifying which type of test was taken. We scale
the English and math test scores by their maximum possible values, 40
and 100, respectively, to aid comparison:
data(jsp, package="faraway")
jspr <- jsp[jsp$year==2,]
mjspr <- data.frame(rbind(jspr[,1:6],jspr[,1:6]),
subject=factor(rep(c("english","math"),c(953,953))),
score=c(jspr$english/100,jspr$math/40))
We can plot the data
ggplot(mjspr, aes(x=raven, y=score))+geom_jitter(alpha=0.25)+facet_grid(gender ~ subject)
We now fit a model for the data that includes all the variables of interest that incorporates some of the interactions that we suspect might be present. See Extending the Linear Model with R,
mjspr$craven <- mjspr$raven-mean(mjspr$raven)
mmod <- lmer(score ~ subject*gender + craven*subject + social + (1|school) + (1|school:class) + (1|school:class:id),mjspr)
faraway::sumary(mmod)
Fixed Effects:
coef.est coef.se
(Intercept) 0.44 0.03
subjectmath 0.37 0.01
gendergirl 0.06 0.01
craven 0.02 0.00
social2 0.01 0.03
social3 -0.02 0.03
social4 -0.07 0.03
social5 -0.05 0.03
social6 -0.09 0.03
social7 -0.10 0.03
social8 -0.08 0.04
social9 -0.05 0.03
subjectmath:gendergirl -0.06 0.01
subjectmath:craven 0.00 0.00
Random Effects:
Groups Name Std.Dev.
school:class:id (Intercept) 0.10
school:class (Intercept) 0.02
school (Intercept) 0.05
Residual 0.12
---
number of obs: 1906, groups: school:class:id, 953; school:class, 90; school, 48
AIC = -1705.6, DIC = -1951.1
deviance = -1846.4
The model being fit for school
We can test some fixed effects:
mmod <- lmer(score ~ subject*gender+craven*subject+social+ (1|school)+(1|school:class)+(1|school:class:id),mjspr, REML=FALSE)
mmodr <- lmer(score ~ subject*gender+craven+subject+social+(1|school)+(1|school:class)+(1|school:class:id),mjspr, REML=FALSE)
KRmodcomp(mmod, mmodr)
large : score ~ subject + gender + craven + social + (1 | school) + (1 |
school:class) + (1 | school:class:id) + subject:gender +
subject:craven
small : score ~ subject * gender + craven + subject + social + (1 | school) +
(1 | school:class) + (1 | school:class:id)
stat ndf ddf F.scaling p.value
Ftest 16 1 950 1 6.9e-05
We are testing for a subject by gender interaction. We can see that this effect is strongly statistically significant.
We can compute confidence intervals for the parameters:
set.seed(123)
confint(mmod, method="boot", oldNames=FALSE)
2.5 % 97.5 %
sd_(Intercept)|school:class:id 0.0903727 0.1085795
sd_(Intercept)|school:class 0.0000000 0.0416423
sd_(Intercept)|school 0.0235886 0.0599511
sigma 0.1110622 0.1206808
(Intercept) 0.3952423 0.4937078
subjectmath 0.3496975 0.3816522
gendergirl 0.0441439 0.0830535
craven 0.0156687 0.0192079
social2 -0.0384890 0.0660042
social3 -0.0810421 0.0348846
social4 -0.1211400 -0.0210216
social5 -0.1080110 0.0020657
social6 -0.1482613 -0.0270518
social7 -0.1574203 -0.0324216
social8 -0.1747016 -0.0057936
social9 -0.1033001 0.0065696
subjectmath:gendergirl -0.0797471 -0.0365518
subjectmath:craven -0.0057538 -0.0021502
The lower end of the class confidence interval is zero while the school random effect is clearly larger. There is some variation associated with individuals.
Integrated nested Laplace approximation is a method of Bayesian computation which uses approximation rather than simulation. More can be found on this topic in Bayesian Regression Modeling with INLA and the chapter on GLMMs
Use the most recent computational methodology:
inla.setOption(inla.mode="experimental")
inla.setOption("short.summary",TRUE)
Need to construct unique labels for nested factor levels of class and student:
mjspr$school <- factor(mjspr$school)
mjspr$classch <- factor(paste(mjspr$school,mjspr$class,sep="."))
mjspr$classchid <- factor(paste(mjspr$school,mjspr$class,mjspr$id,sep="."))
formula <- score ~ subject*gender+craven*subject+social + f(school, model="iid") + f(classch, model="iid") + f(classchid, model="iid")
result <- inla(formula, family="gaussian", data=mjspr)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.442 0.026 0.390 0.442 0.493 0.442 0
subjectmath 0.367 0.008 0.351 0.367 0.382 0.367 0
gendergirl 0.064 0.010 0.044 0.064 0.084 0.064 0
craven 0.017 0.001 0.016 0.017 0.019 0.017 0
social2 0.014 0.027 -0.039 0.014 0.067 0.014 0
social3 -0.021 0.029 -0.078 -0.021 0.036 -0.021 0
social4 -0.071 0.026 -0.122 -0.071 -0.021 -0.071 0
social5 -0.050 0.029 -0.107 -0.050 0.006 -0.050 0
social6 -0.089 0.031 -0.149 -0.089 -0.029 -0.089 0
social7 -0.100 0.032 -0.162 -0.100 -0.038 -0.100 0
social8 -0.082 0.042 -0.165 -0.082 0.001 -0.082 0
social9 -0.048 0.027 -0.102 -0.048 0.006 -0.048 0
subjectmath:gendergirl -0.059 0.011 -0.080 -0.059 -0.038 -0.059 0
subjectmath:craven -0.004 0.001 -0.006 -0.004 -0.002 -0.004 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 73.61 3.38 67.11 73.56 80.44 73.49
Precision for school 418.85 129.75 181.64 411.67 671.79 405.43
Precision for classch 14210.33 43224.49 1097.46 5304.42 87203.49 1793.65
Precision for classchid 96.52 7.30 82.24 96.50 111.00 96.94
is computed
Maybe OK but let’s try some more informative priors.
Now try more informative gamma priors for the precisions. Define it so
the mean value of gamma prior is set to the inverse of the variance of
the residuals of the fixed-effects only model. We expect the error
variances to be lower than this variance so this is an overestimate. The
variance of the gamma prior (for the precision) is controlled by the
apar
parameter.
apar <- 0.5
lmod <- lm(score ~ subject*gender+craven*subject+social,mjspr)
bpar <- apar*var(residuals(lmod))
lgprior <- list(prec = list(prior="loggamma", param = c(apar,bpar)))
formula = score ~ subject*gender+craven*subject+social+f(school, model="iid", hyper = lgprior)+f(classch, model="iid", hyper = lgprior)+f(classchid, model="iid", hyper = lgprior)
result <- inla(formula, family="gaussian", data=mjspr)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.441 0.027 0.387 0.441 0.495 0.441 0
subjectmath 0.367 0.008 0.351 0.367 0.382 0.367 0
gendergirl 0.062 0.010 0.042 0.062 0.082 0.062 0
craven 0.017 0.001 0.016 0.017 0.019 0.017 0
social2 0.013 0.028 -0.041 0.013 0.067 0.013 0
social3 -0.020 0.029 -0.077 -0.020 0.038 -0.020 0
social4 -0.069 0.026 -0.120 -0.069 -0.018 -0.069 0
social5 -0.050 0.029 -0.108 -0.050 0.007 -0.050 0
social6 -0.085 0.031 -0.146 -0.085 -0.024 -0.085 0
social7 -0.098 0.032 -0.161 -0.098 -0.036 -0.098 0
social8 -0.080 0.043 -0.164 -0.080 0.004 -0.080 0
social9 -0.046 0.028 -0.101 -0.046 0.008 -0.046 0
subjectmath:gendergirl -0.059 0.011 -0.080 -0.059 -0.038 -0.059 0
subjectmath:craven -0.004 0.001 -0.006 -0.004 -0.002 -0.004 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 73.78 3.38 67.31 73.72 80.61 73.62
Precision for school 367.26 112.60 193.27 351.60 632.55 322.34
Precision for classch 473.37 133.13 263.72 456.02 783.90 423.35
Precision for classchid 99.80 8.55 84.21 99.36 117.87 98.37
is computed
Compute the transforms to an SD scale for the field and error. Make a table of summary statistics for the posteriors:
sigmaschool <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaclass <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaid <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[4]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaschool,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaclass,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaid,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c(names(lmod$coef),"school","class","id","epsilon")
data.frame(restab)
X.Intercept. subjectmath gendergirl craven social2 social3 social4 social5 social6
mean 0.44092 0.36656 0.062025 0.01735 0.013317 -0.019764 -0.068989 -0.050492 -0.085273
sd 0.027403 0.0077025 0.010284 0.00093429 0.027504 0.029309 0.026179 0.029123 0.031004
quant0.025 0.38713 0.35145 0.041843 0.015517 -0.040668 -0.077295 -0.12038 -0.10766 -0.14614
quant0.25 0.42239 0.36135 0.055067 0.016718 -0.0052859 -0.039588 -0.086696 -0.07019 -0.10624
quant0.5 0.44087 0.36655 0.062003 0.017348 0.013259 -0.019826 -0.069044 -0.050554 -0.085336
quant0.75 0.45934 0.37174 0.068939 0.017978 0.031804 -6.3476e-05 -0.051392 -0.030917 -0.064432
quant0.975 0.4946 0.38165 0.08217 0.01918 0.067192 0.037645 -0.017711 0.0065533 -0.024549
social7 social8 social9 subjectmath.gendergirl subjectmath.craven school class id
mean -0.098472 -0.08022 -0.046173 -0.059194 -0.0037203 0.053973 0.047292 0.10037
sd 0.031979 0.042735 0.027725 0.010696 0.00092951 0.0081267 0.0065314 0.0042548
quant0.025 -0.16125 -0.16411 -0.1006 -0.080189 -0.0055448 0.039858 0.035797 0.092174
quant0.25 -0.1201 -0.10912 -0.064925 -0.066428 -0.004349 0.048197 0.042663 0.09745
quant0.5 -0.098538 -0.080309 -0.046231 -0.059217 -0.0037222 0.053312 0.046815 0.10032
quant0.75 -0.076976 -0.051494 -0.027536 -0.052005 -0.0030955 0.059031 0.051399 0.10322
quant0.975 -0.035836 0.003486 0.0081337 -0.038244 -0.0018996 0.071718 0.06141 0.10888
epsilon
mean 0.11651
sd 0.0026509
quant0.025 0.11142
quant0.25 0.11469
quant0.5 0.11646
quant0.75 0.11828
quant0.975 0.12183
Also construct a plot of the SD posteriors:
ddf <- data.frame(rbind(sigmaschool,sigmaclass,sigmaid,sigmaepsilon),
errterm=gl(4,nrow(sigmaepsilon),
labels = c("school","class","id","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm, color=errterm))+geom_line()+xlab("score")+ylab("density")+xlim(0,0.15)
Posteriors for the school and class assign no weight to values close to zero.
In Simpson et al (2015), penalized complexity priors are proposed. This requires that we specify a scaling for the SDs of the random effects. We use the SD of the residuals of the fixed effects only model (what might be called the base model in the paper) to provide this scaling.
lmod <- lm(score ~ subject*gender+craven*subject+social,mjspr)
sdres <- sd(residuals(lmod))
pcprior <- list(prec = list(prior="pc.prec", param = c(3*sdres,0.01)))
formula = score ~ subject*gender+craven*subject+social+f(school, model="iid", hyper = pcprior)+f(classch, model="iid", hyper = pcprior)+f(classchid, model="iid", hyper = pcprior)
result <- inla(formula, family="gaussian", data=mjspr)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.441 0.027 0.389 0.441 0.494 0.441 0
subjectmath 0.367 0.008 0.351 0.367 0.382 0.367 0
gendergirl 0.063 0.010 0.043 0.063 0.083 0.063 0
craven 0.017 0.001 0.016 0.017 0.019 0.017 0
social2 0.014 0.027 -0.040 0.014 0.067 0.014 0
social3 -0.021 0.029 -0.078 -0.021 0.036 -0.021 0
social4 -0.070 0.026 -0.121 -0.070 -0.020 -0.070 0
social5 -0.050 0.029 -0.107 -0.050 0.006 -0.050 0
social6 -0.088 0.031 -0.148 -0.088 -0.027 -0.088 0
social7 -0.099 0.032 -0.161 -0.099 -0.037 -0.099 0
social8 -0.081 0.042 -0.165 -0.081 0.002 -0.081 0
social9 -0.047 0.028 -0.101 -0.047 0.007 -0.047 0
subjectmath:gendergirl -0.059 0.011 -0.080 -0.059 -0.038 -0.059 0
subjectmath:craven -0.004 0.001 -0.006 -0.004 -0.002 -0.004 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 73.67 3.38 67.25 73.58 80.57 73.39
Precision for school 412.31 131.39 204.31 395.84 716.83 364.90
Precision for classch 5411.85 8091.11 739.90 3116.42 24907.73 1484.85
Precision for classchid 97.71 8.25 82.58 97.32 115.06 96.48
is computed
Compute the summaries as before:
sigmaschool <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaclass <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaid <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[4]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaschool,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaclass,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaid,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c(names(lmod$coef),"school","class","id","epsilon")
data.frame(restab)
X.Intercept. subjectmath gendergirl craven social2 social3 social4 social5 social6 social7
mean 0.44146 0.36656 0.063242 0.017388 0.013804 -0.020589 -0.070472 -0.050412 -0.087583 -0.099294
sd 0.0266 0.0077092 0.010278 0.00092627 0.027283 0.029036 0.025932 0.028876 0.030755 0.031691
quant0.025 0.38924 0.35143 0.043073 0.01557 -0.03975 -0.077584 -0.12137 -0.10709 -0.14796 -0.1615
quant0.25 0.42347 0.36135 0.056287 0.016761 -0.00465 -0.04023 -0.088012 -0.069944 -0.10838 -0.12073
quant0.5 0.44141 0.36655 0.063219 0.017386 0.013747 -0.02065 -0.070526 -0.050472 -0.087645 -0.099359
quant0.75 0.45934 0.37175 0.070151 0.018011 0.032144 -0.0010709 -0.05304 -0.031001 -0.066908 -0.07799
quant0.975 0.49356 0.38166 0.083375 0.019202 0.067246 0.036286 -0.019679 0.00615 -0.027347 -0.037223
social8 social9 subjectmath.gendergirl subjectmath.craven school class id epsilon
mean -0.081413 -0.047174 -0.059194 -0.0037203 0.051128 0.018925 0.10143 0.1166
sd 0.042437 0.027502 0.010705 0.00093031 0.0082355 0.007883 0.0042421 0.0026571
quant0.025 -0.16472 -0.10116 -0.080207 -0.0055464 0.037452 0.006418 0.093292 0.11145
quant0.25 -0.11012 -0.065776 -0.066435 -0.0043496 0.045239 0.012943 0.098514 0.11478
quant0.5 -0.081501 -0.047231 -0.059217 -0.0037222 0.05021 0.018086 0.10136 0.11657
quant0.75 -0.052886 -0.028687 -0.051998 -0.0030949 0.05608 0.023924 0.10426 0.11838
quant0.975 0.0017069 0.0066922 -0.038227 -0.0018981 0.069695 0.036409 0.10995 0.12188
Make the plots:
ddf <- data.frame(rbind(sigmaschool,sigmaclass,sigmaid,sigmaepsilon),
errterm=gl(4,nrow(sigmaepsilon),
labels = c("school","class","id","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm, color=errterm))+geom_line()+xlab("score")+ylab("density")+xlim(0,0.15)
Class variation is quite small compared to the other sources.
STAN performs Bayesian inference using MCMC. Set up STAN to use multiple cores. Set the random number seed for reproducibility.
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
Fit the model. Requires use of STAN command file multiple.stan. We view the code here:
writeLines(readLines("../stancode/multiple.stan"))
data {
int<lower=0> Nobs;
int<lower=0> Npreds;
int<lower=0> Nlev1;
int<lower=0> Nlev2;
int<lower=0> Nlev3;
vector[Nobs] y;
matrix[Nobs,Npreds] x;
int<lower=1,upper=Nlev1> levind1[Nobs];
int<lower=1,upper=Nlev2> levind2[Nobs];
int<lower=1,upper=Nlev3> levind3[Nobs];
real<lower=0> sdscal;
}
parameters {
vector[Npreds] beta;
real<lower=0> sigmalev1;
real<lower=0> sigmalev2;
real<lower=0> sigmalev3;
real<lower=0> sigmaeps;
vector[Nlev1] eta1;
vector[Nlev2] eta2;
vector[Nlev3] eta3;
}
transformed parameters {
vector[Nlev1] ran1;
vector[Nlev2] ran2;
vector[Nlev3] ran3;
vector[Nobs] yhat;
ran1 = sigmalev1 * eta1;
ran2 = sigmalev2 * eta2;
ran3 = sigmalev3 * eta3;
for (i in 1:Nobs)
yhat[i] = x[i]*beta+ran1[levind1[i]]+ran2[levind2[i]]+ran3[levind3[i]];
}
model {
eta1 ~ normal(0, 1);
eta2 ~ normal(0, 1);
eta3 ~ normal(0, 1);
sigmalev1 ~ cauchy(0, 2.5*sdscal);
sigmalev2 ~ cauchy(0, 2.5*sdscal);
sigmalev3 ~ cauchy(0, 2.5*sdscal);
sigmaeps ~ cauchy(0, 2.5*sdscal);
y ~ normal(yhat, sigmaeps);
}
We have used uninformative priors for the treatment effects but slightly informative half-cauchy priors for the variances. All the fixed effects have been collected into a single design matrix. The school and class variables need to be renumbered into consecutive positive integers. Somewhat inconvenient since the schools are numbered up to 50 but have no data for two schools so only 48 schools are actually used.
mjspr$craven <- mjspr$raven-mean(mjspr$raven)
lmod <- lm(score ~ subject*gender+craven*subject+social,mjspr)
sdscal <- sd(residuals(lmod))
Xmatrix <- model.matrix(lmod)
mjspr$school <- factor(mjspr$school)
mjspr$classch <- factor(paste(mjspr$school,mjspr$class,sep="."))
mjspr$classchid <- factor(paste(mjspr$school,mjspr$class,mjspr$id,sep="."))
jspdat <- list(Nobs=nrow(mjspr),
Npreds=ncol(Xmatrix),
Nlev1=length(unique(mjspr$school)),
Nlev2=length(unique(mjspr$classch)),
Nlev3=length(unique(mjspr$classchid)),
y=mjspr$score,
x=Xmatrix,
levind1=as.numeric(mjspr$school),
levind2=as.numeric(mjspr$classch),
levind3=as.numeric(mjspr$classchid),
sdscal=sdscal)
Break the fitting of the model into three steps. We use 5x the default number of iterations to ensure sufficient sample size for the later estimations.
rt <- stanc("../stancode/multiple.stan")
sm <- stan_model(stanc_ret = rt, verbose=FALSE)
set.seed(123)
system.time(fit <- sampling(sm, data=jspdat, iter=10000))
user system elapsed
670.801 24.611 248.132
For the error SD:
pname <- "sigmaeps"
muc <- rstan::extract(fit, pars=pname, permuted=FALSE, inc_warmup=FALSE)
mdf <- reshape2::melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])
For the school SD
pname <- "sigmalev1"
muc <- rstan::extract(fit, pars=pname, permuted=FALSE, inc_warmup=FALSE)
mdf <- reshape2::melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])
For the class SD
pname <- "sigmalev2"
muc <- rstan::extract(fit, pars=pname, permuted=FALSE, inc_warmup=FALSE)
mdf <- reshape2::melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])
For the id SD
pname <- "sigmalev3"
muc <- rstan::extract(fit, pars=pname, permuted=FALSE, inc_warmup=FALSE)
mdf <- reshape2::melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])
All these are satisfactory.
Examine the main parameters of interest:
print(fit,pars=c("beta","sigmalev1","sigmalev2","sigmalev2","sigmaeps"))
Inference for Stan model: multiple.
4 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=20000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 0.44 0 0.03 0.39 0.42 0.44 0.46 0.49 4703 1
beta[2] 0.37 0 0.01 0.35 0.36 0.37 0.37 0.38 22269 1
beta[3] 0.06 0 0.01 0.04 0.06 0.06 0.07 0.08 15107 1
beta[4] 0.02 0 0.00 0.02 0.02 0.02 0.02 0.02 21172 1
beta[5] 0.01 0 0.03 -0.04 0.00 0.01 0.03 0.07 4939 1
beta[6] -0.02 0 0.03 -0.08 -0.04 -0.02 0.00 0.04 5047 1
beta[7] -0.07 0 0.03 -0.12 -0.09 -0.07 -0.05 -0.02 4464 1
beta[8] -0.05 0 0.03 -0.11 -0.07 -0.05 -0.03 0.01 5148 1
beta[9] -0.09 0 0.03 -0.15 -0.11 -0.09 -0.07 -0.03 5220 1
beta[10] -0.10 0 0.03 -0.16 -0.12 -0.10 -0.08 -0.04 5677 1
beta[11] -0.08 0 0.04 -0.16 -0.11 -0.08 -0.05 0.00 7628 1
beta[12] -0.05 0 0.03 -0.10 -0.07 -0.05 -0.03 0.01 4721 1
beta[13] -0.06 0 0.01 -0.08 -0.07 -0.06 -0.05 -0.04 22953 1
beta[14] 0.00 0 0.00 -0.01 0.00 0.00 0.00 0.00 33522 1
sigmalev1 0.05 0 0.01 0.03 0.04 0.05 0.05 0.07 2681 1
sigmalev2 0.02 0 0.01 0.00 0.01 0.02 0.03 0.05 1374 1
sigmaeps 0.12 0 0.00 0.11 0.11 0.12 0.12 0.12 8110 1
Samples were drawn using NUTS(diag_e) at Fri Jan 6 10:46:26 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Remember that the beta correspond to the following parameters:
colnames(Xmatrix)
[1] "(Intercept)" "subjectmath" "gendergirl" "craven"
[5] "social2" "social3" "social4" "social5"
[9] "social6" "social7" "social8" "social9"
[13] "subjectmath:gendergirl" "subjectmath:craven"
The results are comparable to the REML fit. The effective sample sizes are sufficient.
We can use extract to get at various components of the STAN fit. First consider the SDs for random components:
postsig <- rstan::extract(fit, pars=c("sigmaeps","sigmalev1","sigmalev2","sigmalev3"))
ref <- reshape2::melt(postsig,value.name="score")
ref$L1 = factor(ref$L1)
levels(ref$L1) = c("epsilon","school","class","id")
ggplot(data=ref,aes(x=score, color=L1))+geom_density()+guides(color=guide_legend(title="SD"))
As usual the error SD distribution is more concentrated. The class SD is more diffuse, smaller and gives some weight to values close to zero. Now the treatment effects:
ref <- reshape2::melt(rstan::extract(fit, pars="beta"))
colnames(ref)[2:3] <- c("parameter","score")
ref$parameter <- factor(colnames(Xmatrix)[ref$parameter])
ggplot(ref, aes(x=score))+geom_density()+geom_vline(xintercept = 0) + facet_wrap(~parameter,scales="free")
BRMS stands for Bayesian
Regression Models with STAN. It provides a convenient wrapper to STAN
functionality. We specify the model as in lmer()
above. I have used
more than the standard number of iterations because this reduces some
problems and does not cost much computationally.
suppressMessages(bmod <- brm(score ~ subject*gender + craven*subject + social + (1|school) + (1|school:class) + (1|school:class:id),data=mjspr,iter=10000, cores=4))
We get some minor warnings. We can obtain some posterior densities and diagnostics with:
plot(bmod, variable = "^s", regex=TRUE)
We have chosen only the random effect hyperparameters since this is where problems will appear first. Looks OK. We can see some weight is given to values of the class effect SD close to zero.
We can look at the STAN code that brms
used with:
stancode(bmod)
// generated with brms 2.18.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
int<lower=1> J_2[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
// data for group-level effects of ID 3
int<lower=1> N_3; // number of grouping levels
int<lower=1> M_3; // number of coefficients per level
int<lower=1> J_3[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_3_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
vector[N_2] z_2[M_2]; // standardized group-level effects
vector<lower=0>[M_3] sd_3; // group-level standard deviations
vector[N_3] z_3[M_3]; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_2] r_2_1; // actual group-level effects
vector[N_3] r_3_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
r_2_1 = (sd_2[1] * (z_2[1]));
r_3_1 = (sd_3[1] * (z_3[1]));
lprior += student_t_lpdf(Intercept | 3, 0.6, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_2 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_3 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n] + r_3_1[J_3[n]] * Z_3_1[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_2[1]);
target += std_normal_lpdf(z_3[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
We see that brms
is using student t distributions with 3 degrees of
freedom for the priors. For the three error SDs, this will be truncated
at zero to form half-t distributions. You can get a more explicit
description of the priors with prior_summary(bmod)
. These are
qualitatively similar to the the PC prior used in the INLA fit.
We examine the fit:
summary(bmod)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: score ~ subject * gender + craven * subject + social + (1 | school) + (1 | school:class) + (1 | school:class:id)
Data: mjspr (Number of observations: 1906)
Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
total post-warmup draws = 20000
Group-Level Effects:
~school (Number of levels: 48)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.05 0.01 0.02 0.07 1.01 1076 524
~school:class (Number of levels: 90)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.02 0.01 0.00 0.05 1.01 853 612
~school:class:id (Number of levels: 953)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.10 0.00 0.09 0.11 1.00 3700 7429
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.44 0.03 0.39 0.49 1.00 3577 5953
subjectmath 0.37 0.01 0.35 0.38 1.00 12670 14306
gendergirl 0.06 0.01 0.04 0.08 1.00 8311 10174
craven 0.02 0.00 0.02 0.02 1.00 12032 13873
social2 0.01 0.03 -0.04 0.07 1.00 3634 6670
social3 -0.02 0.03 -0.08 0.04 1.00 3897 7206
social4 -0.07 0.03 -0.12 -0.02 1.00 3381 6115
social5 -0.05 0.03 -0.11 0.01 1.00 3805 7224
social6 -0.09 0.03 -0.15 -0.03 1.00 4037 7403
social7 -0.10 0.03 -0.16 -0.04 1.00 3881 7358
social8 -0.08 0.04 -0.16 -0.00 1.00 5633 9493
social9 -0.05 0.03 -0.10 0.01 1.00 3601 6673
subjectmath:gendergirl -0.06 0.01 -0.08 -0.04 1.00 9947 10819
subjectmath:craven -0.00 0.00 -0.01 -0.00 1.00 33301 14201
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.12 0.00 0.11 0.12 1.00 6693 10845
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results are consistent with those seen previously.
It is possible to fit some GLMMs within the GAM framework of the mgcv
package. An explanation of this can be found in this
blog
gmod = gam(score ~ subject*gender + craven*subject + social +
s(school,bs="re") +
s(classch,bs="re") +
s(classchid,bs="re"),
data=mjspr, method="REML")
and look at the summary output:
summary(gmod)
Family: gaussian
Link function: identity
Formula:
score ~ subject * gender + craven * subject + social + s(school,
bs = "re") + s(classch, bs = "re") + s(classchid, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.441578 0.026459 16.69 < 2e-16
subjectmath 0.366565 0.007710 47.54 < 2e-16
gendergirl 0.063351 0.010254 6.18 8.6e-10
craven 0.017390 0.000925 18.81 < 2e-16
social2 0.013754 0.027230 0.51 0.6136
social3 -0.020768 0.028972 -0.72 0.4736
social4 -0.070708 0.025868 -2.73 0.0064
social5 -0.050474 0.028818 -1.75 0.0801
social6 -0.087852 0.030672 -2.86 0.0042
social7 -0.099408 0.031607 -3.15 0.0017
social8 -0.081623 0.042352 -1.93 0.0542
social9 -0.047337 0.027445 -1.72 0.0848
subjectmath:gendergirl -0.059194 0.010706 -5.53 3.9e-08
subjectmath:craven -0.003720 0.000930 -4.00 6.7e-05
Approximate significance of smooth terms:
edf Ref.df F p-value
s(school) 28.8 47 14.86 <2e-16
s(classch) 14.2 89 0.89 0.22
s(classchid) 540.6 942 1.54 <2e-16
R-sq.(adj) = 0.794 Deviance explained = 85.8%
-REML = -870.79 Scale est. = 0.013592 n = 1906
We get the fixed effect estimates. We also get tests on the random effects (as described in this article. The hypothesis of no variation is rejected for the school and id but not for the class. This is consistent with earlier findings.
We can get an estimate of the operator and error SD:
gam.vcomp(gmod)
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(school) 0.047230 0.0324228 0.068801
s(classch) 0.024123 0.0089072 0.065330
s(classchid) 0.101253 0.0930705 0.110154
scale 0.116583 0.1114571 0.121945
Rank: 4/4
The point estimates are the same as the REML estimates from lmer
earlier. The confidence intervals are different. A bootstrap method was
used for the lmer
fit whereas gam
is using an asymptotic
approximation resulting in substantially different results. Given the
problems of parameters on the boundary present in this example, the
bootstrap results appear more trustworthy.
The fixed effect estimates can be found with:
coef(gmod)[1:14]
(Intercept) subjectmath gendergirl craven social2
0.4415780 0.3665647 0.0633509 0.0173905 0.0137536
social3 social4 social5 social6 social7
-0.0207677 -0.0707076 -0.0504741 -0.0878520 -0.0994077
social8 social9 subjectmath:gendergirl subjectmath:craven
-0.0816234 -0.0473366 -0.0591943 -0.0037203
The remaining random effects are too numerous to print.
In Wood (2019), a simplified
version of INLA is proposed. The first construct the GAM model without
fitting and then use the ginla()
function to perform the computation.
gmod = gam(score ~ subject*gender + craven*subject + social +
s(school,bs="re") +
s(classch,bs="re") +
s(classchid,bs="re"),
data=mjspr, fit = FALSE)
gimod = ginla(gmod)
We get the posterior density for the intercept as:
plot(gimod$beta[1,],gimod$density[1,],type="l",xlab="score",ylab="density")
We get the posterior density for the math effect as:
plot(gimod$beta[2,],gimod$density[2,],type="l",xlab="score",ylab="density")
and for the social effects as:
xmat = t(gimod$beta[5:12,])
ymat = t(gimod$density[5:12,])
matplot(xmat, ymat,type="l",xlab="score",ylab="density")
legend("left",paste0("social",2:9),col=1:8,lty=1:8)
We can see some overlap between the effects, but strong evidence of a negative outcome relative to social class 1 for some classes.
It is not straightforward to obtain the posterior densities of the hyperparameters.
See the Discussion of the single random effect model for general comments.
-
As with the previous analyses, sometimes the INLA posteriors for the hyperparameters have densities which do not give weight to close-to-zero values where other analyses suggest this might be reasonable.
-
There is relatively little disagreement between the methods and much similarity.
-
There were no major computational issue with the analyses (in contrast with some of the other examples)
-
The
mgcv
analyses (both standard and ginla) took much longer than previous analyses because the sample size is larger and there are a large number of random effects — slower than any of the other analyses.
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgcv_1.8-41 nlme_3.1-161 brms_2.18.0 Rcpp_1.0.9 rstan_2.26.13
[6] StanHeaders_2.26.13 knitr_1.41 INLA_22.12.16 sp_1.5-1 foreach_1.5.2
[11] RLRsim_3.1-8 pbkrtest_0.5.1 lme4_1.1-31 Matrix_1.5-3 ggplot2_3.4.0
[16] faraway_1.0.9
loaded via a namespace (and not attached):
[1] minqa_1.2.5 colorspace_2.0-3 ellipsis_0.3.2 markdown_1.4 base64enc_0.1-3
[6] rstudioapi_0.14 Deriv_4.1.3 farver_2.1.1 MatrixModels_0.5-1 DT_0.26
[11] fansi_1.0.3 mvtnorm_1.1-3 bridgesampling_1.1-2 codetools_0.2-18 splines_4.2.1
[16] shinythemes_1.2.0 bayesplot_1.10.0 jsonlite_1.8.4 nloptr_2.0.3 broom_1.0.2
[21] shiny_1.7.4 compiler_4.2.1 backports_1.4.1 assertthat_0.2.1 fastmap_1.1.0
[26] cli_3.5.0 later_1.3.0 htmltools_0.5.4 prettyunits_1.1.1 tools_4.2.1
[31] igraph_1.3.5 coda_0.19-4 gtable_0.3.1 glue_1.6.2 reshape2_1.4.4
[36] dplyr_1.0.10 posterior_1.3.1 V8_4.2.2 vctrs_0.5.1 svglite_2.1.0
[41] iterators_1.0.14 crosstalk_1.2.0 tensorA_0.36.2 xfun_0.36 stringr_1.5.0
[46] ps_1.7.2 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.3 gtools_3.9.4
[51] MASS_7.3-58.1 zoo_1.8-11 scales_1.2.1 colourpicker_1.2.0 promises_1.2.0.1
[56] Brobdingnag_1.2-9 inline_0.3.19 shinystan_2.6.0 yaml_2.3.6 curl_4.3.3
[61] gridExtra_2.3 loo_2.5.1 stringi_1.7.8 highr_0.10 dygraphs_1.1.1.6
[66] checkmate_2.1.0 boot_1.3-28.1 pkgbuild_1.4.0 systemfonts_1.0.4 rlang_1.0.6
[71] pkgconfig_2.0.3 matrixStats_0.63.0 distributional_0.3.1 evaluate_0.19 lattice_0.20-45
[76] purrr_1.0.0 labeling_0.4.2 rstantools_2.2.0 htmlwidgets_1.6.0 processx_3.8.0
[81] tidyselect_1.2.0 plyr_1.8.8 magrittr_2.0.3 R6_2.5.1 generics_0.1.3
[86] DBI_1.1.3 pillar_1.8.1 withr_2.5.0 xts_0.12.2 abind_1.4-5
[91] tibble_3.1.8 crayon_1.5.2 utf8_1.2.2 rmarkdown_2.19 grid_4.2.1
[96] callr_3.7.3 threejs_0.3.3 digest_0.6.31 xtable_1.8-4 tidyr_1.2.1
[101] httpuv_1.6.7 RcppParallel_5.1.5 stats4_4.2.1 munsell_0.5.0 shinyjs_2.1.0