Skip to content

Latest commit

 

History

History
1009 lines (809 loc) · 40.6 KB

jspmultilevel.md

File metadata and controls

1009 lines (809 loc) · 40.6 KB

Multilevel Design

Julian Faraway 2024-08-30

See the introduction for an overview.

This example is discussed in more detail in my book Extending the Linear Model with R

Required libraries:

library(faraway)
library(ggplot2)
library(lme4)
library(pbkrtest)
library(RLRsim)
library(INLA)
library(knitr)
library(cmdstanr)
register_knitr_engine(override = FALSE)
library(brms)
library(mgcv)

Data

Multilevel models is a term used for models for data with hierarchical structure. The term is most commonly used in the social sciences. We can use the methodology we have already developed to fit some of these models.

We take as our example some data from the Junior School Project collected from primary (U.S. term is elementary) schools in inner London. We math test score result from year two as the response and try to model this as a function of gender, social class and the Raven’s test score from the first year which might be taken as a measure of ability when entering the school. We subset the data to ignore the math scores from the first two years, we centre the Raven score and create a combined class-by-school label:

data(jsp, package="faraway")
jspr <- jsp[jsp$year==2,]
jspr$craven <- jspr$raven-mean(jspr$raven)
jspr$classch <- paste(jspr$school,jspr$class,sep=".")

We can plot the data

ggplot(jspr, aes(x=raven, y=math))+xlab("Raven Score")+ylab("Math Score")+geom_point(position = position_jitter())

ggplot(jspr, aes(x=social, y=math))+xlab("Social Class")+ylab("Math Score")+geom_boxplot()

Mixed Effect Model

Although the data supports a more complex model, we simplify to having the centred Raven score and the social class as fixed effects and the school and class nested within school as random effects. See Extending the Linear Model with R,

mmod <- lmer(math ~ craven + social+(1|school)+(1|school:class),jspr)
faraway::sumary(mmod)
Fixed Effects:
            coef.est coef.se
(Intercept) 32.01     1.03  
craven       0.58     0.03  
social2     -0.36     1.09  
social3     -0.78     1.16  
social4     -2.12     1.04  
social5     -1.36     1.16  
social6     -2.37     1.23  
social7     -3.05     1.27  
social8     -3.55     1.70  
social9     -0.89     1.10  

Random Effects:
 Groups       Name        Std.Dev.
 school:class (Intercept) 1.02    
 school       (Intercept) 1.80    
 Residual                 5.25    
---
number of obs: 953, groups: school:class, 90; school, 48
AIC = 5949.7, DIC = 5933
deviance = 5928.3 

We can see the math score is strongly related to the entering Raven score. We see that the math score tends to be lower as social class goes down. We also see the most substantial variation at the individual level with smaller amounts of variation at the school and class level.

We test the random effects:

mmodc <- lmer(math ~ craven + social+(1|school:class),jspr)
mmods <- lmer(math ~ craven + social+(1|school),jspr)
exactRLRT(mmodc, mmod, mmods)
    simulated finite sample distribution of RLRT.
    
    (p-value based on 10000 simulated values)

data:  
RLRT = 1.85, p-value = 0.082
exactRLRT(mmods, mmod, mmodc)
    simulated finite sample distribution of RLRT.
    
    (p-value based on 10000 simulated values)

data:  
RLRT = 7.64, p-value = 0.0026

The first test is for the class effect which just fails to meet the 5% significance level. The second test is for the school effect and shows strong evidence of differences between schools.

We can test the social fixed effect:

mmodm <- lmer(math ~ craven + (1|school)+(1|school:class),jspr)
KRmodcomp(mmod, mmodm)
large : math ~ craven + social + (1 | school) + (1 | school:class)
small : math ~ craven + (1 | school) + (1 | school:class)
        stat    ndf    ddf F.scaling p.value
Ftest   2.76   8.00 930.34         1  0.0052

We see the social effect is significant.

We can compute confidence intervals for the parameters:

confint(mmod, method="boot")
              2.5 %   97.5 %
.sig01       0.0000  1.79403
.sig02       1.0024  2.49296
.sigma       5.0214  5.49923
(Intercept) 29.8730 34.05140
craven       0.5207  0.64835
social2     -2.5176  1.75458
social3     -3.0038  1.50360
social4     -3.9892 -0.08310
social5     -3.7584  1.03529
social6     -4.6960  0.32164
social7     -5.3252 -0.30700
social8     -7.0114 -0.15446
social9     -2.8744  1.17524

The lower end of the class confidence interval is zero while the school random effect is clearly larger. This is consistent with the earlier tests.

INLA

Integrated nested Laplace approximation is a method of Bayesian computation which uses approximation rather than simulation. More can be found on this topic in Bayesian Regression Modeling with INLA and the chapter on GLMMs

Use the most recent computational methodology:

inla.setOption(inla.mode="experimental")
inla.setOption("short.summary",TRUE)
formula <- math ~ social+craven + f(school, model="iid") + f(classch, model="iid")
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 31.784 1.014     29.795   31.784     33.773 31.784   0
social2     -0.176 1.096     -2.325   -0.176      1.973 -0.176   0
social3     -0.518 1.165     -2.803   -0.518      1.767 -0.518   0
social4     -1.832 1.037     -3.865   -1.832      0.201 -1.832   0
social5     -1.156 1.157     -3.425   -1.156      1.115 -1.156   0
social6     -2.217 1.230     -4.629   -2.217      0.195 -2.217   0
social7     -2.929 1.267     -5.414   -2.930     -0.445 -2.930   0
social8     -3.349 1.708     -6.700   -3.349      0.002 -3.349   0
social9     -0.645 1.098     -2.799   -0.645      1.509 -0.645   0
craven       0.585 0.032      0.522    0.585      0.649  0.585   0

Model hyperparameters:
                                            mean       sd 0.025quant 0.5quant 0.975quant     mode
Precision for the Gaussian observations 3.60e-02 2.00e-03      0.033 3.60e-02   4.00e-02    0.036
Precision for school                    2.20e+04 2.42e+04   1470.033 1.45e+04   8.62e+04 4008.419
Precision for classch                   2.63e-01 6.90e-02      0.155 2.54e-01   4.24e-01    0.236

 is computed 

As usual, the default priors result in precisions for the random effects which are unbelievably large and we need to change the default prior.

Informative Gamma priors on the precisions

Now try more informative gamma priors for the random effect precisions. Define it so the mean value of gamma prior is set to the inverse of the variance of the residuals of the fixed-effects only model. We expect the error variances to be lower than this variance so this is an overestimate. The variance of the gamma prior (for the precision) is controlled by the apar shape parameter.

apar <- 0.5
lmod <- lm(math ~ social+craven, jspr)
bpar <- apar*var(residuals(lmod))
lgprior <- list(prec = list(prior="loggamma", param = c(apar,bpar)))
formula = math ~ social+craven+f(school, model="iid", hyper = lgprior)+f(classch, model="iid", hyper = lgprior)
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 32.001 1.061     29.921   32.001     34.083 32.001   0
social2     -0.395 1.099     -2.550   -0.395      1.760 -0.395   0
social3     -0.760 1.171     -3.056   -0.760      1.537 -0.760   0
social4     -2.098 1.045     -4.147   -2.098     -0.048 -2.098   0
social5     -1.422 1.164     -3.704   -1.422      0.861 -1.422   0
social6     -2.349 1.238     -4.777   -2.349      0.080 -2.349   0
social7     -3.056 1.277     -5.560   -3.056     -0.551 -3.056   0
social8     -3.553 1.708     -6.903   -3.553     -0.202 -3.553   0
social9     -0.887 1.107     -3.058   -0.887      1.286 -0.887   0
craven       0.586 0.032      0.522    0.586      0.650  0.586   0

Model hyperparameters:
                                         mean    sd 0.025quant 0.5quant 0.975quant  mode
Precision for the Gaussian observations 0.037 0.002      0.033    0.037      0.040 0.037
Precision for school                    0.266 0.086      0.137    0.253      0.471 0.229
Precision for classch                   0.341 0.102      0.183    0.327      0.580 0.301

 is computed 

Results are more credible.

Compute the transforms to an SD scale for the random effect terms. Make a table of summary statistics for the posteriors:

sigmasch <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmacla <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmasch,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmacla,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c("mu",result$names.fixed[2:10],"school SD","class SD","epsilon")
data.frame(restab) |> kable()
mu social2 social3 social4 social5 social6 social7 social8 social9 craven school.SD class.SD epsilon
mean 32.001 -0.39524 -0.75976 -2.0978 -1.4222 -2.3491 -3.0556 -3.5527 -0.8867 0.58598 2.0119 1.7676 5.2218
sd 1.0602 1.0982 1.1702 1.0444 1.1631 1.2375 1.2762 1.7073 1.1068 0.032394 0.31501 0.2586 0.12369
quant0.025 29.921 -2.5505 -3.0565 -4.1476 -3.705 -4.7781 -5.5605 -6.9037 -3.059 0.52239 1.461 1.3162 4.9839
quant0.25 31.284 -1.1381 -1.5513 -2.8042 -2.209 -3.1861 -3.9188 -4.7075 -1.6354 0.56407 1.7881 1.5841 5.1368
quant0.5 31.999 -0.39765 -0.76229 -2.1001 -1.4248 -2.3517 -3.0583 -3.5564 -0.8891 0.58591 1.9882 1.7475 5.2197
quant0.75 32.714 0.34287 0.026772 -1.3958 -0.64048 -1.5173 -2.1978 -2.4051 -0.14276 0.60776 2.2092 1.9293 5.3044
quant0.975 34.079 1.7561 1.5326 -0.051925 0.85633 0.075022 -0.55573 -0.2081 1.2815 0.64943 2.6957 2.3301 5.4696

Also construct a plot the SD posteriors:

ddf <- data.frame(rbind(sigmasch,sigmacla,sigmaepsilon),errterm=gl(3,nrow(sigmasch),labels = c("school","class","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("math")+ylab("density")

Posteriors look OK although no weight given to smaller values.

Penalized Complexity Prior

In Simpson et al (2015), penalized complexity priors are proposed. This requires that we specify a scaling for the SDs of the random effects. We use the SD of the residuals of the fixed effects only model (what might be called the base model in the paper) to provide this scaling.

lmod <- lm(math ~ craven + social, jspr)
sdres <- sd(residuals(lmod))
pcprior <- list(prec = list(prior="pc.prec", param = c(3*sdres,0.01)))
formula = math ~ social+craven+f(school, model="iid", hyper = pcprior)+f(classch, model="iid", hyper = pcprior)
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 31.985 1.036     29.953   31.984     34.019 31.984   0
social2     -0.343 1.093     -2.487   -0.343      1.802 -0.343   0
social3     -0.748 1.165     -3.032   -0.748      1.536 -0.748   0
social4     -2.088 1.039     -4.126   -2.088     -0.050 -2.088   0
social5     -1.347 1.158     -3.617   -1.347      0.925 -1.347   0
social6     -2.346 1.231     -4.761   -2.346      0.069 -2.346   0
social7     -3.029 1.268     -5.517   -3.029     -0.541 -3.029   0
social8     -3.519 1.701     -6.856   -3.519     -0.182 -3.519   0
social9     -0.861 1.102     -3.022   -0.861      1.301 -0.861   0
craven       0.584 0.032      0.521    0.584      0.648  0.584   0

Model hyperparameters:
                                         mean    sd 0.025quant 0.5quant 0.975quant  mode
Precision for the Gaussian observations 0.036 0.002      0.033    0.036      0.040 0.036
Precision for school                    0.331 0.135      0.137    0.307      0.661 0.265
Precision for classch                   1.321 1.300      0.265    0.942      4.755 0.549

 is computed 

Compute the summaries as before:

sigmasch <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmacla <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmasch,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmacla,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c("mu",result$names.fixed[2:10],"school SD","class SD","epsilon")
data.frame(restab) |> kable()
mu social2 social3 social4 social5 social6 social7 social8 social9 craven school.SD class.SD epsilon
mean 31.985 -0.34287 -0.74793 -2.0884 -1.3467 -2.3463 -3.0289 -3.5191 -0.86102 0.58445 1.8449 1.0766 5.2422
sd 1.0359 1.0929 1.1639 1.0386 1.1572 1.2306 1.2678 1.7005 1.1014 0.032139 0.3726 0.38077 0.12493
quant0.025 29.952 -2.4879 -3.0323 -4.1269 -3.618 -4.7618 -5.5175 -6.8568 -3.0229 0.52137 1.2335 0.46184 4.9988
quant0.25 31.284 -1.0821 -1.5352 -2.7909 -2.1294 -3.1787 -3.8864 -4.6693 -1.606 0.56271 1.5781 0.79497 5.1569
quant0.5 31.982 -0.34525 -0.75047 -2.0907 -1.3492 -2.3489 -3.0316 -3.5227 -0.86337 0.58438 1.802 1.0372 5.2415
quant0.75 32.681 0.39171 0.034362 -1.3903 -0.56885 -1.5192 -2.1768 -2.3761 -0.12069 0.60605 2.0661 1.3098 5.3263
quant0.975 34.015 1.7982 1.5322 -0.053897 0.92036 0.064124 -0.54555 -0.18776 1.2965 0.64741 2.6917 1.9307 5.4894

Make the plots:

ddf <- data.frame(rbind(sigmasch,sigmacla,sigmaepsilon),errterm=gl(3,nrow(sigmasch),labels = c("school","class","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("math")+ylab("density")

Posteriors put more weight on lower values compared to gamma prior.

STAN

STAN performs Bayesian inference using MCMC. I use cmdstanr to access Stan from R.

You see below the Stan code to fit our model. Rmarkdown allows the use of Stan chunks (elsewhere I have R chunks). The chunk header looks like this.

STAN chunk will be compiled to ‘mod’. Chunk header is:

cmdstan, output.var="mod", override = FALSE
data {
     int<lower=0> Nobs;
     int<lower=0> Npreds;
     int<lower=0> Nlev1;
     int<lower=0> Nlev2;
     array[Nobs] real y;
     matrix[Nobs,Npreds] x;
     array[Nobs] int<lower=1,upper=Nlev1> levind1;
     array[Nobs] int<lower=1,upper=Nlev2> levind2;
     real<lower=0> sdscal;
}
parameters {
           vector[Npreds] beta;
           real<lower=0> sigmalev1;
           real<lower=0> sigmalev2;
           real<lower=0> sigmaeps;

           vector[Nlev1] eta1;
           vector[Nlev2] eta2;
}
transformed parameters {
  vector[Nlev1] ran1;
  vector[Nlev2] ran2;
  vector[Nobs] yhat;

  ran1  = sigmalev1 * eta1;
  ran2  = sigmalev2 * eta2;

  for (i in 1:Nobs)
    yhat[i] = x[i]*beta+ran1[levind1[i]]+ran2[levind2[i]];

}
model {
  eta1 ~ normal(0, 1);
  eta2 ~ normal(0, 1);
  sigmalev1 ~ cauchy(0, 2.5*sdscal);
  sigmalev2 ~ cauchy(0, 2.5*sdscal);
  sigmaeps ~ cauchy(0, 2.5*sdscal);
  y ~ normal(yhat, sigmaeps);
}

We have used uninformative priors for the treatment effects but slightly informative half-cauchy priors for the three variances. All the fixed effects have been collected into a single design matrix. The school and class variables need to be renumbered into consecutive positive integers. Somewhat inconvenient since the schools are numbered up to 50 but have no data for two schools so only 48 schools are actually used.

lmod <- lm(math ~ craven + social, jspr)
sdscal <- sd(residuals(lmod))
Xmatrix <- model.matrix( ~ craven + social, jspr)
jspr$school <- factor(jspr$school)
jspr$classch <- factor(paste(jspr$school,jspr$class,sep="."))
jspdat <- list(Nobs=nrow(jspr),
               Npreds=ncol(Xmatrix),
               Nlev1=length(unique(jspr$school)),
               Nlev2=length(unique(jspr$classch)),
               y=jspr$math,
               x=Xmatrix,
               levind1=as.numeric(jspr$school),
               levind2=as.numeric(jspr$classch),
               sdscal=sdscal)

Do the MCMC sampling:

fit <- mod$sample(
  data = jspdat, 
  seed = 123, 
  chains = 4, 
  parallel_chains = 4,
  refresh = 0 # print no updates
)
Running MCMC with 4 parallel chains...

Chain 3 finished in 6.1 seconds.
Chain 1 finished in 6.2 seconds.
Chain 4 finished in 6.2 seconds.
Chain 2 finished in 8.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 6.7 seconds.
Total execution time: 8.5 seconds.

Diagnostics

Extract the draws into a convenient dataframe format:

draws_df <- fit$draws(format = "df")

For the School SD:

ggplot(draws_df,
       aes(x=.iteration,y=sigmalev1,color=factor(.chain))) + geom_line() +
  labs(color = 'Chain', x="Iteration")

For the class SD

ggplot(draws_df,
       aes(x=.iteration,y=sigmalev2,color=factor(.chain))) + geom_line() +
  labs(color = 'Chain', x="Iteration")

All these are satisfactory.

Output Summary

Display the parameters of interest:

fit$summary(c("beta","sigmalev1","sigmalev2","sigmaeps"))
# A tibble: 13 × 10
   variable    mean median     sd    mad     q5    q95  rhat ess_bulk ess_tail
   <chr>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
 1 beta[1]   32.0   32.0   1.08   1.11   30.2   33.8    1.01     833.    1517.
 2 beta[2]    0.584  0.584 0.0323 0.0312  0.531  0.636  1.00    8121.    3032.
 3 beta[3]   -0.351 -0.352 1.14   1.14   -2.22   1.56   1.01    1004.    1653.
 4 beta[4]   -0.753 -0.757 1.21   1.25   -2.70   1.28   1.01     910.    1554.
 5 beta[5]   -2.11  -2.12  1.08   1.08   -3.91  -0.297  1.01     843.    1318.
 6 beta[6]   -1.35  -1.37  1.21   1.22   -3.33   0.690  1.00     955.    1721.
 7 beta[7]   -2.36  -2.36  1.28   1.27   -4.49  -0.248  1.00     956.    1587.
 8 beta[8]   -3.04  -3.05  1.31   1.29   -5.21  -0.859  1.00    1013.    1853.
 9 beta[9]   -3.53  -3.54  1.72   1.72   -6.35  -0.654  1.00    1595.    2493.
10 beta[10]  -0.879 -0.884 1.14   1.15   -2.73   1.00   1.01     889.    1641.
11 sigmalev1  1.78   1.80  0.437  0.387   0.995  2.43   1.02     342.     478.
12 sigmalev2  1.02   1.02  0.496  0.523   0.220  1.86   1.02     319.     612.
13 sigmaeps   5.26   5.26  0.127  0.124   5.06   5.48   1.00    5433.    3166.

Remember that the beta correspond to the following parameters:

colnames(Xmatrix)
 [1] "(Intercept)" "craven"      "social2"     "social3"     "social4"     "social5"     "social6"     "social7"    
 [9] "social8"     "social9"    

The results are comparable to the REML fit. The effective sample sizes are sufficient.

Posterior Distributions

We can use extract to get at various components of the STAN fit. First consider the SDs for random components:

sdf = stack(draws_df[,c("sigmaeps","sigmalev1","sigmalev2")])
colnames(sdf) = c("math","SD")
levels(sdf$SD) = c("individual","school","class")
ggplot(sdf, aes(x=math,color=SD)) + geom_density() 

As usual the error SD distribution is a more concentrated. The school SD is more diffuse and smaller whereas the class SD is smaller still. Now the treatement effects, considering the social class parameters first:

sdf = stack(draws_df[,4:11])
colnames(sdf) = c("math","social")
levels(sdf$social) = 2:9
ggplot(sdf, aes(x=math,color=social)) + geom_density() 

Now just the raven score parameter:

ggplot(draws_df, aes(x=.data[["beta[2]"]])) + geom_density() + xlab("Math per Raven")

Now for the schools:

sdf = stack(draws_df[,startsWith(colnames(draws_df),"ran1")])
colnames(sdf) = c("math","school")
levels(sdf$school) = 1:48
ggplot(sdf, aes(x=math,group=school)) + geom_density() 

We can see the variation between schools. A league table might be used to rank the schools but the high overlap in these distributions show that such a ranking should not be interpreted too seriously.

BRMS

BRMS stands for Bayesian Regression Models with STAN. It provides a convenient wrapper to STAN functionality. We specify the model as in lmer() above. I have used more than the standard number of iterations because this reduces some problems and does not cost much computationally.

bmod <- brm(math ~ craven + social+(1|school)+(1|school:class),
            data=jspr,iter=10000, cores=4, refresh=0,
            backend = "cmdstanr")
Running MCMC with 4 parallel chains...

Chain 4 finished in 11.4 seconds.
Chain 3 finished in 13.0 seconds.
Chain 2 finished in 13.7 seconds.
Chain 1 finished in 14.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 13.2 seconds.
Total execution time: 14.7 seconds.

We get some minor warnings. We can obtain some posterior densities and diagnostics with:

plot(bmod, variable = "^s", regex=TRUE)

We have chosen only the random effect hyperparameters since this is where problems will appear first. Looks OK. We can see some weight is given to values of the class effect SD close to zero.

We can look at the STAN code that brms used with:

stancode(bmod)
// generated with brms 2.21.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  array[N] int<lower=1> J_1;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  array[N] int<lower=1> J_2;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // regression coefficients
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  array[M_1] vector[N_1] z_1;  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  array[M_2] vector[N_2] z_2;  // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  vector[N_2] r_2_1;  // actual group-level effects
  real lprior = 0;  // prior contributions to the log posterior
  r_1_1 = (sd_1[1] * (z_1[1]));
  r_2_1 = (sd_2[1] * (z_2[1]));
  lprior += student_t_lpdf(Intercept | 3, 32, 5.9);
  lprior += student_t_lpdf(sigma | 3, 0, 5.9)
    - 1 * student_t_lccdf(0 | 3, 0, 5.9);
  lprior += student_t_lpdf(sd_1 | 3, 0, 5.9)
    - 1 * student_t_lccdf(0 | 3, 0, 5.9);
  lprior += student_t_lpdf(sd_2 | 3, 0, 5.9)
    - 1 * student_t_lccdf(0 | 3, 0, 5.9);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
    }
    target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(z_1[1]);
  target += std_normal_lpdf(z_2[1]);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

We see that brms is using student t distributions with 3 degrees of freedom for the priors. For the three error SDs, this will be truncated at zero to form half-t distributions. You can get a more explicit description of the priors with prior_summary(bmod). These are qualitatively similar to the the PC prior used in the INLA fit.

We examine the fit:

summary(bmod)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: math ~ craven + social + (1 | school) + (1 | school:class) 
   Data: jspr (Number of observations: 953) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Multilevel Hyperparameters:
~school (Number of levels: 48) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.80      0.40     0.95     2.56 1.00     2920     2386

~school:class (Number of levels: 90) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.98      0.49     0.08     1.94 1.00     1965     2797

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    32.02      1.04    29.99    34.06 1.00     5458     8223
craven        0.58      0.03     0.52     0.65 1.00    23350    11998
social2      -0.35      1.09    -2.48     1.78 1.00     6051    10140
social3      -0.78      1.17    -3.05     1.53 1.00     6304     9836
social4      -2.11      1.04    -4.14    -0.07 1.00     5432     8635
social5      -1.36      1.16    -3.62     0.92 1.00     6143     8728
social6      -2.37      1.23    -4.76     0.02 1.00     6540    10512
social7      -3.05      1.27    -5.52    -0.53 1.00     7102    11188
social8      -3.55      1.71    -6.90    -0.19 1.00    10173    12589
social9      -0.88      1.10    -3.05     1.29 1.00     5866     9667

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.26      0.13     5.02     5.52 1.00    20364    13125

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The results are consistent with those seen previously.

MGCV

It is possible to fit some GLMMs within the GAM framework of the mgcv package. An explanation of this can be found in this blog

gmod = gam(math ~ craven + social+s(school,bs="re")+s(classch,bs="re"),data=jspr, method="REML")

and look at the summary output:

summary(gmod)
Family: gaussian 
Link function: identity 

Formula:
math ~ craven + social + s(school, bs = "re") + s(classch, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  32.0107     1.0350   30.93   <2e-16
craven        0.5841     0.0321   18.21   <2e-16
social2      -0.3611     1.0948   -0.33    0.742
social3      -0.7767     1.1649   -0.67    0.505
social4      -2.1196     1.0396   -2.04    0.042
social5      -1.3632     1.1585   -1.18    0.240
social6      -2.3703     1.2330   -1.92    0.055
social7      -3.0482     1.2703   -2.40    0.017
social8      -3.5473     1.7027   -2.08    0.038
social9      -0.8863     1.1031   -0.80    0.422

Approximate significance of smooth terms:
            edf Ref.df    F p-value
s(school)  27.4     47 2.67  <2e-16
s(classch) 15.6     89 0.33   0.052

R-sq.(adj) =  0.378   Deviance explained = 41.2%
-REML = 2961.8  Scale est. = 27.572    n = 953

We get the fixed effect estimates. We also get tests on the random effects (as described in this article. The hypothesis of no variation is rejected for the school but not for the class. This is consistent with earlier findings.

We can get an estimate of the operator and error SD:

gam.vcomp(gmod)
Standard deviations and 0.95 confidence intervals:

           std.dev   lower  upper
s(school)   1.7967 1.21058 2.6667
s(classch)  1.0162 0.41519 2.4875
scale       5.2509 5.00863 5.5049

Rank: 3/3

The point estimates are the same as the REML estimates from lmer earlier. The confidence intervals are different. A bootstrap method was used for the lmer fit whereas gam is using an asymptotic approximation resulting in substantially different results. Given the problems of parameters on the boundary present in this example, the bootstrap results appear more trustworthy.

The fixed and random effect estimates can be found with:

coef(gmod)
  (Intercept)        craven       social2       social3       social4       social5       social6       social7 
    32.010740      0.584117     -0.361080     -0.776732     -2.119649     -1.363206     -2.370314     -3.048249 
      social8       social9   s(school).1   s(school).2   s(school).3   s(school).4   s(school).5   s(school).6 
    -3.547252     -0.886345     -2.256262     -0.422953      0.836164     -1.256567     -0.677051      0.186343 
  s(school).7   s(school).8   s(school).9  s(school).10  s(school).11  s(school).12  s(school).13  s(school).14 
     1.323942      0.364974     -2.022321      0.558287     -0.505067      0.016585     -0.615911      0.421654 
 s(school).15  s(school).16  s(school).17  s(school).18  s(school).19  s(school).20  s(school).21  s(school).22 
    -0.219725      0.441527     -0.204674      0.621186     -0.304768     -2.509540     -1.069436     -0.182572 
 s(school).23  s(school).24  s(school).25  s(school).26  s(school).27  s(school).28  s(school).29  s(school).30 
     2.242250      1.135253      1.155773      0.375632     -2.487656     -2.501408      1.101826      2.361060 
 s(school).31  s(school).32  s(school).33  s(school).34  s(school).35  s(school).36  s(school).37  s(school).38 
     0.056199     -1.044550      2.557490     -0.981792      2.559601      0.588989      2.437473     -1.000318 
 s(school).39  s(school).40  s(school).41  s(school).42  s(school).43  s(school).44  s(school).45  s(school).46 
    -2.022936      1.655924     -0.279566     -0.092512     -2.221997      0.216297      1.662840     -0.657181 
 s(school).47  s(school).48  s(classch).1  s(classch).2  s(classch).3  s(classch).4  s(classch).5  s(classch).6 
     0.066572      0.592921      0.573442     -1.295266     -0.101570      0.280178      0.328046      0.171502 
 s(classch).7  s(classch).8  s(classch).9 s(classch).10 s(classch).11 s(classch).12 s(classch).13 s(classch).14 
    -0.222321     -0.438808     -0.666694      0.672000     -0.197043     -0.085737      0.220633     -0.432686 
s(classch).15 s(classch).16 s(classch).17 s(classch).18 s(classch).19 s(classch).20 s(classch).21 s(classch).22 
     0.362391      0.280296     -0.139043     -0.065479      0.452835     -0.254105     -0.135311     -0.539712 
s(classch).23 s(classch).24 s(classch).25 s(classch).26 s(classch).27 s(classch).28 s(classch).29 s(classch).30 
     0.442210     -0.204360     -0.598493     -0.164090     -0.178044      0.181223     -0.239631      0.717342 
s(classch).31 s(classch).32 s(classch).33 s(classch).34 s(classch).35 s(classch).36 s(classch).37 s(classch).38 
     0.363191      0.266493      0.103262     -0.045999      0.166171     -0.692729     -0.103123     -0.800251 
s(classch).39 s(classch).40 s(classch).41 s(classch).42 s(classch).43 s(classch).44 s(classch).45 s(classch).46 
     0.583420     -0.315914     -0.057735      0.410232      0.374896      0.436353     -0.055897      0.017979 
s(classch).47 s(classch).48 s(classch).49 s(classch).50 s(classch).51 s(classch).52 s(classch).53 s(classch).54 
    -0.397021      0.062848      0.224533      0.880602     -0.286941     -0.047087     -0.267009      0.603827 
s(classch).55 s(classch).56 s(classch).57 s(classch).58 s(classch).59 s(classch).60 s(classch).61 s(classch).62 
    -0.232530      0.447571      0.188430      0.779797     -0.243148     -0.076874     -0.402001     -0.647178 
s(classch).63 s(classch).64 s(classch).65 s(classch).66 s(classch).67 s(classch).68 s(classch).69 s(classch).70 
     0.529764     -1.156667      0.131640      0.935589     -0.029596     -0.710862      0.069198      0.487627 
s(classch).71 s(classch).72 s(classch).73 s(classch).74 s(classch).75 s(classch).76 s(classch).77 s(classch).78 
     0.044350      0.037317      0.073938     -0.053456     -0.268045      0.122984     -0.101686     -0.252028 
s(classch).79 s(classch).80 s(classch).81 s(classch).82 s(classch).83 s(classch).84 s(classch).85 s(classch).86 
     0.035425      0.189688      0.258015     -0.198400      0.373615      0.049941      0.311116      0.069280 
s(classch).87 s(classch).88 s(classch).89 s(classch).90 
    -0.263633      0.582019     -0.587131     -0.641870 

GINLA

In Wood (2019), a simplified version of INLA is proposed. The first construct the GAM model without fitting and then use the ginla() function to perform the computation.

gmod = gam(math ~ craven + social+s(school,bs="re")+s(classch,bs="re"),
           data=jspr, fit = FALSE)
gimod = ginla(gmod)

We get the posterior density for the intercept as:

plot(gimod$beta[1,],gimod$density[1,],type="l",xlab="math",ylab="density")

We get the posterior density for the raven effect as:

plot(gimod$beta[2,],gimod$density[2,],type="l",xlab="math per raven",ylab="density")

and for the social effects as:

xmat = t(gimod$beta[3:10,])
ymat = t(gimod$density[3:10,])
matplot(xmat, ymat,type="l",xlab="math",ylab="density")
legend("left",paste0("social",2:9),col=1:8,lty=1:8)

We can see some overlap between the effects, but strong evidence of a negative outcome relative to social class 1 for some classes.

It is not straightforward to obtain the posterior densities of the hyperparameters.

Discussion

See the Discussion of the single random effect model for general comments.

  • As with the previous analyses, sometimes the INLA posteriors for the hyperparameters have densities which do not give weight to close-to-zero values where other analyses suggest this might be reasonable.

  • There is relatively little disagreement between the methods and much similarity.

  • There were no major computational issue with the analyses (in contrast with some of the other examples)

  • The mgcv analyses took a little longer than previous analyses because the sample size is larger (but still were quite fast).

  • August 2024: rerunning the code with the latest versions of packages produces much the same results except for the PC prior INLA fit. The current fit is more inline with the other Bayesian fits. In contrast the INLA with gamma prior results was similar to the previous year.

Package version info

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mgcv_1.9-1     nlme_3.1-166   brms_2.21.0    Rcpp_1.0.13    cmdstanr_0.8.1 knitr_1.48     INLA_24.06.27 
 [8] sp_2.1-4       RLRsim_3.1-8   pbkrtest_0.5.3 lme4_1.1-35.5  Matrix_1.7-0   ggplot2_3.5.1  faraway_1.0.8 

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1     farver_2.1.2         dplyr_1.1.4          loo_2.8.0            fastmap_1.2.0       
 [6] tensorA_0.36.2.1     digest_0.6.37        estimability_1.5.1   lifecycle_1.0.4      Deriv_4.1.3         
[11] sf_1.0-16            StanHeaders_2.32.10  processx_3.8.4       magrittr_2.0.3       posterior_1.6.0     
[16] compiler_4.4.1       rlang_1.1.4          tools_4.4.1          utf8_1.2.4           yaml_2.3.10         
[21] data.table_1.16.0    labeling_0.4.3       bridgesampling_1.1-2 pkgbuild_1.4.4       classInt_0.4-10     
[26] plyr_1.8.9           abind_1.4-5          KernSmooth_2.23-24   withr_3.0.1          purrr_1.0.2         
[31] grid_4.4.1           stats4_4.4.1         fansi_1.0.6          xtable_1.8-4         e1071_1.7-14        
[36] colorspace_2.1-1     inline_0.3.19        emmeans_1.10.4       scales_1.3.0         MASS_7.3-61         
[41] cli_3.6.3            mvtnorm_1.2-6        rmarkdown_2.28       generics_0.1.3       RcppParallel_5.1.9  
[46] rstudioapi_0.16.0    reshape2_1.4.4       minqa_1.2.8          DBI_1.2.3            proxy_0.4-27        
[51] rstan_2.32.6         stringr_1.5.1        splines_4.4.1        bayesplot_1.11.1     parallel_4.4.1      
[56] matrixStats_1.3.0    vctrs_0.6.5          boot_1.3-31          jsonlite_1.8.8       systemfonts_1.1.0   
[61] tidyr_1.3.1          units_0.8-5          glue_1.7.0           nloptr_2.1.1         codetools_0.2-20    
[66] ps_1.7.7             distributional_0.4.0 stringi_1.8.4        gtable_0.3.5         QuickJSR_1.3.1      
[71] munsell_0.5.1        tibble_3.2.1         pillar_1.9.0         htmltools_0.5.8.1    Brobdingnag_1.2-9   
[76] R6_2.5.1             fmesher_0.1.7        evaluate_0.24.0      lattice_0.22-6       backports_1.5.0     
[81] broom_1.0.6          rstantools_2.4.0     class_7.3-22         gridExtra_2.3        svglite_2.1.3       
[86] coda_0.19-4.1        checkmate_2.3.2      xfun_0.47            pkgconfig_2.0.3