library(tidyverse)
set.seed(123)
<- 10
n_school <- 7
mean_n_class <- 5
mean_n_student
<- rpois(n_school, mean_n_class)
n_class <- map2_df(
schools seq_len(n_school), n_class,
~tibble(
school = .x, class = seq_len(.y), students = rpois(.y, mean_n_student)
)%>%
) group_by(school, class) %>%
do(
student = tibble(student = seq_len(.$students))
%>%
) unnest(student) %>%
mutate(
class2 = interaction(class, school, drop = TRUE),
student2 = interaction(class2, student, drop = TRUE)
)
Nested and crossed random effects in lme4
People often get confused on how to code nested and crossed random effects in the lme4
package (Bates et al. 2015). I will try to make this more clear using some artificial data sets.
Nested random effects
Nested random effects assume that there is some kind of hierarchy in the grouping of the observations. E.g. schools and classes. A class groups a number of students and a school groups a number of classes. There is a one-to-many relationship between the random effects. E.g. a school can contain multiple classes but a class can only be part of one school. Lets start by creating a simple example with fake data to explain the design. The figure shows the contingency matrix for the dataset.
schools
contains 3 design variables: school
, class
and student
. Each numbering restarts at 1 when the higher level number changes (Figure 1). Hence the id of class and student are not unique. Therefore I added two new variables class2
and student2
which are unique id’s for each class and student. The next step is adding the expected and observed values.
with(schools, table(class2, school)) %>%
image(
col = grey.colors(10, start = 1, end = 0), axes = FALSE, xlab = "Class",
ylab = "School"
)
<- 2
school_sd <- 2
class_sd <- 1
noise_sd <- 50
intercept
<- rnorm(n_school, mean = 0, sd = school_sd)
school_effect <- rnorm(length(levels(schools$class2)), mean = 0, sd = class_sd)
class_effect <- schools %>%
schools mutate(
mu = intercept + school_effect[school] + class_effect[class2],
y = mu + rnorm(n(), mean = 0, sd = noise_sd)
)
Explicit nesting
The first option is to use explicit nesting. Here we add a random effect for each hierarchical level and use the :
notation to add all higher levels. This can be expanded to more than two levels. E.g. (1|A) + (1|A:B) + (1|A:B:C) + (1|A:B:C:D)
. The nice thing about this notation is twofold: a) the nesting is explicit and clear for all readers; b) it is insensitive for the order: e.g. (1|A) + (1|A:B)
is identical to (1|B:A) + (1|A)
.
library(lme4)
lmer(y ~ (1 | school) + (1 | school:class), data = schools)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | school:class)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
school:class (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: school:class, 74; school, 10
Fixed Effects:
(Intercept)
50.27
lmer(y ~ (1 | class:school) + (1 | school), data = schools)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | class:school) + (1 | school)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class:school (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class:school, 74; school, 10
Fixed Effects:
(Intercept)
50.27
Shorthand nesting
(1|A) + (1|A:B)
can be abbreviated into (1|A/B)
. However, I recommend against it because here the order is important as seen in the example below. (1|B/A)
expands to (1|B) + (1|B:A)
, which is clearly a different model than (1|A) + (1|A:B)
. I’ve seen many people being confused about the order, therefore I recommend to be explicit instead of using shorthand.
lmer(y ~ (1 | school / class), data = schools)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school/class)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class:school (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class:school, 74; school, 10
Fixed Effects:
(Intercept)
50.27
lmer(y ~ (1 | class / school), data = schools)
boundary (singular) fit: see help('isSingular')
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | class/school)
Data: schools
REML criterion at convergence: 1320.039
Random effects:
Groups Name Std.Dev.
school:class (Intercept) 2.330
class (Intercept) 0.000
Residual 1.073
Number of obs: 366, groups: school:class, 74; class, 11
Fixed Effects:
(Intercept)
50.42
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
Implicit nesting
With implicit nesting, the nesting is ‘defined’ in the data. That is each level of a random effect has a one-to-many relation with the levels of the lower random effect. E.g. each class id is unique for a given class in a given school and cannot refer to a class in any other school. This is how we constructed the class2
variable in our data. With implicit nesting the code can be abbreviated to (1|A) + (1|B)
. Note that the (1|A) + (1|A:B)
and (1|A/B)
notations remain valid.
lmer(y ~ (1 | school) + (1 | class2), data = schools)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | class2)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class2 (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class2, 74; school, 10
Fixed Effects:
(Intercept)
50.27
lmer(y ~ (1 | school) + (1 | school:class2), data = schools)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | school:class2)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
school:class2 (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: school:class2, 74; school, 10
Fixed Effects:
(Intercept)
50.27
lmer(y ~ (1 | school / class2), data = schools)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school/class2)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class2:school (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class2:school, 74; school, 10
Fixed Effects:
(Intercept)
50.27
Crossed random effects
Crossed random effects appear when two (or more) variables can be used to create distinct groupings. Think about factories and products where a factory can produce a range of products, and a product can be manufactured in different factories. The contingency matrix of such a design is shown in the next figure.
<- 10
n_factory <- 10
n_product <- 5
mean_n_sample
<- expand.grid(
factories factory = seq_len(n_factory),
product = seq_len(n_product)
%>%
) mutate(
samples = rpois(n(), mean_n_sample)
%>%
) group_by(factory, product) %>%
do(
sample = tibble(sample = seq_len(.$samples))
%>%
) unnest(sample) %>%
mutate(
sample2 = interaction(factory, product, sample, drop = TRUE)
)
factories
contains 3 design variables: factory
, product
and sample
. Most of the factory
and product
combinations are present in the data and they are meaningful. Product 1 in factory 1 is the same product as product 1 in factory 2 (Figure 2).
with(factories, table(product, factory)) %>%
image(
col = grey.colors(10, start = 1, end = 0), axes = FALSE, xlab = "Product",
ylab = "Factory"
)
<- 3
factory_sd <- 2
product_sd <- 1
noise_sd <- 50
intercept
<- rnorm(n_factory, mean = 0, sd = factory_sd)
factory_effect <- rnorm(n_product, mean = 0, sd = product_sd)
product_effect <- factories %>%
factories mutate(
mu = intercept + factory_effect[factory] + product_effect[product],
y = mu + rnorm(n(), mean = 0, sd = noise_sd)
)
Coding
The coding for crossed random effects is easy: (1|A) + (1|B) + (1|C)
.
lmer(y ~ (1 | factory) + (1 | product), data = factories)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | factory) + (1 | product)
Data: factories
REML criterion at convergence: 1461.96
Random effects:
Groups Name Std.Dev.
factory (Intercept) 2.8210
product (Intercept) 1.6837
Residual 0.9957
Number of obs: 481, groups: factory, 10; product, 10
Fixed Effects:
(Intercept)
49.36
Recommendations
- each level of a random effect should be defined by a single variable: e.g.
class2
andstudent2
inschools
;factory
,product
andsample2
infactories
- use explicit nesting even when the data set would allow implicit nesting
- don’t use the shorthand nesting
Session info
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os Ubuntu 22.04.3 LTS
system x86_64, linux-gnu
ui X11
language nl_BE:nl
collate nl_BE.UTF-8
ctype nl_BE.UTF-8
tz Europe/Brussels
date 2023-09-02
pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
boot 1.3-28.1 2022-11-22 [1] CRAN (R 4.3.0)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
digest 0.6.32 2023-06-26 [1] CRAN (R 4.3.1)
dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)
evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)
knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
lattice 0.21-8 2023-04-05 [4] CRAN (R 4.3.0)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
lme4 * 1.1-34 2023-07-04 [1] CRAN (R 4.3.1)
lubridate * 1.9.2.9000 2023-05-15 [1] https://inbo.r-universe.dev (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
MASS 7.3-60 2023-05-04 [4] CRAN (R 4.3.1)
Matrix * 1.5-4.1 2023-05-18 [1] CRAN (R 4.3.0)
minqa 1.2.5 2022-10-19 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
nlme 3.1-162 2023-01-31 [1] CRAN (R 4.3.0)
nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.3.0)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.1)
rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
[1] /home/thierry/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
──────────────────────────────────────────────────────────────────────────────