```
library(tidyverse)
set.seed(123)
<- 10
n_school <- 7
mean_n_class <- 5
mean_n_student
<- rpois(n_school, mean_n_class)
n_class <- map2_df(
schools seq_len(n_school), n_class,
~tibble(
school = .x, class = seq_len(.y), students = rpois(.y, mean_n_student)
)%>%
) group_by(school, class) %>%
do(
student = tibble(student = seq_len(.$students))
%>%
) unnest(student) %>%
mutate(
class2 = interaction(class, school, drop = TRUE),
student2 = interaction(class2, student, drop = TRUE)
)
```

# Nested and crossed random effects in `lme4`

People often get confused on how to code nested and crossed random effects in the `lme4`

package (Bates et al. 2015). I will try to make this more clear using some artificial data sets.

## Nested random effects

Nested random effects assume that there is some kind of hierarchy in the grouping of the observations. E.g. schools and classes. A class groups a number of students and a school groups a number of classes. There is a one-to-many relationship between the random effects. E.g. a school can contain multiple classes but a class can only be part of one school. Lets start by creating a simple example with fake data to explain the design. The figure shows the contingency matrix for the dataset.

`schools`

contains 3 design variables: `school`

, `class`

and `student`

. Each numbering restarts at 1 when the higher level number changes (Figure 1). Hence the id of class and student are not unique. Therefore I added two new variables `class2`

and `student2`

which are unique id’s for each class and student. The next step is adding the expected and observed values.

```
with(schools, table(class2, school)) %>%
image(
col = grey.colors(10, start = 1, end = 0), axes = FALSE, xlab = "Class",
ylab = "School"
)
```

```
<- 2
school_sd <- 2
class_sd <- 1
noise_sd <- 50
intercept
<- rnorm(n_school, mean = 0, sd = school_sd)
school_effect <- rnorm(length(levels(schools$class2)), mean = 0, sd = class_sd)
class_effect <- schools %>%
schools mutate(
mu = intercept + school_effect[school] + class_effect[class2],
y = mu + rnorm(n(), mean = 0, sd = noise_sd)
)
```

### Explicit nesting

The first option is to use explicit nesting. Here we add a random effect for each hierarchical level and use the `:`

notation to add all higher levels. This can be expanded to more than two levels. E.g. `(1|A) + (1|A:B) + (1|A:B:C) + (1|A:B:C:D)`

. The nice thing about this notation is twofold: a) the nesting is explicit and clear for all readers; b) it is insensitive for the order: e.g. `(1|A) + (1|A:B)`

is identical to `(1|B:A) + (1|A)`

.

```
library(lme4)
lmer(y ~ (1 | school) + (1 | school:class), data = schools)
```

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | school:class)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
school:class (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: school:class, 74; school, 10
Fixed Effects:
(Intercept)
50.27
```

`lmer(y ~ (1 | class:school) + (1 | school), data = schools)`

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | class:school) + (1 | school)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class:school (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class:school, 74; school, 10
Fixed Effects:
(Intercept)
50.27
```

### Shorthand nesting

`(1|A) + (1|A:B)`

can be abbreviated into `(1|A/B)`

. However, I recommend against it because here the order is important as seen in the example below. `(1|B/A)`

expands to `(1|B) + (1|B:A)`

, which is clearly a different model than `(1|A) + (1|A:B)`

. I’ve seen many people being confused about the order, therefore I recommend to be explicit instead of using shorthand.

`lmer(y ~ (1 | school / class), data = schools)`

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school/class)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class:school (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class:school, 74; school, 10
Fixed Effects:
(Intercept)
50.27
```

`lmer(y ~ (1 | class / school), data = schools)`

`boundary (singular) fit: see help('isSingular')`

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | class/school)
Data: schools
REML criterion at convergence: 1320.039
Random effects:
Groups Name Std.Dev.
school:class (Intercept) 2.330
class (Intercept) 0.000
Residual 1.073
Number of obs: 366, groups: school:class, 74; class, 11
Fixed Effects:
(Intercept)
50.42
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
```

### Implicit nesting

With implicit nesting, the nesting is ‘defined’ in the data. That is each level of a random effect has a one-to-many relation with the levels of the lower random effect. E.g. each class id is unique for a given class in a given school and cannot refer to a class in any other school. This is how we constructed the `class2`

variable in our data. With implicit nesting the code can be abbreviated to `(1|A) + (1|B)`

. Note that the `(1|A) + (1|A:B)`

and `(1|A/B)`

notations remain valid.

`lmer(y ~ (1 | school) + (1 | class2), data = schools)`

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | class2)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class2 (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class2, 74; school, 10
Fixed Effects:
(Intercept)
50.27
```

`lmer(y ~ (1 | school) + (1 | school:class2), data = schools)`

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | school:class2)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
school:class2 (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: school:class2, 74; school, 10
Fixed Effects:
(Intercept)
50.27
```

`lmer(y ~ (1 | school / class2), data = schools)`

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school/class2)
Data: schools
REML criterion at convergence: 1290.668
Random effects:
Groups Name Std.Dev.
class2:school (Intercept) 1.631
school (Intercept) 1.806
Residual 1.073
Number of obs: 366, groups: class2:school, 74; school, 10
Fixed Effects:
(Intercept)
50.27
```

## Crossed random effects

Crossed random effects appear when two (or more) variables can be used to create distinct groupings. Think about factories and products where a factory can produce a range of products, and a product can be manufactured in different factories. The contingency matrix of such a design is shown in the next figure.

```
<- 10
n_factory <- 10
n_product <- 5
mean_n_sample
<- expand.grid(
factories factory = seq_len(n_factory),
product = seq_len(n_product)
%>%
) mutate(
samples = rpois(n(), mean_n_sample)
%>%
) group_by(factory, product) %>%
do(
sample = tibble(sample = seq_len(.$samples))
%>%
) unnest(sample) %>%
mutate(
sample2 = interaction(factory, product, sample, drop = TRUE)
)
```

`factories`

contains 3 design variables: `factory`

, `product`

and `sample`

. Most of the `factory`

and `product`

combinations are present in the data and they are meaningful. Product 1 in factory 1 is the same product as product 1 in factory 2 (Figure 2).

```
with(factories, table(product, factory)) %>%
image(
col = grey.colors(10, start = 1, end = 0), axes = FALSE, xlab = "Product",
ylab = "Factory"
)
```

```
<- 3
factory_sd <- 2
product_sd <- 1
noise_sd <- 50
intercept
<- rnorm(n_factory, mean = 0, sd = factory_sd)
factory_effect <- rnorm(n_product, mean = 0, sd = product_sd)
product_effect <- factories %>%
factories mutate(
mu = intercept + factory_effect[factory] + product_effect[product],
y = mu + rnorm(n(), mean = 0, sd = noise_sd)
)
```

### Coding

The coding for crossed random effects is easy: `(1|A) + (1|B) + (1|C)`

.

`lmer(y ~ (1 | factory) + (1 | product), data = factories)`

```
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | factory) + (1 | product)
Data: factories
REML criterion at convergence: 1461.96
Random effects:
Groups Name Std.Dev.
factory (Intercept) 2.8210
product (Intercept) 1.6837
Residual 0.9957
Number of obs: 481, groups: factory, 10; product, 10
Fixed Effects:
(Intercept)
49.36
```

## Recommendations

- each level of a random effect should be defined by a single variable: e.g.
`class2`

and`student2`

in`schools`

;`factory`

,`product`

and`sample2`

in`factories`

- use explicit nesting even when the data set would allow implicit nesting
- don’t use the shorthand nesting

## Session info

`::session_info() sessioninfo`

```
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os Ubuntu 22.04.3 LTS
system x86_64, linux-gnu
ui X11
language nl_BE:nl
collate nl_BE.UTF-8
ctype nl_BE.UTF-8
tz Europe/Brussels
date 2023-09-02
pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
boot 1.3-28.1 2022-11-22 [1] CRAN (R 4.3.0)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
digest 0.6.32 2023-06-26 [1] CRAN (R 4.3.1)
dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)
evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)
knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
lattice 0.21-8 2023-04-05 [4] CRAN (R 4.3.0)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
lme4 * 1.1-34 2023-07-04 [1] CRAN (R 4.3.1)
lubridate * 1.9.2.9000 2023-05-15 [1] https://inbo.r-universe.dev (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
MASS 7.3-60 2023-05-04 [4] CRAN (R 4.3.1)
Matrix * 1.5-4.1 2023-05-18 [1] CRAN (R 4.3.0)
minqa 1.2.5 2022-10-19 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
nlme 3.1-162 2023-01-31 [1] CRAN (R 4.3.0)
nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.3.0)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.1)
rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
[1] /home/thierry/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
──────────────────────────────────────────────────────────────────────────────
```

## References

*Journal of Statistical Software*67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.