Nested and crossed random effects in `lme4`

Author

Thierry Onkelinx

Published

July 18, 2017

People often get confused on how to code nested and crossed random effects in the lme4 package (Bates et al. 2015). I will try to make this more clear using some artificial data sets.

Nested random effects

Nested random effects assume that there is some kind of hierarchy in the grouping of the observations. E.g. schools and classes. A class groups a number of students and a school groups a number of classes. There is a one-to-many relationship between the random effects. E.g. a school can contain multiple classes but a class can only be part of one school. Lets start by creating a simple example with fake data to explain the design. The figure shows the contingency matrix for the dataset.

library(tidyverse)
set.seed(123)
n_school <- 10
mean_n_class <- 7
mean_n_student <- 5

n_class <- rpois(n_school, mean_n_class)
schools <- map2_df(
  seq_len(n_school), n_class,
  ~tibble(
    school = .x, class = seq_len(.y), students = rpois(.y, mean_n_student)
  )
) %>%
  group_by(school, class) %>%
  do(
    student = tibble(student = seq_len(.$students))
  ) %>%
  unnest(student) %>%
  mutate(
    class2 = interaction(class, school, drop = TRUE),
    student2 = interaction(class2, student, drop = TRUE)
  )

schools contains 3 design variables: school, class and student. Each numbering restarts at 1 when the higher level number changes (Figure 1). Hence the id of class and student are not unique. Therefore I added two new variables class2 and student2 which are unique id’s for each class and student. The next step is adding the expected and observed values.

with(schools, table(class2, school)) %>%
  image(
    col = grey.colors(10, start = 1, end = 0),  axes = FALSE,  xlab = "Class",
    ylab = "School"
  )

Figure 1: Contingency matrix for the `schools`data set

school_sd <- 2
class_sd <- 2
noise_sd <- 1
intercept <- 50

school_effect <- rnorm(n_school, mean = 0, sd = school_sd)
class_effect <- rnorm(length(levels(schools$class2)), mean = 0, sd = class_sd)
schools <- schools %>%
  mutate(
    mu = intercept + school_effect[school] + class_effect[class2],
    y = mu + rnorm(n(), mean = 0, sd = noise_sd)
  )

Explicit nesting

The first option is to use explicit nesting. Here we add a random effect for each hierarchical level and use the : notation to add all higher levels. This can be expanded to more than two levels. E.g. (1|A) + (1|A:B) + (1|A:B:C) + (1|A:B:C:D). The nice thing about this notation is twofold: a) the nesting is explicit and clear for all readers; b) it is insensitive for the order: e.g. (1|A) + (1|A:B) is identical to (1|B:A) + (1|A).

library(lme4)
lmer(y ~ (1 | school) + (1 | school:class), data = schools)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | school:class)
   Data: schools
REML criterion at convergence: 1290.668
Random effects:
 Groups       Name        Std.Dev.
 school:class (Intercept) 1.631   
 school       (Intercept) 1.806   
 Residual                 1.073   
Number of obs: 366, groups:  school:class, 74; school, 10
Fixed Effects:
(Intercept)  
      50.27

lmer(y ~ (1 | class:school) + (1 | school), data = schools)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | class:school) + (1 | school)
   Data: schools
REML criterion at convergence: 1290.668
Random effects:
 Groups       Name        Std.Dev.
 class:school (Intercept) 1.631   
 school       (Intercept) 1.806   
 Residual                 1.073   
Number of obs: 366, groups:  class:school, 74; school, 10
Fixed Effects:
(Intercept)  
      50.27

Shorthand nesting

(1|A) + (1|A:B) can be abbreviated into (1|A/B). However, I recommend against it because here the order is important as seen in the example below. (1|B/A) expands to (1|B) + (1|B:A), which is clearly a different model than (1|A) + (1|A:B). I’ve seen many people being confused about the order, therefore I recommend to be explicit instead of using shorthand.

lmer(y ~ (1 | school / class), data = schools)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school/class)
   Data: schools
REML criterion at convergence: 1290.668
Random effects:
 Groups       Name        Std.Dev.
 class:school (Intercept) 1.631   
 school       (Intercept) 1.806   
 Residual                 1.073   
Number of obs: 366, groups:  class:school, 74; school, 10
Fixed Effects:
(Intercept)  
      50.27

lmer(y ~ (1 | class / school), data = schools)

boundary (singular) fit: see help('isSingular')

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | class/school)
   Data: schools
REML criterion at convergence: 1320.039
Random effects:
 Groups       Name        Std.Dev.
 school:class (Intercept) 2.330   
 class        (Intercept) 0.000   
 Residual                 1.073   
Number of obs: 366, groups:  school:class, 74; class, 11
Fixed Effects:
(Intercept)  
      50.42  
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings

Implicit nesting

With implicit nesting, the nesting is ‘defined’ in the data. That is each level of a random effect has a one-to-many relation with the levels of the lower random effect. E.g. each class id is unique for a given class in a given school and cannot refer to a class in any other school. This is how we constructed the class2 variable in our data. With implicit nesting the code can be abbreviated to (1|A) + (1|B). Note that the (1|A) + (1|A:B) and (1|A/B) notations remain valid.

lmer(y ~ (1 | school) + (1 | class2), data = schools)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | class2)
   Data: schools
REML criterion at convergence: 1290.668
Random effects:
 Groups   Name        Std.Dev.
 class2   (Intercept) 1.631   
 school   (Intercept) 1.806   
 Residual             1.073   
Number of obs: 366, groups:  class2, 74; school, 10
Fixed Effects:
(Intercept)  
      50.27

lmer(y ~ (1 | school) + (1 | school:class2), data = schools)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school) + (1 | school:class2)
   Data: schools
REML criterion at convergence: 1290.668
Random effects:
 Groups        Name        Std.Dev.
 school:class2 (Intercept) 1.631   
 school        (Intercept) 1.806   
 Residual                  1.073   
Number of obs: 366, groups:  school:class2, 74; school, 10
Fixed Effects:
(Intercept)  
      50.27

lmer(y ~ (1 | school / class2), data = schools)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | school/class2)
   Data: schools
REML criterion at convergence: 1290.668
Random effects:
 Groups        Name        Std.Dev.
 class2:school (Intercept) 1.631   
 school        (Intercept) 1.806   
 Residual                  1.073   
Number of obs: 366, groups:  class2:school, 74; school, 10
Fixed Effects:
(Intercept)  
      50.27

Crossed random effects

Crossed random effects appear when two (or more) variables can be used to create distinct groupings. Think about factories and products where a factory can produce a range of products, and a product can be manufactured in different factories. The contingency matrix of such a design is shown in the next figure.

n_factory <- 10
n_product <- 10
mean_n_sample <- 5

factories <- expand.grid(
  factory = seq_len(n_factory),
  product = seq_len(n_product)
) %>%
  mutate(
    samples = rpois(n(), mean_n_sample)
  ) %>%
  group_by(factory, product) %>%
  do(
    sample = tibble(sample = seq_len(.$samples))
  ) %>%
  unnest(sample) %>%
  mutate(
    sample2 = interaction(factory, product, sample, drop = TRUE)
  )

factories contains 3 design variables: factory, product and sample. Most of the factory and product combinations are present in the data and they are meaningful. Product 1 in factory 1 is the same product as product 1 in factory 2 (Figure 2).

with(factories, table(product, factory)) %>%
  image(
    col = grey.colors(10, start = 1, end = 0),  axes = FALSE, xlab = "Product",
    ylab = "Factory"
  )

Figure 2: Contingency matrix for the `factories`data set

factory_sd <- 3
product_sd <- 2
noise_sd <- 1
intercept <- 50

factory_effect <- rnorm(n_factory, mean = 0, sd = factory_sd)
product_effect <- rnorm(n_product, mean = 0, sd = product_sd)
factories <- factories %>%
  mutate(
    mu = intercept + factory_effect[factory] + product_effect[product],
    y = mu + rnorm(n(), mean = 0, sd = noise_sd)
  )

Coding

The coding for crossed random effects is easy: (1|A) + (1|B) + (1|C).

lmer(y ~ (1 | factory) + (1 | product), data = factories)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | factory) + (1 | product)
   Data: factories
REML criterion at convergence: 1461.96
Random effects:
 Groups   Name        Std.Dev.
 factory  (Intercept) 2.8210  
 product  (Intercept) 1.6837  
 Residual             0.9957  
Number of obs: 481, groups:  factory, 10; product, 10
Fixed Effects:
(Intercept)  
      49.36

Recommendations

each level of a random effect should be defined by a single variable: e.g. class2 and student2 in schools; factory, product and sample2 in factories
use explicit nesting even when the data set would allow implicit nesting
don’t use the shorthand nesting

Session info

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       Ubuntu 22.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language nl_BE:nl
 collate  nl_BE.UTF-8
 ctype    nl_BE.UTF-8
 tz       Europe/Brussels
 date     2023-09-02
 pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 boot          1.3-28.1   2022-11-22 [1] CRAN (R 4.3.0)
 cli           3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
 colorspace    2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 digest        0.6.32     2023-06-26 [1] CRAN (R 4.3.1)
 dplyr       * 1.1.2      2023-04-20 [1] CRAN (R 4.3.0)
 evaluate      0.21       2023-05-05 [1] CRAN (R 4.3.0)
 fansi         1.0.4      2023-01-22 [1] CRAN (R 4.3.0)
 fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 forcats     * 1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 generics      0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2     * 3.4.2      2023-04-03 [1] CRAN (R 4.3.0)
 glue          1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 gtable        0.3.3      2023-03-21 [1] CRAN (R 4.3.0)
 hms           1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools     0.5.5      2023-03-23 [1] CRAN (R 4.3.0)
 htmlwidgets   1.6.2      2023-03-17 [1] CRAN (R 4.3.0)
 jsonlite      1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 knitr         1.43       2023-05-25 [1] CRAN (R 4.3.0)
 lattice       0.21-8     2023-04-05 [4] CRAN (R 4.3.0)
 lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.3.0)
 lme4        * 1.1-34     2023-07-04 [1] CRAN (R 4.3.1)
 lubridate   * 1.9.2.9000 2023-05-15 [1] https://inbo.r-universe.dev (R 4.3.0)
 magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 MASS          7.3-60     2023-05-04 [4] CRAN (R 4.3.1)
 Matrix      * 1.5-4.1    2023-05-18 [1] CRAN (R 4.3.0)
 minqa         1.2.5      2022-10-19 [1] CRAN (R 4.3.0)
 munsell       0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 nlme          3.1-162    2023-01-31 [1] CRAN (R 4.3.0)
 nloptr        2.0.3      2022-05-26 [1] CRAN (R 4.3.0)
 pillar        1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 purrr       * 1.0.1      2023-01-10 [1] CRAN (R 4.3.0)
 R6            2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp          1.0.10     2023-01-22 [1] CRAN (R 4.3.0)
 readr       * 2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 rlang         1.1.1      2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown     2.23       2023-07-01 [1] CRAN (R 4.3.1)
 rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.3.0)
 scales        1.2.1      2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 stringi       1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
 stringr     * 1.5.0      2022-12-02 [1] CRAN (R 4.3.0)
 tibble      * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.0      2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse   * 2.0.0      2023-02-22 [1] CRAN (R 4.3.0)
 timechange    0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 tzdb          0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8          1.2.3      2023-01-31 [1] CRAN (R 4.3.0)
 vctrs         0.6.3      2023-06-14 [1] CRAN (R 4.3.0)
 withr         2.5.0      2022-03-03 [1] CRAN (R 4.3.0)
 xfun          0.39       2023-04-20 [1] CRAN (R 4.3.0)
 yaml          2.3.7      2023-01-23 [1] CRAN (R 4.3.0)

 [1] /home/thierry/R/x86_64-pc-linux-gnu-library/4.3
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────

References

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.