Required number of levels for a random effects

Author

Thierry Onkelinx

Published

September 3, 2018

When we analyse a mixed models, the question often arises whether a covariate should be used as a random effect or as a fixed effects. Let’s assume a simple design. Two types of fertilizer are tested on a number of fields (ns). Each field is split in two and the fertilizers are assigned at random to these halves while making sure that each field has both treatments. From a conceptual point of view, we are only interested in the effect of the fertilizers. However, we need to take the potential effect of the field into account. Hence we use fertiliser as a fixed effect and field as a random effect.

Example design
field Part 1 Part2
1 A B
2 A B
3 A B
4 B A
5 B A
6 B A
7 B A
8 A B
9 A B
10 B A

While this makes conceptually sense, we might run into computational problems. Instead of estimating each individual field effect, we want to estimate the variability due to field effect. This is the variance of the random effect, rather than the estimates for the individual random effect levels. But how precise is the estimate of this random effect variance? The sample variance s2 follows a scaled χ2 distribution.

(ns1)s2σ2χns12

This equation makes it straightforward to calculate the distribution of the r=σ2/s2 ratio for a given ns. We can rewrite this as σ2=rs2 or the true variance is a multiple r of the estimated variance s2. The distribution of r is derived below.

(ns1)s2σ2χns12

s2σ2χns12(ns1)

r=σ2s2(ns1)χns12

Even when with a large number of levels (ns=1000), there still is a reasonable amount of uncertainty around σ2 since the 95% interval of r ranges from 0.918 to 1.094. The full distribution for r when ns=1000 is shown in the figure below.

Figure 1: Distribution of the relative true variance (σ2s2) when ns=1000. Dashed line: σ2s2=1 Dotted lines: 2.5% and 97.5% quantiles of σ2s2. Dash-dotted line: median of σ2s2.

A thousand random effect levels is not always feasible. So what will happen if we use a more realistic number of random effect levels, e.g. ns=20. The distribution of r will be wider as ns decreases. At ns=20, the 95% interval of r will range from 0.578 to 2.133.

Figure 2: Distribution of the relative true variance (σ2s2) when ns=20. Dashed line: σ2s2=1 Dotted lines: 2.5% and 97.5% quantiles of σ2s2. Dash-dotted line: median of σ2s2.

How low can we go? The figure below depicts the density of r when ns=4. The 95% interval of r ranges from 0.321 to 13.902, so we can hardly do any inference on the estimated variance.

Figure 3: Distribution of the relative true variance (σ2s2) when ns=4. Dashed line: σ2s2=1 Dotted lines: 2.5% and 97.5% quantiles of σ2s2. Dash-dotted line: median of σ2s2.

Recommendations

We see that the number of random effect levels has a strong impact on the uncertainty of the estimate variance. The figure below displays the 97.5% quantile of r for a large range of ns values. Since the right tail of r is heavier than the left tail, it is a good idea to focus on the 97.5% quantile. Based on this figure we can give the following recommendations:

  • get ns>1000 levels when an accurate estimate of the random effect variance is crucial. E.g. when a single number will be use for power calculations.
  • get ns>100 levels when a reasonable estimate of the random effect variance is sufficient. E.g. power calculations with sensitivity analysis of the random effect variance.
  • get ns>20 levels for an experimental study
  • in case 10<ns<20 you should validate the model very cautious before using the output
  • in case ns<10 it is safer to use the variable as a fixed effect.

Figure 4: Uncertainty on the random effect variance in function of the number of random effect levels.

Session info

These R packages were used to create this post.

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       Ubuntu 22.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language nl_BE:nl
 collate  nl_BE.UTF-8
 ctype    nl_BE.UTF-8
 tz       Europe/Brussels
 date     2023-09-02
 pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 cli           3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
 colorspace    2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 digest        0.6.32     2023-06-26 [1] CRAN (R 4.3.1)
 dplyr       * 1.1.2      2023-04-20 [1] CRAN (R 4.3.0)
 evaluate      0.21       2023-05-05 [1] CRAN (R 4.3.0)
 fansi         1.0.4      2023-01-22 [1] CRAN (R 4.3.0)
 farver        2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
 fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 forcats     * 1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 generics      0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2     * 3.4.2      2023-04-03 [1] CRAN (R 4.3.0)
 glue          1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 gtable        0.3.3      2023-03-21 [1] CRAN (R 4.3.0)
 hms           1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools     0.5.5      2023-03-23 [1] CRAN (R 4.3.0)
 htmlwidgets   1.6.2      2023-03-17 [1] CRAN (R 4.3.0)
 jsonlite      1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 knitr       * 1.43       2023-05-25 [1] CRAN (R 4.3.0)
 labeling      0.4.2      2020-10-20 [1] CRAN (R 4.3.0)
 lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.3.0)
 lubridate   * 1.9.2.9000 2023-05-15 [1] https://inbo.r-universe.dev (R 4.3.0)
 magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 munsell       0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 pillar        1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 purrr       * 1.0.1      2023-01-10 [1] CRAN (R 4.3.0)
 R6            2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 readr       * 2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 rlang         1.1.1      2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown     2.23       2023-07-01 [1] CRAN (R 4.3.1)
 rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.3.0)
 scales        1.2.1      2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 stringi       1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
 stringr     * 1.5.0      2022-12-02 [1] CRAN (R 4.3.0)
 tibble      * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.0      2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse   * 2.0.0      2023-02-22 [1] CRAN (R 4.3.0)
 timechange    0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 tzdb          0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8          1.2.3      2023-01-31 [1] CRAN (R 4.3.0)
 vctrs         0.6.3      2023-06-14 [1] CRAN (R 4.3.0)
 withr         2.5.0      2022-03-03 [1] CRAN (R 4.3.0)
 xfun          0.39       2023-04-20 [1] CRAN (R 4.3.0)
 yaml          2.3.7      2023-01-23 [1] CRAN (R 4.3.0)

 [1] /home/thierry/R/x86_64-pc-linux-gnu-library/4.3
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────