Temporal autocorrelation in INLA

statistics
mixed-models
Author

Thierry Onkelinx

Published

March 5, 2018

One of the reason why I often use INLA is because it allows for correlated random effects. In this blog post, I will handle random effect with temporal autocorrelation. INLA has several options for this. There are two major types of model, the first handles discrete time step, the latter continuous time steps.

Dummy data set

This blog post was inspired by a post on the R-Sig-Mixed models mailing list. Therefore I created some example data relevant for that post. The set-up were 34 participants received five random prompts per day for six weeks, asking them whether they were craving a particular drug.

The chunk below sets the design. Timestamp assumes that the participant were only prompted between 8:00 and 22:00. Hour are the timestamps rounded to the hour.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2          ✔ readr     2.1.4     
✔ forcats   1.0.0          ✔ stringr   1.5.0     
✔ ggplot2   3.4.2          ✔ tibble    3.2.1     
✔ lubridate 1.9.2.9000     ✔ tidyr     1.3.0     
✔ purrr     1.0.1          
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
set.seed(20180305)
n_id <- 34
n_day <- 7 * 6
n_prompt <- 5
expand.grid(
  ID = seq_len(n_id),
  Day = seq_len(n_day),
  Prompt = seq_len(n_prompt)
) %>%
  mutate(
    Timestamp = runif(n(), min = 8, max = 21.99999),
    Timestamp2 = round(Timestamp, 2),
    Hour = floor(Timestamp)
  ) -> example

Next we need to create a true model. Let’s assume a first order random walking along Timestamp and a random intercept per participant (Figure 1). A real life model would be more complicated. Choosing a simpler model keeps the computation time down.

data.frame(
  Timestamp2 = seq(min(example$Timestamp2), max(example$Timestamp2), by = 0.01)
) %>%
  mutate(fixed = cumsum(rnorm(n(), sd = 0.05))) -> effect
re <- rnorm(n_id, sd = 0.5)
example %>%
  inner_join(effect, by = "Timestamp2") %>%
  mutate(
    random = re[ID],
    eta = fixed + random,
    mu = plogis(eta + rnorm(n(), sd = 0.1)),
    Craving = rbinom(n(), size = 1, prob = mu),
    DayTime = Day + Timestamp / 24
  ) -> example
library(plotly)
p <- ggplot(example, aes(x = DayTime, y = plogis(fixed))) +
  geom_line() +
  xlab("Time (in days)") +
  ylab("Probability of craving")
ggplotly(p)
Figure 1: Overal true trend in carving for the ‘average’ participant.

Discrete time step

Some temporal models

  • ar1
    • the current value is \(\rho\) times the previous value + a noise term \(\varepsilon_{ij}\)
    • \(x_i = \rho x_{i-1} + \varepsilon_{ij}\)
    • restrictions
      • \(|\rho|<1\)
      • \(\varepsilon_{ij} \sim \mathcal{N}(0, \tau^{-1})\)
  • rw1
    • the current value is the previous value plus an increment \(\Delta x_i\)
    • \(x_i = x_{i - 1} + \Delta x_i\)
    • restrictions
      • \(\Delta x_i \sim \mathcal{N}(0, \tau^{-1})\)
      • \(\sum \Delta x_i = 0\)
  • rw2
    • while rw1 models the difference between two time points, rw2 models the difference between the two consecutive differences.
    • think of rw1 as the first derivative and rw2 as the second derivative
    • \(\Delta ^2 x_i = (x_{i + 1} - x_i) - (x_i - x_{i - 1})\)
    • \(x_i = \frac{x_{i + 1} + x_{i - 1}}{2} - \frac{\Delta ^2 x_i}{2}\)
    • So the current value is the average of the next and the previous value minus half of \(\Delta ^2 x_i\)
    • restrictions
      • \(\Delta^2 x_i \sim \mathcal{N}(0, \tau^{-1})\)
      • \(\sum \Delta x_i = 0\)

Fitting the models

We are using the defaults priors, except for the rw1 model where we use the currently recommended prior, which is not the default prior. We fitted the ar1 model once without sum-to-zero constraint (default) and once with sum-to-zero constraint.

library(INLA)
model_ar1 <- inla(
  Craving ~ f(Hour, model = "ar1") + f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)
model_ar1c <- inla(
  Craving ~ f(Hour, model = "ar1", constr = TRUE) + f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)
pc_prec <- list(theta = list(prior = "pc.prec", param = c(0.5, 0.01)))
model_rw1 <- inla(
  Craving ~ f(Hour, model = "rw1", hyper = pc_prec) + f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)
model_rw2 <- inla(
  Craving ~ f(Hour, model = "rw2") + f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)

Comparing the models

The fitted values for the models are very similar (Figure 2). The main difference is that the rw2 model is much smoother than the other models. The fitted values for the ar1 and rw1 model are very similar.

Figure 3 show the effect of the random effect of Hour under the different models. Overall, the pattern are very similar to those of the fitted values. This is expected since, besides the temporal pattern of Hour, the models only contains random intercepts. Note that uses the original scale, whereas Figure 3 uses the logit scale.

The default ar1 model has a much larger uncertainty than rw1 for both the random effect of Hour and the fixed intercept and yet the uncertainty for the fitted values is nearly identical for both models. This is because the default ar1 has no sum-to-zero constraint which makes the intercept unidentifiable when there is only one replica of the ar1 model. When the sum-to-zero constraint it also added to the ar1 model, then the results are very similar to those from the rw1 model.

p <- bind_rows(
  model_ar1$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "ar1") %>%
    bind_cols(example),
  model_ar1c$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "ar1c") %>%
    bind_cols(example),
  model_rw1$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "rw1") %>%
    bind_cols(example),
  model_rw2$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "rw2") %>%
    bind_cols(example)
) %>%
  filter(ID == 1) %>%
  ggplot(aes(x = Hour, y = fitted, ymin = lcl, ymax = ucl)) +
  geom_ribbon(alpha = 0.1, aes(fill = Model)) +
  geom_line(aes(y = lcl, colour = Model), linetype = 3) +
  geom_line(aes(y = ucl, colour = Model), linetype = 3) +
  geom_line(aes(colour = Model))
ggplotly(p)
Figure 2: Comparsion of the fitted values for ID == 1 under the different models
p <- bind_rows(
  model_ar1$summary.random$Hour %>%
    select(Hour = 1, mean = 2, lcl = 4, ucl = 6) %>%
    mutate(Model = "ar1"),
  model_ar1c$summary.random$Hour %>%
    select(Hour = 1, mean = 2, lcl = 4, ucl = 6) %>%
    mutate(Model = "ar1c"),
  model_rw1$summary.random$Hour %>%
    select(Hour = 1, mean = 2, lcl = 4, ucl = 6) %>%
    mutate(Model = "rw1"),
  model_rw2$summary.random$Hour %>%
    select(Hour = 1, mean = 2, lcl = 4, ucl = 6) %>%
    mutate(Model = "rw2")
) %>%
  ggplot(aes(x = Hour, y = mean, ymin = lcl, ymax = ucl)) +
  geom_ribbon(alpha = 0.1, aes(fill = Model)) +
  geom_line(aes(colour = Model))
ggplotly(p)
Figure 3: Comparsion of the random effects of Hour under the different models
bind_rows(
  model_ar1$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "ar1"),
  model_ar1c$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "ar1c"),
  model_rw1$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "rw1"),
  model_rw2$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "rw2")
) %>%
  ggplot(aes(x = model, y = mean, ymin = lcl, ymax = ucl, colour = model)) +
  geom_errorbar() +
  geom_point() +
  coord_flip()

Figure 4: Comparison of the fixed effects of the different models

The \(WAIC\) values of the three models are also very similar.

c(
  ar1 = model_ar1$waic$waic, ar1c = model_ar1c$waic$waic,
  rw1 = model_rw1$waic$waic, rw2 = model_rw2$waic$waic
) |>
  sort()
     rw1      ar1     ar1c      rw2 
7828.555 7829.420 7829.680 7831.265 

Continuous time step

Some temporal models

  • crw2
    • the equivalent of rw2 for irregular locations
    • might required some grouping when the distance between some locations is small
    • no equation provided in the online helpfile
    • restrictions
      • \(\sum \Delta x_i = 0\)
  • ou
    • Stands for the Ornstein-Uhlenbeck process
    • continuous time analogue to the discrete time ar1
    • \(dx_t = -\phi x_t + \sigma dW_t\)
  • rw1
    • binning the continuous time stamps in a large number of equally spaces bins allows to apply rw1 (also ar1 and rw2)

Fitting the models

The Timestamp was split in 100 groups for the crw2 model. We fitted the ou model both with and without sum-to-zero constraint. The time stamp was rounded to 0.01 hours for the rw1 model.

model_crw2 <- inla(
  Craving ~ f(inla.group(Timestamp, n = 100), model = "crw2") +
    f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)
model_ou <- inla(
  Craving ~ f(Timestamp, model = "ou") + f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)
model_ouc <- inla(
  Craving ~ f(Timestamp, model = "ou", constr = TRUE) + f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)
model_rw1t <- inla(
  Craving ~ f(Timestamp2, model = "rw1", hyper = pc_prec) +
    f(ID, model = "iid"),
  family = "binomial", data = example, control.compute = list(waic = TRUE)
)

Comparing the models

The fitted values in Figure 5 are more fine grained than those in Figure 2, simply because we use more fine grained time information. The crw2 model generates a rather smooth pattern. The rw1 model produces similar patterns that the ou model, although the ou seems to be a bit more extreme than the rw1 model. The credible interval of ou, ouc and rw1 are very similar.

When we look at the random effects (Figure 6) and the fixed effects (Figure 7), we can draw similar conclusions as from Figure 3 and Figure 7: the ou without sum-to-zero constraint has wider credible intervals. Adding the sum-to-zero constrains to ou shrinks the credible intervals, although they remain wide than those for rw1.

p <- bind_rows(
  model_ou$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "ou") %>%
    bind_cols(example),
  model_ouc$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "ouc") %>%
    bind_cols(example),
  model_rw1t$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "rw1t") %>%
    bind_cols(example),
  model_crw2$summary.fitted.values %>%
    select(fitted = 1, lcl = 3, ucl = 5) %>%
    mutate(Model = "crw2") %>%
    bind_cols(example)
) %>%
  filter(ID == 1) %>%
  ggplot(aes(x = Timestamp, y = fitted, ymin = lcl, ymax = ucl)) +
  geom_ribbon(alpha = 0.1, aes(fill = Model)) +
  geom_line(aes(y = lcl, colour = Model), linetype = 3) +
  geom_line(aes(y = ucl, colour = Model), linetype = 3) +
  geom_line(aes(colour = Model))
ggplotly(p)
Figure 5: Comparsion of the fitted values for ID == 1 under the different continuous time step models
p <- bind_rows(
  model_ou$summary.random$Timestamp %>%
    select(Timestamp = 1, mean = 2, lcl = 4, ucl = 6) %>%
    mutate(Model = "ou"),
  model_ouc$summary.random$Timestamp %>%
    select(Timestamp = 1, mean = 2, lcl = 4, ucl = 6) %>%
    mutate(Model = "ouc"),
  model_rw1t$summary.random$Timestamp2 %>%
    select(Timestamp = 1, mean = 2, lcl = 4, ucl = 6) %>%
    mutate(Model = "rw1t"),
  model_crw2$summary.random$`inla.group(Timestamp, n = 100)` %>%
    select(Timestamp = 1, mean = 2, lcl = 4, ucl = 6) %>%
    filter(Timestamp > 0) %>%
    mutate(Model = "crw2")
) %>%
  ggplot(aes(x = Timestamp, y = mean, ymin = lcl, ymax = ucl)) +
  geom_ribbon(alpha = 0.1, aes(fill = Model)) +
  geom_line(aes(colour = Model))
ggplotly(p)
Figure 6: Comparsion of the random effects of Timestamp under the different continuous time step models
bind_rows(
  model_ou$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "ou"),
  model_ouc$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "ouc"),
  model_rw1t$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "rw1t"),
  model_crw2$summary.fixed %>%
    select(mean = 1, lcl = 3, ucl = 5) %>%
    mutate(model = "crw2")
) %>%
  ggplot(aes(x = model, y = mean, ymin = lcl, ymax = ucl, colour = model)) +
  geom_errorbar() +
  geom_point() +
  coord_flip()

Figure 7: Comparison of the fixed effects of the different continuous time step models

The ou model and the rw1 model on the fine grained time stamps have the lowest \(WAIC\). Note that the data was generated using an rw1 model at this fine grained level.

sort(c(
  ar1 = model_ar1$waic$waic, rw1 = model_rw1$waic$waic,
  rw2 = model_rw2$waic$waic, crw2 = model_crw2$waic$waic,
  ou = model_ou$waic$waic, ouc = model_ouc$waic$waic,
  rw1t = model_rw1t$waic$waic))
      ou      ouc     rw1t     crw2      rw1      ar1      rw2 
7813.569 7813.570 7813.898 7826.407 7828.555 7829.420 7831.265 

Each fitted inla object contains information on the time that was used to fit the model. It is interesting to see that with this example the ar1 model is much faster than the rw1 model with similar detail (on Hour). Likewise is the ou model faster than the rw1 model (both on Timestamp). Applying the sum-to-zero constraint on ar1 and ou requires some extra time.

c(
  ar1 = model_ar1$cpu.used[4], ar1c = model_ar1c$cpu.used[4],
  rw1 = model_rw1$cpu.used[4], rw2 = model_rw2$cpu.used[4],
  rw1t = model_rw1t$cpu.used[4], crw2 = model_crw2$cpu.used[4],
  ou = model_ou$cpu.used[4], ouc = model_ouc$cpu.used[4]
) |>
  sort()
 rw1.Total  rw2.Total ar1c.Total crw2.Total  ar1.Total rw1t.Total  ouc.Total 
  2.580671   2.785755   2.875859   3.194484   3.194496   3.283555   5.224722 
  ou.Total 
  6.720574 

Session info

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       Ubuntu 22.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language nl_BE:nl
 collate  nl_BE.UTF-8
 ctype    nl_BE.UTF-8
 tz       Europe/Brussels
 date     2023-08-30
 pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version    date (UTC) lib source
 class          7.3-22     2023-05-03 [4] CRAN (R 4.3.1)
 classInt       0.4-9      2023-02-28 [1] CRAN (R 4.3.1)
 cli            3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
 codetools      0.2-19     2023-02-01 [1] CRAN (R 4.3.0)
 colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 crosstalk      1.2.0      2021-11-04 [1] CRAN (R 4.3.0)
 data.table     1.14.8     2023-02-17 [1] CRAN (R 4.3.0)
 DBI            1.1.3      2022-06-18 [1] CRAN (R 4.3.0)
 digest         0.6.32     2023-06-26 [1] CRAN (R 4.3.1)
 dplyr        * 1.1.2      2023-04-20 [1] CRAN (R 4.3.0)
 e1071          1.7-13     2023-02-01 [1] CRAN (R 4.3.1)
 ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
 evaluate       0.21       2023-05-05 [1] CRAN (R 4.3.0)
 fansi          1.0.4      2023-01-22 [1] CRAN (R 4.3.0)
 farver         2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
 fastmap        1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 generics       0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2      * 3.4.2      2023-04-03 [1] CRAN (R 4.3.0)
 glue           1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 gtable         0.3.3      2023-03-21 [1] CRAN (R 4.3.0)
 hms            1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools      0.5.5      2023-03-23 [1] CRAN (R 4.3.0)
 htmlwidgets    1.6.2      2023-03-17 [1] CRAN (R 4.3.0)
 httr           1.4.6      2023-05-08 [1] CRAN (R 4.3.0)
 INLA         * 23.06.29   2023-06-30 [1] local
 inlabru        2.8.0      2023-06-20 [1] CRAN (R 4.3.1)
 jsonlite       1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 KernSmooth     2.23-21    2023-05-03 [1] CRAN (R 4.3.0)
 knitr          1.43       2023-05-25 [1] CRAN (R 4.3.0)
 labeling       0.4.2      2020-10-20 [1] CRAN (R 4.3.0)
 lattice        0.21-8     2023-04-05 [4] CRAN (R 4.3.0)
 lazyeval       0.2.2      2019-03-15 [1] CRAN (R 4.3.0)
 lifecycle      1.0.3      2022-10-07 [1] CRAN (R 4.3.0)
 lubridate    * 1.9.2.9000 2023-05-15 [1] https://inbo.r-universe.dev (R 4.3.0)
 magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 Matrix       * 1.5-4.1    2023-05-18 [1] CRAN (R 4.3.0)
 MatrixModels   0.5-1      2022-09-11 [1] CRAN (R 4.3.0)
 munsell        0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 pillar         1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 plotly       * 4.10.2     2023-06-03 [1] CRAN (R 4.3.0)
 proxy          0.4-27     2022-06-09 [1] CRAN (R 4.3.1)
 purrr        * 1.0.1      2023-01-10 [1] CRAN (R 4.3.0)
 R6             2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp           1.0.10     2023-01-22 [1] CRAN (R 4.3.0)
 readr        * 2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 rlang          1.1.1      2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown      2.23       2023-07-01 [1] CRAN (R 4.3.1)
 rstudioapi     0.14       2022-08-22 [1] CRAN (R 4.3.0)
 scales         1.2.1      2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 sf             1.0-13     2023-05-24 [1] CRAN (R 4.3.0)
 sp           * 2.0-0      2023-06-22 [1] CRAN (R 4.3.1)
 stringi        1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
 stringr      * 1.5.0      2022-12-02 [1] CRAN (R 4.3.0)
 tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr        * 1.3.0      2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect     1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.3.0)
 timechange     0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 units          0.8-2      2023-04-27 [1] CRAN (R 4.3.0)
 utf8           1.2.3      2023-01-31 [1] CRAN (R 4.3.0)
 vctrs          0.6.3      2023-06-14 [1] CRAN (R 4.3.0)
 viridisLite    0.4.2      2023-05-02 [1] CRAN (R 4.3.0)
 withr          2.5.0      2022-03-03 [1] CRAN (R 4.3.0)
 xfun           0.39       2023-04-20 [1] CRAN (R 4.3.0)
 yaml           2.3.7      2023-01-23 [1] CRAN (R 4.3.0)

 [1] /home/thierry/R/x86_64-pc-linux-gnu-library/4.3
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────