Lecture 2

Some practical tips on inlabru

Sara Martino

Dept. of Mathematical Science, NTNU

Janine Illian

University of Glasgow

Jafet Belmont

University of Glasgow

Fitting a LGM with inlabru

The general workflow:

# Define model components
comps <- component_1(...) +
  component_2(...) + 
  ...

# Define the model predictor
pred <- linear_function(component_1,
                            component_2, ...)

# Build the observation model
lik <- bru_obs(formula = pred,
               family = ... ,
               data = ... ,
                ...)

# Fit the model
fit <- bru(comps, lik, ...)

Fitting a LGM with inlabru

The general workflow:

# Define model components
comps <- component_1(...) +
  component_2(...) + 
  ...

# Define the model predictor
pred <- linear_function(component_1,
                            component_2, ...)

# Build the observation model
lik <- bru_obs(formula = pred,
               family = ... ,
               data = ... ,
                ...)

# Fit the model
fit <- bru(comps, lik, ...)

Basic component features

Basic component features

The basic syntax for components is:

~ my_component_name(
  main = ...,
  model = ...
)
  • my_component_name: this is a user-chosen label for the model component. This is then used as input in predict() and generate().
  • main: here we define the input data for the component.
  • model: The type of model component see ?component, and ?INLA::inla.list.models()$latent.

Examples

  • beta_0(1) or Intercept(1) define one intercept
  • cov_effect(altitude, model= "linear") defines the linear effect of altitude on the predictor.
  • time_effect(time, model= "rw2") defines the smooth effect of time on the predictor.

Shortcuts

Some inlabru shortcuts

In the slides we have fitted the model as:

cmp = ~ -1 + beta0(1) beta1(covariate, model = "linear")
formula = y ~ beta0 + beta1
lik = bru_obs(formula = formula,
              famuly  = "gaussian",
              data = df)
fit = bry(cmp, lik)

but in the practical you have

cmp = ~ -1 + beta0(1) beta1(covariate, model = "linear")

lik = bru_obs(formula = y~.,
              famuly  = "gaussian",
              data = df)
fit = bry(cmp, lik)

Some inlabru shortcuts

In the slides we have fitted the model as:

cmp = ~ -1 + beta0(1) beta1(covariate, model = "linear")
formula = y ~ beta0 + beta1
lik = bru_obs(formula = formula,
              famuly  = "gaussian",
              data = df)
fit = bru(cmp, lik)

but in the practical you have

cmp = ~ -1 + beta0(1) beta1(covariate, model = "linear")

lik = bru_obs(formula = y~.,
              famuly  = "gaussian",
              data = df)
fit = bru(cmp, lik)
  1. You don’t have to define a formula outside of the bru_obs() function..it can be defines also inside the function!
lik = bru_obs(formula = y~  beta0 + beta1,
              famuly  = "gaussian",
              data = df)
  1. The expression y ~ . just means “take all the components and sum them together”. It also tells the bru() function that your predictor is linear.

Why do we need the -1 in the formula?

  • inlabru provides a dedicated term, Intercept(), for specifying an intercept that is shared across all data.

  • When this term is omitted, a common intercept is added by default, unless the formula or component explicitly includes -1.

One wrong and two correct ways of coding

# This is WRONG as we get an intercept too much :-)
cmp = ~ beta0(1) + beta1(covariate, model = "linear") 
lik = bru_obs(formula = y ~ . ,
              data = df)
m1 = bru(cmp, lik)
round(m1$summary.fixed[,c(1,2)],3)
           mean     sd
beta0     0.445 22.361
beta1     0.736  0.326
Intercept 0.445 22.361
# This is correct..
cmp = ~ Intercept(1) + beta1(covariate, model = "linear") 
lik = bru_obs(y~ .,
              data = df)
m1 = bru(cmp, lik)
round(m1$summary.fixed[,c(1,2)],3)
           mean    sd
Intercept 0.890 0.174
beta1     0.736 0.326
# This is also correct
cmp = ~  -1 + beta0(1) + beta1(covariate, model = "linear")
lik = bru_obs(y~. ,
              data = df)
m1 = bru(cmp, lik)
round(m1$summary.fixed[,c(1,2)],3)
       mean    sd
beta0 0.890 0.174
beta1 0.736 0.326

Categorical variables and interactions

Effects of categorical variables and Interactions

inlabru has a specific component type for this that is called fixed.

Let’s look at the mtcars dataset that is in R

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Effects of categorical variables in inlabru

We want to fit a model where gear is the only covariate.

  • gear has three categories: 3, 4 and 5
  • We want to model it as a factor effect
mtcars$gear = factor(mtcars$gear)
m1 = lm(mpg~ gear, data = mtcars)
m1$coef
(Intercept)       gear4       gear5 
  16.106667    8.426667    5.273333 

How do we do this in inlabru?

Effects of categorical variables in inlabru

Option 1: Use the fixed model

# create the model matrix

cmp1 = ~ -1 + gear0(~ gear, model = "fixed")
lik1 = bru_obs(formula = mpg ~.,
               data = mtcars)
fit1 = bru(cmp1, lik1)
fit1$summary.random$gear0[,c(1:3)]
           ID      mean       sd
1 (Intercept) 16.103112 1.205027
2       gear4  8.414875 1.807342
3       gear5  5.253896 2.407237

NOTE

  • gear0(~ gear, model = "fixed") is a lm style model definition! So an intercept is automatically included
  • The results are stored in the summary.random part of the model, not the summary.fixed

Effects of categorical variables in inlabru

  • Option 2: Fixed effect are just random effects with fixed precision 😄
cmp2 = ~ -1 +  gear_effect(gear, model = "iid", fixed = T,initial = -6)
# or: cmp2 = ~ Intercept(1) +  gear_effect(gear_id, model = "iid", fixed = T,initial = -6, constr = T)
lik2 = bru_obs(formula = mpg ~.,
               data = mtcars)
fit2 = bru(cmp2, lik2)
fit2$summary.random$gear_effect$mean
[1] 16.04855 24.42278 21.15033

What is happening here??

This is just a re-parametrization problem

c(m1$coef[1], m1$coef[1] + m1$coef[2],m1$coef[1] + m1$coef[3])

Effects of categorical variables in inlabru

Alternative (and equivalent) coding options

cmp1 = ~ -1 + gear(~ gear, model = "fixed")     #One reference category
cmp2 = ~ -1 + gear( ~ gear -1, model = "fixed") # No common intercept 
cmp3 = ~ -1 + gear(gear, model = "iid", fixed = T,initial = -6) # No common intercept 

NOTE

When using the fixed model, inlabru internally creates a model matrix using the function MatrixModels::model.Matrix(). The syntax

cmp1 = ~ -1 + gear(main = ~ gear, 
                   model = "fixed")     #One reference category

is equivalent to

cmp1 = ~ -1 + gear(main = ~ MatrixModels::model.Matrix(gear, .data.),
                   model = "fixed")     #One reference category

Interactions of linear covariates in inlabru

Here the easiest option is to use again the fixed model type.

Example 1

Say we want to fit a model with interactions between (categorical) gear and (bivariate) vs.

In lm we can do

m2 = lm(mpg ~ gear * vs, data = mtcars)
m2$coefficients
(Intercept)       gear4       gear5          vs    gear4:vs    gear5:vs 
  15.050000    5.950000    4.075000    5.283333   -1.043333    5.991667 

Similarly in inlabru

cmp = ~ -1 + fixed_effects( ~ (gear * vs), model = "fixed")
lik = bru_obs(formula = mpg ~ .,
              data = mtcars)
fit = bru(cmp, lik)
fit$summary.random$fixed_effects$mean
[1] 15.0434624  5.8986288  4.0890388  5.2876289 -0.9880638  5.8809905
fit$summary.random$fixed_effects$sd
[1] 1.179235 3.100019 2.353196 2.618198 4.074736 5.205994

Interactions of linear covariates in inlabru

Example 2

Another example, this time interaction between (categorical) gear and (continuous) disp.

Using lm:

m2 = lm(mpg ~ gear * disp, data = mtcars)
m2$coef
(Intercept)       gear4       gear5        disp  gear4:disp  gear5:disp 
24.51556635 15.05196338  7.14538044 -0.02577046 -0.09644222 -0.02500467 

Using inlabru and the fixed model

cmp = ~ -1 + fixed_effects(~gear * disp, model = "fixed")
lik = bru_obs(formula = mpg ~ .,
              data = mtcars)
fit = bru(cmp, lik)
fit$summary.random$fixed_effects$mean
[1] 24.50106644 14.96913649  7.11471930 -0.02572927 -0.09575870 -0.02486888

Interactions of linear covariates in inlabru

We can also use the idea that “fixed effects are just random effects with fixed precision” 😄

NOTE: This idea can be very useful in more complex contexts where the fixed model does not work (for example if one wants to have spatially varying effect of a covariate!)

Let’s look at the interaction between (categorical) gear and (continuous) disp.

\[ \begin{aligned} \eta_i & = \beta_0 + \beta_{0,\text{GEAR}_i} + \beta_1x_i + \beta_{1,\text{GEAR}_i} x_i\\ & = \beta^*_{0,\text{GEAR}_i} +\beta^*_{1,\text{GEAR}_i} x_i \end{aligned} \]

Interactions of linear covariates in inlabru

We can also use the idea that “fixed effects are just random effects with fixed precision” 😄

NOTE: This idea can be very useful in more complex contexts where the fixed model does not work (for example if one wants to have spatially varying effect of a covariate!)

Let’s look at the interaction between (categorical) gear and (continuous) disp.

\[ \begin{aligned} \eta_i & = \beta_0 + \beta_{0,\text{GEAR}_i} + \beta_1x_i + \beta_{1,\text{GEAR}_i} x_i\\ & = \underbrace{\beta^*_{0,\text{GEAR}_i}}_{\text{iid}} +\underbrace{\beta^*_{1,\text{GEAR}_i}}_{\text{iid}}\ \underbrace{x_i}_{\text{weights}} \end{aligned} \]

Interactions of linear covariates in inlabru

The model for the linear predictor is:

cmp = ~ -1 + beta0(gear, model = "iid", initial = -6, fixed = T) +
  beta1(gear, disp, model = "iid", initial = -6, fixed = T)
lik = bru_obs(formula = mpg ~ .,
              data = mtcars)
fit = bru(cmp, lik)

fit$summary.random$beta0$mean
[1] 24.15664 38.93821 31.16917
fit$summary.random$beta1$mean
[1] -0.02475087 -0.11752667 -0.04884794

Compare with the lm results

m2$coefficients
(Intercept)       gear4       gear5        disp  gear4:disp  gear5:disp 
24.51556635 15.05196338  7.14538044 -0.02577046 -0.09644222 -0.02500467 

The difference is just in the different parametrization 😄

Getting results from the fitted model

The result object

The fitted model is stored in an inlabru object

library(car)
data("Salaries")
# it is a good idea to scale the response
Salaries$salary  = Salaries$salary/1000
Salaries[1:3,]
      rank discipline yrs.since.phd yrs.service  sex salary
1     Prof          B            19          18 Male 139.75
2     Prof          B            20          16 Male 173.20
3 AsstProf          B             4           3 Male  79.75
cmp = ~ Intercept(1) + sex(sex, model = "iid", initial = -6, fixed = T) +
  time(yrs.since.phd, model = "linear")
lik = bru_obs(formula = salary ~ .,
              data = Salaries)
result = bru(cmp, lik)
summary(result)
inlabru version: 2.13.0.9033 
INLA version: 26.02.06 
Latent components:
Intercept: main = linear(1)
sex: main = iid(sex)
time: main = linear(yrs.since.phd)
Observation models:
  Model tag: <No tag>
    Family: 'gaussian'
    Data class: 'data.frame'
    Response class: 'numeric'
    Predictor: salary ~ Intercept + sex + time
    Additive/Linear/Rowwise: TRUE/TRUE/TRUE
    Used components: effect[Intercept, sex, time], latent[] 
Time used:
    Pre = 0.492, Running = 0.19, Post = 0.014, Total = 0.696 
Fixed effects:
            mean     sd 0.025quant 0.5quant 0.975quant   mode kld
Intercept 73.634 13.214     47.722   73.634     99.546 73.634   0
time       0.976  0.108      0.764    0.976      1.188  0.976   0

Random effects:
  Name    Model
    sex IID model

Model hyperparameters:
                                         mean   sd 0.025quant 0.5quant
Precision for the Gaussian observations 0.001 0.00      0.001    0.001
                                        0.975quant  mode
Precision for the Gaussian observations      0.002 0.001

Marginal log-Likelihood:  -1909.39 
 is computed 
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Getting results I

Some results are very easy to get as they are contained in the result object:

  • Summaries for fixed effects
result$summary.fixed
                mean         sd 0.025quant   0.5quant 0.975quant       mode
Intercept 73.6338668 13.2142639 47.7218377 73.6339188  99.545600 73.6339190
time       0.9758411  0.1079614  0.7641549  0.9757971   1.187779  0.9757972
                   kld
Intercept 5.468309e-11
time      3.966806e-10
  • Summaries for random effects
result$summary.random$sex
      ID     mean       sd 0.025quant 0.5quant 0.975quant     mode          kld
1 Female 10.74085 13.21356 -15.169724 10.74088   36.65129 10.74088 5.470145e-11
2   Male 18.96517 13.12601  -6.773644 18.96517   44.70400 18.96517 5.501394e-11
  • Summaries for hyperparameters
result$summary.hyperpar
                                               mean          sd  0.025quant
Precision for the Gaussian observations 0.001332104 9.44603e-05 0.001152964
                                           0.5quant  0.975quant        mode
Precision for the Gaussian observations 0.001329884 0.001523951 0.001325018

Getting results I

  • Marginals for fixed effects
result$marginals.fixed$Intercept[1:3,]
            x            y
[1,] 17.31393 3.445719e-06
[2,] 24.47190 2.993035e-05
[3,] 32.77900 2.546511e-04
  • Summaries for random effects
result$marginals.random$sex$index.1[1:3,]
             x            y
[1,] -45.57561 3.445950e-06
[2,] -38.41813 2.993244e-05
[3,] -30.11160 2.546681e-04
  • Summaries for hyperparameters
result$marginals.hyperpar$`Precision for the Gaussian observations`[1:3,]
               x         y
[1,] 0.001003775  4.078690
[2,] 0.001018635  8.164013
[3,] 0.001059874 43.850525

predict() and generate() functions

Often one wants more “complex” results from the model.

For example if we fit to the mtcar data, the model

\[ \eta_i = \beta_{0,\text{GEAR}_i} + \beta_{1,\text{GEAR}_i}x_i,\ i =1,\dots,n \]

We might want to recover the three regression lines

\[ \beta_{0,\text{GEAR}_i} + \beta_{1,\text{GEAR}_i}x_i ,\ \text{GEAR} = 3,4,5 \]

To do this we can use the predict() and generate() functions.

  • predict() simulates from the fitted posterior distribution \(\pi(\mathbf{y}|\mathbf{u},\theta)\), computes what you ask and produces a summary of the samples (mean, sd, quantiles, etc)

  • generate() simulates from the fitted posterior distribution \(\pi(\mathbf{y}|\mathbf{u},\theta)\), computes what you ask and return the samples!

Example 1

cmp = ~ -1 + fixed_effects(~gear * disp, model = "fixed")
lik = bru_obs(formula = mpg ~ .,
              data = mtcars)
fit = bru(cmp, lik)

new_data  = data.frame(gear = factor(rep(3:5, each = 400)),
           disp = seq(71,470,1))
res = predict(fit, new_data, ~ fixed_effects)

res[1:5,]
  gear disp     mean       sd   q0.025     q0.5   q0.975   median
1    3   71 22.39604 2.034613 18.99907 22.54192 26.31040 22.54192
2    3   72 22.37112 2.027809 18.98687 22.51678 26.27229 22.51678
3    3   73 22.34619 2.021007 18.97468 22.49164 26.23418 22.49164
4    3   74 22.32127 2.014208 18.96248 22.46650 26.19608 22.46650
5    3   75 22.29635 2.007411 18.95029 22.44135 26.15797 22.44135
  mean.mc_std_err sd.mc_std_err
1       0.2369176     0.1672814
2       0.2361205     0.1666981
3       0.2353237     0.1661147
4       0.2345271     0.1655314
5       0.2337307     0.1649481

Example 1

res %>% ggplot() + geom_line(aes(disp, mean, group = gear, color = gear)) +
  geom_ribbon(aes(disp, ymin = q0.025 , ymax = q0.975,
                  group = gear, fill = gear), alpha = 0.5)

Example 2

cmp = ~ -1 + beta0(gear, model = "iid", initial = -6, fixed = T) +
  beta1(gear, disp, model = "iid", initial = -6, fixed = T)
lik = bru_obs(formula = mpg ~ .,
              data = mtcars)
fit = bru(cmp, lik)

res = predict(fit, new_data, ~ beta0 + beta1)

res[1:5,]
  gear disp     mean       sd   q0.025     q0.5   q0.975   median
1    3   71 22.20469 2.082109 17.86798 22.45977 25.67829 22.45977
2    3   72 22.18086 2.074446 17.85878 22.43372 25.64251 22.43372
3    3   73 22.15703 2.066786 17.84957 22.40766 25.60673 22.40766
4    3   74 22.13320 2.059129 17.84037 22.38161 25.57094 22.38161
5    3   75 22.10936 2.051474 17.83117 22.35555 25.53516 22.35555
  mean.mc_std_err sd.mc_std_err
1       0.2387793     0.1528418
2       0.2379132     0.1523430
3       0.2370475     0.1518446
4       0.2361822     0.1513464
5       0.2353171     0.1508486

Example 2

res %>% ggplot() + geom_line(aes(disp, mean, group = gear, color = gear)) +
  geom_ribbon(aes(disp, ymin = q0.025 , ymax = q0.975,
                  group = gear, fill = gear), alpha = 0.5)

Yet another example

Let’s fit the model for the Tokyo rainfall data \[ \begin{aligned} y_t|\eta_t&\sim\text{Bin}(n_t, p_t),\qquad i = 1,\dots,366\\ \eta_t &= \text{logit}(p_t)= \beta_0 + f(\text{time}_t) \end{aligned} \]

data("Tokyo")
cmp= ~ -1 + Intercept(1) + time(time, model ="rw2")
formula = y ~ Intercept + time
lik = bru_obs(formula = formula,
              data = Tokyo,
              Ntrials = n,
              family = "binomial")
fit = bru(cmp, lik)

We now want to extract results…what do we want?

  • The time effect \(f(\text{time}_t)\) ?
  • The linear predictor \(\eta_t = \beta_0 + f(\text{time}_t)\)?
  • The estimated probability offit precipitation \(p_t = \text{inv_logit}(\eta_t)\) ?

We can get all of them with the predict() or generate() functions!

Example - predict()

preds1 = predict(object = fit, newdata = Tokyo, ~ time)
preds2 = predict(object = fit, newdata = Tokyo, ~ Intercept + time)
inv_logit = function(x) ((1 + exp(-x))^(-1))
preds3 = predict(object = fit, newdata = Tokyo, ~ inv_logit(Intercept + time))

or

inv_logit = function(x) ((1 + exp(-x))^(-1))
preds = predict(object = fit, newdata = Tokyo,
                ~ data.frame(time_eff = time,
                             lin_pred = Intercept + time,
                             probs = inv_logit(Intercept + time)),
                n.samples = 1000
                )
# preds is then a list
round(preds$probs[1:3,],3)
  y n time  mean    sd q0.025  q0.5 q0.975 median mean.mc_std_err sd.mc_std_err
1 0 2    1 0.177 0.084  0.059 0.159  0.377  0.159           0.003         0.003
2 0 2    2 0.174 0.079  0.060 0.158  0.358  0.158           0.003         0.002
3 1 2    3 0.172 0.075  0.063 0.157  0.343  0.157           0.003         0.002

Example - predict()

preds$probs %>% ggplot() + geom_line(aes(time, mean)) +
  geom_ribbon(aes(time, ymin = q0.025, ymax = q0.975), alpha = 0.5)

Example - generate()

samples = generate(object = fit, newdata = Tokyo,
                ~ data.frame(time_eff = time,
                             lin_pred = Intercept + time,
                             probs = inv_logit(Intercept + time)),
                n.samples = 20
                )
# samples is now a list of length 20 (n.samples) each element of the list looks like:

samples[[1]][1:3,]
      time_eff   lin_pred     probs
1  0.145248856 -0.9073913 0.2875340
2 -0.008300456 -1.0609406 0.2571298
3 -0.126704192 -1.1793443 0.2351701

Example - generate()

data.frame(time = Tokyo$time, sapply(samples, function(x) x$probs)) %>%
  pivot_longer(-time) %>%
  ggplot() + geom_line(aes(time, value, group = name, color = factor(name))) +
  theme(legend.position = "none")

The _latent suffix

  • Sometimes, one wants to sample directly from latent parameters without specifying any new input data (or, in other words, to sample with an identity projection matrix)

  • The keyword _latent can be appended to a latent component name to retrieve that

For example, the following code recovers the intercept and the time effect at day 13

p = predict(object = fit, newdata = c(), ~ data.frame(int = Intercept_latent,
                                                      f_1 = time_latent[13]))

NOTE Here the newdata is just an empty object

Manipulating posterior marginals

Manipulating posterior marginals

We can be interested in functions of the posterior marginals that are computed by the bru() function.

For example, we might want to :

  • compute the mean of a posterior marginal
  • compute the median of a posterior marginal
  • sample from the posterior marginal

All this can be done with the help of the inla.*marginal() family of functions

Example from the Tokyo example

Obtain the posterior marginal for the standard deviation of the random effect, compute it’s mean and median and sample from it.

post_prec = fit$marginals.hyperpar$`Precision for time`
post_sd = inla.tmarginal(fun = function(x) 
                          1/sqrt(x), 
                         post_prec)
post_sd_mode = inla.mmarginal(post_sd)
post_sd_mean = inla.emarginal(fun =function(x)x, post_sd )
post_sd_sample = inla.rmarginal(1000, post_sd)
ggplot() + geom_line(data = post_sd, aes(x,y))  +
  geom_histogram(data = data.frame(samples = post_sd_sample), 
                 aes(x = samples, y = after_stat(density)),      
                 color = "black", fill = "lightblue")+
  geom_vline(xintercept = post_sd_mode, color = "red") + 
    geom_vline(xintercept = post_sd_mean, color = "blue") 

The inla.*marginal() family

Function Name Usage
inla.dmarginal(x, marginal, ...) Density at a vector of evaluation points \(x\)
inla.pmarginal(q, marginal, ...) Distribution function at a vector of quantiles q
inla.qmarginal(p, marginal, ...) Quantile function at a vector of probabilities p
inla.rmarginal(n, marginal) Generate n random deviates
inla.hpdmarginal(p, marginal, ...) Compute the highest posterior density interval at level p
inla.emarginal(fun, marginal, ...) Compute the expected value assuming transformation given by fun
inla.mmarginal(marginal) Computes the mode
inla.smarginal(marginal, ...) Smoothed density (returns x-values and interpolated y-values)
inla.tmarginal(fun, marginal, ...) Transform the marginal using function fun
inla.zmarginal(marginal) Summary statistics for the marginal

NAs in inlabru

Missing values in inlabru

The bru() function treats missing values in the dataset differently according to their role in the model.

Take the model definition:

cmp = ~ Intercept(1) + covariate(x, model = "linear") + random(z, model = "iid")
formula = y ~ Intercept + covariate + random
  • NA’s in the response y

    If y[i] = NA, this means that y[i] is not observed, hence gives no contribution to the likelihood.

  • NA’s in fixed effect x

    If x[i] = NA this means that x[i] is not part of the linear predictor for y[i]. For fixed effects, this is equivalent to x[i]=0, hence internally we make this change: x[is.na(x)] = 0

  • NA’s in random effect z

    If z[i] = NA, this means that the random effect does not contribute to the linear predictor for y[i].

Can inlabru deal with missing covariates?

No, inlabru has no way to `impute’ or integrate-out missing covariates.

You have to adjust your model to account for missing covariates.

Sometimes, you can formulate a joint model for the data and the covariates, but this is case-specific.

Missing values when using model = "fixed"

When we use the model = "fixed" missing values can give error:

df = data.frame(y = rnorm(5),
           x = as.factor(c("a","b", NA, "b", "a")))
cmp = ~ -1  + fixed(~x, model = "fixed")
lik = bru_obs(formula = y~.,
              data = df)
fit = bru(cmp, lik)
Error in `iinla()`:
! The total number of response values (N=5) and predictor values (N=4) do not match.
  This is likely due to a mistake in the component or predictor constructions.

The problem here is related to the model.matrix() function that is internally run and that, by default, removes the missing values

dim(df)
[1] 5 2
dim(model.matrix(~x, data = df))
[1] 4 2

This makes the length of the response (5) different from the length of the covariate (4).

One solution is to force model.matrix() not to remove the missing values

options(na.action = "na.pass")
fit = bru(cmp, lik)
fit$summary.random$fixed[,c(1:4)]
           ID       mean        sd 0.025quant
1 (Intercept)  0.5621983 0.7106791 -0.8834028
2          xb -0.7319059 1.1232657 -3.0150691

WARNING this is a global options, it might interfere with other part of your code!

Getting help while using inlabru

Which priors have I used?

You might wonder which priors have you used in your model..

fit = bru(cmp, lik)

INLA::inla.priors.used(fit)

NB inla.priors.used is a function of the INLA package, so you need to load it explicitly.

Getting help!

The inla.doc() function provides help on the different prior, latent models and likelihood.

inla.doc("ar1")
inla.doc("pc.prior")
inla.doc("gaussian")

Which models are implemented?

Likelihoods

# which likelihoods?
inla.list.models("likelihood")
Section [likelihood]
     0binomial                     New 0-inflated Binomial                 
     0binomialS                    New 0-inflated Binomial Swap            
     0poisson                      New 0-inflated Poisson                  
     0poissonS                     New 0-inflated Poisson Swap             
     1poisson                      New 1-inflated Poisson                  
     1poissonS                     New 1-inflated Poisson Swap             
     agaussian                     The aggregated Gaussian likelihoood     
     bcgaussian                    The Box-Cox Gaussian likelihoood        
     bell                          The Bell likelihood                     
     beta                          The Beta likelihood                     
     betabinomial                  The Beta-Binomial likelihood            
     betabinomialna                The Beta-Binomial Normal approximation likelihood
     bgev                          The blended Generalized Extreme Value likelihood
     binomial                      The Binomial likelihood                 
     binomialmix                   Binomial mixture                        
     cbinomial                     The clustered Binomial likelihood       
     cennbinomial2                 The CenNegBinomial2 likelihood (similar to cenpoisson2)
     cenpoisson                    Then censored Poisson likelihood        
     cenpoisson2                   Then censored Poisson likelihood (version 2)
     circularnormal                The circular Gaussian likelihoood       
     cloglike                      User-defined likelihood                 
     coxph                         Cox-proportional hazard likelihood      
     dgompertzsurv                 destructive gompertz (survival) distribution
     dgp                           Discrete generalized Pareto likelihood  
     egp                           Exteneded Generalized Pareto likelihood 
     exponential                   The Exponential likelihood              
     exponentialsurv               The Exponential likelihood (survival)   
     exppower                      The exponential power likelihoood       
     fl                            The fl likelihood                       
     fmri                          fmri distribution (special nc-chi)      
     fmrisurv                      fmri distribution (special nc-chi)      
     gamma                         The Gamma likelihood                    
     gammacount                    A Gamma generalisation of the Poisson likelihood
     gammajw                       A special case of the Gamma likelihood  
     gammajwsurv                   A special case of the Gamma likelihood (survival)
     gammasurv                     The Gamma likelihood (survival)         
     gammasv                       The Gamma likelihood with constant rate 
     gaussian                      The Gaussian likelihoood                
     gaussianjw                    The GaussianJW likelihoood              
     gev                           The Generalized Extreme Value likelihood
     ggaussian                     Generalized Gaussian                    
     ggaussianS                    Generalized GaussianS                   
     gompertz                      gompertz distribution                   
     gompertzsurv                  gompertz distribution                   
     gp                            Generalized Pareto likelihood           
     gpoisson                      The generalized Poisson likelihood      
     iidgamma                      (experimental)                          
     iidlogitbeta                  (experimental)                          
     lavm                          Link adjusted von Mises circular distribution
     loggammafrailty               (experimental)                          
     logistic                      The Logistic likelihoood                
     loglogistic                   The loglogistic likelihood              
     loglogisticsurv               The loglogistic likelihood (survival)   
     lognormal                     The log-Normal likelihood               
     lognormalsurv                 The log-Normal likelihood (survival)    
     logperiodogram                Likelihood for the log-periodogram      
     mgamma                        The modal Gamma likelihood              
     mgammasurv                    The modal Gamma likelihood (survival)   
     nbinomial                     The negBinomial likelihood              
     nbinomial2                    The negBinomial2 likelihood             
     nmix                          Binomial-Poisson mixture                
     nmixnb                        NegBinomial-Poisson mixture             
     npoisson                      The Normal approximation to the Poisson likelihood
     nvm                           Normal approx of the von Mises circular distribution
     nzpoisson                     The nzPoisson likelihood                
     obeta                         The ordered Beta likelihood             
     occupancy                     Occupancy likelihood                    
     poisson                       The Poisson likelihood                  
     poisson.special1              The Poisson.special1 likelihood         
     pom                           Likelihood for the proportional odds model
     qkumar                        A quantile version of the Kumar likelihood
     qloglogistic                  A quantile loglogistic likelihood       
     qloglogisticsurv              A quantile loglogistic likelihood (survival)
     rcpoisson                     Randomly censored Poisson               
     sem                           The SEM likelihoood                     
     simplex                       The simplex likelihood                  
     sn                            The Skew-Normal likelihoood             
     stdgaussian                   The stdGaussian likelihoood             
     stochvol                      The Gaussian stochvol likelihood        
     stochvolln                    The Log-Normal stochvol likelihood      
     stochvolnig                   The Normal inverse Gaussian stochvol likelihood
     stochvolsn                    The SkewNormal stochvol likelihood      
     stochvolt                     The Student-t stochvol likelihood       
     t                             Student-t likelihood                    
     tpoisson                      Thinned Poisson                         
     tstrata                       A stratified version of the Student-t likelihood
     tweedie                       Tweedie distribution                    
     vm                            von Mises circular distribution         
     weibull                       The Weibull likelihood                  
     weibullsurv                   The Weibull likelihood (survival)       
     wrappedcauchy                 The wrapped Cauchy likelihoood          
     xbinomial                     The Binomial likelihood (experimental version)
     xpoisson                      The Poisson likelihood (expert version) 
     zeroinflatedbetabinomial0     Zero-inflated Beta-Binomial, type 0     
     zeroinflatedbetabinomial1     Zero-inflated Beta-Binomial, type 1     
     zeroinflatedbetabinomial2     Zero inflated Beta-Binomial, type 2     
     zeroinflatedbinomial0         Zero-inflated Binomial, type 0          
     zeroinflatedbinomial1         Zero-inflated Binomial, type 1          
     zeroinflatedbinomial2         Zero-inflated Binomial, type 2          
     zeroinflatedcenpoisson0       Zero-inflated censored Poisson, type 0  
     zeroinflatedcenpoisson1       Zero-inflated censored Poisson, type 1  
     zeroinflatednbinomial0        Zero inflated negBinomial, type 0       
     zeroinflatednbinomial1        Zero inflated negBinomial, type 1       
     zeroinflatednbinomial1strata2 Zero inflated negBinomial, type 1, strata 2
     zeroinflatednbinomial1strata3 Zero inflated negBinomial, type 1, strata 3
     zeroinflatednbinomial2        Zero inflated negBinomial, type 2       
     zeroinflatedpoisson0          Zero-inflated Poisson, type 0           
     zeroinflatedpoisson1          Zero-inflated Poisson, type 1           
     zeroinflatedpoisson2          Zero-inflated Poisson, type 2           
     zeroninflatedbinomial2        Zero and N inflated binomial, type 2    
     zeroninflatedbinomial3        Zero and N inflated binomial, type 3    

Which models are implemented?

Components

# which latent models? (components)
inla.list.models("latent")
Section [latent]
     2diid                         (This model is obsolute)                
     ar                            Auto-regressive model of order p (AR(p))
     ar1                           Auto-regressive model of order 1 (AR(1))
     ar1c                          Auto-regressive model of order 1 w/covariates
     besag                         The Besag area model (CAR-model)        
     besag2                        The shared Besag model                  
     besagproper                   A proper version of the Besag model     
     besagproper2                  An alternative proper version of the Besag model
     bym                           The BYM-model (Besag-York-Mollier model)
     bym2                          The BYM-model with the PC priors        
     cgeneric                      Generic latent model specified using C  
     clinear                       Constrained linear effect               
     copy                          Create a copy of a model component      
     crw2                          Exact solution to the random walk of order 2
     dmatern                       Dense Matern field                      
     fgn                           Fractional Gaussian noise model         
     fgn2                          Fractional Gaussian noise model (alt 2) 
     generic                       A generic model                         
     generic0                      A generic model (type 0)                
     generic1                      A generic model (type 1)                
     generic2                      A generic model (type 2)                
     generic3                      A generic model (type 3)                
     iid                           Gaussian random effects in dim=1        
     iid1d                         Gaussian random effect in dim=1 with Wishart prior
     iid2d                         Gaussian random effect in dim=2 with Wishart prior
     iid3d                         Gaussian random effect in dim=3 with Wishart prior
     iid4d                         Gaussian random effect in dim=4 with Wishart prior
     iid5d                         Gaussian random effect in dim=5 with Wishart prior
     iidkd                         Gaussian random effect in dim=k with Wishart prior
     intslope                      Intecept-slope model with Wishart-prior 
     linear                        Alternative interface to an fixed effect
     log1exp                       A nonlinear model of a covariate        
     logdist                       A nonlinear model of a covariate        
     matern2d                      Matern covariance function on a regular grid
     meb                           Berkson measurement error model         
     mec                           Classical measurement error model       
     ou                            The Ornstein-Uhlenbeck process          
     prw2                          Proper random walk of order 2           
     revsigm                       Reverse sigmoidal effect of a covariate 
     rgeneric                      Generic latent model specified using R  
     rw1                           Random walk of order 1                  
     rw2                           Random walk of order 2                  
     rw2d                          Thin-plate spline model                 
     rw2diid                       Thin-plate spline with iid noise        
     scopy                         Create a scopy of a model component     
     seasonal                      Seasonal model for time series          
     sigm                          Sigmoidal effect of a covariate         
     slm                           Spatial lag model                       
     spde                          A SPDE model                            
     spde2                         A SPDE2 model                           
     spde3                         A SPDE3 model                           
     z                             The z-model in a classical mixed model formulation

Which models are implemented?

Priors

# which priors?
inla.list.models("prior")
Section [prior]
     betacorrelation               Beta prior for the correlation          
     dirichlet                     Dirichlet prior                         
     expression:                   A generic prior defined using expressions
     flat                          A constant prior                        
     gamma                         Gamma prior                             
     gaussian                      Gaussian prior                          
     invalid                       Void prior                              
     jeffreystdf                   Jeffreys prior for the doc              
     laplace                       Laplace prior                           
     linksnintercept               Skew-normal-link intercept-prior        
     logflat                       A constant prior for log(theta)         
     loggamma                      Log-Gamma prior                         
     logiflat                      A constant prior for log(1/theta)       
     logitbeta                     Logit prior for a probability           
     logtgaussian                  Truncated Gaussian prior                
     logtnormal                    Truncated Normal prior                  
     minuslogsqrtruncnormal        (obsolete)                              
     mvnorm                        A multivariate Normal prior             
     none                          No prior                                
     normal                        Normal prior                            
     pc                            Generic PC prior                        
     pc.alphaw                     PC prior for alpha in Weibull           
     pc.ar                         PC prior for the AR(p) model            
     pc.cor0                       PC prior correlation, basemodel cor=0   
     pc.cor1                       PC prior correlation, basemodel cor=1   
     pc.dof                        PC prior for log(dof-2)                 
     pc.egptail                    PC prior for the tail in the EGP likelihood
     pc.fgnh                       PC prior for the Hurst parameter in FGN 
     pc.gamma                      PC prior for a Gamma parameter          
     pc.gammacount                 PC prior for the GammaCount likelihood  
     pc.gevtail                    PC prior for the tail in the GEV likelihood
     pc.matern                     PC prior for the Matern SPDE            
     pc.mgamma                     PC prior for a Gamma parameter          
     pc.prec                       PC prior for log(precision)             
     pc.prw2.range                 PCprior for the range in PRW2           
     pc.range                      PC prior for the range in the Matern SPDE
     pc.sn                         PC prior for the skew-normal            
     pc.spde.GA                    (experimental)                          
     pom                           #classes-dependent prior for the POM model
     ref.ar                        Reference prior for the AR(p) model, p<=3
     rprior:                       A R-function defining the prior         
     table:                        A generic tabulated prior               
     wishart1d                     Wishart prior dim=1                     
     wishart2d                     Wishart prior dim=2                     
     wishart3d                     Wishart prior dim=3                     
     wishart4d                     Wishart prior dim=4                     
     wishart5d                     Wishart prior dim=5                     
     wishartkd                     Wishart prior                           

Take home message

Take home message

  • Linear models with interactions

    • inlabru has a special model to implemement this
    • but it is also worth to understand that “fixed effects are just random efffects with fixed precision 😄
  • predict() and generate() functions are powerful tools (based on sampling from the posterior distribution) to obtain many interesting results!

  • missing values are usually not a problem but in some cases have to be treated carefully.