Including covariates

library(ggplot2)
library(phenomix)
library(dplyr)
library(TMB)

Covariates

Covariates may be included in phenomix models in several ways. First, we allow them to affect the mean,

μ = XB + Zα

where fixed effect covariates X are linked to the mean via coefficients B and random effects α are linked to the mean via design matrix Z. For simplicity, we only assume that the random effects are IID in time, e.g. μδ ∼ Normal(0, σμ). The random effects are optional of course, and may be turned on / off with the est_mu_re argument.

Fixed effects for the mean at minimum include a global intercept, but may include any other covariates via the formula interface. The formula is included as an argument to the fit() function, and defaults to mu ~ 1, or an intercept only model. Including temperature could be done as mu ~ temp, which also includes an intercept.

Importantly, if covariates are to be included in the mean or standard deviation, a second data frame covar_data must be included that has a unique covariate value for each time slice of data (e.g. year). For example, in the example above with temperature, covar data would have to look something like this:

# temp is a dummy variable here representing annual deviations in temperature, 
# but you could replace it here with your own data
covar_data = data.frame(year = 2000:2020, 
                        temp = rnorm(21,mean=0,sd=1))

Similarly, we could include a linear trend in the mean mu ~ nyear, where nyear is a numeric version of year. One way to pass in the data would be simply making nyear redundant with year,

covar_data = data.frame(year = 2000:2020)
covar_data$nyear <- covar_data$year

However, we’ve found that approach can be slightly unstable. Instead, we can center or scale the numeric year variable. Here, we’ll scale it to start at 1.

covar_data = data.frame(year = 2000:2020)
covar_data$nyear <- covar_data$year - min(covar_data$year) + 1

In addition to the mean, we allow covariates to affect the standard deviation. The overall approach is exactly as above with the mean, however a couple points are worth highlighting:

  • With the mean, we did not use a link function. With the standard deviation(s) we assume the covariate effects are estimated in log-space, to keep standard deviations positive.

log(σ1) = XB + Zα * For asymmetric models, we assume that the same covariates act on both the left and right sides of the peak (in other words, we do not allow separate formulas for each side)

  • The relationship between covariates and the variances is specified through a separate formula, sigma ~ 1

  • For asymmetric models, we estimate different coefficients on each side. So for a model with sigma~temp we could estimate

  • Random effects in the standard deviation are turned on/off with the est_sigma_re argument in create_data()