Covariates may be included in phenomix
models in several
ways. First, we allow them to affect the mean,
μ = XB + Zα
where fixed effect covariates X are linked to the mean via
coefficients B and
random effects α are
linked to the mean via design matrix Z. For simplicity, we only
assume that the random effects are IID in time, e.g. μδ ∼ Normal(0, σμ).
The random effects are optional of course, and may be turned on / off
with the est_mu_re
argument.
Fixed effects for the mean at minimum include a global intercept, but
may include any other covariates via the formula interface. The formula
is included as an argument to the fit()
function, and
defaults to mu ~ 1
, or an intercept only model. Including
temperature could be done as mu ~ temp
, which also includes
an intercept.
Importantly, if covariates are to be included in the mean or standard
deviation, a second data frame covar_data
must be included
that has a unique covariate value for each time slice of data
(e.g. year). For example, in the example above with temperature, covar
data would have to look something like this:
# temp is a dummy variable here representing annual deviations in temperature,
# but you could replace it here with your own data
covar_data = data.frame(year = 2000:2020,
temp = rnorm(21,mean=0,sd=1))
Similarly, we could include a linear trend in the mean
mu ~ nyear
, where nyear is a numeric version of year. One
way to pass in the data would be simply making nyear
redundant with year
,
However, we’ve found that approach can be slightly unstable. Instead, we can center or scale the numeric year variable. Here, we’ll scale it to start at 1.
covar_data = data.frame(year = 2000:2020)
covar_data$nyear <- covar_data$year - min(covar_data$year) + 1
In addition to the mean, we allow covariates to affect the standard deviation. The overall approach is exactly as above with the mean, however a couple points are worth highlighting:
log(σ1) = XB + Zα * For asymmetric models, we assume that the same covariates act on both the left and right sides of the peak (in other words, we do not allow separate formulas for each side)
The relationship between covariates and the variances is
specified through a separate formula, sigma ~ 1
For asymmetric models, we estimate different coefficients on each
side. So for a model with sigma~temp
we could
estimate
Random effects in the standard deviation are turned on/off with
the est_sigma_re
argument in
create_data()