Excerpt from Bolker, B. M., 2015. Linear and generalized linear mixed models. In G. A. Fox, S. Negrete-Yankelevich,
and V. J. Sosa (eds.), Ecological Statistics: Contemporary theory and application. Oxford University Press.
ISBN 978-0-19-967255-4. In press.
https://global.oup.com/academic/product/ecological-statistics-9780199672547?cc=ca&lang=en&

Random effects

The traditional view of random effects is as a way to do correct statistical tests when some observations are correlated. When samples are collected in groups (within sites in the tundra example above, or within experimental blocks of any kind), we violate the assumption of independent observations that is part of most statistical models. There will be some variation within groups (s²_within) and some among groups (s²_among); the total variance is s²_total=s²_within+s²_among; and therefore the correlation between any two observations in the same group is (observations that come from different groups are uncorrelated). Sometimes one can solve this problem easily by taking group averages. For example, if were testing for differences between deciduous and evergreen trees, where every member of a species has the same leaf habit, we could simply calculate species’ average responses, throwing away the variation within species, and do a t-test between the deciduous and evergreen species means. If the data are balanced (i.e., if we sample the same number of trees for each species), this procedure is exactly equivalent to testing the fixed effect in a classical mixed model ANOVA with a fixed effect of leaf habit and a random effect of species. This approach correctly incorporates the facts that (1) repeated sampling within species reduces the uncertainty associated with within-group variance, but (2) we have fewer independent data points than observations – in this case, as many as we have groups (species) in our study.

These basic ideas underlie all classical mixed model ANOVA analyses, although the formulas get more complex when treatments vary within grouping variables, or when different fixed effects vary at the levels of different grouping variables (e.g., randomized-block and split-plot designs). For simple nested designs, simpler approaches like the averaging procedure described above are usually best (Murtaugh 2007). However, mixed model ANOVA is still extremely useful for a wide range of more complicated designs, and as discussed below, traditional mixed model ANOVA itself falls short for cases such as unbalanced designs or non-Normal data.

We can also think of random effects as a way to combine information from different levels within a grouping variable. Consider the tundra ecosystem example, where we want to estimate linear trends (slopes) across time for many sites. If we had only a few years sampled from a few sites, we might have to pool the data, ignoring the differences in trend among sites. Pooling assumes that s²_among (the variance in slopes among sites) is effectively zero, so that the individual observations are uncorrelated (r=0). On the other hand, if we had many years sampled from each site, and especially if we had a small number of sites, we might want to estimate the slope for each site individually, or in other words to estimate a fixed effect of time for each site. Treating the grouping factor (site) as a fixed effect assumes that information about one site gives us no information about the slope at any other site; this is equivalent, for the purposes of parameter estimation, to treating s²_among as infinite. Treating site as a random effect compromises between the extremes of pooling and estimating separate (fixed) estimates; we acknowledge, and try to quantify, the variability in slope among sites. Because the trends are assumed to come from a population (of slopes) with a well-defined mean, the predicted slopes in CO₂ flux for each site are a weighted average between the trend for that site and the overall mean trend across all sites; the smaller and noisier the sample for a particular site, the more its slope is compressed toward the population mean (Figure 13.1). For technical reasons, these values (the deviation of each site’s value from the population average) are called conditional modes, rather than estimates. The conditional modes are also sometimes called random effects, but this could also refer to the grouping variables (the sites themselves, in the tundra example). Confusingly, both the conditional modes and the estimates of the among-site variances can be considered parameters of the random effects part of the model. For example, if we had independently estimated the trend at one site (i.e. as a fixed effect) as -5 grams C/m²/year, with an estimated variance of 1, while the mean rate of all the sites was -8 g C/m²/year with an among-site variance of 3, then our predicted value for that site would be (m_site/s²_within + m_overall/s²_among)/ (1/s²_within + 1/s²_among) = (‑5/1+-8/3)/(1/1+1/3)=-5.75 g C/m²/year. Because s²_within<s²_among -- the trend estimate for the site is relatively precise compared to the variance among sites -- the random-effects prediction is closer to the site-specific value than to the overall mean. (Stop and plug in a few different values of among-site variance to convince yourself that this formula agrees with verbal description above of how variance-weighted averaging works when s²_among is either very small or very large relative to s²_within.)

Random effects are especially useful when we have (1) lots of levels (e.g. many species or blocks), (2) relatively little data on each level (although we need multiple samples from most of the levels), and (3) uneven sampling across levels (Box 13.1).

Frequentists and Bayesians define random effects somewhat differently, which affects the way they use them. Frequentists define random effects as categorical variables whose levels are chosen at random from a larger population, e.g. species chosen at random from a list of endemic species. Bayesians define random effects as sets of variables whose parameters are drawn from a distribution. The frequentist definition is philosophically coherent, and you will encounter researchers (including reviewers and supervisors) who insist on it, but it can be practically problematic. For example, it implies that you can’t use species as random effect when you have observed all of the species at your field site -- since the list of species is not a sample from a larger population -- or using year as a random effect -- since researchers rarely run an experiment in randomly sampled years: they usually use either a series of consecutive years, or the haphazard set of years when they could get into the field. This problem applies to both the gopher tortoise and tick examples, each of which use data from consecutive years.

BOX 13.1.

You may want to treat a predictor variable as a random effect if you:

· don’t want to test hypotheses about differences between responses at particular levels of the grouping variable;

· do want to quantify the variability among levels of the grouping variable;

· do want to make predictions about unobserved levels of the grouping variable;

· do want to combine information across levels of the grouping variable;

· have variation in information per level (number of samples or noisiness);

· have levels that are randomly sampled from/representative of a larger population.

· have a categorical predictor that is a nuisance variable (i.e. it is not of direct interest, but should be controlled for)

cf. Crawley (2002), Gelman (2005)

If you have sampled fewer than 5 levels of the grouping variable, you should strongly consider treating it as a fixed effect even if one or more of the criteria above apply.

Random effects can also be described as predictor variables where you are interested in making inferences about the distribution of values (i.e., the variance among the values of the response at different levels) rather than in testing the differences of values between particular levels. Choosing a random effect trades the ability to test hypotheses about differences among particular levels (low vs. high nitrogen, 2001 vs. 2002 vs. 2003) for the ability to (1) quantify the variance among levels (variability among sites, among species, etc.) and (2) generalize to levels that were not measured in your experiment. If you treat species as a fixed effect, you can’t say anything about an unmeasured species; if you use it as a random effect, then you can guess that an unmeasured species will have a value equal to the population mean estimated from the species you did measure. Of course, as with all statistical generalization, your levels (e.g. years) must be chosen in some way that, if not random, is at least representative of the population you want to generalize to.

People sometimes say that random effects are “factors that you aren’t interested in”. This is not always true. While it is often the case in ecological experiments (where variation among sites is usually just a nuisance), it is sometimes of great interest, for example in evolutionary studies where the variation among genotypes is the raw material for natural selection, or in demographic studies where among-year variation lowers long-term growth rates. In some cases fixed effects are also used to control for uninteresting variation, e.g. using mass as a covariate to control for effects of body size.

You will also hear that “you can’t say anything about the (predicted) value of a conditional mode.” This is not true either – you can’t formally test a null hypothesis that the value is equal to zero, or that the values of two different levels are equal, but it is still perfectly sensible to look at the predicted value, and even to compute a standard error of the predicted value (e.g. see the error bars around the conditional modes in Figure 13.1). Particularly in management contexts, researchers may care very much about which sites are particularly good or bad relative to the population average, and how much better or worse they are than the average. Even though it’s difficult to compute formal inferential summaries such as p-values, you can still make common-sense statements about the conditional modes and their uncertainties.

The Bayesian framework has a simpler definition of random effects. Under a Bayesian approach, a fixed effect is one where we estimate each parameter (e.g. the mean for each species within a genus) independently (with independently specified priors), while for a random effect the parameters for each level are modeled as being drawn from a distribution (usually Normal); in standard statistical notation, species_mean ~ Normal(genus_mean, s²_species).

I said above that random effects are most useful when the grouping variable has many measured levels. Conversely, random effects are generally ineffective when the grouping variable has too few levels. You usually can’t use random effects when the grouping variable has fewer than 5 levels, and random effects variance estimates are unstable with fewer than 8 level, because you are trying to estimate a variance from a very small sample. In the classic ANOVA approach, where all of the variance estimates are derived from simple sums-of-squares calculations, random effects calculations work as long as you have at least two samples (although their power will be very low, and sometimes you can get negative variance estimates). In the modern mixed modeling approach, you tend to get warnings and errors from the software instead, or estimates of zero variance, but in any case the results will be unreliable (section 13.5 offers a few tricks for handling this case). Both the gopher tortoise and grouse tick examples have year as a categorical variable that would ideally be treated as random, but we treat it as fixed because there are only three years sampled: treating years as a random effect would most likely estimate the among-year variance as zero.