% Ben Bolker
% Mon Oct 29 23:20:23 2012
Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.
Data from Toby Marthews, r-sig-mixed-models mailing list:
load("starling.RData")
library(ggplot2)
library(grid)
theme_set(theme_bw())
## squash panels together (need grid package loaded)
zmargin <- theme(panel.margin=unit(0,"lines"))
ggplot(dataf,aes(mnth,stmass))+
geom_point()+
geom_line(aes(group=subject))+
facet_grid(.~roostsitu)+
zmargin
You could also try
ggplot(dataf,aes(mnth,stmass,colour=roostsitu))+
geom_point()+
geom_line(aes(group=subject))
but I find the former more pleasant to look at.
It's pretty obvious that the starting (November) mass varies among roost situations (tree/nest-box/etc.), and that mass increases from November to January, but we might like to know if the slopes differ among situations. That means our fixed effects would be ~roostsitu*month
, with our attention focused on the roostsitu:month
(interaction) term. For random effects, we can allow both intercept (obviously) and slope (maybe) to vary among individuals, via ~1+mnth|subject
or equivalent … in this case, because measurements are only taken in two months, we could also write the random term as ~1|subject/mnth
. However, it turns out that we can't actually estimate random slopes for this model, because every individual is only measured twice. That means that the variance in the slopes would be completely confounded with the residual variance (I only figured this out when I went to run the model in lme4
and it complained: lme
gave me an answer, but (as we shall see) it didn't necessarily make sense …
library(nlme)
lme1 <- lme(stmass~mnth*roostsitu,
random=~1|subject/mnth,data=dataf)
The variance components apparently make sense: look at VarCorr(lme1)
. However, if we try to get the approximate confidence intervals via intervals(lme1,which="var-cov")
we get an error: cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance
. This is a bad sign.
lme2 <- lme(stmass~mnth*roostsitu,random=~1|subject,data=dataf)
We can now get estimates, although the subject-level random effects are very uncertain: see intervals(lme2,which="var-cov")
(we may investigate this further in a future lab).
summary(lme2)
and try to interpret all the pieces.Diagnostic plots: fitted vs. residuals, coloured according to the roostsitu
variable:
plot(lme2,col=dataf$roostsitu)
Q-Q plot (a straight line indicates normality)
qqnorm(lme2,col=dataf$roostsitu)
Boxplots of residuals subdivided by roostsitu
(it's a quirk that you have to put the grouping variable on the left side of the formula here):
plot(lme2,roostsitu~resid(.))
One good way to assess the results of a model fit is to look at a coefficient plot:
library(coefplot2)
coefplot2(lme2)
Stop and explain to yourself what these parameters mean. If you're not sure, try resetting the base level of the roostsitu
factor: dataf2 <- transform(dataf,roostsitu=relevel(roostsitu,ref="other"))
, predict what will happen to the results, and re-run the analysis.)
Exercise: This is actually a slightly trivial example, because there are only two measurements for each individual: thus we can actually get the same answers using a \( t \) test (as long as we only care about the changes within subjects).
Rearrange the data to git differences by subject:
library(plyr)
dataf2 <- ddply(dataf,.(subject), function(x) {
with(x,data.frame(roostsitu,massdiff=diff(stmass)))})
Draw a picture:
ggplot(dataf2,aes(x=roostsitu,y=massdiff))+geom_boxplot()+
geom_dotplot(binaxis="y",stackdir="center",fill="red",alpha=0.5)
Analyze the data with lm
and convince yourself that the estimates (fixed-effect coefficients, residual variance, etc.) are equivalent to those found from the previous analysis.
lme4
The lmer
syntax is almost identical, except that the random effects (in parentheses) are added to the formula rather than being expressed separately:
detach("package:nlme",unload=TRUE) ## nlme and lme4 don't like each other
## Warning: 'nlme' namespace cannot be unloaded: namespace 'nlme' is imported
## by 'lme4.0' so cannot be unloaded
library(lme4)
## Warning: the specification for S3 class "family" in package 'lme4' seems
## equivalent to one from package 'lme4.0' and is not turning on duplicate
## class definitions for this class
## Warning: the specification for class "lmList" in package 'lme4' seems
## equivalent to one from package 'lme4.0' and is not turning on duplicate
## class definitions for this class
lmer1 <- lmer(stmass~mnth*roostsitu+(1|subject),data=dataf)
Compare the results (you can use coefplot2(list(lmer1,lme2))
to compare the fixed effects graphically).
See what other options you have for exploring the results.
MCMCglmm
library(MCMCglmm)
mcmcglmm1 <- MCMCglmm(stmass~mnth*roostsitu,
random=~subject,data=dataf,
verbose=FALSE)
(you can leave verbose
at its default value of TRUE
).
summary()
: printing out the a raw MCMCglmm
model is ugly).For MCMC approaches, it is your responsibility to check that the chain(s) are well-behaved.
Try this:
library(coda)
xyplot(as.mcmc(mcmcglmm1$Sol))
JAGS
/R2jags
jagsmodel <- function() {
for (i in 1:ntot) {
eta[i] <- inprod(X[i,],beta) ## fixed effects
eta2[i] <- eta[i] + u1[subject[i]] ## add random effect
stmass[i] ~ dnorm(eta2[i],tau.res)
}
for (i in 1:nindiv) {
u1[i] ~ dnorm(0,tau.indiv)
}
## priors
for (i in 1:ncoef) {
beta[i] ~ dnorm(0,0.001)
}
## traditional but sometimes dangerous -- discuss in class
tau.indiv ~ dgamma(0.01,0.01)
tau.res ~ dgamma(0.01,0.01)
## for convenience, translate precisions to std devs
sd.indiv <- pow(tau.indiv,-0.5)
sd.res <- pow(tau.res,-0.5)
}
source("writeModel.R")
write.model(jagsmodel,"starling.bug")
JAGS machinery: specify data list, starting values, run model:
library(R2jags)
## Warning: the specification for S3 class "bugs" in package 'R2jags' seems
## equivalent to one from package 'coefplot2' and is not turning on duplicate
## class definitions for this class
modelMat <- model.matrix(~mnth * roostsitu,data=dataf)
jagsData <- list(X=modelMat,
stmass=dataf$stmass,
subject=as.numeric(dataf$subject),
ntot=nrow(dataf),ncoef=ncol(modelMat),
nindiv=length(unique(dataf$subject)))
jagsInits <- list(list(beta=rep(0,8),tau.indiv=0.1,tau.res=0.1))
jags1 <- jags(data=jagsData,
inits=jagsInits,
model="starling.bug",
parameters=c("beta","sd.indiv","sd.res"),n.chains=1)
Notes:
y <- a + b*x
, but for more complex models it rapidly becomes worthwhile to construct a model matrix in R and pass it to JAGS
, so that all you need is to multiply (inprod
essentially does a single row of the \( X \beta \) calculation at a time)ncoef
to 8 even if you know that's what it is for this particular example) – instead, use the dimensions of your data to set themwrite.model
is a handy function from R2WinBUGS
– since we don't need the rest of it, I copied the code and made it availableIn order to do more fun stuff with the results, it's a good idea to (1) convert to an mcmc
object (for coda
quantitative and graphical diagnostics) and (2) rename the beta
variables more meaningfully. JAGS arranges output variables alphabetically: in this case we know the beta
values are the first 8 columns of the output, but we will again try to avoid hard-coding:
jagsmm1 <- as.mcmc(jags1)
betacols <- grep("^beta",colnames(jagsmm1)) ## find beta columns
colnames(jagsmm1)[betacols] <- colnames(modelMat)
Trace plots, density plots, and coefficient plots:
xyplot(jagsmm1) ## trace plot: default single-column layout
xyplot(jagsmm1,layout=c(3,4),asp="fill")
densityplot(jagsmm1,layout=c(3,4),asp="fill")
coefplot2(jags1) ## coefficient plot
Run the Raftery-Lewis diagnostic (suitable for a single chain):
raftery.diag(jagsmm1)
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
Exercise:
list(list(beta=rep(0,8),tau.indiv=0.1,tau.res=0.1), list(beta=rep(1,8),tau.indiv=0.1,tau.res=0.1),beta=rep(0,8),tau.indiv=1,tau.res=1)),n.chains=3,...
; examine the output; run gelman.diag
from the coda
package on the mcmc
results (may need lump.mcmc.list
from the emdbook
package … ?)culcdat <- read.csv("culcitalogreg.csv",
colClasses=c(rep("factor",2),
"numeric",
rep("factor",6)))
## abbreviate slightly
levels(culcdat$ttt.1) <- c("none","crabs","shrimp","both")
Adjust contrasts for the treatment, to compare (1) no-symbiont vs symbiont cases, (2) crabs vs shrimp, (3) effects of a single pair/type of symbionts vs effects of having both:
contrasts(culcdat$ttt) <-
matrix(c(3,-1,-1,-1,
0,1,-1,0,
0,1,1,-2),
nrow=4,
dimnames=list(c("none","C","S","CS"),
c("symb","C.vs.S","twosymb")))
library(lme4)
library(MASS)
culcmod0 <- glmmPQL(predation~ttt,random=~1|block,family=binomial,data=culcdat,
verbose=FALSE)
culcmod1 <- glmer(predation~ttt+(1|block),family=binomial,data=culcdat)
culcmod2 <- glmer(predation~ttt+(1|block),family=binomial,data=culcdat,nAGQ=8)
coefplot2(list(glmmPQL=culcmod0,Laplace=culcmod1,
GHQ8=culcmod2),col=c(1,2,4),legend.x="right")
Try it with glmmADMB
and MCMCglmm
:
library(MCMCglmm)
library(glmmADMB)
## Warning: the specification for S3 class "glmmadmb" in package 'glmmADMB'
## seems equivalent to one from package 'coefplot2' and is not turning on
## duplicate class definitions for this class
culcmod3 <- glmmadmb(predation~ttt,random=~1|block,family="binomial",
data=culcdat)
culcdat$nopred <- 1-culcdat$predation
culcmod4 <- MCMCglmm(cbind(predation,nopred)~ttt,random=~block,family="multinomial2",
data=culcdat,verbose=FALSE)
Check out the results. MCMCglmm
doesn't seem to be doing well: need stronger priors?
culcmodel <- function() {
for (i in 1:ncoef) {
beta[i] ~ dnorm(0,0.001) ## fixed-effect parameters: priors
}
sd.b ~ dunif(0,maxsdprior) ## prior for block variance
tau.b <- 1/sd.b^2
for (i in 1:nblock) {
u[i] ~ dnorm(0,tau.b) ## priors for block random effects
}
for (i in 1:nobs) {
## linear predictor: design matrix*coeffs + random effects
eta[i] <- inprod(X[i,],beta)+u[block[i]]
p[i] <- 1/(1+exp(-eta[i])) ## convert to probabilities
obs[i] ~ dbern(p[i]) ## Bernoulli response
}
}
write.model(culcmodel,"culcita.bug")
cModelMat <- model.matrix(~ttt,data=culcdat)
cJagsDat <- list(nblock=length(unique(culcdat$block)),
ncoef=ncol(cModelMat),
nobs=nrow(culcdat),
maxsdprior=5,
obs=culcdat$predation,
block=culcdat$block,
X=cModelMat)
cJagsInit <- with(cJagsDat,list(list(beta=rep(0,ncoef),
sd.b=1,u=rep(0,nblock))))
load.module("glm")
cjags <- jags(data=cJagsDat,
inits=cJagsInit,
n.chains=1,
model.file="culcita.bug",
parameters=c("beta","sd.b"))
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 790
##
## Initializing model
If time permits, check out the Contraception
data set from the mlmRev
package …