% Ben Bolker
% Tue Nov 6 11:04:55 2012
Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.
Packages/versions used:
## coefplot2 glmmADMB lme4 MCMCglmm nlme pbkrtest
## 0.1.3.2 0.7.2.12 0.999999-0 2.16 3.1-105 0.3-2
## R2jags
## 0.03-08
lme4
, available from R-forge, has additional capabilities, such as predict()
methods; at this writing (2012-11-06
) it is somewhat unstable for GLMM, but hopefully that will be fixed soon.coefplot2
should be available from R-forge via install.packages("coefplot2",repos="http://r-forge.r-project.org")
; if that fails because R-forge is having a bad day, try install.packages("coefplot2",repos="http://www.math.mcmaster.ca/bolker/R",type="source")
, after making sure that all the dependencies are installed (coda
, lme4
, reshape
).load("starling.RData")
Fit lme
model:
library(nlme)
lme2 <- lme(stmass~mnth*roostsitu,random=~1|subject,data=dataf)
Wald tests:
printCoefmat(summary(lme2)$tTable,digits=3,has.Pval=TRUE)
## Value Std.Error DF t-value p-value
## (Intercept) 83.60 1.33 36.00 62.85 < 2e-16 ***
## mnthJan 7.20 1.86 36.00 3.87 0.00045 ***
## roostsitunest-box -4.20 1.88 36.00 -2.23 0.03188 *
## roostsituinside -5.00 1.88 36.00 -2.66 0.01165 *
## roostsituother -8.20 1.88 36.00 -4.36 0.00010 ***
## mnthJan:roostsitunest-box 3.60 2.63 36.00 1.37 0.18024
## mnthJan:roostsituinside 2.40 2.63 36.00 0.91 0.36834
## mnthJan:roostsituother 1.60 2.63 36.00 0.61 0.54743
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(you can see this in context with summary(lme2)
; I used the more complicated command here to isolate just the coefficent matrix).
We conclude that the interactions are not doing much,
but there's definitely an effect of the roosts.
This agrees with the picture from coefplot2
:
library(coefplot2)
coefplot2(lme2)
However, we probably want to test the overall effect of the interactions, not the individual levels. Here are the type II (sequential) ANOVA results:
anova(lme2)
## numDF denDF F-value p-value
## (Intercept) 1 36 31144 <.0001
## mnth 1 36 95 <.0001
## roostsitu 3 36 11 <.0001
## mnth:roostsitu 3 36 1 0.5838
Because of the design of this particular study, the denominator degrees of freedom (denDF
column) is identical for all effects.
If we want to evaluate the marginal sums of squares, i.e. dropping one term at a time from the model, we usually want to change the model to use sum-to-zero contrasts:
lme2B <- update(lme2,
contrasts=list(mnth="contr.sum",roostsitu="contr.sum"))
The alternative approach is to use options(contrasts=c("contr.sum","contr.poly"))
, then refit the model, but I prefer to use the contrasts
argument because it is more explicit.
Type III (marginal) results:
anova(lme2B,type="marginal")
## numDF denDF F-value p-value
## (Intercept) 1 36 31144 <.0001
## mnth 1 36 95 <.0001
## roostsitu 3 36 11 <.0001
## mnth:roostsitu 3 36 1 0.5838
In this case the results are identical because the original design is balanced (hence, orthogonal). Not true if the data are (1) unbalanced (which is often true of ANOVA [categorical-predictor] designs, and almost always true of regression designs) or (2) GLMM or nonlinear.
The explicit model-comparison approach uses a likelihood ratio test rather than an \( F \) test (i.e., it does not correct for the fact that the denominator sums of squares is estimated with error). In this case it hardly matters.
lme2C <- update(lme2B,method="ML")
lme2D <- update(lme2C,. ~ . - mnth:roostsitu)
anova(lme2C,lme2D)
## Model df AIC BIC logLik Test L.Ratio p-value
## lme2C 1 10 468.4 492.3 -224.2
## lme2D 2 7 464.6 481.3 -225.3 1 vs 2 2.134 0.545
If we now want to use the model-comparison approach on the reduced (no-interaction) model to test the significance of roostsitu
, we can use update
to drop the roostsitu
effect, but we also have to make sure to update the contrasts
argument so that it only refers to predictors that remain in the reduced model (otherwise, we get an error).
lme2E <- update(lme2D,.~.-roostsitu,
contrasts=list(mnth="contr.sum"))
anova(lme2D,lme2E)
## Model df AIC BIC logLik Test L.Ratio p-value
## lme2D 1 7 464.6 481.3 -225.3
## lme2E 2 4 483.9 493.5 -238.0 1 vs 2 25.34 <.0001
If we want to test the random effect, we would in principle remove the random effect and test with anova
, but this is a bit problematic here: lme
can't fit a model without any random effects.
Let's try this with gls
:
gls1 <- gls(stmass~mnth*roostsitu,
data=dataf,method="ML")
(a1 <- anova(lme2C,gls1))
## Model df AIC BIC logLik Test L.Ratio p-value
## lme2C 1 10 468.4 492.3 -224.2
## gls1 2 9 466.5 487.9 -224.2 1 vs 2 0.01516 0.902
This seems reasonable: let's see if we can confirm this result with RLRsim
.
library(RLRsim)
exactRLRT(m=lme2B)
##
## simulated finite sample distribution of RLRT. (p-value based on
## 10000 simulated values)
##
## data:
## RLRT = 0, p-value = 1
(This reports a \( p \) value of exactly zero: this is “consistent with” a \( p \) value of 0.902, but worries me.)
It is often a good idea to detach("package:nlme")
before loading lme4
– the packages sometimes conflict – but in this case we won't, in part because coefplot2
depends on both.
library(lme4)
lmer1 <- lmer(stmass~mnth*roostsitu+(1|subject),data=dataf)
For lmer
we get the same anova table, but (1) without the Intercept
term included (2) with sum-of-squares and mean-squares columns included (3) without denominator df or \( p \)-values (4) with slightly different precision (1 more significant figure):
(a2 <- anova(lmer1))
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## mnth 1 1656 1656 95.46
## roostsitu 3 552 184 10.61
## mnth:roostsitu 3 34 11 0.66
We have to deduce the number of degrees of freedom from standard rules:
with(dataf,table(mnth,roostsitu))
## roostsitu
## mnth tree nest-box inside other
## Nov 10 10 10 10
## Jan 10 10 10 10
If we think about this in terms of the paired \( t \)-test, there are 40 comparisons and 4 parameters (one for each roost site)= 36 df.
If you wanted to compute the \( p \)-values by hand, you could:
fvals <- a2[,"F value"]
numdf <- a2[,"Df"]
dendf <- 36
pf(fvals,numdf,dendf,lower.tail=FALSE)
## [1] 1.152e-11 3.833e-05 5.838e-01
Alternatively, we can try a parametric bootstrap: the pbkrtest
package can do this, or we can just set it up by hand:
lmer3 <- lmer(stmass~mnth*roostsitu+(1|subject),data=dataf)
lmer4 <- lmer(stmass~mnth+roostsitu+(1|subject),data=dataf)
pboot <- function(m0,m1) {
s <- simulate(m0)
L1 <- logLik(refit(m1,s[[1]]))
L0 <- logLik(refit(m0,s[[1]]))
2*(L1-L0)
}
pboot(lmer4,lmer3)
## 'log Lik.' 20.22 (df=10)
boothist <- replicate(1000,pboot(lmer4,lmer3))
library(plyr)
boothist <- rlply(1000,pboot(lmer4,lmer3))
## can use .progress="text" to get a progress bar ...
boothist <- unlist(boothist)
hist(boothist,breaks=50,col="gray")
obsval <- 2*(logLik(lmer3)-logLik(lmer4))
abline(v=obsval,col=2,lwd=2)
mean(boothist>obsval)
## [1] 0.582
library(pbkrtest)
KRmodcomp(lmer3,lmer4)
## F-test with Kenward-Roger approximation; computing time: 0.50 sec.
## large : stmass ~ mnth * roostsitu + (1 | subject)
## small : stmass ~ mnth + roostsitu + (1 | subject)
## stat ddf F.scaling p.value
## Ftest 0.66 36.000 1 0.58
## FtestU 0.66 36.000 0.58
In this case, the Kenward-Roger correction appropriately does nothing different – we have a classical balanced design and no correction is actually necessary. But it does give a denominator df and a \( p \)-value for this lmer model, which is handy …
Computing predicted values and superimposing them on data plots.
In lme
, predict
has a level
argument that specifies which levels of the random effects should be included (0=none, population level; 1=prediction at the subject level; more complex models, might have additional nested levels). The stable version of lmer
doesn't have a predict
method: you can try the development version, or look at prediction code at the GLMM faq.
library(ggplot2)
library(grid)
theme_set(theme_bw())
## squash panels together (need grid package loaded)
zmargin <- theme(panel.margin=unit(0,"lines"))
dataf$pred <- predict(lme2,level=0) ## population level
dataf$pred1 <- predict(lme2,level=1) ## individual level
g0 <- ggplot(dataf,aes(mnth,stmass))+
geom_point()+
geom_line(aes(group=subject))+
facet_grid(.~roostsitu)+
zmargin
g0 + geom_line(colour="gray",aes(y=pred1,group=subject)) +
geom_line(colour="red",aes(y=pred,group=subject))
There is so much shrinkage (the among-individual variance is very small) that we can barely see the individual-level predictions (gray lines) behind the population-level predictions (red lines).
Unfortunately computing confidence intervals for the predictions is a little tricky: again, there is some code on the GLMM faq for this (also see below under MCMCglmm
).
ggplot(dataf,aes(mnth,pred1))+
geom_line(aes(group=subject,x=as.numeric(mnth)),colour="gray")+
facet_wrap(~roostsitu,scale="free_y",nrow=1)+
geom_line(aes(y=pred,x=as.numeric(mnth)),colour="red")
For most cases you will want to set up a new data frame to do prediction rather than just using the covariates from the original data (e.g. if the data are sampled irregularly, or sparsely), and use the newdata
argument of predict
. The expand.grid
function is handy in this context too.
MCMCglmm
We use pr=TRUE
to store the random effects (which are sometimes quite large, hence not stored by default); we use verbose=FALSE
to turn off the progress messages, which would be ugly in this document but are generally useful.
library(MCMCglmm)
mcmcglmm1 <- MCMCglmm(stmass~mnth*roostsitu,
random=~subject,data=dataf,
verbose=FALSE,pr=TRUE)
library(coda)
summary(mcmcglmm1)
##
## Iterations = 3001:12991
## Thinning interval = 10
## Sample size = 1000
##
## DIC: 466.8
##
## G-structure: ~subject
##
## post.mean l-95% CI u-95% CI eff.samp
## subject 2.37e-08 1.87e-16 5.88e-08 147
##
## R-structure: ~units
##
## post.mean l-95% CI u-95% CI eff.samp
## units 18.2 13.1 24.5 1000
##
## Location effects: stmass ~ mnth * roostsitu
##
## post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept) 83.572 81.130 86.557 1257 <0.001 ***
## mnthJan 7.229 3.584 10.943 1497 <0.001 ***
## roostsitunest-box -4.175 -7.749 -0.355 1201 0.034 *
## roostsituinside -4.981 -8.870 -1.340 1000 0.006 **
## roostsituother -8.125 -11.819 -4.518 1124 <0.001 ***
## mnthJan:roostsitunest-box 3.582 -1.755 8.797 1299 0.176
## mnthJan:roostsituinside 2.405 -2.867 7.741 1172 0.366
## mnthJan:roostsituother 1.574 -3.567 7.031 1195 0.528
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
MCMCglmm makes it a little bit easier to get confidence intervals on the predictions. The marginal
argument specifies which random effects we want the predictions to “marginalise” (i.e., average over). The default is to use all of the random effects in the original model (i.e. ~subject
in this case), i.e. to predict the average population-level outcome. The approach taken below to get the subject-level predictions (i.e. marginalise over nothing, since subject
is the only random effect) is a bit of a hack: this may be easier in future versions.
mpred0 <- predict(mcmcglmm1,interval="confidence")
colnames(mpred0) <- paste("mpred0",c("fit","lwr","upr"),sep=".")
mpred1 <- predict(mcmcglmm1,interval="confidence",
marginal=c("","")) ## HACK
colnames(mpred1) <- paste("mpred1",c("fit","lwr","upr"),sep=".")
dataf <- data.frame(dataf,mpred0,mpred1)
Testing that we get the same results for the level-1 predictions:
ggplot(dataf,aes(x=pred1,y=mpred1.fit))+geom_point()+
geom_abline(slope=1,intercept=0)+
labs(x="lme prediction",y="MCMCglmm prediction")
Now we can plot confidence intervals
g0 <- ggplot(dataf,aes(x=mnth,y=stmass))+
stat_sum(aes(size=factor(..n..)))+
facet_grid(~roostsitu)+
scale_size_discrete(name="n",range=c(2,5))+
zmargin
g0 + geom_line(aes(x=as.numeric(mnth),y=mpred0.fit),colour="red")+
geom_ribbon(aes(x=as.numeric(mnth),y=mpred0.fit,
ymin=mpred0.lwr,ymax=mpred0.upr),fill="red",
alpha=0.3)
We can plot individual predictions and confidence intervals as well, although in this case the plot is almost identical:
g0 + geom_line(aes(x=as.numeric(mnth),y=mpred1.fit,group=subject),
colour="blue")+
geom_ribbon(aes(x=as.numeric(mnth),y=mpred1.fit,group=subject,
ymin=mpred1.lwr,ymax=mpred1.upr),fill="blue",
alpha=0.05)
JAGS
/R2jags
jagsmodel <- function() {
for (i in 1:ntot) {
eta[i] <- inprod(X[i,],beta) ## fixed effects
eta2[i] <- eta[i] + u1[subject[i]] ## add random effect
stmass[i] ~ dnorm(eta2[i],tau.res)
}
for (i in 1:nindiv) {
u1[i] ~ dnorm(0,tau.indiv)
}
## priors
for (i in 1:ncoef) {
beta[i] ~ dnorm(0,0.001)
}
## traditional but sometimes dangerous -- discuss in class
tau.indiv ~ dgamma(0.01,0.01)
tau.res ~ dgamma(0.01,0.01)
## for convenience, translate precisions to std devs
sd.indiv <- pow(tau.indiv,-0.5)
sd.res <- pow(tau.res,-0.5)
}
source("writeModel.R")
write.model(jagsmodel,"starling.bug")
JAGS machinery: specify data list, starting values, run model:
library(R2jags)
## Warning: the specification for S3 class "bugs" in package 'R2jags' seems
## equivalent to one from package 'coefplot2' and is not turning on duplicate
## class definitions for this class
modelMat <- model.matrix(~mnth * roostsitu,data=dataf)
jagsData <- list(X=modelMat,
stmass=dataf$stmass,
subject=as.numeric(dataf$subject),
ntot=nrow(dataf),ncoef=ncol(modelMat),
nindiv=length(unique(dataf$subject)))
jagsInits <- list(list(beta=rep(0,8),tau.indiv=0.1,tau.res=0.1))
jags1 <- jags(data=jagsData,
inits=jagsInits,
model="starling.bug",
parameters=c("beta","sd.indiv","sd.res"),n.chains=1)
jagsmm1 <- as.mcmc(jags1)
betacols <- grep("^beta",colnames(jagsmm1)) ## find beta columns
colnames(jagsmm1)[betacols] <- colnames(modelMat)
Look at the summary:
summary(jagsmm1)
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## (Intercept) 83.40 1.629 0.0515 0.0507
## mnthJan 7.41 1.881 0.0595 0.0500
## roostsitunest-box -3.89 2.179 0.0689 0.0661
## roostsituinside -4.79 2.364 0.0747 0.0837
## roostsituother -7.94 2.337 0.0739 0.0738
## mnthJan:roostsitunest-box 3.23 2.909 0.0920 0.0835
## mnthJan:roostsituinside 2.17 2.856 0.0903 0.0872
## mnthJan:roostsituother 1.32 2.809 0.0888 0.0758
## deviance 457.14 16.948 0.5359 0.5291
## sd.indiv 0.47 0.462 0.0146 0.0143
## sd.res 4.33 2.692 0.0851 0.0828
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## (Intercept) 80.7992 82.564 83.498 84.300 85.934
## mnthJan 4.0037 6.205 7.390 8.617 10.989
## roostsitunest-box -7.7195 -5.160 -3.918 -2.725 -0.173
## roostsituinside -8.3498 -6.045 -4.863 -3.459 -1.019
## roostsituother -11.8071 -9.349 -7.996 -6.671 -4.109
## mnthJan:roostsitunest-box -2.0426 1.453 3.250 5.158 8.441
## mnthJan:roostsituinside -3.2824 0.488 2.257 3.950 7.095
## mnthJan:roostsituother -4.2925 -0.490 1.306 3.133 6.738
## deviance 441.8382 453.260 456.152 459.959 468.565
## sd.indiv 0.0648 0.137 0.273 0.632 1.650
## sd.res 3.5201 3.956 4.181 4.451 5.080
culcdat <- read.csv("culcitalogreg.csv",
colClasses=c(rep("factor",2),
"numeric",
rep("factor",6)))
## abbreviate slightly
levels(culcdat$ttt.1) <- c("none","crabs","shrimp","both")
library(lme4)
Adjust contrasts for the treatment, to compare (1) no-symbiont vs symbiont cases, (2) crabs vs shrimp, (3) effects of a single pair/type of symbionts vs effects of having both:
contrasts(culcdat$ttt) <-
matrix(c(3,-1,-1,-1,
0,1,-1,0,
0,1,1,-2),
nrow=4,
dimnames=list(c("none","C","S","CS"),
c("symb","C.vs.S","twosymb")))
library(lme4)
library(MASS)
culcmod0 <- glmmPQL(predation~ttt,random=~1|block,family=binomial,data=culcdat,
verbose=FALSE)
culcmod1 <- glmer(predation~ttt+(1|block),family=binomial,data=culcdat)
culcmod2 <- glmer(predation~ttt+(1|block),family=binomial,data=culcdat,nAGQ=8)
coefplot2(list(glmmPQL=culcmod0,Laplace=culcmod1,
GHQ8=culcmod2),col=c(1,2,4),legend.x="right")
Try it with glmmADMB
and MCMCglmm
:
library(MCMCglmm)
library(glmmADMB)
## Warning: the specification for S3 class "glmmadmb" in package 'glmmADMB'
## seems equivalent to one from package 'coefplot2' and is not turning on
## duplicate class definitions for this class
culcmod3 <- glmmadmb(predation~ttt,random=~1|block,family="binomial",
data=culcdat)
culcdat$nopred <- 1-culcdat$predation
culcmod4 <- MCMCglmm(cbind(predation,nopred)~ttt,random=~block,family="multinomial2",
data=culcdat,verbose=FALSE)
Check out the results. MCMCglmm
doesn't seem to be doing well: need stronger priors?
culcmodel <- function() {
for (i in 1:ncoef) {
beta[i] ~ dnorm(0,0.001) ## fixed-effect parameters: priors
}
sd.b ~ dunif(0,maxsdprior) ## prior for block variance
tau.b <- 1/sd.b^2
for (i in 1:nblock) {
u[i] ~ dnorm(0,tau.b) ## priors for block random effects
}
for (i in 1:nobs) {
## linear predictor: design matrix*coeffs + random effects
eta[i] <- inprod(X[i,],beta)+u[block[i]]
p[i] <- 1/(1+exp(-eta[i])) ## convert to probabilities
obs[i] ~ dbern(p[i]) ## Bernoulli response
}
}
write.model(culcmodel,"culcita.bug")
cModelMat <- model.matrix(~ttt,data=culcdat)
cJagsDat <- list(nblock=length(unique(culcdat$block)),
ncoef=ncol(cModelMat),
nobs=nrow(culcdat),
maxsdprior=5,
obs=culcdat$predation,
block=culcdat$block,
X=cModelMat)
cJagsInit <- with(cJagsDat,list(list(beta=rep(0,ncoef),
sd.b=1,u=rep(0,nblock))))
load.module("glm")
cjags <- jags(data=cJagsDat,
inits=cJagsInit,
n.chains=1,
model.file="culcita.bug",
parameters=c("beta","sd.b"))
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 790
##
## Initializing model
If time permits, check out the Contraception
data set from the mlmRev
package …