\documentclass{article} \usepackage[dvips]{graphicx,color} \usepackage{amsmath} \usepackage{epsfig} \usepackage{geometry} \geometry{letterpaper, left=1in, bottom=1in, right=1in, top=1in, foot=0.5in, head=0.5in} \usepackage{Sweave} \title{a spatial lme example} \author{Ottar Bj{\o}rnstad} \date{\today} \begin{document} \maketitle \section*{spatial dependence} The data contained in {\bf habsel.txt} represents the number of animals captured (\$captures) in 71 permanent traps located at co-ordinates \$x and \$y on a small Norwegian island. Also reported is various habitat variables in the viscinity of each trap. There is a bunch of things measured but we will focus on \$lichen (the cover of lichen), \$veg (the total vegetation cover, \$heather (the cover of heather), \$moss (the cover of moss) and \$stdp (a measure of structural complexity). <>= habsel=read.table("habsel.txt", header=TRUE) symbols(x=habsel$x, y=habsel$y, circles=habsel$captures, inches=.1) @ <<>>= fit1=lm(log(captures+1)~ lichen+veg+heather+moss+stdp, data=habsel) summary(fit1) @ So it looks like there is some abundance-environment association. HOwever, the issue with these types of observational data is that there may be strong spatial autocorrelation that may invalidate a simple regression of abundance against environment. We can do a quick look at this correlation in residuals from an 'independence' model. (the log-transformation is to make the capture variable a bit more Gaussian-looking): <>= require(ncf) cres=spline.correlog(x=habsel$x, y=habsel$y, z=resid(fit1), resamp=0) plot(cres) @ So there seems to be some spatial autocorrelation. To do the correct model taking into account the autocorrelation, we use corr-argument available in mle. First, however, we need to make sure mle understands that all the data belongs to a single dependence group. And then we fit the model assuming exponential spatial dependence: -- OTTAR TALK ABOUT PARAMETRIC CORRELATION FUNCTIONS HERE <<>>= require(nlme) habsel$grp=rep(1, dim(habsel)[1]) fit2=lme(log(captures+1)~ lichen+veg+heather+moss+stdp, data=habsel, random = ~1|grp, corr=corSpatial(form=~x+y, type="exponential", nugget=F), method="ML") summary(fit2) @ Excersize: interpret the results! What does 'range' mean? We can test for significant spatial correlation by comparing this model with the mle-model without spatial dependence: <<>>= fit3=lme(log(captures+1)~ lichen+veg+heather+moss+stdp, data=habsel, random = ~1|grp, method="ML") anova(fit2, fit3) @ Excersize: Fit a new model with 'Gaussian' (not exponential) spatial autorrelation. Which model better fits the data? \section*{Some other types of dependence} \subsection*{random blocks: maternal effects} The data contained in {\bf mouse.txt} has the following variables (this is a slightly expanded version of the data we looked at in the Bayes class). <<>>= mouse<-read.table("mouse.txt", sep="\t", header=T) names(mouse) @ data\$wtX is the weight of offspring at the age of X days; \$sex is 1 = male, 2 = female; \$code is the indiviudal tag, \$ind is the individual number of the 95 individuals; \$cage is the the identifier of the litter (as well as the identity of the mother); \$mweight is the weight of the mother; and \$lsize is the litter size. The biological questions are: (1) Do males and females differ with respect to their \textit{average} growth. (2) Is there a maternal effect? Do individuals from the same litter tend to be similar? (3) Do sex effect differ by litter. let's remove any individual with missing data: <<>>= mouse=na.omit(mouse) @ We might first take an 'empirical' tack on the question (this is not necessary for the lme modelling, but it is interesting). Lets fit a model for the weight on day 11, ignoring any interdependence between littermates. Then look whether there is any signature of dependence in the residuals. First fit the model: <<>>= fit=lm(wt11~as.factor(sex)+mweight+lsize, data=mouse) @ To look at autocorrelation recall the definition of correlation: $cor(x,y)=(x-\mu_x) (y-\mu_y)/\sigma_x \sigma_y$. The following bit of R-code will will calculate the autocorrelation-matrix among all residuals: <<>>= scres=(resid(fit)-mean(resid(fit)))/sd(resid(fit)) rcor=outer(scres,scres) @ And here is one way to flag all littermates (with a 1, and all non-littermates or 'self-comparison' with 0): <<>>= mat=outer(mouse$cage, mouse$cage, "-") mat[mat!=0]=-1 mat=mat+1 diag(mat)=0 @ With this it is easy to calculate the within-litter autocorrelation: <<>>= mean(rcor[mat==1]) @ There is clearly a substantial interdependence. Now lets model the interdependence specifically (and using weight at day 5 as a covariate. To do this we use the \textbf{lme}-function in the \textbf{nlme}-package: <<>>= library(nlme) fit=lme(wt11~as.factor(sex)+wt5+mweight+lsize, random=~1|cage, data=mouse) @ The above is the random-intercept model. \\ Excersize: use summary() to look at the fitted model. What is your interpretation? Advanced Q: What's up with the degrees-of-freedom for the fixed effects (hint: think of split-plot vs randomized block design)? Usually we may want to look at whether there is random variability in other parameters. For instance, we may be interestedin asking: Do mothers who have unusually large female offspring also tend to have unusually large offspring for the litter size? To do this (and make sure that R manages to converge on good values), we have to change some control-parameters in the model: <<>>= fit=lme(wt11~as.factor(sex)+wt5+lsize, random=~as.factor(sex)+lsize|cage, data=mouse, control=lmeControl(maxIter=500,msMaxIter=500, opt="optim")) @ Excersize: use summary() and random.effects() to answer: \textit{Do mothers who have unusually large female offspring also tend to have unusually large offspring for the litter size?} To examine the model fit, we can use: <<>>= plot(fit, wt11 ~ fitted(.) | cage, abline = c(0,1)) @ \subsection*{repeated measures} Of course, the full data-set is really a repeated measures data set (each individual was weighed on many occations). mle() can help us the full model if we want to (we first reshape the data): <<>>= mouse2=reshape(mouse, idvar="ind", varying=list(names(mouse)[7:18]), v.names="wt", direction="long", times=c(0,1,2,3,4,5,7,9,11,13,15, 18)) fit=lme(wt~as.factor(sex)+time+lsize, random=~time|cage/ind, data=mouse2, control=lmeControl(maxIter=500,msMaxIter=500, opt="optim")) @ Iterpret the result (NB! |cage/ind means that individual is nested within cage). Is most of the random variation in growth at the maternal level or the individual level? \end{document}