% version of chap8 that uses formula interface to trim ... <>= library(bbmle) library(emdbook) library(plotrix) library(lattice) library(Hmisc) library(MASS) source("chapskel.R") @ \section*{Summary} This chapter combines all the methods we've considered so far to carry out some more complete analyses of the example data sets, specifically the data of \cite{VoneshBolker2005} on tadpole predation, \cite{Wilson2004} on goby survival, and \cite{DuncanDuncan2000} on seed predation. \section{Tadpole predation experiments} \subsection{Introduction} The goal of Vonesh and Bolker's (2005) tadpole predation study was to quantify the effects of prey size and density on predation rate, and to use the results along with data on growth rates to understand the tradeoffs between growth and survival. The response variable in all of the data we will consider here is the number of tadpoles killed by a given number or density of predators in a specified amount of time; the covariates are changing (initial) number of tadpoles (which gives rise to a \emph{functional response} curve) and the size of tadpoles (estimating the presence of a ``size refuge''). The binomial distribution is an obvious choice as a stochastic model for predation data, because the data are a discrete sample with a fixed upper limit. The challenge for the frog predation data is to decide on deterministic models that adequately describe the changes in predation probability with tadpole size and density. \subsection{Fitting the size-predation curve} \cite{VoneshBolker2005} used the function \begin{equation} \gamma(S) = \frac{e^{\epsilon (\phi-S)}}{1+ e^{\beta \epsilon (\phi-S)}} \label{eq:VB1} \end{equation} to represent the dependence of predation probability, $\gamma(S)$, on prey size $S$. The location parameter $\phi$ represents a baseline prey size at which 50\% of tadpoles are eaten; $\epsilon$ is the rate of change of mortality with size, controlling the steepness of the curve; and $\beta$ determines the asymmetry of the curve --- the extent to which prey escape predation at both small and large sizes. If $\beta=1$ then (\ref{eq:VB1}) describes a logistic predation function that decreases (if $\beta>0$) or increases (if $\beta<0$) with size. Some slightly tedious calculus establishes that the most vulnerable size is $\hat S = \phi+\log(\beta-1)/(\epsilon \beta)$, which gives a predation probability $$ (\beta-1)^{(-1/\beta)}/(1+1/\beta-1). $$ The peak predation probability depends only on $\beta$. If $\beta<1$, then the function is monotonically decreasing, with no peak. (To find $\hat S$, solve $d\gamma/dS=0$ for $S$, using the quotient and chain rules to calculate the derivative, and remembering that you only need to worry about setting the numerator to zero. Then plug $\hat S$ back into $\gamma(S)$ to find the predation probability.) %Setting up functions to draw the curve of %predation vs. size and calculate the %location and height of the peak: <>= sizefun = function(S,phi=20,eps=1,beta=1) { exp(eps*(phi-S))/(1+exp(beta*eps*(phi-S))) } maxxval = function(phi=20,eps=1,beta=1) { phi+log(beta-1)/(beta*eps) } maxyval = function(beta) { (beta-1)^(-1/beta)/(1+1/(beta-1)) } @ \begin{figure} <>= op = par(cex=1.2,mar=c(4,4,0,1)+0.1, mgp=c(2.5,0.75,0),las=1,bty="l") curve(sizefun(x),from=5,to=40,xlab="Prey size",ylab="Predation risk") curve(sizefun(x,beta=1.1),add=TRUE,lty=2) curve(sizefun(x,beta=3),add=TRUE,lty=3) curve(sizefun(x,beta=1.1,eps=5),add=TRUE,lty=4) text(x=c(19.5,5,6.5,28),y=c(0.9,0.5,0.08,0.5), c(expression(list(beta==1,epsilon==1)),expression(list(beta==1.1,epsilon==1)), expression(list(beta==1.1,epsilon==5)), expression(list(beta==3,epsilon==1))), adj=0) arrows(27.5,0.49,21.5,0.4) par(op) @ \caption{Modified logistic function from \cite{VoneshBolker2005} (eq. \ref{eq:VB1}). Location parameter $\phi=20$ for all curves.} \label{fig:modlogist} \end{figure} A more traditional function to describe a humped dependence of predation on size is the \emph{generalized Ricker function} \citep{Persson+1998}, \begin{equation} y= b \left(\frac{S}{a} \exp\left(1-\frac{S}{a}\right)\right)^\alpha. \end{equation} This function is basically a reparameterization of the Ricker function ($y=ax e^{-bx}$) with an added power parameter $\alpha$ that can broaden or narrow the peak; if $\alpha=1$, the generalized Ricker reduces to the standard Ricker function. A third possibility is another modification of the Ricker, which I will call the truncated Ricker: this function shifts the Ricker's origin away from zero by a distance $t$, and sets the function to zero below $t$ (so that it doesn't become negative): \begin{equation} y = \begin{cases} 0 & \mbox{if } S>= data(ReedfrogSizepred) attach(ReedfrogSizepred) @ Define the functions (\code{modlogist} for the modified logistic, \code{powricker} and \code{tricker} for the generalized (power) and truncated Ricker): <<>>= modlogist = function(x,eps,beta,phi) { exp(eps*(phi-x))/(1+exp(beta*eps*(phi-x))) } ##ricker = function(x,a,b) { ## a*x*exp(-b*x) ##} powricker = function(x,a,b,alpha) { b*(x/a*exp(1-x/a))^alpha } tricker = function(x,a,b,t,min=0.0001) { ifelse(x>= NLL.modlogist = function(eps,beta,phi) { p.pred = modlogist(TBL,eps,beta,phi) -sum(dbinom(Kill,size=10,prob=p.pred,log=TRUE)) } NLL.modlogist.bb = function(eps,beta,phi,theta) { p.pred = modlogist(TBL,eps,beta,phi) -sum(dbetabinom(Kill,size=10,prob=p.pred,theta=theta,log=TRUE)) } ##NLL.ricker = function(a,b) { ## p.pred = ricker(TBL,a,b) ## -sum(dbinom(Kill,size=10,prob=p.pred,log=TRUE)) ##} NLL.powricker = function(a,b,alpha) { p.pred = powricker(TBL,a,b,alpha) -sum(dbinom(Kill,size=10,prob=p.pred,log=TRUE)) } NLL.tricker = function(a,b,t) { p.pred = tricker(TBL,a,b,t) -sum(dbinom(Kill,size=10,prob=p.pred,log=TRUE)) } @ Eyeballing the data (Figure~\ref{fig:frogsizepred}) gives approximate starting parameters for the modified logistic of $\{\phi=15,\beta=1.1,\epsilon=5\}$ (compare Figure~\ref{fig:modlogist}, and use $\phi$ to shift the peak to approximately $S=15$). I'll start the beta-binomial version at the best-fit parameters for the binomial model and add $\theta=1000$ (representing very little overdispersion --- the beta-binomial becomes binomial as $\theta \to \infty$), setting the \code{parscale} control option to let \R\ know that we expect this parameter to be larger than the others. (In an initial exploration with worse starting parameter guesses, I also played around with options like \code{method="Nelder-Mead"} and setting the \code{maxit} control parameter larger in order to get the optimization to work.) <<>>= FSP.modlogist = mle2(NLL.modlogist, start=list(eps=5,beta=1.1,phi=15)) FSP.modlogist.bb = mle2(NLL.modlogist.bb, start=as.list(c(coef(FSP.modlogist),list(theta=1000))), control=list(parscale=c(1,1,1,1000))) @ The beta-binomial fit estimates $\theta=\Sexpr{round(coef(FSP.modlogist.bb)["theta"])}$, evidence that the beta-binomial model is not really necessary; the decrease in negative log-likelihood is only \Sexpr{-round(c(logLik(FSP.modlogist.bb)-logLik(FSP.modlogist)),3)}. We hardly need to run the likelihood ratio test (\code{anova(FSP.modlogist,FSP.modlogist.bb)}) or the AIC calculation (\code{AICtab(FSP.modlogist,FSP.modlogist.bb)}). Even dividing the $p$ value for the Likelihood Ratio Test by 2 to account for the fact that the null hypothesis is on the boundary (i.e., the beta-binomial model reduces to the binomial model when $\theta \to \infty$) makes no difference to the conclusions. If we try to get confidence limits on $\theta$, however, we run into trouble: <>= confint(FSP.modlogist.bb,which="theta") @ \begin{verbatim} Profiling has found a better solution, so original fit had not converged: New minimum= 12.13806 Parameter values: eps beta.beta phi theta 0.3577257 8.9873023 9.7457033 3405.1429647 Error in onestep(step) : try restarting fit from values above \end{verbatim} Refitting the parameters from this new starting point (using \code{modlogist} instead of \code{modlogist.bb}, and extending the maximum number of iterations): <<>>= FSP.modlogist2 = mle2(NLL.modlogist, start=list(eps=0.357,beta=8.99,phi=9.75), control=list(maxit=1000)) @ The parameters of this fit are quite different <<>>= rbind(coef(FSP.modlogist),coef(FSP.modlogist2)) @ and the negative log-likelihood is slightly lower (\Sexpr{round(-logLik(FSP.modlogist2),2)} vs. \Sexpr{round(-logLik(FSP.modlogist),2)}). You can use \code{plot(calcslice(FSP.modlogist,FSP.modlogist2))} to calculate and plot the negative log-likelihoods along a ``slice'' through parameter space, showing that the two different fits probably do represent distinct local minima (Figure~\ref{fig:voneshmult}). However, despite fitting the data a little better the fit seems unrealistic, spiking up abruptly to a high predation rate and then dropping exponentially (Figure~\ref{fig:frogsizepred}). Fitting the generalized and truncated Ricker models: <<>>= ##FSP.ricker = mle2(NLL.ricker,start=list(a=0.4,b=0.3)) FSP.powricker = mle2(NLL.powricker,start=list(a=0.4,b=0.3,alpha=1)) FSP.tricker = mle2(NLL.tricker,start=list(a=0.4,b=0.3,t=8)) ## @ <>= alpha.ci = confint(FSP.powricker,parm="alpha",quietly=TRUE) @ The confidence limits on $\alpha$ for the generalized Ricker (\code{confint(FSP.powricker,parm="alpha")}) are $\{\Sexpr{round(alpha.ci[1],2)},\Sexpr{round(alpha.ci[2],2)}\}$ --- % the standard Ricker ($\alpha=1$) is clearly not competitive. Calculating AIC values with \code{AICtab}: <>= a1 = AICtab(FSP.powricker,FSP.modlogist,FSP.tricker,FSP.modlogist2, weights=TRUE,sort=TRUE) a1$AIC = round(a1$AIC,1) a1$weight = round(a1$weight,3) attr(a1,"row.names") = c("modified logistic (fit 2)", "truncated Ricker", "modified logistic (fit 1)", "generalized Ricker") latex(a1,file="",title="",table.env=FALSE) @ None of the models is nested (all have the same number of parameters), and all the fits are (almost) within 2 log-likelihood units of each other. Burnham and Anderson would recommend using the weighted predictions of all the models in subsequent analyses, but in this case (where we are just trying to gain qualitative insights into life-history tradeoffs) this extra complication feels unnecessary. In this case, I would be willing to override the narrow definition of ``best fit'' and discard the first two models because I don't really believe that predation risk is going to increase sharply as tadpoles grow bigger than 9~mm, as suggested by the truncated Ricker or by the second fit to the modified logistic. I might even choose the generalized Ricker, the worst-fitting model, over the first fit of the modified logistic, because the generalized Ricker is better established in the literature. The lesson here is that the sparser the data, the more you have to use your judgment in selecting a model --- whether or not you are explicitly Bayesian. \begin{figure} <>= op = par(lwd=2,bty="l",las=1,mgp=c(3,1,0),cex=1.5) sizeplot(TBL,Kill,xlab="Tadpole size\n(total body length in mm)", ylab="", xlim=c(0,40),scale=1,ylim=c(0,9)) mtext(side=2,"Number killed",line=2,cex=1.5,las=0) with(as.list(coef(FSP.modlogist)),curve(modlogist(x,eps,beta,phi)*10, add=TRUE)) ## with(as.list(coef(FSP.ricker)),curve(ricker(x,a,b)*10,add=TRUE,lty=2)) with(as.list(coef(FSP.powricker)),curve(powricker(x,a,b,alpha)*10,add=TRUE,lty=2)) with(as.list(coef(FSP.tricker)),curve(tricker(x,a,b,t)*10,add=TRUE,lty=3)) with(as.list(coef(FSP.modlogist2)),curve(modlogist(x,eps,beta,phi)*10, add=TRUE,lty=4)) legend("topright",c("modified logistic", "generalized Ricker", "truncated Ricker", "modified logistic #2"),cex=0.7, lty=1:4,bty="n") @ \caption{Size-predation relationship for \emph{Hyperolius spinigularis} tadpoles: modified logistic, generalized and truncated Ricker fits} \label{fig:frogsizepred} \end{figure} \subsection{Fitting the functional response curve} The other data set we will examine from \cite{VoneshBolker2005} is the functional response experiment, which varied the density of tadpoles (with total body length $\approx 12.8$~mm). As many of 67\% (10/15) of the tadpoles in an experiment were eaten, suggesting that we should allow for the effect of depletion over the course of the experiment. The standard model for saturating functional responses is the Holling type~II response, $N=aPT N_0/(1+ahN_0)$, where $N$ is the number eaten, $N_0$ is the starting number/density, $a$ and $h$ are baseline attack rate and handling time, $P$ is the predator number or density and $T$ is the total exposure time\footnote{$P$ and $T$ are usually ignored in the Holling equation, giving the function units of ``number eaten per predator per unit time'', but we include them here for consistency with the Rogers equation.}. The Rogers random-predator equation, which allows for depletion, is \begin{equation} N = N_0 \left( 1- e^{a (N h - P T)} \right) \label{eq:rogers} \end{equation} where $P$ is the number of predators, and $T$ is the total time of exposure. (The predator-exposure factor $PT$ would just be multiplied by the Holling equation.) The Rogers random-predator equation (\ref{eq:rogers}) contains $N$ on both the left- and right-hand sides of the equation; traditionally, one has had to use iterative numerical methods to compute the function \citep{VoneshBolker2005}. However, the \emph{Lambert W} function \citep{Corless+1996}, which gives the solution to the equation $W(x)e^{W(x)}=x$, can be used to compute the Rogers equation efficiently: in terms of the Lambert $W$ the Rogers equation is \begin{equation} N = N_0 - \frac{W\left(a h N_0 e^{-a (P T- h N_0)}\right)}{a h}. \end{equation} Implement this equation (using the \code{lambertW} function in the \code{emdbook} package) in \R, as well as the Holling type~II function for comparison: <<>>= rogers.pred = function(N0,a,h,P,T) { N0 - lambertW(a*h*N0*exp(-a*(P*T-h*N0)))/(a*h) } holling2.pred = function(N0,a,h,P,T) { a*N0*P*T/(1+a*h*N0) } @ <>= curve(rogers.pred(x,a=1,h=0.2,P=1,T=1),from=0,to=60, ylab="Number eaten/unit time",xlab="Initial number",ylim=c(0,5), main="Predation: a=1, h=0.2") curve(rogers.pred(x,a=1,h=0.2,P=1,T=5)/5,add=TRUE,lty=2,from=0) curve(rogers.pred(x,a=1,h=0.2,P=1,T=0.2)*5,add=TRUE,lty=3,from=0) curve(rogers.pred(x,a=1,h=0.2,P=1,T=10)/10,add=TRUE,lty=4,from=0) curve(holling2.pred(x,a=1,h=0.2),add=TRUE,lty=1,lwd=2,from=0) abline(h=5) legend(30,2, c(paste("Rogers, T=",c(0.2,1,5,10),sep=""), "Holling type II"),lwd=c(rep(1,4),2),lty=c(3,1,2,4,1)) @ Attach the data: <<>>= data(ReedfrogFuncresp) attach(ReedfrogFuncresp) @ Write the likelihood functions: <<>>= NLL.rogers = function(a,h,T,P) { if (a<0 || h<0) return(NA) prop.exp = rogers.pred(Initial,a,h,P,T)/Initial -sum(dbinom(Killed,prob=prop.exp,size=Initial,log=TRUE)) } NLL.holling2 = function(a,h,P=1,T=1) { -sum(dbinom(Killed,prob=a*T*P/(1+a*h*Initial),size=Initial,log=TRUE)) } @ In the Rogers likelihood function I constrained the range of the function by simply returning \code{NA} if $a<0$ or $h<0$, rather than using constrained optimization; if you are not using \code{L-BFGS-B}, this shortcut sometimes works. What about initial values? Eyeballing the data (Figure~\ref{fig:frogfuncresp}), the initial slope of the functional response curve is about 0.5 (50\% of tadpoles are killed at low densities) and the asymptote looks like it might be at around 50. These values correspond to $aPT=0.5$ or $a=0.5/(PT) \approx 0.012$ and $PT/h=50$ or $h \approx 0.84$. These values will be overestimates, but still usable, as starting points for the Rogers estimation as well: <<>>= FFR.rogers = mle2(NLL.rogers,start=list(a=0.012,h=0.84), data=list(T=14,P=3)) FFR.holling2 = mle2(NLL.holling2,start=list(a=0.012,h=0.84), data=list(T=14,P=3)) @ Running \code{AICtab(FFR.rogers,FFR.holling2,weights=TRUE)} shows that the Holling type~II is a marginally better fit (0.3 log-likelihood unit difference): <>= a2 = AICtab(FFR.rogers,FFR.holling2,sort=TRUE,weights=TRUE) a2$AIC = round(a2$AIC,1) a2$weight = round(a2$weight,3) attr(a2,"row.names") = c("Holling type II", "Rogers") latex(a2,file="",title="",table.env=FALSE) @ The best-fit Holling and Rogers curves are practically indistinguishable in the plot (Figure~\ref{fig:frogfuncresp}) as well; however, we strongly prefer the Rogers curve on biological grounds, because we know that predators are depleting tadpole prey significantly over the course of the experiment. The ``Rogers (no depletion)'' curve in Figure~\ref{fig:frogfuncresp} shows that depletion decreases the effect of predation by about two tadpoles across the board --- as much as a 40\% effect at low numbers. It will be important to take depletion into account when we compare experiments with different exposure times and predator densities below. <>= esttab = signif(rbind(coef(FFR.rogers),coef(FFR.holling2)),3) dimnames(esttab) = list(c("Rogers","Holling type II"), c("$a$","$h$")) latex(esttab,file="",title="",table.env=FALSE) @ Taking depletion into account leads to a \Sexpr{round(100*(coef(FFR.rogers)["a"]/coef(FFR.holling2)["a"]-1))}\% increase in the estimated attack rate and a \Sexpr{round(100*(coef(FFR.rogers)["h"]/coef(FFR.holling2)["h"]-1))}\% increase in the estimated handling time. \begin{figure} <>= op = par(lwd=2,bty="l",las=1,mgp=c(3,1,0),cex=1.5) plot(Initial,Killed,xlim=c(0,100)) with(as.list(coef(FFR.rogers)), curve(rogers.pred(x,a,h,T=14,P=3),add=TRUE)) with(as.list(coef(FFR.rogers)), curve(a*14*3*x/(1+a*h*x),add=TRUE,lty=3)) with(as.list(coef(FFR.holling2)), curve(a*14*3*x/(1+a*h*x),add=TRUE,lty=2)) legend("topleft", c("Rogers","Rogers (no depletion)", "Holling"), lty=c(1,3,2),bty="n") par(op) @ \caption{Functional response fit to frog predation data. Both Holling type~II and Rogers random-predator fits are shown, but are barely distinguishable. ``Rogers (no depletion)'' curve plots the expected functional response from the estimated Rogers parameters in the absence of depletion.} \label{fig:frogfuncresp} \end{figure} \subsection{Combined effects of size and density} \cite{VoneshBolker2005} combined the effects of size and density by algebraically combining the parameters of the separate size and density fits. Here, we will instead combine all the data in a single likelihood function, estimating the functional response parameter ($h$) and the size-dependent attack rate parameters ($\alpha$, $\beta$, and $\epsilon$) at the same time\footnote{It would be realistic to make the handling time vary as a function of size as well \citep{Persson+1998}, but unfortunately we don't have enough data.}. The only thing we need to sort out is that the experiments were run in different volumes, as well as with different numbers of predators and for different lengths of time. The functional response experiments were run in 300~L tanks (1.2 $\times$ 0.8 $\times$ 0.4~m high) filled to 220~L; the size experiments were run in 35~L plastic tubs (0.32~m in diameter) filled to 25~L. Based on the way that predators foraged, \cite{VoneshBolker2005} assumed that predation success depended on the area of the foraging arena ($1.2 \cdot 0.8 = 0.96 \mbox{ m}^2$ vs $\pi ((0.32)/2)^2 = 0.080 \mbox{ m}^2$) rather than its volume. To make the predation probabilities match, we have to divide the predator numbers by area\footnote{But \emph{not} the prey numbers --- figuring this out reminded me of the old riddle ``if a hen and a half lays an egg and a half in a day a half, how many eggs can one hen lay in a day?''} It is convenient to collect the auxiliary parameters for each experiment (number of predators, area, exposure time, etc.) in a couple of lists: <<>>= xpars.Funcresp = list(T=14,P=3,vol=220,area=1.2*0.8,size=12.8) xpars.Sizepred = list(T=3,P=2,vol=25,area=pi*0.16^2,initprey=10) @ Put together a combined data set representing the initial numbers, size, number killed, predator density, and exposure time for both experiments, using \code{rep} to repeat values where necessary: <<>>= n.Funcresp = nrow(ReedfrogFuncresp) n.Sizepred = nrow(ReedfrogSizepred) combInit = c(ReedfrogFuncresp$Initial,rep(xpars.Sizepred$initprey,n.Sizepred)) combSize = c(rep(xpars.Funcresp$size,n.Funcresp),ReedfrogSizepred$TBL) combKilled = c(ReedfrogFuncresp$Killed,ReedfrogSizepred$Kill) combP = rep(c(xpars.Funcresp$P/xpars.Funcresp$area, xpars.Sizepred$P/xpars.Sizepred$area), c(n.Funcresp,n.Sizepred)) combT = rep(c(xpars.Funcresp$T,xpars.Sizepred$T), c(n.Funcresp,n.Sizepred)) @ Write a combined function for the expected proportion eaten, computing the attack rate $a$ from the parameters $\epsilon$, $\beta$, and $\phi$ and combining it with the handling time $h$: <<>>= prop.eaten = function(N0,S,h,P,T,eps,beta,phi,minprop=.Machine$double.eps) { a = modlogist(S,eps=eps,beta=beta,phi=phi) N.eaten = rogers.pred(N0,a=a,h=h,P=P,T=T) prop = N.eaten/N0 prop[prop<=0] = minprop prop[prop>=1] = 1-minprop prop } @ The value \verb!.Machine$double.eps! is a built-in constant corresponding to the smallest difference between numeric values your computer can keep track of without rounding (it is \Sexpr{scinot(.Machine$double.eps,digits=2)} on the machine I am using). Using \code{minprop} to adjust values that are $\le 0$ or $\ge 1$ takes care of the cases where the \code{rogers.pred} function returns an expected proportion eaten slightly less than zero, or exactly equal to 1 (which causes an infinite negative log-likelihood if no tadpoles are eaten); these minor errors happen because of round-off error. A negative log-likelihood function incorporating the proportion eaten: <<>>= NLL.rogerscomb = function(a,h,eps,beta,phi,T=combT,P=combP) { if (h<0) return(NA) prob = prop.eaten(combInit,combSize,h,P,T,eps,beta,phi) dprob = dbinom(combKilled,prob=prob,size=combInit,log=TRUE) -sum(dprob) } @ Set the starting values by combining $h$ from the Rogers fit (which has to be put inside its own \code{list}) with the attack rates from the size-dependence fit (which will be a slight underestimate since they don't incorporate the effects of handling time): <<>>= startvals = c(list(h=coef(FFR.rogers)["h"]),as.list(coef(FSP.modlogist))) @ Finding the optimum, avoiding the alternate fit (fit \#2 above) when profiling, and avoiding overflow errors is quite finicky in this case. The easiest way to avoid the alternate fit is to restrict $\beta$, but using the \code{L-BFGS-B} optimizer leads to lots of headaches with \code{NA}s being produced in the Lambert $W$ function. I used a two-stage method --- first, optimizing with \code{method="Nelder-Mead"} and using \code{confint(FPcomb,method="quad")} to get approximate confidence limits: <>= FPcomb = mle2(NLL.rogerscomb,start=startvals, method="Nelder-Mead") confint(FPcomb,method="quad") @ <>= FPci2 = try(confint(FPcomb),silent=TRUE) startvals2 = list(h=0.88254539,eps=-0.01068388, beta=0.93631205,phi=383.67804375) FPcomb2 = mle2(NLL.rogerscomb,start=startvals2, method="Nelder-Mead",control=list(parscale=c(1,1,1,500))) @ Then using slightly larger values for the upper and lower bounds to refit the model and get more precise confidence limits (\code{confint} must use the same optimization rules that were used in the original fit). Getting this to work took some frustrating trial and error, including incorporating debugging statements like <>= cat(h,eps,beta,phi,"\n") @ or <>= if (any(!is.finite(prob))) cat("NAs:",h,eps,beta,phi,"\n") @ or <>= if (any(!is.finite(dprob))) { browser() } @ into the \code{NLL.rogerscomb} function to track down where the problems were occurring in order to set bounds that would prevent \code{NA}s. \code{cat} prints a list of variables to the screen in the middle of a function evaluation (\code{"\textbackslash n}") specifies a new line, while \code{browser} stops the function and lets you examine the values of different variables. In the course of this exploration I also went back and incorporated the minimum and maximum bounds in \code{prop.eaten}, which I had initially left out. <>= FPcomb = mle2(NLL.rogerscomb,start=startvals, method="L-BFGS-B", lower=c(0.7,0.5,1,14), upper=c(1.8,2.25,2,20), control=list(parscale=c(1,1,1,10))) FPcomb.ci = confint(FPcomb) @ <>= NLL.rogerscomb.bb = function(a,h,eps,beta,phi, theta,T=combT,P=combP) { if (h<0) return(NA) prob = prop.eaten(combInit,combSize,h,P,T,eps,beta,phi) dprob = dbetabinom(combKilled,prob=prob,size=combInit,theta=theta,log=TRUE) -sum(dprob) } FPcomb.bb = mle2(NLL.rogerscomb.bb,start=c(startvals,list(theta=100)), method="L-BFGS-B", lower=c(0.7,0.5,1,14,0.1), upper=c(1.8,2.25,2,20,Inf), control=list(parscale=c(1,1,1,10,100))) theta.ci = confint(FPcomb.bb,parm="theta") @ What is the combined estimate of the proportion eaten under the conditions of the size-predation experiment (12.8~mm body length, 2 predators in an area of \Sexpr{round(xpars.Sizepred$area,2)}~m$^2$ for 3 days)? How well does it match the estimate based only on the size-predation experiment? (That is, does combining the data change the baseline estimate from the size-predation experiment?) Figure~\ref{fig:FPfit2} is mildly alarming at first sight, showing that the estimate of the size refuge changes markedly when we incorporate the data from the functional response experiment. That suggests a major difference between the two experiments. A closer look, however, shows that the major difference between the results falls in a region where we don't have any data, between 12.8 and 21~mm body length. The slightly higher predation rate in the functional response experiment (even corrected for predator exposure) pulls the curve up. % SKIP delta method % Calculating the uncertainty from the original % estimate (size-predation data only), by the delta % method (calculating the point estimate, the % variance by the delta method, and % $\mu \pm 1.96 \sigma$): <>= c1 = coef(FSP.modlogist) FSP.expprop.mean = modlogist(12.8,c1["eps"],c1["beta"],c1["phi"]) FSP.exppropvar = deltavar(exp(eps*(phi-12.8))/(1+exp(beta*eps*(phi-12.8))), meanval=coef(FSP.modlogist),Sigma=vcov(FSP.modlogist)) FSP.expprop.deltaci = FSP.expprop.mean+c(-1.96,1.96)*sqrt(FSP.exppropvar) @ How would we go about quantifying the uncertainty in the two curves and convincing ourselves that they're not (statistically) significantly different? Calculating the estimates of the proportion eaten at size 12.8~mm from the size-predation fit alone: <<>>= c1 = coef(FSP.modlogist) FSP.expprop.mean = modlogist(12.8,c1["eps"],c1["beta"],c1["phi"]) @ and from the combined fit: <<>>= c2 = coef(FPcomb) FP.expprop.mean = prop.eaten(N0=10,S=12.8,c2["h"], P=2/0.08,T=3,eps=c2["eps"],beta=c2["beta"],phi=c2["phi"]) @ The estimated predation proportions are \Sexpr{round(FSP.expprop.mean,2)} for the size-predation experiment alone and \Sexpr{round(FP.expprop.mean,2)} for the combined data --- % a difference that certainly might be biologically significant, if it were statistically significant. As discussed in Chapter~7, population projection intervals are a simple way to calculate the confidence intervals of a quantity of interest that is not a parameter in the model. Using \code{mvrnorm} to generate 5000 values from the sampling distribution of the parameters and calculating the 95\% population projection intervals of the size-predation data: <<>>= set.seed(1001) FSP.expprop.pars = mvrnorm(5000,mu=c1,Sigma=vcov(FSP.modlogist)) FSP.expprop.val = numeric(5000) for (i in 1:5000) { FSP.expprop.val[i] = modlogist(12.8,FSP.expprop.pars[i,1], FSP.expprop.pars[i,2],FSP.expprop.pars[i,3]) } FSP.expprop.ppi = quantile(FSP.expprop.val,c(0.025,0.975)) @ Doing the same thing for the combined fit: <>= FP.exppropvar = deltavar(prop.eaten(N0=10,S=12.8,h,P=2/0.08,T=3,eps,beta,phi), meanval=coef(FPcomb),Sigma=vcov(FPcomb)) FP.expprop.deltaci = FP.expprop.mean+c(-1.96,1.96)*sqrt(FP.exppropvar) @ %Using \code{mvrnorm} to generate 5000 values from %the sampling distribution of the parameters and %calculating the 95\% population projection intervals: <>= FP.expprop.pars = mvrnorm(5000,mu=c2,Sigma=vcov(FPcomb)) FP.expprop.val = numeric(5000) for (i in 1:5000) { FP.expprop.val[i] = prop.eaten(N0=10,S=12.8,P=2/0.08,T=3, h=FP.expprop.pars[i,"h"], eps=FP.expprop.pars[i,"eps"], beta=FP.expprop.pars[i,"beta"], phi=FP.expprop.pars[i,"phi"]) } FP.expprop.ppi = quantile(FP.expprop.val,c(0.025,0.975)) @ <>= ##esttab = rbind(c(FSP.expprop.mean,NA,NA), ## c(NA,FSP.expprop.deltaci), ## c(NA,FSP.expprop.ppi), ## c(FP.expprop.mean,NA,NA), ## c(NA,FP.expprop.deltaci), ## c(NA,FP.expprop.ppi)) ## dimnames(esttab) = list(c("size-pred", ## "delta method", ## "PPI", ## "combined", ## "delta method", ## "PPI"), ## c("mean","low","high")) esttab = rbind(c(FSP.expprop.mean,FSP.expprop.ppi), c(FP.expprop.mean,FP.expprop.ppi)) dimnames(esttab) = list(c("size-pred","combined"), c("mean","low","high")) latex(round(esttab,3),file="",title="",table.env=FALSE) @ The results show that the uncertainty in the estimates is large enough that at least the confidence limits of the size-predation estimates (\Sexpr{round(FSP.expprop.ppi[1],2)}, \Sexpr{round(FSP.expprop.ppi[2],2)}) overlap with the estimate from the combined data (\Sexpr{round(FP.expprop.mean,2)}), if not \emph{vice versa}. \cite{VoneshBolker2005} took results like these (although they did not try fitting the combined data as we have done here) and used them together with size-dependent growth rate estimates from a growth experiment to simulate the survival of tadpoles hatching at different sizes. They found that because smaller-starting tadpoles grew faster through the window of vulnerability between 10 and 20~mm, their overall survival was comparable to tadpoles that hatched at a larger size. This analysis has opened up several more questions. \begin{itemize} \item Because it must compromise between two sets of data with slightly different survival rates, the fit of the combined curve to the size-predation data is slightly worse than the fit of the size-predation curve itself (Figure~\ref{fig:FPfit2}). We initially rejected the need for a beta-binomial model to account for overdispersion, but the larger deviations suggest that it might be worth testing again. \item Following \cite{VoneshBolker2005}, we assumed that predator efficiency scaled with area, not volume; this approach may have understated the predator threat in the functional response experiment, leading to an inflation of the expected proportion eaten per unit of exposure. <>= spredarea = with(xpars.Sizepred,P*T/area) frpredarea = with(xpars.Funcresp,P*T/area) spredvol = with(xpars.Sizepred,P*T/vol) frpredvol = with(xpars.Funcresp,P*T/vol) @ The total predator exposure ($P \times T$) in the functional response experiment was $14 \times 3 =42$ predator-days, in contrast to $3 \times 2 = 6$ predator-days for the size-predation experiment. If we calculate $PT/\mbox{area}$ for each experiment and take the ratio, we get a relative risk of $\Sexpr{round(frpredarea,1)}/\Sexpr{round(spredarea,1)} = \Sexpr{round(frpredarea/spredarea,1)}$; overall predator pressure per unit area was lower in the functional response experiment. On the other hand, repeating the same calculation but scaling by volume gives a risk ratio of $\Sexpr{round(frpredvol,2)}/\Sexpr{round(spredvol,2)} = \Sexpr{round(frpredvol/spredvol,2)}$ --- less difference, leading to less inflation of the per-predator risk in the functional response. We could adjust the model by adding a scaling factor to account for the differences between the experiments, and tentatively interpret it in terms of the geometry of the foraging arena \citep{Petersen+1999}. While we clearly don't have enough data to make a decision just from these two experiments, the slight discrepancy between the results of the two experiments does open up some interesting questions \ldots \end{itemize} \begin{figure} <>= svec = seq(0,40,by=0.25) FP.expprop = prop.eaten(N0=10,S=svec,c2["h"], P=2/0.08,T=3,eps=c2["eps"],beta=c2["beta"],phi=c2["phi"]) FSP.expprop = with(as.list(coef(FSP.modlogist)), modlogist(svec,eps,beta,phi)) FP.expprop.pars = mvrnorm(5000,mu=c2,Sigma=vcov(FPcomb)) FP.expprop.vals = t(apply(FP.expprop.pars,1, function(x) { with(as.list(x),prop.eaten(N0=10,S=svec,P=2/0.08,T=3, h=h, eps=eps, beta=beta, phi=phi))})) FP.expprop.pars = mvrnorm(5000,mu=c2,Sigma=vcov(FPcomb)) FSP.expprop.vals = t(apply(FSP.expprop.pars,1, function(x) { with(as.list(x),modlogist(svec,eps,beta,phi))})) FSP.env = t(apply(FSP.expprop.vals,2,quantile,c(0.025,0.975))) FP.env = t(apply(FP.expprop.vals,2,quantile,c(0.025,0.975))) @ <>= op = par(lwd=2,bty="l",las=1,mgp=c(3,1,0),cex=1.5) svec = seq(0,40,by=0.25) with(ReedfrogSizepred,sizeplot(TBL,Kill/10,ylim=c(0,1), xlab="Total body length", ylab="Proportion eaten")) lines(svec,FP.expprop) lines(svec,FSP.expprop,lty=2) abline(v=12.8,col="gray") ## matlines(svec,FSP.env,lty=2,col="gray") ## matlines(svec,FP.env,lty=1,col="gray") ## c3 = coef(FPcomb2) ## FP.expprop2 = prop.eaten(N0=10,S=svec,c3["h"], ## P=2/0.08,T=3,eps=c3["eps"],beta=c3["beta"],phi=c3["phi"]) ## lines(svec,FP.expprop2,lty=3) ## bit = 0.3 ## plotCI(12.8-bit,FP.expprop.mean,li=FP.expprop.ppi[1],ui=FP.expprop.ppi[2], ## add=TRUE,pch=NA) ## plotCI(12.8+bit,FSP.expprop.mean,li=FSP.expprop.ppi[1],ui=FSP.expprop.ppi[2], ## add=TRUE,pch=NA,slty=2) legend("topright",c("size-pred.","combined"),lty=2:1,bty="n") par(op) @ \caption{Observed number eaten as a function of size; predicted values from size-predation experiment only and from all data combined.} \label{fig:FPfit2} \end{figure} <>= detach(ReedfrogSizepred) detach(ReedfrogFuncresp) @ \section{Goby survival analysis} Next, we will take a look at the effects of density and ``quality'' (spatial variation in habitat quality correlated with natural rates of immigration) on the survival of the small marine fishes (gobies) \emph{Elacatinus evelynae} and \emph{E. prochilos} in field experiments \citep{Wilson2004}. The questions here are straightforward: how fast do fish die (or disappear) at different levels of density and quality? Do quality, density, or their interaction (i.e. an effect of quality on the density-dependent mortality rate) have significant effects? As a reminder, the data contain information on the survival of marine gobies in experiments where ambient density was manipulated on coral heads with different background settlement rates. Settlement rates were suspected to be correlated with some unknown aspect of environmental quality, such as flow patterns or availability of refuges \citep{WilsonOsenberg2002}, which revealed itself through lower mortality rates (Figure~\ref{fig:gobyrep}). \subsection{Preliminaries} Attach the data: <<>>= library(emdbookx) data(GobySurvival) attach(GobySurvival) @ In the data, time starts from day~1 (the day the fish were put on the reef) and runs until day~12; any fish that survived past the end of the experiment (i.e. they were still present on day~12) were given a ``last day seen'' (\code{d2}) value of 70 in the original data set. For the following analysis, time should start from zero and run to $\infty$ (the cumulative distribution functions we will be using can handle infinite values), so we will subtract 1 from \code{d1} and \code{d2} and set the ending value of \code{d2} to \code{Inf}: <<>>= day1 = d1-1 day2 = ifelse(d2==70,Inf,d2-1) @ %We need one more adjustment; for some parameters, %the Weibull <>= day1 = pmax(day1,.Machine$double.eps) @ \begin{figure} <>= xmax = 1 op = par(cex=1.5,lwd=2, mgp=c(2.5,0.75,0),las=1,bty="l") par(mfrow=c(1,2)) shapes = c(1,0.5,0.2) curve(pweibull(x,shape=shapes[1],scale=0.5,lower.tail=FALSE),from=0.001,to=xmax,log="y", ylab="Fraction surviving",xlab="Time",ylim=c(0.1,1),axes=FALSE) axis(side=1,at=c(0,0.5,1)) axis(side=2) box() curve(pweibull(x,shape=shapes[2],scale=0.5,lower.tail=FALSE),add=TRUE,lty=2) curve(pweibull(x,shape=shapes[3],scale=0.5,lower.tail=FALSE),add=TRUE,lty=3) ## abline(h=exp(-1),col=2) ## abline(v=0.5,col=2) text(rep(xmax+0.1,3),pweibull(xmax,shape=shapes,scale=0.5,lower.tail=FALSE), paste("shape=",shapes,sep=""), adj=0,xpd=NA) text(corner.loc(x=1,y=1),adj=1,"scale=0.5",cex=1.5) par(mar=c(5,3,4,4)+0.1) scales = c(0.5,1,2) curve(pweibull(x,shape=0.5,scale=scales[1],lower.tail=FALSE),from=0.001,to=xmax, log="y", axes=FALSE,ylim=c(0.1,1),xlab="Time",ylab="") box() axis(side=1,at=c(0,0.5,1)) curve(pweibull(x,shape=0.5,scale=scales[2],lower.tail=FALSE),add=TRUE,lty=2) curve(pweibull(x,shape=0.5,scale=scales[3],lower.tail=FALSE),add=TRUE,lty=3) text(rep(xmax+0.1,3),pweibull(xmax,shape=0.5,scale=scales,lower.tail=FALSE), paste("scale=",scales,sep=""), adj=0,xpd=NA) ## abline(h=exp(-1)) text(corner.loc(x=1,y=1),adj=1,"shape=0.5",cex=1.5) par(op) @ \caption{Comparisons of Weibull distributions with differing scale and shape parameters. The \R\ commands to plot the curves are variations on \code{curve(pweibull(x,shape=...,scale=...,lower.tail=FALSE))}.} \label{fig:weib} \end{figure} \begin{figure} <>= op = par(lwd=2,bty="l",las=1,mgp=c(3,1,0),cex=1.5) dens.cat = factor(ifelse(density>median(density),"high","low"), levels=c("low","high")) qual.cat = factor(ifelse(qual>median(qual),"high","low"), levels=c("low","high")) intcat = interaction(qual.cat,dens.cat) cattab = table(intcat) ## when (average) did they die? meansurv = (d1+d2)/2 ## tabulate morttab = table(meansurv,intcat) ## reverse order morttab = morttab[nrow(morttab):1,] ## get cumulative sum: this is number surviving until day x csurvtab = apply(morttab,2,cumsum) ## divide by total number in category cnsurvtab = sweep(csurvtab,2,cattab,"/") days = as.numeric(rownames(csurvtab)) matplot(days,cnsurvtab,type="s",xlab="Time (days)", ylab="Proportion of cohort surviving", log="y",col=1,xlim=c(0,40)) legend("topright", c("high quality/high density", "low quality/low density", "high quality/low density", "low quality/high density"),bty="n", lty=c(4,1,2,3),cex=0.9) par(op) @ \caption{Goby survival curves by quality and density categories (above/below median values), based on mean survival time \code{(d1+d2)/2}.} \label{fig:gobyrep} \end{figure} %Survival rates over time: %calculate mortality per day? <>= numsurv = diag(table(d1,d2)) days = c(1,4,8,11,70) csurv = nrow(X)-c(0,cumsum(numsurv)) 1-exp(log(csurv[2:5]/csurv[1:4])/diff(days)) @ As discussed in Chapter~\ref{chap:opt}, we will use the Weibull distribution to fit the data, allowing for the observed decrease in mortality rate over time. We are interested in whether mortality is density-dependent, and whether quality affects either the density-independent or the density-dependent mortality rate. We may need to allow for the possibility that different experiments show different results (this data set combines the results from 5~experiments run over the course of three years). The most complete model of the survival time of an individual fish in experiment $x$ with density (number of neighboring fish) $d$ and quality (background settlement rate) $q$ would be: \begin{equation} \begin{split} T & \sim \mbox{Weibull}(a_{x}(d,q),s_{x}(d,q)) \\ a_{x}(d,q) & = \exp(\alpha_{a,x} + \beta_{a,x} \cdot q + (\gamma_{a,x} + \delta_{a,x} \cdot q) d) \\ s_{x}(d,q) & = \exp(\alpha_{s,x} + \beta_{s,x} \cdot q + (\gamma_{s,x} + \delta_{s,x} \cdot q) d) \end{split} \end{equation} In other words, we are fitting the shape and scale parameters on the log scale. For both the shape and the scale parameter we are allowing for a baseline value ($\alpha$), a linear effect (on the log scale) of quality ($\beta$), a linear effect of density ($\gamma$), and an interaction between density and quality ($\delta$) --- i.e., a linear effect of quality on the density-dependent mortality coefficient. As indicated by the $x$ in the subscripts, we are also allowing each parameter to be different in each experiment, for a total of 40 (!) parameters. Given that we have only \code{length(day1)} observations, unevenly divided among experiments (with as few as \Sexpr{min(table(exper))} observations in an experiment), and that each observation tells us fairly imprecisely when a fish disappeared, this model is certainly more complex than we can hope to fit. We might try anyway, fitting all possible submodels and using model-selection rules to decide which pieces really belong in the model\footnote{The statistical equivalent of the advice of a crusading abbot who when asked how to tell the innocents and the heretics apart said, ``Kill them all, God will recognize his own'' \ldots}, but even so there would be far too many submodels to consider. There are two possibilities for the intercept parameter $\alpha$ (the same for all experiments or different among experiments), and three for each of the other parameters $\beta$, $\gamma$, and $\delta$ (zero for all experiments, meaning no effect of density or quality or their interaction; non-zero but the same for all experiments; or different for different experiments). There are 34~possible models for shape and 34 for scale% \footnote{You might expect $2 \times 3 \times 3 \times 3=54$ for each parameter of the Weibull, but there are a few combinations that don't make sense --- specifically, fitting the $\delta$ (density-quality interaction parameter) if either the density or quality effect is set to zero.} or $34^2=1156$ models in total, even for this moderate-sized problem! We must make some \emph{a priori} decisions about which parameters to drop --- decisions made harder by the difficulty of graphically representing the dependence of survival on continuous covariates. Figure~\ref{fig:weib} shows the effects of the shape and scale parameter on the Weibull distribution. Comparing these difference to the survival curves in Figure~\ref{fig:gobyrep} suggests that the scale, but not the shape, of the Weibull distribution varies between density and quality categories. Figure~\ref{fig:gobyrep} also suggests an interaction between quality and density categories, because survival in the low quality/high density category is considerably below that in any other category. Figure~\ref{fig:gobyrep} does not separate the results of different experiments. It might be worth drawing this figure to check, but for now we will assume that the only possible difference among experiments is in the baseline scale parameter, not in the effects of density and quality. \cite{Wilson2004} used a standard survival analysis to demonstrate non-significant interactions between experiment and density/quality, supporting this assumption. % alpha(2) x beta(2) x delta(2) x gamma(2) = 16 % { 0, {x, b , d , g} , {xb, xd, xg, bd, bg, dg}, {xbd, xbg, bdg, xdg}, % xbdg} -- but (bg) and (dg) combinations are not allowed so % { 0, {x, b , d , g} , {xb, xd, xg, bd}, {xbd }, % -- but (bg) and (bd) combinations are not allowed so % down to 10? (16 - 4 - 4 + 2) These simplifications reduce our most complex model to \begin{equation} \begin{split} T & \sim \mbox{Weibull}(a,s_{x}(d,q)) \\ s_{x}(d,q) & = \exp(\alpha_{s,x} + \beta_{s} \cdot q + (\gamma_{s} + \delta_{s} \cdot q) d), \end{split} \end{equation} with 9~parameters (5 for treatment effects on scale, 3 for the effects of density, quality and their interaction on scale, and 1 for the shape parameter). Our suite of models reduces to 10. If we denote the simplest model (a single shape and scale parameter) by \code{0}; the presence of treatment effects ($\alpha_i \neq \alpha_j$ for at least one pair of experiments) by \code{x}; a quality effect ($\beta_{s} \neq 0$) by \code{q}; a density effect ($\gamma_s >0$) by \code{d}; and a quality-density interaction ($\delta_s >0$) by \code{i}, then our remaining models with their numbers of parameters are: \\ \begin{center} \begin{tabular}{ccccc} \code{0} (2) & \code{x} (6) & \code{xq} (7) & \code{xqd} (8) & \code{xqdi} (9) \\ & \code{q} (3) & \code{xd} (7) & \code{qdi} (5) & \\ & \code{d} (3) & \code{qd} (4) & & \end{tabular} \end{center} (These are all combinations of \code{x}, \code{q}, \code{d}, and \code{i}, with the restriction that \code{i} cannot be included without both \code{q} and \code{d}.) If we wanted to allow the shape parameter to vary with quality and density, but not experiment, we would have a most-complex model with 12~parameters and a total of 40 ($4 \times 10$) model possibilities. Here is the most complex model, which fits scale and shape parameters that differ with quality, density, and their interaction: <<>>= NLL.GS.xqdi = function(lscale0,lscale.q,lscale.d,lscale.i, lscale.x2,lscale.x3,lscale.x4,lscale.x5, lshape) { lscalediff = c(0,lscale.x2,lscale.x3,lscale.x4,lscale.x5) scale=exp(lscale0+lscalediff[exper]+ lscale.q*qual+(lscale.d+lscale.i*qual)*density) shape = exp(lshape) -sum(log(pweibull(day2,shape,scale)- pweibull(day1,shape,scale))) } @ The only unusual thing here is that we've parameterized the difference among experiments so that the baseline parameter (\code{lscale0}) represents the log of the scale parameter (at density=0 and quality=0) in experiment~1, while the experiment parameters (\code{lscale.x2}, etc.) represent the differences between experiment~1 and the other experiments: this parameterization is consistent with the way that other functions in \R\ define parameters, and makes it possible to test the hypothesis that all experiments are the same by setting \code{lscale.x2} and the other experiment parameters to zero. The differences among parameters are indexed by \code{exper} and added to the baseline value along with the effects of density and quality. Since we don't know exactly when (between \code{day1} and \code{day2}) a given fish disappeared, we calculate the probability that it disappeared somewhere between \code{day1} and \code{day2} taking the difference between the probability that it disappeared before \code{day2} (\code{pweibull(day2,...)}) and the probability that it disappeared before \code{day1} (\code{pweibull(day1,...)}); we take the log only \emph{after} calculating the difference. What about starting values for this model? The mean of the Weibull distribution with shape $a$ and scale $s$ is $s \Gamma(1+1/a)$, which for an exponential ($a=1$) is equal to $s$. We'll start $\log(s)$ from the log of the overall mean survival time (calculated from \code{d1} and \code{d2} rather than \code{day1} and \code{day2} because \code{day2} contains infinite values that will mess up the mean calculation), and $\log(a)$ from 0, which represents an exponential distribution. Since the rest of the parameters represent differences from the baseline case, we'll try starting them all from zero. <>= totmeansurv = mean((d1+d2)/2) startvals.GS = list(lscale0=log(totmeansurv), lscale.x2=0,lscale.x3=0,lscale.x4=0,lscale.x5=0, lscale.q=0,lscale.d=0,lscale.i=0, lshape=0) GS.xqdi = mle2(NLL.GS.xqdi,startvals.GS) @ <>= ## this also works! dicweib = function(x,shape,scale,log=FALSE) { if (is.matrix(x)) { day1 = x[,1] day2 = x[,2] } else { day1 = x[1] day2 = x[2] } v = log(pweibull(day2,shape,scale)-pweibull(day1,shape,scale)) if (log) v else exp(v) } fexper = factor(exper) mle2(cbind(day1,day2)~dicweib(exp(shape),exp(scale)), parameters=list(scale~fexper+qual*density), start=list(scale=log(totmeansurv),shape=0)) @ Looking at the estimates of the parameters and their approximate $p$-values: <<>>= summary(GS.xqdi) @ From this summary, it appears that there may be an effect of experiment (experiments 2 and 3 both show significantly shorter survival times than experiment~1), an effect of density, and a shape parameter that is significantly less than 1 ($\log(a)<0$) --- that is, \emph{per capita} mortality declines significantly with time. In a stepwise analysis, we would continue by dropping the interaction term from the model (it doesn't really make sense to drop the parameters for experiments 4 and 5, since they are part of the overall difference among experiments). One shortcut for dropping terms from an \code{mle} fit, rather than writing another likelihood function that is missing one term, is to use the \code{fixed=} argument to set a subset of the parameters to zero. For example, to drop the interaction term from the model: <>= GS.xqd = mle2(NLL.GS.xqdi,startvals.GS,fixed=list(lscale.i=0)) @ We can use the Likelihood Ratio Test on particular series of nested hypotheses to test specific conclusions. For example, we might be most interested in testing whether quality and density have an effect. We attempt to drop the interaction term first, then quality, then density. Because the differences between experiments are potentially important, and an unavoidable part of the experimental design, we leave them in the model. Therefore we want to test the sequence of models \code{xqdi} $\to$ \code{xqd} $\to$ \code{xd} $\to$ \code{x}. Fitting the remaining two models in the sequence: <>= GS.xd = mle2(NLL.GS.xqdi,startvals.GS, fixed=list(lscale.i=0,lscale.q=0)) GS.x = mle2(NLL.GS.xqdi,startvals.GS, fixed=list(lscale.i=0,lscale.q=0,lscale.d=0)) @ Applying \code{anova} to run the Likelihood Ratio Test: <<>>= anova(GS.xqdi,GS.xqd,GS.xd,GS.x) @ This analysis confirms the results of \code{summary} on the most complex model to some extent. It finds that the effect of the interaction (model 1 vs. 2) is insignificant and the effect of density is significant at $p=0.03$ (model 3 vs. 4). The effect of quality (when added to a model that already accounts for density) is weakly significant. The parameter values (\code{coef(GS.xqd)}) show the positive effect of quality (\Sexpr{round(coef(GS.xqd)["lscale.q"],3)}) to be about half the negative effect of density (\Sexpr{round(coef(GS.xqd)["lscale.d"],3)}), on the log scale; adding one competitor to a reef decreases the scale parameter (and hence survival) by a factor of $e^{\Sexpr{round(coef(GS.xqd)["lscale.d"],3)}}= \Sexpr{round(exp(coef(GS.xqd)["lscale.d"]),2)}$, while an additional background settler indicates some element of quality that increases survival on average by a factor of $e^{\Sexpr{round(coef(GS.xqd)["lscale.q"],3)}}= \Sexpr{round(exp(coef(GS.xqd)["lscale.q"]),2)}$. <>= GS.qdi = mle2(NLL.GS.xqdi,startvals.GS,fixed=list(lscale.x2=0,lscale.x3=0,lscale.x4=0,lscale.x5=0)) GS.qd = mle2(NLL.GS.xqdi,startvals.GS, fixed=list(lscale.i=0, lscale.x2=0,lscale.x3=0,lscale.x4=0,lscale.x5=0)) GS.xq = mle2(NLL.GS.xqdi,startvals.GS, fixed=list(lscale.i=0,lscale.d=0)) GS.d = mle2(NLL.GS.xqdi,startvals.GS, fixed=list(lscale.i=0,lscale.x2=0,lscale.x3=0,lscale.x4=0, lscale.x5=0,lscale.q=0)) GS.q = mle2(NLL.GS.xqdi,startvals.GS, fixed=list(lscale.i=0, lscale.x2=0,lscale.x3=0,lscale.x4=0,lscale.x5=0,lscale.d=0)) GS.0 = mle2(NLL.GS.xqdi,startvals.GS, fixed=list(lscale.i=0,lscale.x2=0,lscale.x3=0,lscale.x4=0, lscale.x5=0,lscale.d=0,lscale.q=0)) @ <>= a1 = AICtab(GS.xqdi,GS.xqd,GS.qdi,GS.xq,GS.xd,GS.qd,GS.x,GS.q,GS.d,GS.0,delta=TRUE,weights=TRUE) a2 = AICctab(GS.xqdi,GS.xqd,GS.qdi,GS.xq,GS.xd,GS.qd,GS.x,GS.q,GS.d,GS.0, nobs=nrow(GobySurvival),delta=TRUE) b1 = BICtab(GS.xqdi,GS.xqd,GS.qdi,GS.xq,GS.xd,GS.qd,GS.x,GS.q,GS.d,GS.0, nobs=nrow(GobySurvival),delta=TRUE) atab = cbind(a1$df,a1$dAIC,a1$weight,a2$dAICc,b1$dBIC) rownames(atab) = gsub("GS\\.","",attr(a1,"row.names")) ## need to double backslashes colnames(atab) = c("params","$\\Delta \\mbox{AIC}$","AIC weights","$\\Delta \\mbox{AIC}_c$","$\\Delta \\mbox{BIC}$") atab = round(atab[order(atab[,2]),],3) @ Alternatively, we can simply fit the remaining 6 models (\code{qdi}, \code{qd}, \code{xq}, \code{d}, \code{q}, \code{0} --- not shown) and use information criteria (\code{AICtab}, \code{AIcctab}, or \code{BICtab}) to get the following results: <>= latex(round(atab,2),file="",table.env=FALSE,title="model") @ This is perhaps too much information --- because of the different weighting used by the different information criteria, they give qualitatively different answers. AIC and \aicc\ prefer the model that incorporates the effects of quality and density, with all the models considered plausible ($\Delta \mbox{AIC}, \Delta \mbox{AIC}_c <6$ for all candidate models) but with the simplest models weighted very little; in contrast, BIC prefers the simplest models (\code{0}, \code{d}), ruling out the most complex ones ($\Delta \mbox{BIC}>10$ for \code{xqd}, \code{xqdi}, \code{xd}, \code{xq}, \code{x}). What should one conclude in this situation, with too many possible answers? There isn't really a good answer, except that in reality one should decide \emph{in advance} which model selection approach (if any) comes closest to answering the kind of question you have, rather than trying several and then having to choose among the answers. Here there is fairly strong evidence that density has an effect, and based on the coefficients the effect of quality is about half as strong (per fish present) as that of density. In terms of the range of values used in the experiment, density and quality have approximately equivalent effects (density has a range of 9, from 2 to 11, while quality ranges from 1 to 18). There aren't too many loose ends in this particular analysis, but there are a number of possible directions for further exploration: \begin{itemize} \item We have followed standard survival analysis in making the mortality rate an exponential function of covariates such as density. Fisheries biologists commonly model mortality as a linear (additive) function of density instead (i.e., $\mbox{Prob}(\mbox{survival to }t) \propto e^{-a+b\cdot d}$ rather than $\mbox{Prob}(\mbox{survival to }t) \propto e^{-e^{a+b\cdot d}}$). The exponential analysis is more convenient because it guarantees that the mortality rate will always be positive regardless of the parameters, thus avoiding the need for constrained optimization. For small mortality rates the analysis will give approximately the same answers, since by Taylor expansion the exponential is approximately linear near zero. It would be interesting to re-do the analysis with a linear model and see how similar the answers were. More challengingly, one could explore the dependence of survival on density and quality in more detail --- perhaps graphically --- and see if a more flexible function could give a better answer. \item We ignored differences in shape parameter; it would be interesting to go back and explore the possibilities of differences in shape (representing the differences in change in mortality over time) some more, and with a wider variety of data; does the shape parameter vary with the mode of mortality? \end{itemize} \section{Seed removal} For the Duncan seed predation/seed removal data, some of the ecological questions are: how does the probability of seed removal vary as a function of distance from the forest edge (10 or 25~m)? With species, possibly as a function of seed mass? By time? Since most of the predictor variables are categorical in this case (species; distance from forest), the deterministic models are relatively simple --- simply different probabilities for different levels of the factors. On the other hand, the distribution of the number of seeds taken is unusual, so most of the initial modeling effort will go into finding an appropriate stochastic model. \subsection{Preliminaries} Pull in the data: <<>>= data(SeedPred) @ Drop \code{NA}s and records where there are zero seeds available: attach the results. <<>>= SeedPred = na.omit(subset(SeedPred,available>0)) attach(SeedPred) @ About 90\% of the data consist of ``zero taken'' entries. We don't want to ignore these data, but sometimes we can see more if we look only at the non-zero cases: we'll use \code{nz} for that case. <<>>= nz = subset(SeedPred,taken>0) @ \subsection{Stochastic model: which distribution?} I used \code{barchart} from the \code{lattice} package to look at the data in a variety of different ways --- % rearranging the order of the factors in the table to get different arrangements of panels and bars, plotting data with zero-taken data included and excluded, and adding dropping factors from the \code{table} command to see coarser views of the data: <>= barchart(table(nz$taken,nz$available,nz$dist,nz$species),stack=FALSE) barchart(table(nz$taken,nz$species,nz$dist,nz$available),stack=FALSE) barchart(table(nz$species,nz$available,nz$dist,nz$taken),stack=FALSE) barchart(table(nz$available,nz$dist,nz$taken),stack=FALSE) barchart(table(nz$available,nz$species,nz$taken),stack=FALSE) @ I could also have included the argument \code{subset=taken>0} to restrict the plots to non-zero data. Plot all data (not just cases where some seeds are taken): <<>>= barchart(table(available,dist,taken),stack=FALSE) @ Plot by date: <>= tcumfac = cut(nz$tcum,breaks=c(0,20,40,60,180)) barchart(table(nz$available,tcumfac,nz$taken),stack=FALSE) barchart(table(available,tcumfac,taken),stack=FALSE) @ Two additional useful arguments are \code{auto.key=TRUE}, to draw a legend for the bar colors, and \code{scales=list(relation="free")}, to allow different scales in each panel. \begin{figure} <>= pcomb = table(nz$available,nz$taken) pcomb = sweep(pcomb,1,rowSums(pcomb),"/") trellis.par.set(canonical.theme(color=FALSE)) print(barchart(pcomb[-1,],stack=FALSE,auto.key=list(x=0.8,y=0.8,corner=c(1,1), title="Taken",reverse.rows=TRUE), xlab="Frequency",ylab="Seeds available")) @ \caption{Distribution of overall number of seeds taken as a function of the number available, when number available $>1$ and number taken $>0$.} \label{fig:seeddist} \end{figure} As with the reed frog predation experiment, the data are discrete and the results have an upper limit (i.e., the number of seeds available for removal at the beginning of the interval). The zero-inflated binomial introduced in Chapter~4 might make sense, if there were more zeros in the data set than expected from the binomial sampling process (e.g. if the probability distribution had modes both at zero and away from zero). This distribution would be appropriate if predators sometimes missed the site entirely. However, Figure~\ref{fig:seeddist} shows that the seed removal data set doesn't look like a zero-inflated binomial either, because the distribution is lowest in the middle and increases gradually for higher or lower values. Compare that with Figure~\ref{fig:zi} (p.~\pageref{fig:zi}), which shows that the probability distribution function of the zero-inflated binomial distribution usually drops toward zero, then has a spike at zero ($p(0)>p(1)$, $p(1)>= dzibinom = function(x,prob,size,zprob,log=FALSE) { logv = log(1 - zprob) + dbinom(x, prob = prob, size = size, log = TRUE) logv = ifelse(x == 0, log(zprob + exp(logv)), logv) if (log) logv else exp(logv) } dzibb = function(x,size,prob,theta,zprob,log=FALSE) { logv = ifelse(x>size, NA,log(1 - zprob) + dbetabinom(x, prob = prob, size = size, theta=theta, log = TRUE)) logv = ifelse(x == 0, log(zprob + exp(logv)), logv) if (log) logv else exp(logv) } @ Next I took shortcut and used the formula interface to \code{mle2} rather than writing an explicit negative log-likelihood function. I fitted the zero-inflation probability on a logit scale, using \code{plogis} to transform it on the fly, since it must be between 0 and 1: <>= SP.zibb = mle2(taken~dzibb(size=available,prob,theta,plogis(logitzprob)), start=list(prob=0.5,theta=1,logitzprob=0)) print(SP.zibb) @ There were warnings about \code{NaN}s in \code{lbeta}, but the final answers look reasonable. I was surprised to see that the zero-inflation probability was so small: \code{plogis(\Sexpr{round(coef(SP.zibb)["logitzprob"],2)})}= \Sexpr{round(plogis(coef(SP.zibb)["logitzprob"]),3)}. I suspected that the zero-inflation parameter and the overdispersion parameter ($\theta$) might both be affecting the number of zeros, so I checked the correlations among the parameters: <<>>= cov2cor(vcov(SP.zibb)) @ Indeed, \code{logitzprob} and \code{prob} are 99\% correlated --- suggesting that we could drop the zero-inflation parameter from the model. <>= SP.bb = mle2(taken~dbetabinom(prob,theta,size=available), start=list(prob=0.5,theta=1)) @ <<>>= logLik(SP.bb)-logLik(SP.zibb) @ The log-likelihood difference is only about \Sexpr{round(c(logLik(SP.bb)-logLik(SP.zibb)),2)}. Even allowing for the fact that the null value of the zero-inflation parameter is on the boundary, so that the appropriate $\bar \chi_1^2$ $p$-value is half the usual $\chi_1^2$ $p$-value, this difference is certainly not significant. Just for completeness, I fitted the zero-inflated binomial too (although I didn't think it would fit well): <>= SP.zib = mle2(taken~dzibinom(size=available,prob=p, zprob=plogis(logitzprob)), start=list(p=0.2,logitzprob=0)) @ Using AIC to compare all three distributions: <<>>= AICtab(SP.zib,SP.zibb,SP.bb,sort=TRUE,weights=TRUE) @ Figure~\ref{fig:distcomp} compares the predictions of the different distributions, with stacked barplots showing the breakdown of different numbers of seeds taken for each number of seeds available. The \R\ code to calculate this distribution for the data first computes the table of number-taken-by-number-available, then uses \code{sweep} to divide each column (margin 2) by its sum: <<>>= comb = table(taken,available) pcomb = sweep(comb,2,colSums(comb),"/") @ The equivalent computation for the zero-inflated beta-binomial sets up an empty matrix with 6 rows (for 0 to 5 seeds taken) and 5 columns (for 1 to 5 seeds available). For each number available $N$, it then sets the first $N+1$ rows in column $N$ of the matrix to the predicted probability of taking 0 to $N$ seeds. <<>>= mtab = matrix(0,nrow=6,ncol=5) for (N in 1:5) { cvals = coef(SP.zibb) mtab[1:(N+1),N] = dzibb(0:N,size=N,prob=cvals["prob"], theta=cvals["theta"], zprob=plogis(cvals["logitzprob"])) } @ Similar calculations work for the other two distributions. <>= mtab2 = matrix(0,nrow=6,ncol=5) for (av in 1:5) { mtab2[1:(av+1),av] = dbetabinom(0:av,prob=coef(SP.bb)["prob"], theta=coef(SP.bb)["theta"],size=av) } mtab3 = matrix(0,nrow=6,ncol=5) for (av in 1:5) { mtab3[1:(av+1),av] = dzibinom(0:av,prob=coef(SP.zib)["p"], zprob=plogis(coef(SP.zib)["logitzprob"]), size=av) } @ \begin{figure} <>= tmpf = function(m) { m2 = m[-1,] sweep(m2,2,colSums(m2),"/") } op = par(mfrow=c(2,2),mar=c(2,2,2,2)) ##mosaicplot(tmpf(pcomb),main="data") barplot(tmpf(pcomb),main="data",axes=FALSE) barplot(tmpf(mtab),main="Z-I beta-binomial",axes=FALSE) barplot(tmpf(mtab2),main="beta-binomial",axes=FALSE) barplot(tmpf(mtab3),main="Z-I binomial",axes=FALSE) par(op) @ \caption{Observed and predicted distribution of number of seeds taken as a function of number available. (Zero-taken results are omitted, and columns are rescaled to add to 1.)} \label{fig:distcomp} \end{figure} As we would expect from the statistical results so far, the zero-inflated beta-binomial and beta-binomial predictions look nearly identical, and much closer than the zero-inflated binomial results. However, there are still visible discrepancies for the cases of 4 and 5 seeds available --- the predicted distributions are more regular, and have more even distributions, than the observed. We can calculate standard $\chi^2$ $p$-values for the probability of the observed numbers taken for each number of seeds available: <<>>= pval = numeric(5) for (N in 1:5) { obs = comb[1:(N+1),N] prob = mtab[1:(N+1),N] pval[N] = chisq.test(obs,p=prob)$p.value } @ The $p$-values are: <>= latex(t(matrix(format.pval(pval,digits=2,eps=0.001), dimnames=list(1:5,NULL))),file="",title="",table.env=FALSE) @ There are still statistically significant discrepancies between the expected and observed distributions when 4 or 5 seeds are available. We could try to find a way to make the stochastic model more complex and accurate, but we have reached the limit of what we can do with simple models, and we may also have reached the limit of what we can do with the data. The mechanism for the pattern remains obscure. While I can imagine mechanisms that would lead to all seeds or none being taken, it's hard to see why it's least likely that 3 out of 5 available seeds would be taken. I suspect that there is some disaggregation of the data by species, date, etc., that would divide stations into those where few or many seeds were taken, with an extreme pattern in each case that combines to create the observed bimodal pattern, but I haven't been able to find it. \subsection{Deterministic model: differences among species, etc.} Now we can check for differences among distances from the forest, species, and possibly differences in space and time: how does the distribution of number of seeds removed vary? Does $p$, the overall probability that a seed will be removed, vary? Does $\theta$ (the overdispersion parameter, which in this case is more related to the probability that any seeds will be removed) vary? Do they both vary? \subsubsection{Differences among transects (distance from edge)} \label{contrasts3} \code{mle2}'s formula interface allows us to specify that some parameters vary among groups, by giving a \code{parameters} argument which is a list of the formulas for each group (p.~\pageref{mle2params}). Here I wanted to parameterize the model so that \code{mle2} would estimate the probability and overdispersion parameter for each species, rather than estimating the parameters for the first group and the differences between subsequent groups and the first, so I used the formulas \code{prob\~{}dist-1} and \code{theta\~{}dist-1} to fit the model without an intercept. <>= SP.bb.dist = mle2(taken~dbetabinom(prob,size=available,theta), parameters=list(prob~dist-1,theta~dist-1),start=as.list(coef(SP.bb))) @ A Likelihood Ratio Test on the two models suggests a significant difference between transects: <<>>= anova(SP.bb,SP.bb.dist) @ Reparameterizing the model in terms of differences between the 10-m and 25-m transect rather than the $p$ and $\theta$ values for each transect (i.e., dropping the \code{-1} in the parameter formulas) allows us to calculate confidence limits on the differences between transects. At the same time, I decided to switch to fitting $p$ on a logit scale and $\theta$ on a log scale. With the formula interface, I can do the inverse transformations on the fly with \code{plogis} and \code{exp}. Set up starting values, using \code{qlogis} (the inverse of \code{plogis}) and \code{log} to transform the estimated values of the $p$ and $\theta$ parameters from above. <<>>= startvals =list(lprob=qlogis(coef(SP.bb.dist)["prob.dist10"]), ltheta=log(coef(SP.bb.dist)["theta.dist10"])) @ <>= SP.bb.dist2 = mle2(taken~dbetabinom(plogis(lprob),exp(ltheta),size=available), parameters=list(lprob~dist,ltheta~dist),start=startvals) @ The summary of the model now gives us approximate $p$-values on the parameters, showing that the difference between transects is caused by a change in $p$ and not a change in $\theta$. <<>>= summary(SP.bb.dist2) @ (The highly significant $p$-values for \code{lprob.10} and \code{ltheta.10} are not biologically significant: they merely show that $\mbox{logit}(p_{10}) \neq 0$ (i.e. $p_{10} \neq 0.5$) and $\log \theta \neq 0$ ($\theta \neq 1$), neither of which is ecologically interesting.) Now reduce the model, allowing only $p$ to vary between transects: <>= SP.bb.probdist = mle2(taken~dbetabinom(plogis(lprob),exp(ltheta),size=available), parameters=list(lprob~dist),start=startvals) @ Both the LRT and the AIC approaches suggest that the best model is one in which $p$ varies between transects but $\theta$ does not (although the AIC table suggests that the more complex model with differing $\theta$ should be kept in consideration): <<>>= anova(SP.bb,SP.bb.probdist,SP.bb.dist) AICtab(SP.bb,SP.bb.probdist,SP.bb.dist,sort=TRUE,weights=TRUE) @ How big is the difference between transects? <<>>= c1 = coef(SP.bb.probdist) plogis(c(c1[1],c1[1]+c1[2])) @ The difference is small --- 6\% vs 7\% probability of removal per observation. This difference is unlikely to be ecologically significant, and reminds us that when we have a big data set (\Sexpr{nrow(SeedPred)} observations) even small differences can be statistically significant. On the other hand, \cite{DuncanDuncan2000} failed to find a significant difference between the transects --- so the likelihood framework is more powerful, and has given us answers in terms (average percent difference in probability of removal) that we can understand. \subsubsection{Differences among species} Now I proceeded to test differences among species. First I tried a model with both $\theta$ and $p$ varying. (Both parameters are again fitted on transformed scales, logit and log respectively.) <>= SP.bb.sp = mle2(taken~dbetabinom(plogis(lprob),exp(ltheta),size=available), parameters=list(lprob~species,ltheta~species),start=startvals) @ The parameter estimates (shown in full by \code{summary(SP.bb.sp)}: here I have dropped one column of the table) suggest that, as in the case of differences among transects, differences in $p$ and not $\theta$ are driving the differences among species: <>= s1 = summary(SP.bb.sp) s1@coef = s1@coef[,-3] ## drop z value printCoefmat(s1@coef,has.Pvalue=TRUE) @ So I fitted a model with only probability $p$, and not overdispersion $\theta$, varying by species: <>= SP.bb.probsp = mle2(taken~dbetabinom(plogis(lprob),exp(ltheta),size=available), parameters=list(lprob~species),start=startvals) @ Once again, both Likelihood Ratio tests and AIC suggest that only the $p$ parameters differ among species: <<>>= anova(SP.bb.sp,SP.bb.probsp,SP.bb) AICtab(SP.bb.sp,SP.bb.probsp,SP.bb,sort=TRUE,weights=TRUE) @ Now I want to know whether seed mass and $p$ are related. If they were, I could fit a likelihood model where $p$ was treated as a function of seed mass, reducing the number of parameters to estimate and perhaps allowing me to predict removal probabilities for other species on the basis of their seed masses. <>= tmpfun = calc_mle2_function(taken~dbetabinom(plogis(lprob),exp(ltheta),size=available), parameters=list(lprob~species-1),start=startvals)$fn pars = coef(SP.bb.probsp) pars[2:8] = pars[2:8]+pars[1] names(pars) = NULL do.call("tmpfun",as.list(pars)) @ <>= SP.bb.probsp0 = mle2(taken~dbetabinom(plogis(lprob),exp(ltheta),size=available), parameters=list(lprob~species-1),start=startvals, method="L-BFGS-B",lower=rep(-10,9),upper=rep(10,9)) @ Fitting this model was numerically problematic. In my first attempt, using default methods and parameters, \code{mle2} found a ridiculous answer (all the logit-probabilities were strongly negative, giving removal probabilities near zero) and crashed while evaluating the Hessian. I used \code{skip.hessian=TRUE} to temporarily stop \code{mle2} from crashing and \code{trace=TRUE} to see where it was going. Switching to \code{method="Nelder-Mead"} helped stabilize the calculation, but it failed to converge until I increased the number of iterations to 3000 (\code{control=list(maxit=3000)}), and even then it got stuck on a solution that was worse than the previous model. (In this case, since all I am doing is reparameterizing the previous model, \code{mle2} ought to be able to achieve an equally good fit.) I then went back to BFGS and tried changing the size of the finite difference interval both down (\code{control=list(ndeps=rep(1e-4,9))}) and up (\code{control=list(ndeps=rep(1e-2,9))}), neither of which helped. I finally got the model to fit as well as the previous parameterization by switching to L-BFGS-B and setting the parameter boundaries to disallow ridiculous fits. %% tried: method="Nelder-Mead", control=list(maxit=3000), ndeps=rep(1e-4,9), %% ndeps=rep(1e-2,9), checking initial parameters, trace=TRUE,skip.hessian=TRUE %% move to supplement/online? % I have to reconstruct the estimated $p$ % values for each species by taking the % baseline value (for species~1) and % adding the estimated differences between % species, then logistic (inverse-logit) % transforming with \code{plogis}: <>= lprob0 = coef(SP.bb.probsp)["lprob0"] lprobdiff = lprob0+coef(SP.bb.probsp)[2:nsp] probvals = c(lprob0,lprob0+lprobdiff) predprob = plogis(probvals) @ % To calculate the confidence intervals, % I compute the standard deviations % of the removal probabilities. % Since the removal probability % for species $i$ is parameterized % as $p_i=p_0+\Delta p_i$, % the standard error % $\sigma_{p_i} = \sqrt{\sigma^2_{p_i}} = % \sqrt{\sigma^2_{p_0}+\sigma^2_{\Delta p_i}}$. <>= sdvals = coef(summary(SP.bb.probsp))[,"Std. Error"] sdvals0 = sdvals["lprob0"] sdvalsdiff = sqrt(sdvals[1]^2+sdvals[2:nsp]^2) sdvals = c(sdvals0,sdvalsdiff) @ % Finally, I calculate the (approximate) % confidence intervals on the basis % of $\mu \pm 1.96 \sigma$ and logistic transform: <>= ci.lo = plogis(probvals-1.96*sdvals) ci.hi = plogis(probvals+1.96*sdvals) @ <<>>= predprob = plogis(coef(SP.bb.probsp0))[1:8] SP.bb.ci = plogis(confint(SP.bb.probsp0,method="quad"))[1:8,] @ \begin{figure} <>= op = par(cex=1.5,mar=c(4,4,0,1)+0.1,bty="l",mgp=c(2.5,0.75,0),las=1) data(SeedPred_mass) plotCI(SeedPred_mass,predprob,li=SP.bb.ci[,1],ui=SP.bb.ci[,2],pch=NA,gap=0.012, xlab="Seed mass",ylab="Removal probability (p)") textrec(SeedPred_mass,predprob,levels(SeedPred$species),cex=0.75, bg="white",expand=1.2) text(SeedPred_mass,predprob,levels(SeedPred$species),cex=0.75) box() par(op) @ \caption{Removal probability parameter ($p$) as a function of seed mass: error bars show quadratic confidence intervals.} \label{fig:seedmass} \end{figure} Figure~\ref{fig:seedmass} shows the results: rather than the possible trend towards higher seed removal for larger seeds that I might have expected, the figure shows slightly elevated removal rates for the two smallest-seeded species (explained by Duncan and Duncan as a possible artifact of small seeds being washed out of the trays by rainfall), and a hugely elevated rate for species \code{abz}; in this case, I would want to go back and see if there was something special about this species' characteristics or the way it was handled in the experiment. \subsubsection{Is there a species-distance interaction?} The initial scan of the data suggested that some species might be more sensitive to the distance from the edge; this possibility is certainly biologically sensible (some species might be taken by specialized seed predators that have more restricted movement), and it is the kind of information that could easily be masked by looking at aggregated data. <>= NLL.bb.probsptrans = function(p) { nsp = 8 lprob0 = p[1] lprobdiff = p[2:nsp] lprobtr = p[nsp+1] lprobtrdiff = p[(nsp+2):(2*nsp)] lprob = c(lprob0,lprob0+lprobdiff, lprob0+lprobtr,lprob0+lprobtr+lprobdiff+lprobtrdiff) probvec = plogis(lprob) ltheta0 = p[2*nsp+1] prob = probvec[interaction(SeedPred$species,SeedPred$dist)] theta = exp(ltheta0) -sum(dbetabinom(taken,size=available,prob=prob,theta=theta,log=TRUE)) } parnames(NLL.bb.probsptrans) = c(parnames(NLL.bb.sp)[1:nsp], paste("tr",parnames(NLL.bb.sp)[1:nsp],sep=""), "ltheta") svec.probtr = c(svec[1:nsp],rep(0,nsp),svec[nsp+1]) names(svec.probtr) = parnames(NLL.bb.probsptrans) SP.bb.probsptrans = mle2(NLL.bb.probsptrans,start=svec.probtr,vecpar=TRUE) @ Using the formula interface, we can simply say \code{lprob\~{}species*dist} to allow for such an interaction: if you need to code such a model by hand, \code{interaction(f1,f2)} will create a factor that represents the interaction of factors \code{f1} and \code{f2}. <>= SP.bb.probspdist = mle2(taken~dbetabinom(plogis(lprob),exp(ltheta),size=available), parameters=list(lprob~species*dist),start=startvals, method="L-BFGS-B",lower=rep(-10,9),upper=rep(5,9)) @ I had to restrict the upper bounds still further, to 5, to make L-BFGS-B happy, since values of 10 gave \code{NaN} results for some parameter combinations. A likelihood ratio test (\code{anova(SP.bb.probsp,SP.bb.probspdist)}) gives a $p$-value of \Sexpr{round(anova(SP.bb.probsp,SP.bb.probspdist)[2,5],3)}; AIC says that the model without distance $\times$ species interaction is best, but only by a little bit: <<>>= AICtab(SP.bb,SP.bb.probsp,SP.bb.probspdist,SP.bb.sp,SP.bb.probdist,SP.bb.dist, weights=TRUE,sort=TRUE) @ \subsubsection{Other issues: time} A minor issue that I have largely neglected so far is that the intervals between observations varied between 3 and 14~days. To account for these differences in exposure time, I could use a model like $p=1-e^{-r (\Delta t)}$, which assumes that seeds are taken at a constant rate $r$. Do the predictions improve, or the conclusions change, if I account for the time interval allowed for removal? Before going to the trouble of building a model, let's look at the data again. Calculate the mean and standard error of the proportion taken, using \code{tapply} to calculate means and standard deviations of proportions divided up by the time interval (\code{tint}); then use \code{table} to calculate the number of observations for each time interval and divide by $\sqrt{n}$ to convert standard deviations to standard errors. <>= mean.prop.taken = tapply(taken/available,tint,mean,na.rm=TRUE) sd.prop.taken = tapply(taken/available,tint,sd,na.rm=TRUE) n.tint = table(tint) se.prop.taken = sd.prop.taken/sqrt(n.tint) @ Figure~\ref{fig:tint}a is a surprise: the model $p=1-e^{-r(\Delta t)}$ suggests the proportion taken should increase rather than decrease with $\Delta t$. What's going on? Figure~\ref{fig:tint}b, which plots the time interval between observations against date, gives the answer: the short-interval (3--4~day) observations were mostly made before May, when the removal rate was high, while the longest intervals between observations (10~days) are in September. \begin{figure} <>= par(mfrow=c(1,2)) op = par(cex=1.5, mar=c(4,4,0,1)+0.1,bty="l",mgp=c(2.5,0.75,0), bty="l",las=1) nint = table(tint) mtaken = tapply(taken/available,tint,mean,na.rm=TRUE) setaken = tapply(taken/available,tint,sd,na.rm=TRUE)/sqrt(nint) par(mfrow=c(1,2)) plotCI(3:10,mtaken,setaken,xlab="Interval (days)",ylab="Proportion taken") corner.label2("a","topleft",inset=0.025) plot(date,tint,xlab="Date",ylab="Interval (days)") corner.label2("b","topleft",inset=0.025) par(op) @ \caption{Relationships between proportion removed and time interval ($\Delta t$), and between $\Delta t$ and date. \R\ command: \code{plotCI(3:10,mean.prop.taken,se.prop.taken)}.} \label{fig:tint} \end{figure} Which brings us to the issue of temporal variation: we already know from Figure~\ref{fig:seedtime} in Chapter 2 that the removal rate decreases over time. Figure~\ref{fig:seeddate} shows the relationship between proportion removed and date, calculated in the same way as the removal--$\Delta t$ relationship. Removal appears to decrease exponentially with time. Replotting the data with a logarithmic $y$ scale suggests that the removal rate might level off above zero, but it's hard to tell. Similarly, it's hard to know what causes the anomalously low proportions for some sampling dates throughout the study and the anomalously high proportions at the very end of the study. Nevertheless, we can add a parameter to the model allowing for exponential decrease in removal rate over time: <>= NLL.bb.probspdate = function(p) { nsp = 8 lprob0 = p[1] lprobdiff = p[2:nsp] lprob = c(lprob0,lprob0+lprobdiff) probvec = plogis(lprob) ltheta0 = p[nsp+1] date = p[nsp+2] prob = probvec[SeedPred$species]*exp(-SeedPred$tcum*date) theta = exp(ltheta0) -sum(dbetabinom(taken,size=available,prob=prob,theta=theta,log=TRUE)) } parnames(NLL.bb.probspdate) = c(parnames(NLL.bb.sp)[1:nsp],"ltheta","date") svec.probspdate = c(svec[1:(nsp+1)],0.001) names(svec.probspdate) = parnames(NLL.bb.probspdate) SP.bb.probspdate = mle2(NLL.bb.probspdate,start=svec.probspdate,vecpar=TRUE) @ <>= SP.bb.probspdate = mle2(taken~dbetabinom(plogis(lprob)*exp(-tcum*date), exp(ltheta),size=available), parameters=list(lprob~species), start=c(startvals,date=0), method="L-BFGS-B",lower=c(rep(-10,9),0),upper=c(rep(5,9),2)) @ The model incorporating date is \Sexpr{round(c(logLik(SP.bb.probspdate)-logLik(SP.bb.probsp)),1)} log-likelihood units better --- the model should definitely include the effect of date. <>= SP.bb.probdate = mle2(taken~dbetabinom(plogis(lprob)*exp(-tcum*date), size=available,exp(ltheta)), start=c(startvals,date=0), method="L-BFGS-B",lower=c(rep(-10,2),0),upper=c(rep(5,2),2)) @ \begin{figure} <>= op = par(cex=1.5,mar=c(4,4,0,1)+0.1,bty="l",mgp=c(2,0.75,0),las=1) dates = unique(date) ndate = table(date) mtaken = tapply(taken/available,date,mean,na.rm=TRUE) sdtaken = tapply(taken/available,date,sd,na.rm=TRUE)/sqrt(ndate) plotCI(dates,mtaken,sdtaken,xlab="Date",ylab="",ylim=c(0.001,0.4)) mtext(side=2,"Proportion taken",line=2.5,cex=1.5,las=0) t0 = as.numeric(SeedPred$date[1]) p0 = plogis(coef(SP.bb.probdate)["lprob"]) dpar = coef(SP.bb.probdate)["date"] curve(p0*exp(-dpar*(x-t0)),add=TRUE) par(op) @ \caption{Proportion taken as a function of date. The line shows fitted exponential dependence ($p=\Sexpr{round(p0,2)} \times e^{-\Sexpr{round(dpar,3)} t}$), based on a fitted model that lumps all the species together.} \label{fig:seeddate} \end{figure} We have gotten a lot of mileage from these data, but as always there are more questions we could ask: do the removal rates of different species drop off at different rates? Can we figure out what causes the anomalous samples in Figure~\ref{fig:seeddate}? Once we have split the data according to these criteria, does the original distribution simplify to something simpler? <>= detach(SeedPred) @