\documentclass{article}
\title{Generalized linear models}
\usepackage{hyperref}
\usepackage{sober}
\usepackage{url}
\usepackage[english]{babel}  %% to work around texi2dvi ~ bug
\usepackage[utf8]{inputenc}  %% to make fancy quotes etc. work
\usepackage{fancyvrb}        %% for \verb, chunks in footnotes
\VerbatimFootnotes
\usepackage{amsmath}         %% boldmath etc.
\usepackage{natbib}          %% references
\bibliographystyle{chicago}
\title{Generalized linear models for disease ecologists}
\author{Ben Bolker}
\date{\today}
\newcommand{\code}[1]{{\tt #1}}
\newcommand{\fixme}[1]{{\textbf{FIXME: #1}}}
\newcommand{\ex}[1]{{\textbf{Exercise: }}}
\newcommand{\pkglink}[1]{\href{http://cran.r-project.org/web/packages/#1}{\nolinkurl{#1}}}
\newcommand{\rflink}[1]{\href{https://r-forge.r-project.org/projects/#1/}{\nolinkurl{#1}}}
\begin{document}
\maketitle

\includegraphics[width=2.64cm,height=0.93cm]{cc-attrib-nc.png}
\begin{minipage}[b]{3in}
{\tiny Licensed under the Creative Commons 
  attribution-noncommercial license
(\url{http://creativecommons.org/licenses/by-nc/3.0/}).
Please share \& remix noncommercially,
mentioning its origin.}
\end{minipage}

%% Sweave/knitr options
%% default size as created by R, and output size, center alignment
%% leave code alignment/breaks as they are
\SweaveOpts{
  fig.width=4,fig.height=4,out.width="\\textwidth",
  fig.align="center",tidy=FALSE,echo=TRUE,message=FALSE,warning=FALSE}
<<setup,echo=FALSE,message=FALSE>>=
## stuff for making pretty PDFs -- **ignore** if running
##  interactively ...
if (require(knitr)) {
    ## 
    require(highlight)
    knit_hooks$set(
                   fig=function(before, options, envir) {
                       if (before) {  ## set base-graphics options
                           par(bty="l",las=1)
                       } else { }
                   },
                   ## from Yihui Xie: https://gist.github.com/1805862
                   ##  allow printing of external code chunks, highlighted, with line numbers
                   printfun <- function(before, options, envir) {
                       if (before) return()
                       txt <- capture.output(dump(options$printfun, ''))
                       ## reformat if tidy=TRUE
                       if (options$tidy) txt = formatR::tidy.source(text = txt, output = FALSE)$text.tidy
                       con = textConnection(txt)
                       on.exit(close(con))
                       out = capture.output({
                           highlight::highlight(con, renderer = highlight::renderer_latex(document= FALSE),
                                                showPrompts = options$prompt, size = options$size,show_line_numbers=TRUE)
                       })
                       paste(out, collapse = '\n')
                   })
    ## set null device so graphics device doesn't keep popping up on the
    ##  screen when knitting
    olddev <- options(device = function(...) {
        .Call("R_GD_nullDevice", PACKAGE = "grDevices")
    }) 
}
library(ggplot2)
theme_update(theme_bw())
library(grid)
zmargin <- opts(panel.margin=unit(0,"lines"))
@ 


\section{Introduction}

You will need the \code{bbmle}, \code{ggplot2}, and
\code{coefplot2} packages installed,
and optionally \code{glmmML}, \code{lme4}, and
\code{glmmADMB}. \code{coefplot2} and \code{glmmADMB} need to
be installed via
<<coefplot2install,eval=FALSE>>=
install.packages("coefplot2",repos="http://r-forge.r-project.org")
install.packages("glmmADMB",repos="http://r-forge.r-project.org")
@ 
\paragraph{Why linear models?}

Stats 101: everything is normally distributed (and the residuals
have constant variance).
If the response variable depends on a \emph{continuous} (as opposed to
\emph{categorical}) predictor variable, then the relationship is linear.

Linear models are:
\begin{itemize}
\item Fast and numerically stable (works for huge and/or wonky data sets)
\item No need for starting values
\item Easy to account for \emph{random effects} (experimental blocks etc.)
\item Lots of data (especially in economics, business, etc.) is reasonably normal
\item Transformations can often fix problems with heteroscedasticity (non-constant variance)/non-normality/non-linearity
\item Assuming normality (equivalently \emph{least-squares} solutions) is \\
  usually OK \emph{asymptotically} (large data sets)
\end{itemize}
 
\paragraph{Why not?}
\begin{itemize}
\item More specific models are more efficient/powerful
\item Would like a model that reflects the data better
\item Some data (discrete, zero-rich) are resistant to transformation
\item Understanding \emph{effect sizes} on transformed scales is hard
\item Linear relationships often force out-of-bounds predictions (proportions outside $\{0,1\}$; negative counts)
\item If you need arguments in favor of GLMs: \cite{ohara_not_2010,warton_arcsine_2011}
\end{itemize}
  
Generalized linear models (GLMs) allow (some) non-normal data (e.g. Poisson, binomial) and (some) nonlinear relationships (e.g. exponential, logistic curves).  Retain advantages of linear models (fast, stable, usually don't need starting values \ldots)

\paragraph{Basic definition}
Need to specify
\begin{itemize}
\item distribution (\emph{family}) (e.g. Poisson=count data, binomial=proportion [count!] data) 
\item \emph{link function} (e.g. log, logit): linearizing transformation (don't actually transform data)
\item response variable and \emph{continuous} and \emph{categorical} predictors
  (R formulas: \verb+r~x+, \verb!r~x+y! (additive model), \verb!r~x*y! (interaction))
\end{itemize}

Model is linear on scale of link function.

Almost all GLMs are logistic regressions.

<<scrape1,eval=FALSE,echo=FALSE>>=
library(stringr)
sscrape <- function(string="logistic+regression") {
    sstring0 <- "http://scholar.google.ca/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=STRING&as_oq=&as_eq=&as_occt=any&as_sauthors=&as_publication=&as_ylo=&as_yhi=&as_sdt=1.&as_sdtp=on&as_sdts=5&hl=en"
    sstring <- sub("STRING",string,sstring0)
    rr <- readLines(url(sstring))
    ## rr2 <- rr[grep("[Rr]esults",rr)[1]]
    rr2 <- rr
    rstr <- gsub(",","",
                 gsub("</b>.+$","",
                      gsub("^.+[Rr]esults.+of about <b>","",rr2)))
    rstr <- str_extract(rr2,"About [0-9,]+ results")
    as.numeric()))
}
@ 

<<gscrapedata,echo=FALSE>>=
fn <- "gscrape.RData"
## could use a caching solution for Sweave (cacheSweave, weaver package,
##  pgfSweave ... but they're all slightly wonky with keep.source at
##  the moment
if (!file.exists(fn)) {
  gscrape <- sapply(c("generalized+linear+model",
                      "logistic+regression","Poisson+regression","binomial+regression"),sscrape)
  save("gscrape",file=fn)
} else load(fn)
@       
These data were scraped from Google Scholar hits on the relevant search terms.

<<scrapepix,echo=FALSE,fig.height=3,fig.width=6>>=
d <- data.frame(n=names(gscrape),v=gscrape)
d$n <- reorder(d$n,d$v)
ggplot(d,aes(x=v,y=n))+geom_point(size=5)+
    xlim(0.5e4,2e6)+
    scale_x_log10(limits=c(1e4,2e6))+
    geom_text(aes(label=v),colour="red",vjust=2)+
    labs(y="",x="Google Scholar hits")
@ 

\begin{itemize}
\item logistic regression: \emph{logit} link (logistic inverse-link)
\item Poisson regression: \emph{log} link (exponential inverse-link)
\end{itemize}

\section{Logistic regression}

\subsection{Preliminaries}

Binary (or maybe binomial) data.  Logit link (others are possible
but usually impossible to distinguish based on data).

Read in the data and look at it:
<<getgopherdat,fig.height=3>>=
dat <- read.table("gophertortoise.txt",header=TRUE)
dat2 <- droplevels(subset(dat,Sex %in% c("M","F"))) 
(gplot1 <- ggplot(dat2,aes(size,status,colour=Sex))+
 stat_sum(alpha=0.5))
@ 

Visualizing binary data is tricky --- here's one way:
<<eval=FALSE>>=
dat2$sizecat <- cut(dat2$size,breaks=c(seq(200,280,by=20),320))
proptab <- ddply(dat2,c("sizecat","Sex"),
      function(x) {
          data.frame(n=nrow(x),
                     size=mean(x$size),
                     status=mean(x$status))
      })
gplot1 + geom_point(data=proptab,aes(size=n),shape=2)
@ 
\subsection{Picture and basic fit}
We can quickly get
\code{ggplot} to add a GLM fit to the \code{gplot1} graph by
adding \verb+geom_smooth(method="glm",family=binomial)+
(try it!).  However, this is only convenient for looking at pictures
(not for testing hypotheses, finding good predictive models, etc.).

Try a basic \code{glm} fit:
<<>>=
(mod1 <- glm(status~size*Sex,family=binomial,data=dat2))
@ 

\subsection{Single-model methods}

What does R know how to do with \code{mod1}?  Try this:
<<results="hide">>=
class(mod1)
methods(class="glm")
@ 

<<eval=FALSE>>=
coef(mod1)           ## coefficients
summary(mod1)        ## various summary info
coef(summary(mod1))  ## just the coefficient table with p-values etc.
confint(mod1)        ## profile confidence intervals
fitted(mod1)         ## fitted values
predict(mod1)        ## predictions ON LINEAR PREDICTOR scale
## ... on response scale (probabilities)
predict(mod1,type="response")  
pframe <- data.frame(size=250,Sex="M")
predict(mod1,newdata=pframe)   ## predictions for new data
## simulated data from the fitted model
simulate(mod1)                 
residuals(mod1)                ## residuals(deviance)
plot(mod1)           ## these are terrible!
AIC(mod1)
deviance(mod1)       ## -2 log L
logLik(mod1)
## SEQUENTIAL analysis of deviance
anova(mod1,test="Chisq")            
drop1(mod1,test="Chisq")            ## drop single terms and test
## marginal tests (danger Will Robinson!)
drop1(mod1,test="Chisq",scope=.~.)  
@ 

Notes and pitfalls:
\begin{itemize}
\item the $p$-values given by \code{summary} etc. are approximate (Wald tests), those
  from \code{drop1}/\code{anova} (likelihood ratio tests) 
  are better (although still approximate)
\item \code{predict} gives predictions on the linear predictor (logit)
  scale by default (not probabilities): use \code{type="response"} for
  probabilities
\item diagnosing lack of fit etc. is hard for binary data.
  One possibility is to compare the GLM fit with a non-parametric
  fit, as follows:
<<fig.width=4,fig.height=3>>=
library(scales)
gplot1+geom_smooth(method="glm",family="binomial")+
    geom_smooth(linetype=2,fill="purple",alpha=0.1)+
    scale_y_continuous(limits=c(-0.05,1.05),oob=squish)
@ 
(The standard goodness-of-fit test for logistic regression,
Hosmer-Lemeshow, has some problems \citep{hosmer_comparison_1997};
an improved version is available by using \code{lrm}
in the \pkglink{rms} package and using \code{resid(f, 'gof')})
\item \code{anova} actually gives an analysis of \emph{deviance},
  and it is \textbf{sequential} (``type I'' in SAS language)
\item \verb+drop1(glm_fit,scope=.~.)+ does a marginal (``type III'')
  analysis, which has its own issues when there are interactions
  in the model (you need to be \emph{very} careful interpreting the
  meaning of the main effects)
\end{itemize}


\subsection{Interpreting parameters}

What do the parameters mean???

People like me are always complaining that researchers should
consider \emph{effect size}, the biological significance
of the parameters, not just the statistical significance.
In order to do this, you need to understand what the parameters
mean.

There are two hard parts of interpreting GLM parameters:
(1) contrasts (how R parameterizes the differences between groups) 
(this issue is general to all modeling in R)
and (2) the log-odds scale (this is specific to logistic regression).

<<>>=
coef(mod1)
@ 
\begin{description}
\item[intercept] ($\approx -10$): the logit probability
  (log-odds) of seropositivity for a female with size$=0$ (i.e.,
  log-odds in the \emph{baseline} condition)
\item[size] ($\approx 0.05$): increase in log-odds per unit
  (mm) of size, \emph{for females} (baseline)
\item[SexM] ($\approx 6.6$): difference between females and males
  at size 0
\item[size:sexM] ($\approx -0.02$): difference between male
  and female slope, on the log-odds scale
\end{description}

<<>>=
 ## probability of infection of a female, size=0
plogis(-10.7)
## probability of infection of a female, size=100
plogis(-10.7+0.048*100)
## probability of infection of a male, size=0
plogis(-10.7+6.6) 
## probability of infection of a male, size=100
plogis(-10.7+6.6+(0.048-0.023)*100) 
@ 

<<csizemodel>>=
dat2$csize <- dat2$size-250
mod1c <- update(mod1,.~csize*Sex)
@ 
\ex{} Confirm for yourself that the \code{Intercept} and \code{SexM} terms have changed,
but not the slopes.

Sometimes it's more useful to scale the continuous predictors
as well; one easy procedure is to center (subtract the mean)
and divide by the standard deviation of the predictor, which
puts all of the predictors on a similar scale
\citep{schielzeth_simple_2010}. (The \code{scale} function
in R also does this, but with some side effects we don't want.)
<<>>=
dat2$scsize <- with(dat2,(size-mean(size))/sd(size))
mod1sc <- update(mod1,.~scsize*Sex)
@ 

Which of the following plots is more useful?
<<fig.height=3,fig.width=8,out.width="\\textwidth",fig.keep="last">>=
library(coefplot2) 
par(mfrow=c(1,2))
coefplot2(mod1)
coefplot2(mod1sc)
@ 

\ex{1}. Change the baseline level from females to males
(the default is alphabetical) by using 
<<eval=FALSE>>=
dat2$Sex <- relevel(dat2$Sex,"M")
@ 
Refit either the scaled or the unscaled model and convince yourself
that you understand how the parameters have changed (and that the
overall meaning of the model has not changed).

\subsection{Comparing models}

Most GLM inference etc. is based on comparing \emph{multiple}
models rather than a single model (the likelihood ratio test.
Allow for the interaction of age $\times$ sex:
<<>>=
mod2sc <- update(mod1sc,.~.-scsize:Sex)
mod3sc <- update(mod1sc,.~scsize)
mod4sc <- update(mod1sc,.~Sex)
mod5sc <- update(mod1sc,.~1)
anova(mod1sc,mod2sc,mod3sc,mod5sc,test="Chisq")
@ 
(the $p$-values are the tests of each model against the next,
i.e. significance tests of the parameters that are being dropped)

Or:
<<>>=
library(bbmle)
AICtab(mod1sc,mod2sc,mod3sc,mod4sc,weights=TRUE)
@ 

Or:
<<fig.width=8,fig.height=4>>=
coefplot2(list(mod1sc,mod2sc,mod3sc,mod4sc),
          legend=TRUE,legend.x="topright",
          col=c(1,2,4,5))
@ 
(I should probably have named these models more informatively!)

Model comparison:
<<echo=FALSE,fig.width=8>>=
pframe <- expand.grid(scsize=seq(min(dat2$scsize),max(dat2$scsize),length=101),
                      Sex=levels(dat2$Sex))
pframe$size <- pframe$scsize*sd(dat2$size)+mean(dat2$size)
pframe$mod1sc <- predict(mod1sc,newdata=pframe,type="response")
pframe$mod2sc <- predict(mod2sc,newdata=pframe,type="response")
pframe$mod3sc <- predict(mod3sc,newdata=pframe,type="response")
pframe$mod4sc <- predict(mod4sc,newdata=pframe,type="response")
pframe$mod5sc <- predict(mod5sc,newdata=pframe,type="response")
library(reshape2)
pframe2 <- melt(pframe,id.var=1:3,value.name="status")
ggplot(dat2,aes(x=size,y=status,colour=Sex))+stat_sum(alpha=0.5,aes(size=..n..),
                             show_guide=FALSE)+
    geom_line(data=pframe2)+facet_grid(.~variable)+zmargin+
    scale_x_continuous(breaks=c(220,280))
@ 

\ex{}: convince yourself that the overall fit of the model (based on
\code{logLik}) and the likelihood ratio test of the full model (comparing
the full model with the null/constant model) do not change when you 
change the parameterization (either by scaling size, or by changing
the baseline level).

The logit/log-odds scale is initially hard to understand, but
it has really useful properties for estimating changes in risk.
The hardest part is that the effects of changes in a predictor
on probability depending on the baseline risk.

If you need to, you can use the \code{dredge} function from
the \pkglink{MuMIn} package to fit all subsets of a model \ldots

Some rules of thumb:
\begin{itemize}
\item When the starting probability is very low, the logistic curve
  is approximately exponential, so parameters approximately
  describe proportional changes
  (e.g. parameter of 0.1 $\approx$ 10\% (\emph{relative}) increase
in probability per unit change)
\item when the starting probability is intermediate (say 0.3-0.7), the
\emph{absolute} change in probability per unit change is $r/4$ ($\to$ parameter of
0.1 implies 0.025 increase in probability per unit change)
\item when the starting probability is high the change in complementary
  risk (probability of the event \emph{not} happening) changes proportionally
\end{itemize}

\subsection{Binomial regression}
If you have $N>1$ per category,
you can either specify the results 
\begin{itemize}
\item as $(k,N-k)$ (e.g.
  \verb!cbind(num_dead,num_alive)~x+y+z!); you would often compute this
  on the fly, as \verb!cbind(num_dead,num_total-num_dead)~x+y+z!).
\item with the \code{weights} specification, e.g.
  \verb!prop~x+y+z,weights=num_total! or
  \verb!num_dead/num_tot~x+y+z,weights=num_total!
\end{itemize}


\ex{}:  aggregate the data by size class.
<<>>=
library(plyr)
sizetab <- ddply(dat2,c("size","Sex"),
                function(x) c(tot=nrow(x),pos=sum(x$status)))
sizetab$prop <- sizetab$pos/sizetab$tot
@ 

use \code{sizetab} to run the analysis above as a binomial model.
The parameters should stay the same, and the \emph{differences}
in log-likelihood, deviance, and AIC among models should be the
same, but the baseline values of log-likelihood etc. will differ.

\section{Poisson/negative binomial regression}

Gopher tortoise shell data.
<<>>=
load("gopherdat2.RData")
head(Gdat)
@ 

<<shellplot,fig.width=6>>=
ggplot(Gdat,aes(x=prev,y=shells/(Area*density),
                colour=factor(year),group=Site))+
    geom_text(aes(label=Site))+geom_line(colour="gray")
@ 

Fit a Poisson model (log link), with an \emph{offset} of 
area times density.  An offset adds a term to the linear predictor,
without fitting the parameters: if $P$=prevalence, $a$=area,
$d$=density
\begin{equation*}
  \begin{split}
\log(\lambda) & = \beta_0+\beta_1 P + \log(a d) \\
\log(\lambda)- \log(ad) & = \beta_0+\beta_1 P \\
\log\left(\frac{\lambda}{ad}\right) & = \beta_0+\beta_1 P
\end{split}
\end{equation*}

<<>>=
m_pois_prev <- glm(shells~prev+offset(log(Area*density)),data=Gdat)
@ 

A Poisson model assumes variance=mean, which is usually wrong.
There is often \emph{overdispersion}: we can test for it by
looking at the sum of squared Pearson residuals ($\sum (\mbox{expected}_i-\mbox{obs}_i)^2/\mbox{var}_i$, where $\mbox{var}_i$ is the expected variance of point $i$ based on the model), which
should be (\emph{approximately}) $\sim \chi^2_{n-p}$ if the data are really Poisson:
<<check_overdisp>>=
devsq <- sum(residuals(m_pois_prev,type="pearson")^2)
devsq/m_pois_prev$df.resid  ## ratio should be approx. 1
pchisq(devsq,df=m_pois_prev$df.resid,lower.tail=FALSE) 
@ 

There are (at least) two ways to account for the overdispersion,
by fitting a quasi-likelihood model
<<>>=
library(MASS)
m_quasi_prev <- update(m_pois_prev,family="quasipoisson")
m_nb_prev <- glm.nb(shells~prev+offset(log(Area*density)),data=Gdat)
@ 
\ex{} compare these three models (Poisson, quasi-Poisson, negative binomial)
looking at \code{coef(summary(fit))} and at 
\code{coefplot2(list(model1,model2,model3), xlim=c(-0.01,0.08))}.
What do you conclude about the coefficient estimates?

The \pkglink{aod} package incorporates a wider spectrum of methods and tests
for dealing with overdispersion (which, oddly, is more widely recognized
in Poisson than in binomial models).

\section{Intermediate topics}

\subsection{Offsets}
Offsets add known components to the model (as shown above).
They're most commonly used to account for unequal sampling areas/times,
in cases where we expect results to be \emph{strictly proportional}
to the offset (time, area sampled).  This is often a way to handle
ratios (e.g. sibling negotiations per chick in a study of begging
behavior by owlets) without losing the discrete nature of the response.

Can also be used in tricky ways, e.g. to fit the Ricker model.

Suppose we think $N(t+1)=a N(t) e^{-b N(t)}$.
On the log scale this 
is
$$ 
\log N(t+1) = \log a + \log N(t) - b N(t)
$$
This is a linear equation in $N(t)$ plus an offset of $\log N(t)$.
In other words, if we use a log link
then we can fit \verb!N ~ Nprev + offset(log(Nprev))!
(The intercept term $\log a$ and the slope $-b$ are implicit
in the R formula.)
\ex{} What model would we fitting if we included $\log N$ as
a variable rather than an offset?

\subsection{Alternative link functions}

Don't have to use the standard link functions.
We could try to use them to get a slightly better fit to 
the shape of the nonlinearity, but I use them more often
as a slightly tricky way to fit more \emph{mechanistic} models
(see e.g. \cite{strong_model_1999}).

Warning: you're more likely to run into convergence problems,
be asked to specify starting values, etc. when using non-standard
link functions.

\begin{itemize}
  \item Holling type~II functional response via the \emph{inverse} link:
    if number eaten is $N_e=aN/(1+ahN)$, then risk of being eaten is
    $p=N_e/N= a/(1+ahN)$. If we use the inverse link then we are fitting
    $(1/p)=1/a+(1/h)\cdot N$ --- this linear in $N$ \ldots  (we can even fit
    Holling type~III responses by doing a bit more algebra and including $1/N$
    as a predictor)
%% p = (aN^2/(1+ahN^2))/N = aN/(1+ahN^2)
%% 1/p     =(1/h)N + (1/a)(1/N)
  \item Chain binomial: suppose the probability of infection if $p=1-\exp(-\beta I_t)$.
    Then $1-p=\exp(-\beta I_t)$, or in other words the log of the probability of
    \emph{not} being infected is $-\beta I_t$.
    If we have chain-binomial data, then, we can fit
<<eval=FALSE>>=
glm(prob_not_inf ~ previous_inf-1, weights=previous_susc, family=binomial(link="log"))
@ 
<<>>=
load("cbsim.RData")
cbsim$Sprev <- c(NA,cbsim$S[-nrow(cbsim)])
cbsim$Iprev <- c(NA,cbsim$I[-nrow(cbsim)])
cbsim$pnotinf <- cbsim$S/cbsim$Sprev
cbplot <- ggplot(cbsim,aes(pnotinf,Iprev))+geom_point()
fit1 <- glm(pnotinf ~ Iprev -1 , family=binomial(link="log"), 
    weights=Sprev,data=cbsim)
confint(fit1)
## cbplot+geom_smooth(method="glm",family=binomial(link="log"),
##                   formula=~x-1)
pp <- profile(fit1)
likdat <- data.frame(likdev=pp$Iprev$z^2,Iprev=pp$Iprev$par.vals[,"Iprev"])
ggplot(likdat,aes(x=Iprev,y=likdev))+geom_line()
@      
<<>>=
## chain-binomial data from Becker:
cbdat <- data.frame(chains=c(1,11,111,12,1111,112,121,13,11111,
                    1112,1121,113,1211,122,131,14),
                    n=c(423,131,36,24,14,8,11,3,4,2,2,2,3,1,0,0))  ## total 664
## Greenwood, no slope; Reed-Frost, no intercept
## log(m_{ij} q_{ij})= log(m_{ij})+alpha_i+beta_i j
@     
This is discussed by \cite{becker_analysis_1989}; you can see some of it on Google books at \url{http://tinyurl.com/chainbinom}
\item Another useful case is the \emph{complementary log-log} (\code{cloglog}) link with
  binomial data.  Suppose we measure the amount of mortality over
  differing exposure periods of length $\Delta t$.  If we use the
  \code{cloglog} link with an offset of $\log(\Delta t)$ we get the
  right behavior.

  The \code{cloglog} function is $C(x)=\log(-\log(1-\mu))$, its inverse is
  $C^{-1}(x) = 1-\exp(-\exp(x))$.
  Thus if we expect mortality $\mu$ over a period $\Delta t=1$ and
  the linear predictor $\eta=C^{-1}(\mu)$ then
  $C^{-1}(\eta+\log \Delta t)=(1-\exp(-\exp(\eta) \cdot \Delta t))$, which is
  what we want.
\end{itemize}
\subsection{Bias-reduced GLM}

\emph{Separation of variables} 
refers to the situation where some threshold value of predictor(s)
can perfectly separate the 0 and 1 responses --- in this case
the maximum likelihood solution doesn't exist and \code{glm} breaks.

The \pkglink{brglm} (bias-reduced GLM) and \pkglink{logistf} (Firth logistic)
packages implement a specific solution
to this problem; more generally, it tends to provide more reliable
results for very small data sets (people tend to forget that GLM
relies, more heavily than regular linear models, on asymptotic assumptions) --- but
it currently works only for binomial models.


Another option is to use Bayesian GLMs (e.g. \code{bayesglm} from
the \pkglink{arm} package); adding even a weak Bayesian prior can stabilize the fit.


\subsection{Generalized linear mixed models}

This can be a fairly hairy topic in general (see \cite{bolker_generalized_2009},
and \url{http://glmm.wikidot.com/faq}, but at root the concepts are fairly
simple: here we add a \emph{random effect} of tortoise ID to the model.

What is a random effect?  There are actually several possible, overlapping answers:
\begin{itemize}
\item Effects that are drawn (randomly?) from a larger population of possible effects
\item Effects where we are interested in the distribution of the levels (variance among levels),
  rather than the values of specific levels
\item Effects that are ``nuisance'' aspects of the experimental design
\item Effects that we want to estimate with \emph{shrinkage}, i.e. 
  pulling poorly estimated values toward the population average
\end{itemize}
In general fitting as random effects works best when we have many levels with
small and uneven amounts of data per level; it works very poorly when there are
fewer than 4--6 levels.
Typical examples: experimental blocks (spatial or temporal); genotypes or
individuals within populations; taxa (species, genera) within higher-level taxa
(genera, families).
The interactions of fixed and random effects get treated as random;
for example, among-pond variation in trends over time (around the 
population-level average trend).

<<>>=
library(lme4)
(mod1mix <- glmer(status~scsize*Sex+(1|TortID),family=binomial,data=dat2))
library(glmmML)
(mod1mixB <- glmmML(status~scsize*Sex,cluster=TortID,family=binomial,data=dat2,
                    method="ghq"))
@ 
<<fig.width=8,fig.height=4>>=
coefplot2(list(GLM=mod1sc,GLMM=mod1mix,GLMM_AGQ=mod1mixB),
          legend=TRUE,legend.x="topright",
          col=c(1,2,4))
@ 

Things to keep in mind:
\begin{itemize}
\item getting reliable $p$ values etc. is hard, unless you have so many groups in your random effect ($>40$) that the finite-size correction doesn't matter
\item it's easy to put make a model that's too complex to fit reliably.  In the example above we have to use the more 
\item there are a variety of packages that can fit GLMMs, with overlapping capabilities (\pkglink{lme4}, \pkglink{glmmML}, \rflink{glmmADMB}, \pkglink{MCMCglmm}) --- if possible, it's good to fit difficult models with more than one package, to cross-check
\end{itemize}


\section{Top 10 GLM mistakes}
\begin{itemize}
\item applying discrete models (Poisson, binomial) to non-discrete data
\item ignoring overdispersion
\item equating negative binomial with binomial rather than Poisson
\item using GLMs where linear models will do (i.e. \code{glm} instead of \code{lm}) (harmless but annoying)
\item ignoring blocking factors (failing to use GLMMs where necessary)
\item confusion in interpreting effects
\item worrying about marginal rather than conditional distributions of data$^*$
\item applying $\pm$ standard errors 
\item using $(k,N)$ rather than $(k,N-k)$ in binomial models
\item getting confused by predictions on the linear predictor scale
\end{itemize}

\section{Topics left out}

\begin{itemize}
\item Zero-inflated/hurdle models (\pkglink{pscl}, \pkglink{tweedie}, \pkglink{glmmADMB}); 
\item generalized additive models (\pkglink{mgcv}); 
\item penalized regressions and shrinkage methods (\pkglink{glmnet}, \pkglink{penalized}); 
\item polytomous/ordinal data;
\item spatial/temporal/phylogenetic correlations; 
\item GLMs on \textbf{big} data (\pkglink{biglm}) \ldots
\end{itemize}
  
See also slides at \url{http://www.slideshare.net/bbolker/},
in particular \url{http://www.slideshare.net/bbolker/glms-and-extensions-in-r},
\url{http://www.slideshare.net/bbolker/trondheim-glmm},
\url{http://www.slideshare.net/bbolker/opensource-glmm-tools-7562082}
\bibliography{../eeid}

<<echo=FALSE>>=
options(device=olddev$device)
@ 

\end{document}