\documentclass[11pt]{article} \usepackage{hyperref} \usepackage{amsmath} %\usepackage{sober} \usepackage{times} %\usepackage{natbib} \newcommand{\code}[1]{{\tt #1}} \title{MLE basics, with pictures} \author{Ben Bolker} \date{\today} %\bibliographystyle{ESA1009} \begin{document} \maketitle \section{One-parameter likelihood model: binomial} Examples of looking at likelihoods: we'll start with a basic binomial model of a single population --- say that $k$, the number of susceptibles infected in a specified time period, out of a total of $S$ susceptible individuals. We'll suppose that (1) all susceptible individuals have the same \emph{per-individual} probability of infection $p$, and (2) all individuals are infected (or not infected) independently. Then the distribution of the number infected will be binomial: \begin{equation} k \sim \mbox{Binomial}(p,S) \end{equation} or \begin{equation} \mbox{Prob}(k|p,S) = \binom{S}{n} p^k (1-p)^{S-k}. \end{equation} Since the joint probability of independent events equals the product of their probabilities, you can think of this as (probability of $k$ independent ``successes'' (infections), each with probability $p$) $\times$ (probability of $S-k$ independent ``failures'' (non-infections), each with probability $1-p$) --- times a complicated bit at the beginning which accounts for the order of events and makes the probabilities of any possible $k$ add up to 1.0 as they should. For concreteness, let's say that $N=100$ and $k=20$. To calculate the probability of this outcome for a particular infection probability $p$, use \code{dbinom(20,size=100,prob=p)}; for example, \code{dbinom(20,size=100,prob=0)} is 0 (there is no chance of getting 20 successes if the per-trial probability is zero), while \code{dbinom(20,size=100,prob=0.2)} is \Sexpr{round(dbinom(20,size=100,prob=0.2),3)}. There are two important ways to use \code{dbinom()} to look at the distribution. In the first, we can generate a distribution of possible outcomes (from 0 to 100 in this case) and see what their probabilities are, for a fixed $p$: <<>>= kvec = 0:100 plot(kvec,dbinom(kvec,size=100,prob=0.2)) abline(v=20) @ Note that while $k=20$ is the most likely outcome (with a probability of \Sexpr{round(dbinom(20,size=100,prob=0.2),3)}, getting 19 infections is almost equally likely (\Sexpr{round(dbinom(19,size=100,prob=0.2),3)}). Extreme outcomes (say, $k<5$ or $k>40$) are very unlikely indeed. We can see just how extreme by plotting the log-probability: <<>>= plot(kvec,dbinom(kvec,size=100,prob=0.2,log=TRUE)) abline(v=20) @ (This is almost the same as \code{plot(kvec,log(dbinom(kvec,size=100,prob=0.2)))}, but slightly more accurate for small probabilities. R uses natural logarithms by default: use \code{log10} to get base-10 logarithms.) Alternatively, we could <<>>= plot(kvec,dbinom(kvec,size=100,prob=0.2),log="y") @ which plots the probabilities on a logarithmic $y$ scale rather than calculating the log-probabilities. There is at least \emph{some} probability of getting 100\% infection with these parameters, even if it is on the order of $10^{-72}$ \ldots Most maximum likelihood procedures search for the minimum negative log-likelihood, rather than the maximum log-likelihood. <<>>= plot(kvec,-dbinom(kvec,size=100,prob=0.2,log=TRUE)) abline(v=20) @ Now turn things around and suppose we have data (i.e. $k=20$), and want to calculate the \emph{likelihood curve} --- % the probability of a fixed $k$ for different values of $p$. $p$ must be between 0 and 1: <<>>= pvec = seq(0,1,by=0.01) plot(pvec,dbinom(20,size=100,prob=pvec)) abline(v=0.2) @ Now the peak is at $\hat p=0.2$ (i.e., the maximum likelihood estimate is equal to the observed proportion of successes), but once again the likelihood (probability of the data) is almost as high with nearby values (\Sexpr{round(dbinom(20,size=100,prob=0.19),3)} for $p=0.19$ or \Sexpr{round(dbinom(20,size=100,prob=0.21),3)} for $p=0.21$. Note also that the maximum likelihood is still pretty low: the probability of getting exactly the expected number of success (20 out of 100) is only \Sexpr{round(dbinom(20,size=100,prob=0.2),3)} --- precisely because you might easily get a value just below or above the expected value. You can also take a bit of a shortcut and try <<>>= curve(dbinom(20,size=100,prob=x),from=0,to=1) @ to save the trouble of setting up the vector of probabilities. (\code{curve()} is a ``magic'' function --- you \emph{must} use \code{x} to denote the $x$ variable. It doesn't work very well for functions like \code{dbinom(x,100,0.2)} which only make sense for integer values of \code{x}.) \textbf{Exercises.} 1. Try increasing or decreasing the number of trials (say $N=10$ or $N=1000$) and see what happens to the shapes of the probability density for a fixed $p$ (\code{dbinom(x,prob=0.2,N)}) or the likelihood curve (\code{dbinom(20,prob=x,N)}). Can you explain the results? 2. \code{rbinom(n,prob,size)} will pick \code{n} binomial random numbers, and \code{table()} will tabulate the number of outcomes of each $k$. Use these two functions to generate a plot of outcomes for 10,000 binomial samples each with $p=0.2$, $N=100$, and compare the results to the expected distribution. (Hint: you will either have to scale the results of your ``experiment'' from frequencies to proportions of the outcome, or multiply the probability density by the number of samples to get frequencies.) (In general, sampling random numbers is a good way to familiarize yourself with the properties of distributions.) Finding maximum likelihood estimates automatically, with \code{mle2}: <<>>= minuslogl = function(p,N,k) { -dbinom(k,prob=p,size=N,log=TRUE) } library(bbmle) fit1 = mle2(minuslogl,start=list(p=0.9),data=list(N=100,k=20)) @ This leads to warnings, but the answer seems OK. To get rid of the warnings, you could use \code{method="L-BFGS-B"}: <<>>= mle2(minuslogl,start=list(p=0.9), method="L-BFGS-B",lower=0.001,upper=0.999, data=list(N=100,k=20)) @ (I have to set the limits slightly in from 0 and 1, because R tries to evaluate the function \emph{at} the limits and fails.) Exercise: convince yourself that in this case, starting value is irrelevant (as long as it's between 0 and 1). Exercise: try finding the quadratic and profile confidence limits (using \code{confint()} or \code{confint(...,method="quad")}). Challenge: Use \code{as.data.frame(profile(fit1))} to extract a data frame from the profile (you may need to get the latest version of \code{bbmle} from the development site: \code{install.packages("bbmle",repos="http://r-forge.r-project.org")}). Then draw a graph to compare the quadratic approximation to the actual profile. (The \code{z} value in the data frame is the \emph{signed square root} of the difference from the minimum deviance; to get the negative log-likelihood, you will need to calculate $z^2/2$.) \section{Two-parameter likelihood model: binomial} Now let's make things slightly more complicated and suppose that we have multiple binomial samples (all with $N=50$) along a transect. The $x$ variables range from 0 to 1, and the $y$ values (numbers of infected individuals?) range from 0 to 50. <>= set.seed(1001) ## generate x = sort(round(runif(20),3)) prob = plogis(10*(x-0.75)) y = rbinom(20,size=50,prob=prob) curve(plogis(10*(x-0.75)),from=0,to=1,ylim=c(0,1)) points(x,y/50) @ <<>>= bdat <- data.frame(x = c(0.006, 0.056, 0.081, 0.138, 0.182, 0.267, 0.289, 0.363, 0.413, 0.419, 0.427, 0.43, 0.44, 0.443, 0.765, 0.775, 0.862, 0.881, 0.888, 0.986), y = c(0, 0, 1, 0, 0, 1, 1, 0, 4, 1, 2, 5, 5, 2, 31, 32, 42, 38, 38, 48)) plot(y~x,data=bdat) @ We'll use a \emph{logistic function} to describe the trend in the infection probability --- $p = 1/(1+e^{b*(x-a)})$. You can code this directly in R (\code{1/(1+exp(b*(x-a)))}), or you can use \code{plogis(b*(x-a))}. (A slightly more standard parameterization would be \code{plogis(a+b*x)}, but our version is easier because it separates the slope and intercept terms --- we can change the rate at which the probability goes from 0 to 1 ($b$) and the point at which it equals 0.5 ($a$) independently. A basic likelihood function: <<>>= minuslogl2 = function(a,b,N,y,x) { -sum(dbinom(y,prob=plogis(b*(x-a)),size=N,log=TRUE)) } @ Since we have multiple value we have to use \code{sum()} to combine the likelihoods (assuming as usual that they're all independent). We need starting values for $a$ and $b$. We can just see that the half-maximum is somewhere around 0.6 ($a \approx 0.6$). How about the slope? We could use <<>>= plot(qlogis(y/50)~x,data=bdat) @ to get a rough idea of the slope (\code{qlogis} is the logit transform, the inverse of the logistic function: on this scale, \code{qlogis(y/50)} should equal $b(x-a)$, or <<>>= plot(log(y/50)~x,data=bdat) @ and look at the \emph{initial} slope ($\log(p) \propto bx$ for small $x$), or look at the doubling time: \begin{equation} T_{1/2} = \log(1/2)/b \approx 0.7/b \to b \approx 0.7/T_{1/2} \end{equation} The doubling time is \emph{approximately} $\Delta x=0.2$, so we'll use $b=3.5$ to start. <<>>= fit2 = mle2(minuslogl2,start=list(a=0.6,b=3.5), data=c(list(N=50),bdat)) @ We use \code{c()} to combine the $N$ value with the $x$ and $y$ data. <<>>= confint(fit2) @ <<>>= plot(y~x,data=bdat) curve(50*plogis(coef(fit2)["b"]*(x-coef(fit2)["a"])),add=TRUE) @ Pretty nice fit! <<>>= library(emdbook) c1 = curve3d(minuslogl2(a=x,b=y,N=50,x=bdat$x,y=bdat$y), from=c(0.65,7),to=c(0.8,12), sys3d="image",col=gray((50:0)/50), xlab="a",ylab="b") contour(c1$x,c1$y,c1$z,add=TRUE) contour(c1$x,c1$y,c1$z,level=-logLik(fit2)+qchisq(0.95,2)/2, add=TRUE,col=2,labels="95%") library(ellipse) lines(ellipse(vcov(fit2),centre=coef(fit2)),col="blue") points(coef(fit2)["a"],coef(fit2)["b"],pch=16) points(0.75,10) @ I've added the true values of the parameters for which I simulated the data ($a=0.75$, $b=10$). They lie just \emph{outside} the 95\% confidence limits --- this does happen sometimes. Since we have a fairly large data set and have set up the model so that the parameters are nearly independent, the picture is very nice and elliptical. Exercise: redraw the surface with the lower $a$ limit at 0.65 instead of 0.5: on this scale, the outer contours are slightly avocado-shaped (to continue the fruit \& vegetable analogies), but not nearly as ``banana-like'' as the chain binomial example (analyzed elsewhere). Challenge: add the 99.9\% confidence region and (using the \code{level} command to the \code{ellipse} function) the quadratic approximation for it to the graph. Also try to the 50\% confidence region. How does the accuracy of the quadratic approximation change? \end{document}