Stats 756: Biostatistics

Winter/spring 2010; Ben Bolker, bolker@mcmaster.ca

I will be teaching a Biostatistics course in the spring. The course web page will be at http://www.math.mcmaster.ca/bolker/classes/s756. The time and location are Mon/Thurs 1:00-2:30, HH (Hamilton Hall) 207.

I will focus on practical (but advanced) statistical techniques useful for population-level biology, i.e. ecology, evolution, and infectious disease epidemiology.

The course will primarily use R.

Primary topics include:

- data manipulation and visualization
- review of generalized linear models, and extensions such as models of overdispersion (e.g. negative binomial, lognormal-Poisson) and zero-inflation
- mixed models: classical (review of nested/split-plot/etc.); ’modern’ linear mixed models; and generalized linear mixed models

Pair or group projects will be a large component of the class.

We will use Faraway (2006) as a textbook: I haven’t gone through it in great detail, but it covers a sensible range of topics and should be a good resource to fall back on. However, I expect the course will go well beyond this (with primary literature, scanned book chapters, and notes as additional resources).

I would like to encourage both mathematicians and statisticians (with weak or absent biological knowledge) and biologists (with solid statistical knowledge: see below) to take the course; one hoped-for benefit of the course will be learning to communicate and work across disciplinary boundaries.

- basic knowledge of R; data structures (vector, matrix, list), simple data manipulation and summaries, basic plotting. If you don’t already know R but are comfortable with programming and/or statistical computation you should be able to be pick it up quickly. There are many introductory resources on the web: http://www.math.mcmaster.ca/bolker/emdbook/lab1.pdf is a good place to start your review
- solid background in basic statistical concepts and procedures (hypothesis tests, t-test, ANOVA, regression)
- some familiarity with generalized linear models. Useful references/reminders:
- some linear algebra would be quite helpful, but I will try to help the biologists in the course understand what’s going on at a qualitative level

Having worked with real, messy data sets is not required but will be helpful.

I will make every effort to adapt the course to the interests and needs of those who sign up, but here are some of the things I do not intend to cover in the course:

- basic statistics for biologists
- specialized approaches for phylogenetic inference
- specialized approaches for bioinformatics – microarray, sequence, SNP data (although some concepts will carry over)
- the course will not be especially close to the catalog description (“classical biostatistics”, i.e. analysis of multidimensional contingency tables; design and analysis of clinical trials; etc.)
- analysis of time series/dynamical data (state space models etc.)
- I will probably not prove any theorems

As a general rule, in graduate courses I am primarily interested in seeing that students are making a serious effort to engage the material; different students will learn different things in this class depending on their background and interests.

Grading will be based on a combination of lab assignments (i.e. follow a prescribed set of R code to explore a problem, then answer some related exercises) (30%); participation (informal + ’formal’ = class presentations/submitting discussion questions/leading discussion on readings) (30%); and a group project (40%) which will be both written up and presented in class.

Crawley, M. J. (2002). Statistical Computing: An Introduction to Data Analysis using S-PLUS. John Wiley & Sons.

Faraway, J. J. (2006). Extending Linear Models with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Chapman & Hall/CRC.

Venables and Ripley (2002). Modern Applied Statistics with S (4th ed.). New York: Springer.

Wood, S. N. (2006). Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC.