11 November 2021

outline

  • experimental design
  • statistical philosophy
  • statistical tests & assumptions
  • analysis platforms

experimental design

the most important thing

  • Design your experiment well and execute it well:
    you needn’t worry too much in advance about statistics
  • Don’t: you’re doomed - statistics can’t save you
  • randomization, replication, control

randomization

  • random assignment to treatments
  • poorer alternative: haphazard assignment
    (“convenience sampling”)
  • stratification
    (i.e., randomize within groups)
  • related: experimental blinding

Yellowstone aspen regeneration (Brice et al. 2021)

replication

  • how big does your experiment need to be?
  • power: probability of detecting an effect of a particular size,
    if one exists
  • more generally: how much information? what kinds of mistakes? (Gelman and Carlin 2014)
  • underpowered studies
    • failure is likely
    • cheating is likely
    • significance filter \(\to\) biased estimates
  • overpowered studies waste time, lives, $$$
  • pseudoreplication (Hurlbert 1984; Davies et al. 2015) confounding sampling units with treatment units

power analysis

  • need to guess effect size and variability
    • minimum interesting biological effect size
    • previous studies
    • ask your supervisor
  • OK to simplify design (e.g. ANOVA \(\to\) \(t\)-test)
  • methods
apropos("^power")  ## base-R functions
library("sos"); findFn("{power analysis}")

power analysis example

With \(n=15\) per group, are we likely to see a clear difference between 10% (control) and 20% (treatment) mortality (a doubling of mortality)?

power analysis example (continued)

power.prop.test(n=15,p1=0.1,p2=0.2)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 15
##              p1 = 0.1
##              p2 = 0.2
##       sig.level = 0.05
##           power = 0.1141268
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Uh-oh!

How many samples per group would we need to get power=0.8?

power.prop.test(power=0.8,p1=0.1,p2=0.2,sig.level=0.05)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 198.9634
##              p1 = 0.1
##              p2 = 0.2
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Uh-oh!

what should we do?

increasing power (control)

  • increase sample size (ugh)
  • increase rejection threshold (e.g. \(\alpha=0.1\))
    (probably can’t get away with this)
  • maximize desired variation (e.g. large doses)
  • measure a clearer outcome (gene expression, hormone levels, etc.)

  • minimize undesired variation
    • within-subjects designs
      (paired, randomized-block, crossover)
  • minimize environmental variation
    (e.g. environmental chambers; clonal or inbred lines)
  • isolate desired effects: positive/negative controls
    (vehicle-only, cage treatments, etc.)
  • control for variation statistically
    (e.g. include body size as a covariate)

statistical philosophy

the other most important thing

  • don’t snoop!
    (= don’t look at your results before deciding how to analyze them)
  • (but do explore your data graphically after deciding on a tentative analysis plan!)
  • reproducibility crisis: Ioannidis (2005), Simmons et al. (2011)
  • pre-register; think about what your questions are and how you will test them before you look at your data

fishing expeditions

“The Garden of Forking Paths” (Gelman and Loken 2014)

don’t lean on p-values too much

  • focus on effect sizes/CIs
  • eschew vacuous hypotheses
  • don’t accept the null hypothesis
  • “the difference between significant and non-significant is not significant” (Gelman et al. 2006)
  • statistical clarity (Dushoff et al. 2018)

don’t lean on p-values too much

  • focus on effect sizes/CIs
  • eschew vacuous hypotheses
  • don’t accept the null hypothesis
  • “the difference between significant and non-significant is not significant” (Gelman et al. 2006)
  • statistical clarity (Dushoff et al. 2018)

regression table (OK)

1978 automobile data (Chambers et al. 2018)

(1)
(Intercept)8.010  (6.206)
mpg-0.187 *(0.088)
trunk-0.013  (0.105)
length0.055  (0.036)
turn-0.200  (0.140)
N74           
R20.251       
logLik-173.832       
AIC359.665       
*** p < 0.001; ** p < 0.01; * p < 0.05.

coefficient plot (better!)

statistical tests

assumptions

  • independence (hard to test!)
  • homogeneity of variance (vs. heteroscedasticity)
  • linearity
  • Normality (least important)
    • outliers; skew; “fat tails” (Student 1927)
    • distributional assumptions apply to the conditional distribution of the response variable

diagnostics

  • hypothesis tests are not generally appropriate:
    they answer the wrong question
  • graphical diagnostics
    • residuals plots (linearity, heteroscedasticity)
    • influence plots (outliers)
    • Q-Q plots (Normality)
    • Box-Cox plots (transformations)

(the data set)

diagnostics

diagnostics (performance pkg)

dealing with violations

  • drop outliers (report both analyses)
  • transform (e.g. log transform: Box-Cox analysis)
  • non-parametric (rank-based) tests
    (e.g. Mann-Whitney-Wilcoxon, Kruskal-Wallis)
  • relax assumptions/do fancier stats, e.g.
    • logistic regression (0/1 outcomes)
    • quadratic regression (nonlinearity)

what should you use?

  • try to connect scientific & statistical questions
  • data type
    • see decision tree or table
    • if your question doesn’t fit in this tree,
      think about how much you like statistics …
  • nonparametric stats
    • slight loss of power
    • stronger assumptions than you think
    • \(p\)-values only - no effect size

computational platforms

Criteria

  • simple/weak vs. complex/powerful
  • GUI vs command-line
  • default: use what your lab uses

Excel

  • ubiquitous
  • open alternatives (Open Office)
  • data in plain sight
  • good enough for simple stuff
  • occasional traps (McCullough et al. (2008); date handling; etc.)
  • archive your data as CSV, not XLSX

stats packages

  • SPSS, JMP …
  • more reliable than Excel
  • more powerful than Excel
  • point & click (mostly)

R

  • powerful; free & open
  • reproducible: script-based
  • hardest to learn
  • graphical interfaces: R Commander or Jamovi
  • great for data manipulation, graphics
    (once you learn how)

t-test (R)

x1 = c(1.5,2.5,2.1)
x2 = c(1.1,1.4,1.5)
t.test(x1,x2)
## 
##  Welch Two Sample t-test
## 
## data:  x1 and x2
## t = 2.226, df = 2.6648, p-value = 0.1236
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3759079  1.7759079
## sample estimates:
## mean of x mean of y 
##  2.033333  1.333333

t-test (Excel)

Further resources

References

Brice, EM et al. 2021.. Ecology Letters n/a (n/a). doi:10.1111/ele.13915. https://onlinelibrary.wiley.com/doi/abs/10.1111/ele.13915.

Chambers, JM et al. 2018. Graphical methods for data analysis. Chapman; Hall/CRC.

Davies, GM et al. 2015.. Ecology and Evolution (October). doi:10.1002/ece3.1782. http://onlinelibrary.wiley.com/doi/10.1002/ece3.1782/abstract.

Dushoff, J et al. 2018. (October). https://arxiv.org/abs/1810.06387.

Gelman, A et al. 2014.. Perspectives on Psychological Science 9 (6) (November): 641–651. doi:10.1177/1745691614551642. http://pps.sagepub.com/content/9/6/641.

———. 2006.. The American Statistician 60 (4) (November): 328–331. doi:10.1198/000313006X152649. http://www.tandfonline.com/doi/abs/10.1198/000313006X152649.

Hurlbert, SH. 1984.. Ecological Monographs 54 (2) (June): 187–211. doi:10.2307/1942661. http://www.esajournals.org/doi/abs/10.2307/1942661.

Ioannidis, JPA. 2005.. PLoS Med 2 (8): e124. doi:10.1371/journal.pmed.0020124. http://dx.doi.org/10.1371/journal.pmed.0020124.

McCullough, BD et al. 2008.. Computational Statistics & Data Analysis 52 (10) (June): 4570–4578. doi:10.1016/j.csda.2008.03.004. http://www.sciencedirect.com/science/article/pii/S0167947308001606.

Simmons, JP et al. 2011.. Psychological Science 22 (11) (November): 1359–1366. doi:10.1177/0956797611417632. http://pss.sagepub.com/content/22/11/1359.

Student. 1927.. Biometrika 19 (1/2) (July): 151–164. doi:10.2307/2332181. http://www.jstor.org/stable/2332181.