---
title: "ANOVA Example"
output: html_notebook
---
R comes with some built in Data, call "PlantGrowth", which contains results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.
```{r}
PlantGrowth
```
First, we can find the grand mean:
```{r}
grand.mean<-mean(PlantGrowth$weight)
grand.mean
```
We can break the data into each treatment:
```{r}
control<-PlantGrowth[PlantGrowth$group=="ctrl",1]
control
```
```{r}
tr1<-PlantGrowth[PlantGrowth$group=="trt1",1]
tr1
```
```{r}
tr2<-PlantGrowth[PlantGrowth$group=="trt2",1]
tr2
```
Notice that all the treatments have the same size:
```{r}
length(tr1)==length(tr2)&&length(tr2)==length(control)
```
```{r}
length(control)
```
And we can find the treatment means:
```{r}
control.mean<-mean(control)
control.mean
```
```{r}
tr1.mean<-mean(tr1)
tr1.mean
```
```{r}
tr2.mean<-mean(tr2)
tr2.mean
```
And we can compute the treatment variances:
```{r}
S2.control<-var(control)
S2.control
```
```{r}
S2.tr1<-var(tr1)
S2.tr1
```
```{r}
S2.tr2<-var(tr2)
S2.tr2
```
Let's populate an ANOVA table. We first need the sums of squares.
```{r}
#SS Total
ss.total<-sum((PlantGrowth$weight-grand.mean)^2)#has 30-1=29 degrees of freedom
ss.total
```
```{r}
treatment.means<-c(control.mean,tr1.mean,tr2.mean)
```
```{r}
#SSTreatments.
ss.tr<-length(control)*sum((treatment.means-grand.mean)^2) #has 3-1=2 degrees of freedom
ss.tr
```
```{r}
#SS Error
ss.e<-(length(control)-1)*(S2.control+S2.tr1+S2.tr2) #has 3(10-1)=27 degrees of freedom
ss.e
```
Note in particular that:
```{r}
(ss.tr+ss.e)
```
We can now compute the mean squares. Note, we only need the mean square of the error and the mean square of the treatments:
```{r}
ms.tr<-ss.tr/(3-1)
ms.tr
```
```{r}
ms.e<-ss.e/(3*(10-1))
ms.e
```
At this point, we can compute the observed value of the F-statistic:
```{r}
f.obs<-ms.tr/ms.e
f.obs
```
Now this seems substatially larger than 1, but we should compute the p-value just to be sure.
```{r}
p.value<-pf(f.obs,2,27,lower.tail = FALSE) #note, we take lower.tail=FALSE in order to take the upper tail (since we want to find the probability that F>f)
p.value
```
So, we can check this against various significance levels.
```{r}
p.value>0.05
```
So H_0 is rejected at the 5% significance level.
However, at the 1% significance level, we see that
```{r}
p.value>0.01
```
and so we cannot reject the null hypothesis at the 1% significance level.
That's a lot of work, but we can also used built in function:
```{r}
res.aov <- aov(weight ~ group, data = PlantGrowth)
summary(res.aov)
```
and we see we have everything in the table automatically. :o