---
title: "ANOVA Example"
output: html_notebook
---


R comes with some built in Data, call "PlantGrowth", which contains results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.


```{r}
PlantGrowth
```


First, we can find the grand mean: 

```{r}
grand.mean<-mean(PlantGrowth$weight)
grand.mean
```

We can break the data into each treatment: 

```{r}
control<-PlantGrowth[PlantGrowth$group=="ctrl",1]
control
```

```{r}
tr1<-PlantGrowth[PlantGrowth$group=="trt1",1]
tr1
```

```{r}
tr2<-PlantGrowth[PlantGrowth$group=="trt2",1]
tr2
```

Notice that all the treatments have the same size: 

```{r}
length(tr1)==length(tr2)&&length(tr2)==length(control)
```
```{r}
length(control)
```


And we can find the treatment means: 

```{r}
control.mean<-mean(control)
control.mean
```

```{r}
tr1.mean<-mean(tr1)
tr1.mean
```


```{r}
tr2.mean<-mean(tr2)
tr2.mean
```
 
And we can compute the treatment variances: 

```{r}
S2.control<-var(control)
S2.control
```

```{r}
S2.tr1<-var(tr1)
S2.tr1
```

```{r}
S2.tr2<-var(tr2)
S2.tr2
```

 
Let's populate an ANOVA table. We first need the sums of squares. 


```{r}
#SS Total
ss.total<-sum((PlantGrowth$weight-grand.mean)^2)#has 30-1=29 degrees of freedom
ss.total
```

```{r}
treatment.means<-c(control.mean,tr1.mean,tr2.mean)
```


```{r}
#SSTreatments.
ss.tr<-length(control)*sum((treatment.means-grand.mean)^2) #has 3-1=2 degrees of freedom
ss.tr
```

```{r}
#SS Error
ss.e<-(length(control)-1)*(S2.control+S2.tr1+S2.tr2) #has 3(10-1)=27 degrees of freedom
ss.e
```

Note in particular that: 

```{r}
(ss.tr+ss.e)
```


We can now compute the mean squares. Note, we only need the mean square of the error and the mean square of the treatments: 

```{r}
ms.tr<-ss.tr/(3-1)
ms.tr
```

```{r}
ms.e<-ss.e/(3*(10-1))
ms.e
```

At this point, we can compute the observed value of the F-statistic: 

```{r}
f.obs<-ms.tr/ms.e
f.obs
```

Now this seems substatially larger than 1, but we should compute the p-value just to be sure. 

```{r}
p.value<-pf(f.obs,2,27,lower.tail = FALSE) #note, we take lower.tail=FALSE in order to take the upper tail (since we want to find the probability that F>f)
p.value
```


So, we can check this against various significance levels. 

```{r}
p.value>0.05
```

So H_0 is rejected at the 5% significance level. 

However, at the 1% significance level, we see that 

```{r}
p.value>0.01
```

and so we cannot reject the null hypothesis at the 1% significance level. 


That's a lot of work, but we can also used built in function: 


```{r}
res.aov <- aov(weight ~ group, data = PlantGrowth)

summary(res.aov)
```

and we see we have everything in the table automatically. :o