--- title: "ANOVA Example" output: html_notebook --- R comes with some built in Data, call "PlantGrowth", which contains results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. ```{r} PlantGrowth ``` First, we can find the grand mean: ```{r} grand.mean<-mean(PlantGrowth$weight) grand.mean ``` We can break the data into each treatment: ```{r} control<-PlantGrowth[PlantGrowth$group=="ctrl",1] control ``` ```{r} tr1<-PlantGrowth[PlantGrowth$group=="trt1",1] tr1 ``` ```{r} tr2<-PlantGrowth[PlantGrowth$group=="trt2",1] tr2 ``` Notice that all the treatments have the same size: ```{r} length(tr1)==length(tr2)&&length(tr2)==length(control) ``` ```{r} length(control) ``` And we can find the treatment means: ```{r} control.mean<-mean(control) control.mean ``` ```{r} tr1.mean<-mean(tr1) tr1.mean ``` ```{r} tr2.mean<-mean(tr2) tr2.mean ``` And we can compute the treatment variances: ```{r} S2.control<-var(control) S2.control ``` ```{r} S2.tr1<-var(tr1) S2.tr1 ``` ```{r} S2.tr2<-var(tr2) S2.tr2 ``` Let's populate an ANOVA table. We first need the sums of squares. ```{r} #SS Total ss.total<-sum((PlantGrowth$weight-grand.mean)^2)#has 30-1=29 degrees of freedom ss.total ``` ```{r} treatment.means<-c(control.mean,tr1.mean,tr2.mean) ``` ```{r} #SSTreatments. ss.tr<-length(control)*sum((treatment.means-grand.mean)^2) #has 3-1=2 degrees of freedom ss.tr ``` ```{r} #SS Error ss.e<-(length(control)-1)*(S2.control+S2.tr1+S2.tr2) #has 3(10-1)=27 degrees of freedom ss.e ``` Note in particular that: ```{r} (ss.tr+ss.e) ``` We can now compute the mean squares. Note, we only need the mean square of the error and the mean square of the treatments: ```{r} ms.tr<-ss.tr/(3-1) ms.tr ``` ```{r} ms.e<-ss.e/(3*(10-1)) ms.e ``` At this point, we can compute the observed value of the F-statistic: ```{r} f.obs<-ms.tr/ms.e f.obs ``` Now this seems substatially larger than 1, but we should compute the p-value just to be sure. ```{r} p.value<-pf(f.obs,2,27,lower.tail = FALSE) #note, we take lower.tail=FALSE in order to take the upper tail (since we want to find the probability that F>f) p.value ``` So, we can check this against various significance levels. ```{r} p.value>0.05 ``` So H_0 is rejected at the 5% significance level. However, at the 1% significance level, we see that ```{r} p.value>0.01 ``` and so we cannot reject the null hypothesis at the 1% significance level. That's a lot of work, but we can also used built in function: ```{r} res.aov <- aov(weight ~ group, data = PlantGrowth) summary(res.aov) ``` and we see we have everything in the table automatically. :o