--- title: "Jan 20" output: html_notebook --- Degree of reading power (DRP) for 3rd graders data: ```{r} drp<-read.csv("DRP.csv") ``` Display the top of the data: ```{r} head(drp,10) ``` These are test scores. The "Treatment" group received some special training, while the "Control" group did not. Let's create vectors containing the control and treatment data: ```{r} control<-drp$Score[drp$Group=="Control"] treatment<-drp$Score[drp$Group=="Treatment"] ``` Let's find the range of these two groups: ```{r} range(treatment) ``` ```{r} range(control) ``` The "range()" function returns a vector with the max and min of the argument. We can find the range as defined in class like this: ```{r} range_treatment<-max(treatment)-min(treatment) range_treatment ``` ```{r} range_control<-max(control)-min(control) range_control ``` So we see there is quite a difference in the spread of the two data sets as measured by the range. Let's look at the data visually: ```{r} stem(treatment, scale=2) ``` ```{r} stem(control, scale=2) ``` Let's compute the interquartile ranges: ```{r} IQR(treatment) ``` ```{r} IQR(control) ``` Exercise: write a function that takes in a vector and returns a vector containing the outliers, and returns NULL if there are none. Let's create side-by-side boxplots of our two data sets: ```{r} type<-c("Treatment", "Control") boxplot(treatment,control, names=type, horizontal = FALSE, main="Degree of Reading Power Test, Third Grade", ylab="Score") ```