% Ben Bolker
% Thu Nov 1 12:45:50 2012
Licensed under the
Creative Commons attribution-noncommercial license.
Please share & remix noncommercially, mentioning its origin.
First load the mlmRev and ggplot2 packages:
library(mlmRev) ## for Oxboys data
library(ggplot2)
As shown in the lecture, make a minimal plot of the Oxboys data set (see ?Oxboys for information on the data set):
ggplot(Oxboys, aes(x = age, y = height)) + geom_point()
Now play around with it.
geom_line() (it probably doesn't do what you want)ggplot specification to variables: define g0 <- ggplot(Oxboys,aes(x=age,y=height,colour=Subject)) (i.e. add colours to the mapping) and try g0+geom_point()g0+geom_line(), or g0+geom_line()+geom_point()g0+geom_point()+geom_smooth()geom_smooth(method="lm") to add linear regression lines rather than the default loess (locally-weighted regression) smoothsOxboys$Subject <- factor(Oxboys$Subject,levels=1:26) (or Oxboys <- transform(Oxboys,Subject=factor(Subject,levels=1:26))). (You need to redefine g0 after you do this, because it has stored the original version of the data internally.)reorder function to get a (slightly) more sensible orderingtheme_set(theme_bw()) to change the theme. Try it. (It also makes it easier to see colours.) (Use theme_set(theme_gray()) if you want to restore the default theme.)se=FALSE in the the geom_smooth() specification. If you want to colour the confidence intervals along with the lines, try adding aes(fill=Subject) inside the geom_smooth() call: or, if you want to change the colour of the confidence region to a single colour (and make the confidence intervals more transparent, a compromise between the default values and using se=FALSE to turn them off completely), add fill="blue",alpha=0.1 (not wrapped inside an aes() statement) to the geom_smooth() call.aes(shape=Subject) to the geom_point() call, and add +scale_shape_manual(values=1:26) to the end of your R command (you will get a series of warnings about unimplemented pch value '26'). scale_shape_manual is our first example of using scales, another component of ggplot: it allows customization of the values (colours, shapes, sizes, etc.) used in mappings.geom_point(), use geom_text(aes(label=letters[Subject]) to add text (if you had subjects with names you could use those names as the label aesthetic rather than using the built-in letters vector on the fly)ggplot didn't distinguish between subjects in drawing the lines: by default, ggplot groups by whatever aesthetic mappings have been defined (e.g. colour, shape, etc.). You can explicitly specify this, or override the default behaviour, by using the group aesthetic.
geom_line(aes(group=Subject)) to draw a separate line for each subjectgeom_smooth(aes(group=1),method="lm",size=1.5) to one of your previous plots to get a linear regression model of the pooled data (i.e. group=1 specifies that only group is the whole data set), with a fatter line than usualggplot is faceting: creating sub-plots (called “facets” or in other contexts “small multiples” or “trellis plots” or “conditioning plots”) for different subsets of the data. Add facet_wrap(~Subject) to one of your previous plots. (To create a two-dimensional grid of plots, use facet_wrap(x~y), where x and y are two separate factors you want to use to define the rows and columns of the sub-plot array.)Take a look at the online ggplot documentation to get an idea of some of the other options.
Get the data set on effects of fruiting and simulated herbivory on Arabidopsis (password-protected), which is described in more detail in the material here (search for “Bolker et al 2009”) and use read.csv to import it (you can call it whatever you want, in the examples below I'll call it dat)
The data are the total number of fruits set (total.fruits), subdivided by status (a nuisance variable, the way the plants were handled); amd (simulated herbivory treatment); nutrient (nutrient level; you may want to create an fNutrient variable within the data set that is a factor instead of a number); rack (which of two experimental racks were used); gen (genotype, again probably should be a factor); popu (population); reg (region).
Start out by mapping the most important variables to the x location, y location, and colour aesthetics:
g0 <- ggplot(dat, aes(x = factor(nutrient), y = total.fruits, colour = amd))
Now experiment with different ways of displaying the data:
geom_point, or geom_boxplot, or stat_sum(aes(size=..n..)) (the last counts the number of overlapping points and sets the point size according to the number of points).stat_summary(fun.y=mean,geom="line",aes(x=nutrient)): this summarizes each group within the data by the mean and adds a corresponding line: the aes(x=nutrient) is a bit of a trick, required because ggplot won't draw lines across horizontal axes that are defined as factors (as in this case). You could also try stat_summary(fun.data=mean_cl_normal), which draws points + normal error bars. Try adding a group aesthetic to stat_summary to get it to take the mean within groups rather than overall.shape to distinguish unresolved variables such as status.See if you can successfully produce Figures 1, 3, and 4 from this PDF (although for Figure 1 you might want to use coord_flip to produce horizontal boxplots with easier-to-read, horizontal labels)
Alternately, or in addition to the previous example, use ggplot to explore your own data or another data set you can find lying around (try data() in R, or ask the instructor …)