\documentclass[12pt]{article}
%\documentclass[12pt,reqno,final]{amsart}   % different style

%% Changelog:
%% 2012-05-06 BMB modifications: 
%%    * \I -> \emph (or \code); \B -> \textbf
%%    * revised for knitr
%%    * got rid of attach()

%%%%%%%%%
\input{stuTeX}
\newcommand{\proginput}[1]{{\sf #1}}
%%%%%%%%%
%\usepackage{times}
%\usepackage{helvetica}

\geometry{left=0.75in, bottom=0.75in, right=0.75in, top=0.75in, foot=0.5in, head=0.5in}
\parindent = 1.0cm

%%%%%%%%%%%%%%%%%%%%%
%           Sweave Options                     %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%     prefix.string = folder & prefix for figs                          %
%     height & width = dimensions of plot passed to device     %
%     eps = write eps file also                                            %
%     keep.source = keeps results looking nice                     %
%     fig = produce figure if called                                      %
%     echo = repeat the chunk call                                      %
%     strip.white = white lines before/after output                 %
%     eval = evaluate chunk                                               %
%     results = show results verbatim, tex, or hide                %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\SweaveOpts{prefix.string= Figures/eeid, height= 6, width= 8, eps= false, keep.source= true, fig= false, echo= true, strip.white= true, eval= true, results= verbatim}
%\setkeys{Gin}{width=0.85\textwidth}
\SweaveOpts{fig.path="Figures/",fig.height=6,fig.width=8,out.width="0.85\\textwidth"}
%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%
\title{\R Basics Tutorial for Ecologists}
\date{}

%\address{
%  Stu Field,
%  Department of Biology, 
%  Colorado State University, 
%  Fort Collins, Colorado
%  80523-1878    USA
%}

%\author[Field]{Stuart G. Field}
%\email{sgf at colostate dot edu}

\author{EEID Workshop: by Stu Field%
%
\footnote{Special thanks to the organizers of the $7^{th}$ annual Ecology \& Evolution of Infectious Disease Workshop for insightful comments, suggestions, \& examples in development of this tutorial, however all careless grammar, errors, and typos are my own.}
}

<<options, echo=FALSE, results="hide", message=FALSE>>=
require(popbio)
require(gplots)
require(lattice)
require(deSolve)
glop <- options(digits=5)
## on.exit(options(.glop))  ## BMB: this line seems to prevent options taking effect
pdf.options(useDingbats=FALSE)
#options(device= function(...) {.Call("R_GD_nullDevice", PACKAGE= "grDevices")})
# read in relevant data   #
mydata <- read.csv("Datafiles/TreeData.csv")
popdata <- read.csv("Datafiles/Pop6C.csv")
@

%%%%%%%%%%%%%%%
\begin{document}
\maketitle
%%%%%%%%%%%%%%%
%%%%%%%%%%%%
%      Timestamp       %
%%%%%%%%%%%%
\begin{flushright}
  \fbox{\parbox{0.37\textwidth}{\B{TimeStamp:} \timestamp}}
\end{flushright}
%%%%%%%%%%


%%%%%%%%%%%%%
%       Title figure         %
%%%%%%%%%%%%%
\tabv
\fig[H]
\centering
\includegraphics[width=9cm, angle=0]{Figures/rglImage}
\efig


\vfill
\fig[H]
\centering
\hfill \includegraphics[width=2cm, angle=0]{Figures/NSF}
\efig


%%%
\np %
%%%%%%%%
\tableofcontents 
%%%%%%%%

%%%
\np %
%%%%%%%%%%%%%%%%%%%
\section{What exactly is \RR?} \label{sec:R}
%%%%%%%%%%%%%%%%%%
\R is an object-oriented scripting language that combines:%
%
\footnote{From Ben Bolker (2008). \emph{Ecological Models and Data in R}. 
     Princeton Univ. Press, Ch. 1.}
%
\bit
   \item A dialect of the programming language \pkg{S}, developed by John Chambers at Bell Labs, that can be used for numerical simulation of both deterministic and stochastic dynamic models 
   \item An extensive set of functions for classical and modern statistical data analysis and modeling 
   \item Graphics functions for visualizing data and model output 
   \item A user interface with a few basic menus
\eit

\R is an \emph{open source} project, available for free download via the Internet (\url{http://www.r-project.org}). Originally a research project in statistical computing, it is now managed by an international core development team that includes a number of well-respected statisticians, and is widely used by statistical researchers (and a growing number of theoretical ecologists and ecological modelers) as a platform for making new methods available to users. 
%% BMB: no-one uses S-PLUS any more!
%% The commercial implementation of \pkg{S} (called \pkg{S}-\textsc{plus}) offers an Office-style ``point \& click'' interface that \R lacks. For our purposes, however, the advantage of this user-friendly front-end is balanced by the fact that \R is built on a faster, much less memory-hungry implementation of \pkg{S}. 
A standard installation of of \R includes extensive documentation, including an introductory manual ($\sim$ 100 pages).%
%
\footnote{See \code{R-intro.pdf} accompanying this tutorial.}


%%%%%%%%%%%%%%
\section{Introduction to \R} \label{sec:intro}
%%%%%%%%%%%%%%
Reading a tutorial about the applications and capabilities of \R is a great start, but until you do it yourself and are actually forced to solve problems on your own in \RR, you will become neither confident nor proficient. This is precisely what the following exercises are designed to do; give you practical programming experience in \RR. Before you start, I recommend you get used to running \R from script files, rather than directly from the command window (\ie The Console). It'll make your life much simpler, especially for more complex computations. For those of you using MacOS, \R already comes with a fairly nice script editor. Alternatively, for those using Windows, I would recommend a supplementary script editor like \textsf{RStudio}, which is becoming rather popular in recent years (\code{\blue{http://www.rstudio.org/}}).\footnote{Really, any editor or GUI that can (1) hot-paste code chunks into R; (2) do syntax
highlighting; (3) match closing with opening brackets will do.
\textbf{Do not under any circumstances use Microsoft Word to edit your R code \ldots}}
%There is also \textsf{Tinn-R}, which has been around a while although recent upgrades seem to have only made it more complicated and less reliable.%
%
%\footnote{Older versions of Tinn-\R seem to be more stable \& straight-forward (try \code{v1.17.2.4}).}

%At the top of your script file you should start with %\code{rm(list=ls())}, which removes all objects from \RR's memory. %You'll use it often. To remove a single object use %\code{rm(objectname)}. 

You may occasionally want to run the command
\code{rm(list=ls())} to remove all objects from \RR's memory. 
However, it is \emph{not} good practice to embed this in your
script files by default, because it can trip up anyone else who
uses your code and isn't expecting it.
To remove a single object use \code{rm(objectname)}.

The objective of this tutorial is to create your own library of scripts for analyses which grows with your knowledge of \RR. Always be sure to annotate your scripts \emph{diligently} with the \verb+#+ symbol, otherwise you may come back to the script months later and not remember what you did (extremely frustrating!). I tend to have paired files with the same name, \eg \code{XYZdata.csv} and \code{XYZdata.R}, where the \code{*.R} file contains all the necessary commands, analyses, and plots for that particular data set in \code{*.csv}.

%%%%%%%%%%%%%%%%%%
\subsection{Installing Packages}
%%%%%%%%%%%%%%%%%%
Packages containing additional functions for \R to use can be downloaded from the internet.%
%
\footnote{The process is fairly simple, although some issues with admin rights may arise if you don't have rights to modify the \R directory.} 
%
There are essentially two steps to using packages: (1) you need to download and install the package onto your machine (once), and (2) you need to load the package (each \R session). Step (2) is simple: type \code{require(packagename)}. Step (1) is done as follows (for a \textsf{PC}):

\benN
   \item Make sure you are connected to the internet.
   \item Click on \proginput{Install package(s)} from the drop-down \proginput{Packages} menu.
   \item Select your CRAN mirror (somewhere within your current country is probably a good idea).
   \item Select the package you want from the menu (in alphabetical order).
   \item Click \textsf{OK} and wait for it to install (should auto-install).
   \item The package is now installed and ready for you to call for it (i.e., load it as stated above).
\een

Alternatively, if you know the name of the package you want to install you can just type \code{install.packages("pkgname")} at the \R prompt.

Not all packages are independent (functions from one package may use functions from within other packages). Usually the required dependencies will load/install automatically; on the other hand, if you get an error message you will have to install them yourself. If you go to the \proginput{Packages} menu and click on \proginput{Load package}, you will see a list of the packages that have been successfully installed on your machine.

For MacOS, the process is very similar except you click on \proginput{Packages \& Data} from the main menu and then \proginput{Package Installer} for package installation. There is an option to also install dependencies at this point. To see which packages are currently installed on the machine, go to \proginput{Package Manager}, where you can check the box of the package of interest to load it.

%%%%%%%%%%%%%%%%%%
\subsection{The Help Menu}
%%%%%%%%%%%%%%%%%%
One of the best features of \R is the extensive \textbf{Help menu} for general information about syntax and arguments for a given function -- just type \code{?functionname}. But be forewarned, it can take a little time to decipher these menus; at first they may seem like gibberish but with time this will pass. It is a very handy tool that I use almost daily. There are a few key sections of the Help Menu to highlight:


%%%%%%%%%%%%%
\begin{table}[H]
\centering
\belowcaptionskip=0.25cm
\caption{Summary of the key sections of \RR's Help Menu.}
\begin{tabular}{l l}
\hline\hline
\textbf{Section} 		& \textbf{Description} \\[5pt]
\hline
\code{line 1} & Gives the function name and the package in \code{\{\dots\}} \\
\code{Description} & A short general description of what the function does \\
\code{Usage} & Illustrates proper syntax; with function arguments and defaults \\
\code{Arguments} & Details about the various function arguments (important!) \\
\code{Details} & Any additional notes and caveats that a user should know \\
\code{Value} & Description of what the function returns following execution \\
\code{References} & Who wrote it and their source if applicable \\
\code{See Also} & Related functions that might also be useful \\
\code{Examples} & Most useful \dots fully executable examples! \\[10pt]
\hline\hline
\end{tabular}
\label{tab:helpmenu}
\end{table}
%%%%%%%%%%%%%


\noi Let's take a look at the Help menu using the function \code{apply()} as an example. In the \code{R Console } type
\code{?apply}:
this should bring up a \textbf{Help} window with details about the \code{apply} function. First, you can see that this function \emph{applies} a given function (\code{FUN}) to the row or column (\code{MARGIN}) of an object (\code{X}). The object (\code{X}) will have to be a data frame, matrix, or array (any 2-dimensional object). The \code{\dots} refer to any additional arguments of (\code{FUN}) you may want to include. The \code{Usage} section tells you how to formulate your syntax and construct your arguments within the function and the \code{Arguments} section gives more information about the actual pieces of information \code{apply()} will need to to its job. Here is an example:

<<apply>>=
Mx = matrix(1:16, nrow=4, ncol=4)    # create a matrix called Mx
Mx
apply(Mx, 1, mean)
apply(Mx, c(1,2), sqrt)
@

This tells \R to calculate the mean of the \emph{rows} \code{(,1,)} of \code{Mx} (by column is \code{(,2,)}). In addition, the Help menu says in the \code{MARGIN} description that using \code{c(1,2)} tells \code{apply()} to perform the function by row \emph{and} by column (\ie entry-wise). The fourth line above uses the \code{sqrt()} function to take the entry-wise square-root of \code{Mx}. \textbf{Note:} how you construct the command depends upon the function you want use (\ie it wouldn't make sense to use the \code{mean} function with an entry-wise construction because you'd be taking the mean of one value -- probably not what you had in mind). Likewise, the construction above is pretty redundant (and unnecessarily complex), since it's exactly the same as \code{sqrt(Mx)}. Lastly, take a look at the \code{Examples} section to see \code{apply()} in action.

I have provided the full \R code associated with all figures and exercises in this tutorial. The intent is that you first try to complete the exercises on your own, then only use this code when/if you get stuck. If you're \emph{really} stuck, I suggest you leave it, try to move on, and we will take care of it during the workshop. Good luck and happy Rrrrring!


\np
%%%%%%%%%%%%%%%%%
\section{Walk before you run} \label{sec:walking}
%%%%%%%%%%%%%%%%%
Before we get into more advanced procedures in \RR, you will need to familiarize yourself with the basic input structure of \RR, completing interactive calculations, assigning variables, and becoming comfortable with the \R console and visual interface.

At the \R console notice the drop down menus and familiarize yourself with where you can find many of the options you will need during an \R session. The console itself, upon startup, will state the version of \R you are running and some basic information about the developers of the software. Also of importance here is the command to exit \RR, which is \code{q()}. Commands can be typed directly into the console at the \code{>} prompt and executed by pressing \code{Enter}. As stated previously, for shorter, non-repetitive commands, utilizing the console is fine, but for the most part I strongly suggest you make it a priority to learn to work from script files. You may then execute lines or chunks of \R code by sending them to the console and in this way you are able to repeat a command by resending the code rather than retyping the entire line(s) in the console (which is extremely tedious). For Windows script editor: place the cursor on the line of interest, or highlight multiple lines, and hit \code{CTRL+r}. For the Mac: hit \code{Cmd+Enter} from the built in Mac editor. Of course, if using an auxiliary script editor (such as \code{RStudio}), you bypass this process and send code chunks directly from the external script editor. For now, lets use the console for the following simple interactive calculations.

%%%%%%%%%%
\subsection{Basic calculations and input}
%%%%%%%%%%
\R uses most of the familiar mathematical operators you may be accustomed to from other computer languages. Addition, subtraction, multiplication, division, exponentiation, $e^x$, etc. are \code{+, -, *, /, \^{}, exp(x)}, respectively. In addition, when working from the console you may use the arrow up key ($\uparrow$) to scroll through the previous few commands to the prompt and edit them (using the $\leftarrow \to$ keys) as desired. Lets begin with some simple computations: type \code{6 + 9} and then \code{Enter} at the command prompt (\code{>}) of the \R Console.

<<addition,prompt=TRUE,comment="">>=
6+9
@

The answer is returned with \code{[1]} next to it, which represents the number of the first element of that output row (in this case only one entry). Try some simple calculator-type calculations on your own to get used to the input/output style of \RR. Rules of mathematical order of operations also apply. Try typing

<<operations,prompt=TRUE,comment="">>=
7-12/3^2
@

although, when making more complex calculations I recommend using parentheses, this is much less for \R than for your own sake. Also note, you may separate an equation (line break) to a second line by hitting \code{Enter} and this will result in a \emph{continuation prompt} (\code{+}), where \R tells you that you haven't finished and prompts you to continue.%
%
\footnote{For the remainder of this tutorial I have removed the continuation prompt (\code{+}) in printed \R code to simplify the cutting and pasting of code from the tutorial, but you will likely see it during your session, particularly if you directly use the Console.}

<<continuation,prompt=TRUE,comment="",tidy=FALSE>>=
(18 * 9) * (2*(3 + 0.5)) -	# R knows you want to subtract something
   1.5*(81)^0.5
@

\noi You may also separate multiple different commands on a single line using the semicolon (\code{;}).

<<semicolon,tidy=FALSE,prompt=TRUE,comment="">>=
7/2; 8*3; 9/3*10; 64^(1/3); log(45)
@

Up until now the input and output are printed in this document more or less exactly as you will see it on the Console, 
showing the prompt (\verb+>+) and continuation (\code{+}) characters; from now on the prompt and continuation characters will be missing, and the results will have comment characters (\verb+##+) in front of them, to make it easier for you to cut and paste the results into \R if you like.

%%%%%%%%%%%%%%%%%%
\subsection{Variable assignment}
%%%%%%%%%%%%%%%%%%
\R carries out calculations with variables in the same manner as numbers (as above). You may assign values to variables (\ie create objects) by using either the \code{=} or \code{<-} symbol, which is a ``less than'' symbol immediately followed by a ``minus'' sign.%
%
\footnote{In practice \code{<-} can be used interchangeably with 
   \code{=}, in almost every situation you will encounter. Opinions differ about which is better.}
%
This symbol can be viewed as ``gets assigned to'': \code{z} ``gets assigned to'' \code{y*(x/2 + 7) + sqrt(y)}. When assigning/naming variables, remember they must begin with a letter, but may contain numbers and decimal points thereafter, and that assignment is case-sensitive (\code{variable.1} $\not=$ \code{Variable1}). Try the following:

<<assignment>>=
x = 7
y = 9
z <- y * (x/2 + 7) + sqrt(y)
@

Notice the result (\code{z}) was not displayed in the console. Once assigned, a variable (or object) is stored in \R memory as an object and to view the result you must first ``call'' the variable by typing the variable name and pressing \code{Enter}. In calculating \code{z}, \R remembers that \code{x} and \code{y} have already been assigned a numeric value and uses those values in calculating \code{z}.

<<calling_variables>>=
x; y; z
@

A trick you can use if you want to both assign and print the value is to surround the entire expression in parentheses:

<<using_variables>>=
(Ex1 <- y^x - z^2)
@

Try using the up-arrow ($\uparrow$) to edit the value of \code{y=5} and then recalculate \code{z} and \code{Ex1}. We can now try a more practical example combining assignment and calculation using a list of numbers called a vector (more on vectors in \S~\ref{sec:vectors}). Remember that \R is case sensitive, so \code{X} $\not=$ \code{x}.

<<calculating_variables>>=
X <- 1:10
Y <- (1-1/X) + 2
X
Y
@

This will return two lists%
%
\footnote{I'm using the term ``list'' loosely here, in \R the term \code{list} is a unique class of object. Technically, \code{X} and \code{Y} are \emph{vectors}.} %
%
of numbers which can then be plotted to display the relationship between $x$ and $y$. We will discuss plotting in more detail in \S~\ref{sec:plots}.

\fig[H]
\centering
<<plot_xy>>=
plot(X, Y)
@
\caption{Your very first plot using \R -- the relationship between \code{X} and \code{Y}.\label{fig:xy}}
\efig


%%%%%
\np %%%
%%%%%%%%%%%%%%%%%%%%%%%%
\section{Importing Data} \label{sec:importing}
%%%%%%%%%%%%%%%%%%%%%%%%
First things first. You need to get your data, which was perhaps collected in the field and now exists as an Excel worksheet or something similar, loaded into \R memory. For the \S~\ref{sec:importing}  and ~\ref{sec:exploring} I suggest reading the supplement by Ben Bolker entitled \emph{Importing data into} \RR. In addition, Chapter~2 of his book, \emph{Ecological Models and Data in R},%
%
\footnote{Bolker, B. (2008). \emph{Ecological Models and Data in R}. Princeton University Press. pp. 408. Graciously provided in \code{pdf} format by B. Bolker (ISBN: 0691125228; \url{http://www.math.mcmaster.ca/~bolker/emdbook/}).} %
% 
provides a more comprehensive discussion of how to shape your data into a compatible form for analysis in \R (including troubleshooting tips for common data formatting issues). Importing data is a relatively simple but essential process, yet you would be surprised how many unexpected issues can arise. The simplest, but by no means the only, way is to save your \pkg{Excel} file as a comma delimited file (\code{filename.csv}), using \proginput{Save as}. Two small hints:

\benN
   \item make sure your home \R directory is set to the location where your datafile is located.
   \item make sure your headings are clear but simple (\ie no spaces, funny characters, etc.). 
\een

The function \code{read.csv()} converts the \code{csv} file into a \emph{data frame}. A data frame is essentially a 2-dimensional array that can contain any combination of vectors (columns of data) that are of integer, numeric, character, or factor classes. This differs from a \code{matrix} object which can only contain one class of data.

One last note on data file organization: it's important to know the difference between \emph{wide} format and \emph{long} format. Table~\ref{tab:dataorg} shows the same data organized in two ways. On the left, it's in long format: there are a few long columns since there's one row for each observation. On the right, it's in wide format: more, shorter columns, with multiple observations on each row. For some analyses it's important to have your data in one format or another. Wide format is often easier to read, but many analyses require the data in long format. There is a function in the base installation of \RR, \code{reshape()}, that converts between the formats (somehow I can never remember how to use it! -- \code{?reshape}). Lately I've been using the \code{melt()} and \code{cast()} functions (confusingly within the \pkg{\{reshape\}} \emph{package}), instead of \code{reshape()}. Use whatever tool/function you like to get your data into long format.


%%%%%%%%%%% \multicolumn{3}{c|}{Three Merged Cols} \\
\begin{table}[H] 
\centering
\belowcaptionskip=0.25cm
\caption{Long vs. wide format.}
\begin{tabular}{c c c | c c c c c c }
\hline\hline
\multicolumn{3}{c|}{\textbf{Long format}} & \multicolumn{6}{c}{\textbf{Wide format}}\\
%
No. & Species & Variable & No. & A & No. & B & No. & C \\ 
\hline
	1 &         A & 10.88 &    1 & 10.88  & 6  & 13.38  & 11 & 4.31 \\
	2 &         A & 11.59 &    2 & 11.59  & 7  & 14.97  & 12 & 6.13 \\
	3 &         A & 10.91 &    3 & 10.91  & 8  & 12.01  & 13 & 5.66 \\
	4 &         A & 10.78 &    4 & 10.78  & 9  & 15.14  & 14 & 4.99 \\
	5 &         A & 10.00 &    5 & 10.00  & 10 & 12.60  & 15 & 5.77 \\
	6 &         B & 13.38    \\
	7 &         B & 14.97    \\
	8 &         B & 12.01    \\ 
	9 &         B & 15.14    \\
	10 &        B & 12.60    \\
	11 &        C & 4.31     \\
	12 &        C & 6.13     \\
	13 &        C & 5.66     \\
	14 &        C & 4.99     \\
	15 &        C & 5.77     \\
	\hline
\end{tabular}
\label{tab:dataorg}
\end{table}
%%%%%%%%%%%%


%%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%%
\benN
	\item Read the \code{TreeData.csv} file into \R memory using 
	\code{read.csv("TreeData.csv"}. Add the argument
	\code{(\dots, row.names= 1)} and note what it does. Be sure to
	redefine (overwrite) this data frame back to the original (without the 
	\code{row.names} argument) because you will use the original data below. 
	It is usually easier to assign the new data frame a name such as \code{mydata}
	so that you can refer to it later if necessary (\eg 
	\code{tree.diameter = mydata\$dbh}).
	
	\item So you don't get hung up at this preliminary stage, if you are having
	difficulty reading \code{TreeData.csv} in to memory I have provided a
	supplementary ``cheat'' file called \code{TreeData.Rdata} which should help.
	Simply double-click this file and its contents (the object \code{mydata}) should
	be read into memory). Alternatively, type 
	\code{load("TreeData.Rdata")} directly in the console.%
	%
	\footnote{Be sure that the working directory is set correctly!} %
	%
	Make sure this was done correctly by calling \code{mydata} in the console (or
	type \code{ls()} to make sure \code{mydata} is in \RR's list of objects in
	memory). I have also included the file \code{R\_Tutorial\_All\_Data.Rdata}
	which contains \textbf{all} the data sets required for this tutorial. Try to read the
	data into memory using \code{read.csv()} but as a last resort use the 
	\code{*.Rdata} files.
	
	\item Call the data frame and take a look at what was loaded into memory. 
	Explore the \code{ls()} function as well as these below to see what they do.

		\bit
		   \item \code{summary(mydata)}
		   \item \code{sapply(mydata, class)}
		   \item \code{names(mydata)}
		   \item \code{attributes(mydata)}
%		   \item \code{attach(mydata)}
% 		   \item \code{detach(mydata)}
		\eit
		
		% Notice what \code{ls()} gives you before and after you \code{attach} your
		% data frame. Are there any new objects? I prefer to assign each variable 
		% individually for this reason, so I know there is nothing going on in the
		% background I cannot see.

		% \textbf{Note about attach():} Advanced \R users do not routinely employ attach
		% in their work, because it can lead to unexpected problems in resolving
		% names (\eg you can end up with multiple copies of the same variable 
		% name, each of a different length and each meaning something completely 
		% different). The problem is that if you manipulate an ``attached object'', the
		% changes aren't propagated back to the original data, therefore \code{xyz}
		% and \code{mydata\$xyz} are no longer are the same. Most modelling 
		% functions like \code{lm()} or \code{glm()} have a \code{data=} argument
		% so attach becomes unnecessary.

	\item The first thing you'll need to do is make sure your data set is complete, 
	that means no pesky \code{NA}s (the code for missing data in \RR). First,
	determine if some of those \code{NA}s should actually be zeros. You can
	identify whether you have a complete data set with the command
	\code{complete.cases(mydata)} or if you have a \emph{huge} data set you
	may want to simplify your life with \code{which(!complete.cases(mydata))}
	and hope \R returns \code{integer(0)}. If not, and you're sure you want to
	remove cases with with missing data, you can remove them with 
	\code{mydata <- na.omit(mydata)}.%
	%
	\footnote{The Bolker references give a excellent and detailed description
	     regarding how to deal with \code{NA}s.}	%
	%
	\item You wish to determine how many of each species of tree is present 
	within the dataset. This is particularly effective if you are organizing data by a 
	factor (such as species, which you just identified as a factor with
	\code{sapply()}%
	%
	\footnote{The \pkg{gdata} package has a simplified version called \code{ll()},
	     try \code{ll(mydata, dim=TRUE)}.} %
	%
	function above), or some discrete variable rather than a continuous one. Try
	assigning a column of \code{mydata} a name%
	%
	\footnote{It is a good idea to avoid using the following: \code{c}, \code{q}, 
          \code{t}, \code{C}, \code{D}, \code{F}, \code{I}, and \code{T}.
	     These  are already reserved for other meanings in \R (\code{T} and \code{F} 
             are abbreviations for \code{TRUE} and \code{FALSE}). It's also a good idea to avoid 
             \code{l} and \code{O} (lower-case letter ``L'' and 
	     upper-case letter ``O''), because they are easily confused with 1 (one) and 0
	     (zero).} %
	%
	(\eg \code{species = mydata\$spp}) and using the \code{table()} function, 
	\code{table(species)}. Repeat this with other variables within the data frame.

	\item Lastly, especially for data frames with factors (which you will probably 
	want to do analyses by), the function \code{levels()} can be useful. Explore
	this function.
\een


\np
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Exploring Your Data} \label{sec:exploring}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
Check to see that your \code{mydata} is still in memory using the \code{ls()} function. If not, re-read it into memory. Then type \code{summary(mydata)} to get a basic overview of the descriptive statistics of your data.

You'll have to be able to manipulate your data set at some point. This is primarily done using the \code{sort()} and \code{order()} functions. Make sure you know the difference between them! Typically, \code{sort()} will organize according to your specifications \emph{independently} of the other columns, which you may not want to do (be thankful \R doesn't rewrite your original \code{*.csv} file!), and is therefore more appropriate for arranging stand alone vectors (see \S~\ref{sec:vectors}) as opposed to data frames or matrices (see \S~\ref{sec:matrix}). Lets try a simple example:

<<order_dataframe, eval=FALSE>>=
order(mydata$dbh)
mydata[order(mydata$dbh), 1:ncol(mydata)]
@

The first line returns a vector indicating the row numbers of the entries in the \code{dbh} column in increasing order. The second line tells \R to rearrange the data frame by increasing values of \emph{dbh} and to order \textbf{all} columns in this fashion. The part before the comma determines the order of and which rows to include. The part after the comma tells \R which columns to return (I used the \code{ncol} function to extract the number of columns of the data frame). Alternatively you could simply leave the column declaration blank which defaults to all columns:

<<decreasing_dataframe, eval=FALSE>>=
mydata[order(mydata$dbh), ]
mydata[order(-mydata$dbh), ]  # decreasing
@

The default for \code{order()} is increasing; you may add the negative symbol (\code{-}) to arrange the data frame according to \emph{decreasing} values of the desired column. This will work only for \code{numeric} or \code{integer} class variables; for non-numeric sorting variables you must use \code{rev(order())} -- for example:

<<order_characterCol, eval=FALSE>>=
mydata[rev(order(mydata$Infected)), ]   # for a factor (character string) 
@


%%%%%%%%%%%%%%%%
\begin{table}[H] \centering
\belowcaptionskip=0.25cm
\caption {List of the main logical operators used by \RR.}
\begin{tabular}{l l}
\hline \hline
	\code{x < y} & less than \\
	\code{x > y} & greater than \\
	\code{x <= y} & less than or equal to \\
	\code{x >= y} & greater than or equal to \\
	\code{x == y} & equal to \\
	\code{x != y} & not equal to \\
	\code{\&} & AND \\
	\code{\&\&} & AND with IF \\
	\code{|} & OR \\
	\code{||} & OR with IF \\
\hline
\end{tabular}
\label{tab:ops}
\end{table}
%%%%%%%%%%

%%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%%
\benN
	\item Now your turn; use the \code{order()} function to re-sort
	\code{mydata} according to \emph{increasing} bark thickness, then
	decreasing sapwood area (be sure that the neighboring columns are also
	affected!). If you want the changes to be stored as new objects in memory,
	name it first (\eg \code{mydata2 <- mydata[order(mydata\$dbh),]}). Also try
	sorting by the column \emph{spp} (both $\uparrow \downarrow$), which by
	now you know to be a factor: what happens?
	
	\item You can also sort by multiple variables/factors. Sort \code{mydata} first 
	by \code{species} alphabetically and then by decreasing \code{NobarkArea}.
	\item Now create a new data frame called \code{tree1} which contains only
	\code{tree \#}, \code{SapDepth}, and \code{SapArea} and is sorted by
	decreasing \code{SapDepth}. I suggest you play with this a bit as you will be
	doing much of this with your own data and the data sets we will be using
	during this workshop. There is a function called \code{subset()} that can
	simplify this process, but since this function does not do any sorting, you'll
	have to do this in two steps. If you have time you should explore this way too.
	How would you make a data frame containing only those trees measured in
	\code{Winter}?
	
	\item You can also exclude certain cases from the data frame that do not 
	satisfy a given set of arguments. For example:%
	%
%	\footnote{For this command and all that follow throughout this tutorial, if you
%	     attached your data using \code{attach(mydata)} you do \textbf{not} need to
%	     refer	to objects from the data frame using this construction. Nevertheless, 
%	     I would steer clear of \code{attach()} and see my note about it above.} %

<<subset_dataframe, eval=FALSE>>=
mydata[mydata$BarkThick != 0.0, ]
@

\dots~check how many cases were removed

<<BTzero>>=
BTzero <- mydata[mydata$BarkThick != 0.0, ]
nrow(BTzero)
nrow(mydata)
@

	In the first call above you are asking \R to return \textbf{all rows} (notice the
	comma) where the column \code{BarkThick} is \textbf{not} equal to zero. These
	logical arguments (\ie \code{!=}) can also be used in combination with others
	using the \verb+&+ symbol. For example, limit the data frame to only those
	trees which have a sapwood area $\le$ 100 ${\rm cm}^2$ \emph{and} have a
	sapwood depth $\ge$ 2 (there should be 7). If you think you'll be using this 
	new data set, simply assign it a new variable name and work with that new
	data frame. Table~\ref{tab:ops} shows other operators for comparisons and selecting data.
	
	\item No preliminary exploration is complete without taking a look at some 
	graphs (psst: advisors \emph{love} graphs \Smiley), so lets take a look at the
	usual suspects in data exploration.
	
		\bit
			\item Histogram (the overall distribution of your data)
			\item Boxplot (discontinuous data; relationships)
			\item Scatterplot (continuous data; relationships)
			\item Multiple scatterplots (data separated by \eg sex, season, etc.)
			\item Barplot (data displayed by category as bars)
		\eit
	
\noi Here is the \R code for Fig.~\ref{fig:expplot} (ensure that \code{mydata} is still in memory!):


\fig[H]
\centering
<<expplot, tidy=FALSE,out.width="\\textwidth">>=
par(mfrow=c(1,3)) ## create plot array of 1 row x 3 columns
hist(mydata$SapDepth, xlab= "Sapwood Depth",
     main= "Histogram: Sapwood Depth", col= "gray50")
boxplot(SapDepth ~ spp, data= mydata, ylab= "SapDepth", 
        col= "darkslateblue", main= "Boxplot: Sapwood Depth by Species")
plot(mydata$dbh, mydata$Heartwood, pch= 17, col= "darkred", 
     ylab= expression(paste(plain(Area)," ",(cm^2))), xlab= "DBH (cm)", 
     main= "DBH vs. Heartwood & Sapwood")
points(mydata$dbh, mydata$SapArea, pch= 19, col= "darkgreen")
legend("topleft", legend= c("Heartwood", "Sapwood"), 
       pch= c(17, 19), col= c("darkred", "darkgreen"), bg= "gray95")
@
\caption{Quick and dirty visual exploration of data.\label{fig:expplot}}
\efig
	
	Fig.~\ref{fig:expplot} shows examples of various types of exploratory plots. 
	Use the code above to replicate it.
	Look at \code{mydata} and explore some
	relationships for yourself graphically (\eg do species differ in sapwood
	depth?). Upon visual inspection of Fig.~\ref{fig:expplot}b it looks like they do.
        To confirm this, you would then have to show if this difference is statistically
	significant (see \S~\ref{sec:stats}). From the scatterplot it also appears there
	is a relationship between dbh and the two chosen tree characteristics. A
	correlation or linear regression analysis would bear this out statistically (see
	again \S~\ref{sec:stats}).
	
Lastly, the \code{qplot()} function (from the \code{ggplot2} package, which will be covered
more later in the week) allows us (among other things) to plot each species in a separate
color \emph{or} in its own separate subplot.
In the first case shown in Figure~\ref{fig:xyplot}, we specify that \code{SapDepth} and \code{dbh} are the $x$ and $y$
coordinates respectively, we tell \R to look within the data frame \code{mydata}
for the variables, and we specify that the points should be colored according
to species.  In the second case, we say that the plot should be divided into \emph{facets}
according to species (we have to specify it as a formula, which is why we say
\verb+~spp+ instead of just \code{spp}).

Try it out for yourself with different dependent and independent variables.
What happens when if you use a two-sided formula
for the facets (\verb+Infected~spp+) instead of just \verb+~spp+
as your grouping factor? 
(You will need the \code{ggplot2} package installed and loaded to
use \code{qplot}.)

\fig[H]
\centering
% fake
<<eval=FALSE>>=
qplot(SapDepth,dbh,data=mydata,color=spp) ## left subfigure
qplot(SapDepth,dbh,data=mydata,facets=~spp) ## right subfigure
@ 
<<message=FALSE,echo=FALSE>>=
library(ggplot2)
library(gridExtra)
grid.arrange(qplot(SapDepth,dbh,data=mydata,color=spp),
             qplot(SapDepth,dbh,data=mydata)+facet_wrap(~spp),
             ncol=2)
## +facet_wrap(~spp)
## xyplot(SapDepth ~ dbh | spp, data= mydata, 
##        xlab= "dbh (cm)",
##        ylab= "Sapwood Depth", 
##        pch= 19, 
##        col= "navy", 
##        as.table= TRUE)
@
\caption{More quick and dirty visual exploration of data.\label{fig:xyplot}}
\efig

	% The function \code{coplot()} produces similar graphics output to
	% \code{xyplot()}, however it may be more appropriate for multivariate data,
	% especially where the ``organizing'' variable (spp in Fig.~\ref{fig:xyplot}) is
	% continuous (\eg temperature, altitude, etc.). Experiment with \code{coplot()}
	% on the above instead of \code{xyplot()} and compare.
	
	\item Make a Barplot of the mean \emph{Area Without Bark} first by 
	\code{season} and then by \code{species}. The conclusions from these results are 
	meaningless, but you'll soon have a fairly complete script for making nice \R 
	plots. I will start you off with the season, can you follow what is happening at 
	each step? \code{tapply()}%
	%
	\footnote{Similar to \code{apply()} from above, but can be applied to 1D 
	     objects. It can be viewed as a combination of the 	\code{table()} and 
	     \code{apply()} functions.} %
	%
	is an extremely useful function that applies a function to the values in a vector
	and produces a table based on one or more grouping variables (\eg factors) and 
	shows the result of that function for each group. This sounds complicated but it 
	really isn't. To illustrate, try typing 
	\code{tapply(mydata\$dbh, mydata\$season, sd)}. What did \R return? All I'm 
	doing below is telling \R to take that table and make it a vector instead. Now 
	complete the Barplot exercise for season, and then repeat for species or another
	dependent variable.

<<Nobark_barplot, eval=FALSE>>=
Nobark.season <- tapply(mydata$NobarkArea, 
                                  mydata$season, mean)
x.axis <- levels(mydata$season)
barplot(Nobark.season, names= x.axis, ylim= c(0, max(Nobark.season)))
@

Notice that when defining the vector containing means, they are combined by 
\code{tapply()} from left to right \emph{alphabetically}. This is good because
in the next step I define the x-axis with the \code{levels()} function, which
also arranges its levels alphabetically. Otherwise it would decouple the
mean value with the season to which it corresponds! Pay attention to these
kinds of operations: always double check against known values (do they look
right?) and always scrutinize the graph (does it meet your expectations?).
\een


%%%%
\np %%
%%%%%%%%%%%%%%%%%%%%%
\section{Vectors} \label{sec:vectors}
%%%%%%%%%%%%%%%%%%%%%
A vector is basically just a list of values, all the same type or class (numeric, logical, character). The numeric vector is a list of numbers and is one of the most often manipulated objects in \RR. Actually, a number in \R is treated as a vector of length 1. They are used for everything from calculating means, plotting graphs, storing outputs of functions, or simply storing parameters for later use within a larger program. It is essential to learn to master their manipulation. Some \R functions create vectors, but you can also create/assign vectors yourself. For example:

<<v1>>= 
v1 <- c(1, 2, 7, 8, 9, 13)
@

Simply typing \code{1, 2, 7, 8, 9, 13} without \code{c()} results in an error. In addition, vector elements can be named. For example:

<<v2>>=
v2 <- c(first = 1, second = 1.2, third = 1.6, fourth = 2.1)
@

There are numerous ways to refer to elements in vectors (\emph{vector addressing}):
\bit
\item by position: \code{v1[1]} or (multiple elements) \code{v1[c(1,2)]}
\item by name: \code{v2["first"]}; only if you've already assigned element
	names as in \code{v2}
\item by exclusion: \code{v1[-1]} drops the first element
\item with logical vectors: \code{v2[c(TRUE,TRUE,FALSE,FALSE)]} gives the first
	two elements only; more usefully, \code{v2[v2>1.2]} first computes a logical
	vector \code{[FALSE,FALSE,TRUE,TRUE]} and then uses it to select the last
	two elements only. 
\eit

Most of \RR's functions are \emph{vectorized}; if you give them vectors they automatically do the right thing (which can frustrate \pkg{MATLAB} users!). Functions of two vectors such as \code{(c(1,3,5,7) + c(2,4))} will automatically replicate the shorter vector until it is as long as the longer one, then carry out the calculation and give a warning message if the longer vector is not an even multiple of the shorter. For example:

<<vector_operations>>=
v1
3 * v1
v1 + c(3,2,1)
v1 * c(2,4)
v1 * c(1,2,3,4)
@


Some of the most basic and useful functions for manipulating vectors are in Table~\ref{tab:vecs}. \textbf{Remember:} operations and functions are generally applied to vectors in \R on an \emph{element-by-element} basis.

%%%%%%%%%%%%%%%
\begin{table}[H]\centering
\belowcaptionskip=0.25cm
\caption{A few of useful commands for creating and modifying vectors in \RR.}
\begin{tabular}{l l}
\hline \hline
	\code{1:x}&Vector from 1 to $x$ by increments of 1. \\
	\code{seq(from,to,by=x)}&Vector values in increments of $x$ (if $x$ $<$ 0 = decreasing) \\
	\code{seq(from,to,length=x)} & Vector of evenly spaced values of $x$ length \\
	\code{c(u,v,\dots)}&Combine numbers and/or vectors into a single \\ &
	 vector (from left $\to$ right) \\
	\code{rep(a, b)}& Creates vector of repeating $a$ elements by amounts of $b$\\
	\code{length(x)}& Number of entries/elements in vector $x$\\
	\code{abs(x)}& Absolute value of $x$ \\
	\code{cos(x),sin(),tan()}& Cosine, sine, tangent of angle x in radians \\
	\code{exp(x)}& Exponential function, $e^x$ \\
	\code{log(x)}& Natural logarithm of $x$ ($log_e$)\\
	\code{log10(x)}& Common logarithm of $x$ ($log_{10}$)\\
	\code{sqrt(x)}& Square root of $x$ \\
	\code{head(x,n)}& Returns the first $n$ entries of vector $x$ \\
	\code{tail(x,n)}& Returns the last $n$ entries of vector $x$ \\
	\code{round(x,n)}& Round entries of $x$ to $n$ decimal places \\
\hline
\end{tabular}
\label{tab:vecs}
\end{table}
%%%%%%%%%%%%

%%%%%%%%%%%%%%
\subsection*{Exercises} \label{sec:vector_ex}
%%%%%%%%%%%%%%%
\benN
	\item Create a vector with the numbers: \code{2,5,8,2,1,9,4,7,5,6} named 
	\code{v3}.
	
	\item Create a vector with the numbers: \code{2,2,2,5,5,9,9,8,8,8,8}. 
	Try to use the \code{rep()} command and call it \code{v4}.
	
	\item Create a vector called \code{v5} that ranges from 0 to 6.5 by
	increments of 0.5. How many entries are in this vector? (I don't want to see
	anyone pointing at the screen and counting!).
	
	\item Understand why \code{rep(1:5, 1:5)} and \code{rep(1:6, c(1, 2, 3, 3, 2,
	1))} produce what they do.%
	%
	\footnote{The \code{each=} argument within \code{rep()} can also be very
	      useful.} %
	
	\item Create a vector called \code{all.vecs} that includes all the numbers
	you just created (in \code{v3,v4,v5}) in one long vector (any order).
	
	\item Create a vector of 15 random numbers from 50 $\to$ 150 and 
	name it \code{mass}.%
	%
	\footnote{See Table~\ref{tab:distns} for more information about random
	     number generation.} %
	%
	In this case assume a uniform distribution.
	
	\item Assume this is the yearly increase in mass for some treatment of 
	interest, create a new vector called \code{mass.d} representing the
	\emph{daily} weight gain. Now create a new vector called \code{mass.r}
	with these daily values rounded to two decimal places. What is the average
	daily weight gain?
	
	\item As I said in \S~\ref{sec:exploring}, I think \code{sort()} is more 
	appropriate for stand-alone vectors, so try using it on \code{mass} (both
	increasing and decreasing).
	
	\item Create a new vector of length 7 with \binpI \item the minimum value 
	   \item maximum value \item mean \item sum \item median \item and range 
	   \einp of \code{mass.r}.
	
	\item Use the entries in the vector above to multiply the mean daily weight
	gain by the total weight gain, divided by the square root of its median,
	plus the log of the minimum daily gain. I cannot imagine why you would want to
	do this, but you get the point; you can use vectors to make calculations and
	then use those vector entries to make subsequent calculations. For example,
	you could place all the parameters of a model into one vector, then call them
	independently by vector addressing we learned above (\ie \code{[]} brackets).
	
	\item Now let's use the imported tree data to do something. The process is 
          similar to above. Assume individual leaf area is a function of numerous other 
          tree characteristics and can be calculated as follows:

		\Eqn \label{eqn:leafarea}
                \text{Leaf Area} =  \big( 0.1 \cdot \delta ^{(1-\beta)} \big) \ + \ 
                \sqrt{\frac{\tau + 1}{\sigma}}
		\eqn

	where $\delta$, $\beta$, $\tau,$ and $\sigma$ are tree diameter
	(\code{dbh}), bark thickness, area without bark, and sapwood area
	respectively. This would be similar to writing an equation in \code{Excel} and
	then copying it down the column. \textbf{Note}: you could use either \code{sqrt()}
	or \verb+^0.5+), but for cube roots or higher, you must use the
	``fractional exponents'' syntax construction. Create an object called 
	\code{LeafArea} that represents Eqn~\eqref{eqn:leafarea}. You will use this
	object later in this tutorial. The resulting vector should look like this:
\een

<<LeafArea, echo=FALSE>>=
LeafArea <- 0.1*mydata$dbh^(1-mydata$BarkThick) + sqrt((mydata$NobarkArea+1)/mydata$SapArea)
@

<<LeafAreaCall>>=
LeafArea
@


%%%%
\np %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Generating Random Numbers} \label{sec:rvec}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\R provides great flexibility and options for generating random sequences of numbers from various distributions. Table~\ref{tab:distns} shows the major functions used to produce numbers according to these distributions.

\begin{table}[H] \centering
\belowcaptionskip=0.25cm
\caption{Random number generation in \R from various distributions.}
\begin{tabular}{l l}\hline\hline
  \code{runif(n,min,max)} & Random $n$ numbers from uniform distribution \\
  \code{rnorm(n,mean,sd)} & Random $n$ numbers from a Normal distribution\\
   &  with given mean and standard deviation \\ 
  \code{rpois(n,lambda)} & Random $n$ numbers of Poisson distribution\\
  \code{rbinom(n,size,p)} & Random $n$ numbers from Binomial distribution; \\ & size = \# trials; p = probability of ``successful'' trial \\
  \code{rnbinom(n,size,mu)} & Same as above except Negative binomial distribution \\ & (\code{mu} = alternative parameterization via the mean: \code{mu} must be specified by name) \\
  \code{rgamma(n,shape,scale)} & Random $n$ numbers from Gamma distribution \\
\hline
\end{tabular}
\label{tab:distns}
\end{table}

Of course you can also determine the probability density, distribution, and quantile functions for each of these distributions using either \code{d}, \code{p}, or \code{q} instead of \code{r}. For example, \code{dbinom}, \code{pbinom}, \code{qbinom} will produce the probability density, distribution, and quantile functions respectively from a Binomial distribution for a given set of parameters.

%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%
\benN
	\item It is important to get some experience ``playing'' with these functions,
	especially what they look like and what they're used for. Look at 
	Fig.~\ref{fig:normaldist} which shows what each of these functions do. Try to 
	reproduce it using the \code{curve()} and \code{hist()} functions.%
	%
	\footnote{These functions are introduced in \S~\ref{sec:plots}. Look ahead for 
	      clues.} %
	%
	Below is the code for the first plot to get you started:%
	%
	\footnote{Modified from Crawley, M.J. (2007). \emph{The R Book}. John Wiley \&
	     Sons. pp. 950.} %
	%
	
% Just to show the 1st one; not all figs present here
<<dnorm, eval=FALSE>>=
par(mfrow=c(2,2))
curve(dnorm, from= -4, to= 4, xlab= "z", ylab= "Probability Density", 
   main= "Density", col= "darkgreen", lwd=2)
@

\fig[H]
\centering
<<normaldist, fig=TRUE, echo=FALSE, width=7>>=
par(mfrow=c(2,2), oma = c(0, 0, 3, 0), bg="white")
curve(dnorm, -4, 4, xlab= "z", ylab= "Probability Density",
   main= "Density", col="darkgreen", lwd=2)
curve(pnorm, -4, 4, xlab= "z", ylab= "Probability",
   main= "Probability", col="darkgreen", lwd=2)
curve(qnorm, 0, 1, xlab= "p", ylab= "Quantiles (z)",
   main= "Quantiles", col="darkgreen", lwd=2)
set.seed(101)
hist(rnorm(1000), xlab= "z", ylab= "Frequency",
   main= "Random Numbers", col="darkgreen")
mtext("The Normal Distribution", outer= TRUE, cex= 1.5, font=2)  # main title
@
\caption{Reproduced shamelessly from \emph{The R Book} (p. 219).\label{fig:normaldist}}
\efig

\item What does the following \R code (below) produce and why? Can you 
  predict what the plot will display before you run it? It is often convenient when
  generating random numbers to set the \emph{seed} using \code{set.seed()}
  so that you can ensure that the \emph{same} random numbers are generated
  each time. Doesn't that make it \emph{non}-random? Yes, but it's good for
  ``debugging'' and code verification (and making sure students get the same
  result you do!). The actual seed can be any arbitrary integer (666 produced a
  nice histogram). Can you produce something similar, but with
  \emph{SapDepth} as in Fig.~\ref{fig:expplot}a?

<<nD, tidy=FALSE, eval=FALSE>>=
set.seed(666)
nD <- rnorm(1000, mean=400, sd=25)
hist(nD, main= "", xlab= "nD", ylab= "Probability Density", 
   col= "gray88", freq= FALSE)
curve(dnorm(x, mean(nD), sd(nD)), from= min(nD), to= max(nD),
   col= "navy", lwd= 2, add= TRUE)
@

\een


%%%%
\np %%
%%%%%%%%%%%%%%%%%%%%%%
\section{The Matrix} \label{sec:matrix}
%%%%%%%%%%%%%%%%%%%%%%
Besides evoking images of bullets flying in 3D surround slo-mo, the matrix (or 2-dimensional array) can be put to good use in \RR. Aside from their obvious use in population projections and classical matrix algebra. Be careful! \RR's default multiplication is set to the entry-wise product%
%
\footnote{The Hadamard product. Remember, the default in \R is almost always 
     entry-wise.} %
%
using the \code{*} symbol. For proper matrix multiplication you will want to use \verb+%*%+ (see Table~\ref{tab:matrix}). If nothing else, matrices are useful in storing output of a given program/function, which is then to be used to some other purpose (\eg plotting).


%%%%%%%%%%%%%%%%
\begin{table}[H] \centering
\belowcaptionskip=0.25cm
\caption{A few important functions for working with matrices in \RR.}
\begin{tabular}{l l}
\hline \hline
	\code{matrix(v,m,n)} & $m \times n$ matrix using the values in $v$ \\
	\code{nrow(),ncol()}& Number of rows and columns of a data frame or matrix\\
	\code{t(A)} & Transpose (exchange rows and columns) of matrix \textbf{A} \\
	\code{dim(A)} & Dimensions of matrix \textbf{A}. (dim(A)[1]=rows, dim(A)[2]=cols) \\
	\code{edit(A)} & Call up ``spreadsheet'' interface to edit the values in \textbf{A} \\
	\code{diag(v,n)} & A diagonal $n \times n$ matrix with v on diagonal, 0 
	elsewhere \\& (by default v = 1, so \code{diag(n)} gives an $n \times n$ 
	identity matrix; \\&if v = vector it is placed on the diagonal and $n$ = length(v))\\
	\code{cbind(a,b,c,\dots)} & Combine \textbf{compatible} objects by attaching them by columns \\& (\eg a data frame with only numeric data) \\
	\code{rbind(a,b,c,\dots)} & Same as \code{cbind} but attaching them by rows 
	(can also bind \\& multiple matrices in this manner if dimensions match) \\
	\code{as.matrix(x)} & Convert object of another type to a matrix (if possible) \\
	\code{outer(v,w)} & Outer product of vectors \code{v}, \code{w}: the matrix
	         whose $(i, j)^{th}$ \\ & element is \code{v[i]*w[j]}\\ 
	\code{A[,n]} & Returns a vector of column $n$ \\
	\code{A[m,]} & Returns a vector of row $m$ \\
	\code{A + B} & Matrix addition (or - for subtraction); entrywise \\
	\code{sum(A)} & Sum of all elements of matrix \textbf{A} \\
	\code{A \%*\% B} & Matrix multiplication ($A \times B$); (or \%/\% for division) \\
	\code{A * B} & Entrywise multiplication ($A \times B$); the Hadamard product \\
	\code{det(A)} & The determinant of matrix \textbf{A}\\
	\code{eigen(A)} & The eigenvalues and right eigenvectors of matrix \textbf{A} \\
\hline
\end{tabular}
\label{tab:matrix}
\end{table}
%%%%%%%%%%

%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%%
\benN
	\item Use the construction \code{matrix(v, nrow, ncol)} to create the matrix
	in Eqn~\eqref{eqn:ex1} where $v$ is a data vector created using the
	\code{seq()} function from Table~\ref{tab:vecs}. You will probably need to modify the function with the argument \code{byrow=TRUE}.
	
		\Eqn \label{eqn:ex1}
		\textbf{Mat} = 
			\Mat
				16 & 12 & 8 \\
				4 & 0 & -4 \\
				-8 & -12 & -16 \\
			\mat
		\eqn

	\item Create the matrix \textbf{A} below in Eqn~\eqref{eqn:A}. You can do 
	this either by using the \code{c()} command and typing the entries
	individually or first creating a vector with the entries you want and then
	putting that vector as the first argument in the \code{matrix()} function. 
	Try it both ways. 

		\Eqn \label{eqn:A}
		\textbf{A} = 
			\Mat
			2 & 3 & 7 & 8.8 & 11 \\
			3 & 4.3 & 8 & 9 & 12 \\
			8 & 16 & 0.1 & 5 & 9 \\
			5 & 0.4 & 9 & 1.7 & 3\\
			0.7 & 6 & 5.9 & 4 & 7
			\mat
		\eqn

	\item Use \code{dim()} to determine the dimensions of \textbf{A}. The output is
	a two column vector (rows, columns), which is the standard for \R (rows first, 
	then columns), however this example is perhaps not ideal as nrows = ncols 
	(5 $\times$ 5). Alternatively, you may use \code{nrow()} or \code{ncol()}
	instead of \code{dim()[1]} and \code{dim()[2]} respectively. These
	commands are especially useful when writing functions (see \S~
	\ref{sec:functions}).
	
      \item Use the \code{rowSums()} command to create a vector that contains
	the totals of each row of \textbf{A}. Now do the same for the columns; can you 
	guess what the command is called? It might be useful to give these vectors a
	name. Now try using the \code{apply()} function from \S~\ref{sec:intro} to
	do the same thing.
	
      \item Now use the commands \code{cbind()} and \code{rbind()} to attach these 
	two vector of sums to create a new matrix \textbf{B} using the tools above 
	which looks like Eqn~\eqref{eqn:B}.

		\Eqn \label{eqn:B}
		\textbf{B} = 
			\Mat
			2 &      3 &       7 &       8.8 &    11 &   31.8 \\
			3 &      4.3 &    8 &       9 &       12 &   36.3 \\
			8 &      16 &     0.1 &     5 &       9 &    38.1 \\
			5 &      0.4 &    9 &       1.7 &     3 &    19.1 \\
			0.7 &   6 &       5.9 &     4 &       7 &    23.6 \\
			18.7 & 29.7 &   30.0 &   28.5 &   42 &   148.9
			\mat
		\eqn
		
              \item Add the vector you created earlier called \code{LeafArea} to
                \code{mydata} as a column in a similar fashion as above.
	
              \item It's sometimes necessary to export your data frame or matrix. Use 
                the \code{write.csv()} command to do this (it will go to your home directory). 
                Try exporting matrix \textbf{B} to a \code{.csv} file called 
                \code{MatrixSum.csv}.%
	%
                \footnote{If you do export a modified data frame, be sure to rename the \emph{new} 
                  modified data set with a unique file name. It's potentially disastrous to 
                  re-enter your data (so be careful). It's usually possible, and preferable, to 
                  reshape/manipulate your data \emph{within} the \R environment using commands
                  at the beginning of your script, as opposed to overwriting \code{*.csv} files.}	%
                If you get stuck, don't forget about \code{?write.csv}.
	
	\item Matrix population models use projection matrices to estimate future 
	populations using a set of difference equations which are condensed into a 
	matrix which contains the vital rates of various stages/classes. The simplest 
	of these models takes the form:\label{q:map}

		\Eqn \label{eqn:map}
			\vec x_{(t+1)} = \textbf{M} \; \cdot \; \vec x_{(t)}
		\eqn

	where $\vec x$ is a vector containing the number of individuals in a given age, 
	stage, or class and \textbf{M} is the projection matrix. Consider the projection matrix
	\textbf{M} in Eqn~\eqref{eqn:M} which contains 3 classes. The diagonal represents
	the survival rates of the classes, the sub-diagonal (0.1 and 0.3) represents the
	transition probability (\ie class 1 $\to$ 2 and class 2 $\to$ 3) and the top row
	represents the fecundity of each class (the number of class 1 individuals 
	produced per individuals of class 2 or 3). Class 1 does not reproduce but if it
	did, entry \textbf{M}[1,1] would be $s_1 + f_1$.

		\Eqn \label{eqn:M}
		\textbf{M} =
			\Mat
				s_1 & f_2   & f_3 \\
				t_1 & s_2  & 0 \\
				0    & t_2  & s_3
			\mat
			=
			\Mat
				0.3 & 1.0 & 3.0 \\
				0.1 & 0.4 & 0.0 \\
				0.0 & 0.3 & 0.8
			\mat
		\eqn

	Assuming $\vec x_1$ = [1 \ 2 \ 4], calculate $\vec x_2$. Be careful,
	remember how \RR's matrix multiplication works (you'll need to use
	\verb+%*%+). We'll come back to Eqn~\eqref{eqn:M} later.
\een


%%%
\np %
%%%%%%%%%%%%%%%%%%%%
\section{Loops} \label{sec:loops}
%%%%%%%%%%%%%%%%%%%%
Loops are essential in any programming language, master them and you're well on your way. There are two basic loops in \RR, the \code{for}-loop and the \emph{while}-loop. The \code{for}-loop runs through the loop $x$ number of iterations, then exits the loop and continues to the rest of the program. The \code{while}-loop continues looping until some predetermined criteria (\code{condition}) is met (\eg as long as $x$ is $\le$ then 1000, continue on looping). In \code{while}-loops it is often a good idea to place a \code{counter} within the loop (see below) so that later you can determine how many iterations were required to exit the loop. In this sense, \code{for}-loops could be considered a subset of \code{while}-loops (if the counter were used in the condition statement). In addition, your variable in defining the condition must appear somewhere within the loop and must change as the loop iterates (otherwise you'll enter the loop and never get out!). Below is an example of a simple \code{for}-loop:

<<for_loop>>=
Popn <- 1     ### create an initial population size
Gen <- 10     ### how many generations
for (n in 2:Gen) {
   Popn[n] <- Popn[n - 1] * 2
}
Popn
@

After calling \code{Popn}, this should look familiar, the population appears to exhibit exponential growth. \textbf{Note} the values over which you are asking \R to iterate (2:Gen). Have you seen this construction before? What do you get when you enter \code{2:10} in the console? What this means is that it is possible to skip iterations by using something like: \code{for (i in seq(2,10,2))}. 

\noi Here is a similar example to the above using a \code{while}-loop:

<<while_loop>>=
pop.vec = pop.now <- 1; count <- 1
while (pop.now <= 25000) {
   pop.now <- pop.now * 2
   pop.vec <- c(pop.vec, pop.now)
   count <- count + 1
}
pop.vec
@
	
So essentially what you're telling \R is that it should start with the current population of 1, the storage vector that will contain the population trajectory is also 1 (since it hasn't started yet), and I've added a counter so that following the simulation I will be able to know how many iterations were required to satisfy the condition that the population be $\le 25000$. Notice that I've used the \code{c()} function to \emph{combine} the previous population sizes with the current population size; \R is essentially adding a new number (\code{pop.now}) to an ever growing population vector (\code{pop.vec}), until the population is more than 25000. Now call \code{pop.vec} and \code{n} to see the simple projection and how many iterations it took to do so. Now, you can modify this code to replicate the classic expression, ``double a penny every day for a month and you'll be a millionaire.'' Is this true? How long does it take to become a millionaire?

Lastly, you can also do loops within loops (aka \emph{nested} loops), which is quite common but no more complicated than regular sequential loops, just keep track of what you're doing as they can get messy. Also, people tend to use \code{n} or \code{i} as counters in \code{for}-loops (I don't know why), but it could be any letter or even combination of letters you want.

%%%%%%%%%%%%%
\subsection*{Exercises} 
%%%%%%%%%%%%%
\benN
\item Try adding some ``noise'' or stochasticity to the first population growth 
  model (the \code{for}-loop) using some random number generation, then 
  roughly plot the results. We'll do more intense plotting in \S~\ref{sec:plots}. 
  For now try:

<<plotPopn, eval=FALSE>>=
plot(1:Gen, Popn, type="b")
@

\fig[H]
\centering
<<Popn, fig=TRUE, echo=FALSE>>=
par(mfrow=c(1,2))
plot(1:Gen, Popn, type= "b")
plot(1:Gen, jitter(Popn, amount=50), type= "b", pch= 19)
@
\caption{Quick graph of exponential population growth using a simple \code{for}-loop.\label{fig:Popn}}
\efig

	Alternatively, you could add noise to a data vector using the \code{jitter()} 
	function.
		
	\item Use a loop to create a vector representing a Fibonacci sequence running 
	from 0 $\to$ 610. I suggest starting with the \code{Fib.v = c(0,1)}, then add
	to this vector at each iteration of the loop. In the end it should look like this:

<<Fib.v, echo=FALSE>>=
Fib.v <- c(0,1)
while (max(Fib.v) < 610) Fib.v<-c(Fib.v, Fib.v[length(Fib.v)-1]+Fib.v[length(Fib.v)])
@

<<Fib.v_call>>=
Fib.v
@

Then:

		\benI
			\item Use this vector to create the following matrix:

<<FibMat, echo=FALSE>>=
Fibonacci1 <- matrix(Fib.v, 4,4)
Fibonacci1
@

		\item Use the \code{sample()} function to create a matrix of random
		Fibonacci numbers (see \code{?sample} to determine the proper syntax
		if necessary). For example:%
		%
		\footnote{Yours won't be \emph{exactly} the same because of random sampling!}\\
		%

<<FibMat2, echo=FALSE>>=
Fibonacci2 <- matrix(sample(Fib.v, length(Fib.v), replace=FALSE), 4, 4)
Fibonacci2
@
	\een

	\item Write a script using a \emph{nested} \code{for}-loop which produces the 
	following $5 \times 5$ projection matrix:%
	%
	\footnote{It is not absolutely necessary to use a nested \code{for}-loop to 
	     accomplish this but it's a good exercise.} %
	%

			\Eqn \label{eqn:nested}
			\textbf{NestMat} =
			\Mat
				0 & 1 & 2 & 3 & 4 \\
				0.1 & 0 & 0 & 0 & 0 \\
				0 & 0.2 & 0 & 0 & 0 \\
				0 & 0 & 0.3 & 0 & 0\\
				0 & 0 & 0 & 0.4 & 0
			\mat \nonumber
			\eqn
	
	\item Use a loop and Eqn~\eqref{eqn:M} to project the 3 class population
	described above for 20 generations (\ie $\vec x_i$, $i=1\dots20$). You will
	need to use a secondary matrix to store the results of the iterations (and then
	to plot them in \S~\ref{sec:plots}). You should use Exercise~\ref{q:map} in
	\S~\ref{sec:matrix} to guide you.
\een


%%%
\np %
%%%%%%%%%%%%%%%%%%%%%%
\section{Plotting and Graphing} \label{sec:plots}
%%%%%%%%%%%%%%%%%%%%%%
Remember what I said in \S~\ref{sec:exploring} about advisors and graphs. That aside, plots are an excellent way to summarize various data visually and are essential if you're ever going to publish your data. Luckily, \R makes great high-quality plots of all types, colors, and varieties. I also recommend you figure out how to save your plots as a pdf so you can send them to your advisor \Smiley.

\begin{table}[H] \centering
\belowcaptionskip=0.5cm
\renewcommand{\arraystretch}{2}
\tabcolsep=0.33cm
\caption {List of the 8 numeric colors used by \RR. Thereafter, numeric colors are recycled, thus \code{col=9} is again \code{"black"}. There are many many other colors available using a string argument construction (\ie \code{"midnightblue")}:
  see \code{colors()}.}
\begin{tabular}{l l | l l | l l}
\hline \hline
	No. & Color & No. & Color & No. & Color \\
	\hline
	1 & Black & 4 & \blue{Blue} & 7 & \color{yellow}{Yellow} \\
	2 & \red{Red} & 5 & \color{cyan}{Cyan} & 8 & \color{light-gray}{Gray} \\
	3 & \green{Green} & 6 & \color{magenta}{Magenta} & 9 & Black \\
\hline
\end{tabular}
\label{tab:cols}
\end{table}

One of the simplest ways view, visually explore, or summarize data is to plot them. The usual suspects are barplots, boxplots, scatterplots, and 
pie charts. However, depending on your audience they can be useful.

I can never remember the individual numeric codes for the various \textbf{points}, \textbf{line types}, and \textbf{colors} available in \R plotting functions, so Fig.~\ref{fig:pchsymbols} below is a cheat-sheet:


<<pchsymbols, echo=TRUE, tidy=FALSE, fig=FALSE>>=
plot(1:25, 1:25, xlab="",ylab="",pch=1:25,col=1:25,cex=2)
grid(lty=1, col="gray90")
points(1:25, 1:25, xlab="",ylab="",pch=1:25,col=1:25,cex=2)
title("Plotting symbol, line type, & color codes")
legend("topleft", legend=1:6, lty=1:6, lwd=1.5, ncol=2, bg="gray95")
legend("bottomright", legend=1:8, col=1:8, ncol=3, pch=19, bg="gray95")
@


\fig[H]
\centering
<<figsymbols, fig=TRUE, echo=FALSE>>=
<<pchsymbols>>
@
\caption{Basic numeric plotting codes used by \RR. For points, the symbols are accessed using the \code{pch=} argument and range $1 \to 25$ (some are redundant). Line types are shown in the legend, are accessed using the \code{lty=} argument, and range $1 \to 6$. Color codes (see Table~\ref{tab:cols}) range from $1 \to 8$ and are accessed using the \code{col=} argument.\label{fig:pchsymbols}}
\efig

First create some dummy data to plot:

<<dummydata>>=
set.seed(876)
InfStatus <- factor(sample(c("Susceptible","Infected","Recovered"),
   size= 50, replace= TRUE))
I <- table(InfStatus); I
Genotype <- factor(sample(c("RR","Rr","rr"), size= 50, replace= TRUE))
G <- table(Genotype); G
table(Genotype, InfStatus)
@

\fig[H]
\centering
<<dummyplots, fig.height=6, fig.width=6,fig.keep="last">>=
par(mfrow=c(2,2), mar=c(3,2,2,1), oma= c(0,0,3,0), bg="white")
plot(InfStatus, ylim= c(0,27)); box() ### or barplot(I); box()
barplot(table(Genotype, InfStatus), ylim= c(0,13), beside= TRUE); box()
legend("topright", c("RR","Rr","rr"), fill= c("gray40","gray70","gray90"),
   ncol= 1, cex= 0.75)
boxplot(rnorm(50, mean=15, sd=3) ~ Genotype, col= "gray75")
pie(G, col= c("gray50","gray70","gray90"))
mtext("Basic R Plots", outer= TRUE, cex= 1.5, font= 2)  # main title
@
\caption{Examples of a few basic exploratory plots (and code syntax) for barplots, boxplots, and pie charts in \RR.\label{fig:dummyplots}}
\efig

There are literally \emph{endless} ways to modify graphics plots in \R including colors, axis type, line type, adding a legend (as above), add a grid, background colors, labels, borders, etc. What happens if you change the \code{beside} argument = \code{FALSE} in Fig.~\ref{fig:dummyplots}b? The help pages for most of these functions can point you in the right direction for the syntax of what you would like to do. You may also want to use the \code{par()} to modify your plots; type \code{?par} to see how truly massive this help file is! Odds are you'll find what you need buried in there, somewhere.

%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%
\benN
	\item Boxplots can be a particularly good way of presenting data and exploring 
	relationships without losing a lot of important information about the
	distribution of the data (especially for continuous data; I don't like barplots for 
	this reason). Take a look at Fig.~\ref{fig:expplot}b and reproduce it for 
	another variable in the data set. Next, say you want to explore the possibility
	of a relationship between season, infection and some other continuous variable 
	(perhaps trees become infected at a particular time of year as a function of 
	this variable?). Make a doubly grouped boxplot to explore this possibility.
	
	\item Symbols plots are also a great way to add information about the 
	data points of a basic scatter plot. The \code{symbols()} function is great for
	this, and is especially good for geographical data where $x$ and $y$ are grid
	coordinates.%
	%
	\footnote{Especially longitude and latitude.} %
	%
	Read the data file \code{InfectionByCounty.csv} into \R and make a geographical
	plot of infection counts for these data with blue solid circles and gray edges.
	
	\item Now lets generate some fake data of a typical infectious disease 
	outbreak. Lets assume that if an individual becomes infected they are infected
	for life (\ie no recovery). We define the model by assuming some general
	underlying deterministic function, then assume some stochasticity about this
	deterministic function. We assume the variation around this model is normally
	distributed and will use the \code{rnorm()} function.

<<XYZ, eval=FALSE>>=
set.seed(125)
t <- seq(1, 30, by= 0.5)
y.det <- 0.4 / (1 + exp(0.4 * (15 - t)))   # underlying deterministic model
p.inf <- rnorm(length(y.det), mean= y.det, sd= 0.02) # add normal noise
plot(t, p.inf, main= "Outbreak XYZ",
   ylab= "Proportion Infected", xlab= "time")
@

% make the actual plot
\fig[H]
\centering
<<outbreak, fig=TRUE, echo=FALSE>>=
par(mfrow=c(1,2))
set.seed(125)
<<XYZ>>
plot(t, p.inf, ylab= "Proportion Infected", pch=19, main= "Outbreak XYZ", xlab= "time", col= 4); grid()
curve(0.4 / (1 + exp(0.4*(15 - x))), from= 0, to= 40, col= 6, lwd= 3, add= TRUE)
@
\caption{Some fake generated outbreak data.\label{fig:outbreak}}
\efig

	Now plot these data, you should get something like Fig.~\ref{fig:outbreak}a.
	You can use the \code{lines()} function to add lines to an existing plot as shown
	in Fig.~\ref{fig:outbreak}b. See if you can replicate it using the \code{pch=}
	and \code{col=} arguments within the \code{plot()} function along with the
	\code{grid()} function for gridlines.

	\item Plot the population projection you produced in the final exercise of
	\S~\ref{sec:loops}. You should get something like Fig.~\ref{fig:3class}. 
	Play with the arguments to get it to look like Fig.~\ref{fig:3class}, including
	the legend.

<<MatProjection, echo=FALSE>>=
Gen <- 20
StoreMat <- matrix(0, Gen, 3)
pars <- c(0.3, 0.1, 0, 1, 0.4, 0.3, 3, 0, 0.8)
ipop <- c(1,2,4)
M <- matrix(pars, 3, 3)		### Projection matrix from above
StoreMat[1,] <- ipop			### place 1st row as initial population (1,2,4)
### Now the loop ###
for (n in 2:Gen){
   StoreMat[n,]= M %*% StoreMat[n-1,]
}
@

<<echo=FALSE>>=
par(mfrow=c(1,1))
@

\fig[H]
\centering
<<class3, echo=FALSE>>=
plot(1:Gen, StoreMat[,1], type="l", col=1, , xlab="No. of generations",
   main="Three Class Population Example", ylab="No. of Individuals", lwd=2)
lines(1:Gen, StoreMat[,2], type="l", col="navy", lwd=2)
lines(1:Gen, StoreMat[,3], type="l", col="darkred", lwd=2)
legend("topleft", legend=c("Class 1","Class 2","Class 3"), lty=1, col=c("black",
   "dark blue", "dark red"), lwd=2, bg="gray95")
@
\caption{Quick population projection of 3 class model.\label{fig:3class}}
\efig
		
	\item \label{itm:popmat} Matplot is a very good command for quickly graphing
	data multiple lines in the same plotting window. Your data frame or matrix 
	must be arranged in a particular format however. Each variable to be plotted
	must be arranged by column and must have the same independent variable 
	(\eg time). Read \code{Pop6C.csv} into memory and use \code{matplot()} to
	produce Fig.~\ref{fig:matplot}. If you are having trouble importing 
	\code{*.csv} files I have also included an \R script file of these data 
	(\code{Pop6C.R}). Don't forget the gridlines and legend. Are the horizontal 
	gridlines%
	%
	\footnote{See \code{?grid}} %
	%
	on top of the legend and plot-lines or beneath them?


\fig[H]
\centering
<<matplot, fig=TRUE, echo=FALSE>>=
matplot(1:20, popdata, type="n", xlab= "Generations", 
   ylab= "Number of Individuals per Class", 
   main="Population Projection for 6 Classes")
   grid(NA, NULL, col="gray90", lty=1)
matplot(1:20, popdata, type="l", xlab= "", lwd= 1.5, ylab= "", main="", add=TRUE)
legend("topleft", 
   legend= c("Class 1","Class 2","Class 3","Class 4","Class 5","Class 6"),
   lty= c(1:5), ncol= 2, 
   col= c(1:6), bg= "gray95")
@
\caption{Using \code{matplot} to quickly graph simultaneous variables with the same independent variable (time/generations in this case).\label{fig:matplot}}
\efig
	
	\item The next essential function for your \R toolbox is \code{curve()}, which
	is a \emph{magical} function that takes a mathematical function/expression
	and plots over a given range of $x$. Most of the arguments are identical to 
	the \code{plot()} function so you don't have to learn many new ones. Graph
	the following functions with \code{curve()} (to start try from=--10, to=10)
	and finally use the \code{add=TRUE} argument to create the graph in Fig. 
	\ref{fig:outbreak}b.
	
		\begin{eqnarray*}
			f(x) &=& \frac{1}{1 + x^2}\\
			f(x) &=& 2x^3 - 8x^2 + 2x + 6\\
			f(x) &=& \frac{0.4}{(1+e^{(0.4(15-x))}) }
		\end{eqnarray*} \
		
	\item Yet another useful plotting function is \code{abline(a,b)}. This nifty 
	little function adds a straight line with intercept = a and slope = b to an existing
	plot (in school I learned \textbf{m} = slope and \textbf{b} = intercept, so now I find it
	personally confusing to have b = slope; so just be careful). \code{bmline()}
	would make more sense, but that's just me \Smiley.
	Familiarize yourself with the following:
	
		\bit
			\item \code{abline(5, 0.66)}
			\item \code{abline(h = 4)}
			\item \code{abline(v = 2)}
		\eit
	You must add the line to an \emph{existing} plot, but all the familiar
	arguments for plotting lines apply here, you can change its thickness, colour,
	type, etc., just as you would any other line using \code{lines()},
	\code{curve()}, or \code{plot()}. We'll be using \code{abline()} again later in
	\S~\ref{sec:stats}.
\een


%%%
\np %
%%%%%%%%%%%%%%%%%%%%%%
\section{Creating Functions} \label{sec:functions}
%%%%%%%%%%%%%%%%%%%%%%
Writing your own functions (aka subroutines) in \R is in my opinion the most useful tool/strategy in \R and much of it's power stems from this flexibility. Let's start simple with the generalized format of a function:

<<eval=FALSE>>=
myfun <- function(x, y, z) {
  expression_1
  expression_2
  expression_n
  Output    ## if desired
  ## or list(Output)
 }
@ 

In this very generic example, \code{x}, \code{y}, and \code{z} are the arguments for the function and will have to be provided by the user (you). They are essentially the ingredients the function requires to perform its job and can be vectors, scalars, matrices, or data frames (any supported \R object). It is also possible to provide the function with a default value for an argument by writing \code{function(x, y=5, z)} for example. If no value for \code{y} is provided, the function will use \code{y=5}.

The expressions within the function will describe what \code{myfun} will do to \code{x}, \code{y}, and \code{z} in order to produce what you want as an output. The last line usually ends with what you want the function to spit out when you hit \code{Enter} (this must be described within the expressions). If there is more than one object you wish to output, use \code{list()}. The function \code{cat()} can be especially useful for printing some output during the completion of a function (useful for debugging or tracking a long simulation). In addition, an output line is not absolutely \emph{necessary}, only if you want \R to return something when you hit \code{Enter}.

Lets try a very simple example. Below is a function that already exists in \RR, the \code{mean()} function, which I will emulate so you can double check it with the ``real'' function that is included in the \code{base} package installation of \RR.

<<Xbar_fun>>=
xbar <- function(x) {
   mu <- sum(x)/length(x)
   mu
}
@

So, now to use the function you just created, simply type \code{xbar(vector)} where \code{vector} is a predetermined list of numbers already in memory.%
%
\footnote{This function will also work on matrices because they can be easily
     converted into vectors.} %
%
Type \code{ls()} and find a vector (or matrix) already in memory and execute your new function, then double check it against the \code{mean(vector)} function provided with \RR. Here it is applied to \code{LeafArea} from \S~\ref{sec:vector_ex}.

<<Xbar_call>>=
xbar(LeafArea); mean(LeafArea)
xbar(Mx)
xbar(21:49)
@

Also notice that in the list of objects in memory, your new function \code{xbar} is now present. Lastly, it is important to remember that the variable names you choose in a function's arguments and expressions are \emph{not} saved as objects in \RR's global memory and are used only within the function \emph{locally} to tell \R how to manipulate the arguments you've given it. As a result, they can be used elsewhere in your program (\ie in the above, \code{mu}, \code{x}, \code{y}, and \code{z} can already exist as objects, which is good because in a long program you may start running out of obvious abbreviations for your variables). This also means, however, that you \emph{cannot} use a function to reassign the value of a variable.

OK, one last example and then you're on your own. Say for some reason you wanted a function that created a matrix of random numbers. There are numerous ways to do this, here is one:

<<rMat_fun>>=
rMat <- function(m, n, min, max, dec=0) {
   random <- runif(m*n, min, max)
   A <- matrix(random, m, n)
   round(A, dec)
}
rMat(3, 7, 5, 37)
@

Line one tells \R that the new \code{rMat()} is going to be a function and defines the arguments that will be required for that function to work. The arguments \code{m} and \code{n} are the dimensions of the matrix (rows, cols), \code{min} and \code{max} define the range of numbers the matrix should contain, and \code{dec} is the number of decimal places the numbers within the matrix should be rounded to (default set to zero decimal places). The second line creates a vector of uniform random numbers of the desired range using the \code{runif()} function already provided with \RR. Line three creates a matrix called \textbf{A} from those random numbers and of the desired dimensions. The final line tells \R to spit out \textbf{A} with its elements rounded to the desired number of places. Play around with \code{rMat()} until you feel you're comfortable with the syntax of writing functions, then move on to the exercises. 

%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%
\benN
	\item Below is a function that produces the 95\% confidence
	intervals (assuming a normal distribution). It uses Eqn~\eqref{eqn:se} to
	calculate the standard error of the mean, and returns the upper and lower
	CI95s along with the mean of any vector (\code{x}) provided as an argument.
		
		\Eqn\label{eqn:se}
			SE_m = \frac{sd}{\sqrt{n}}
		\eqn 
	
	\np
		
<<CI95fun>>=
CI95 <- function(x) {
   mean <- mean(x)
   sd <- sd(x)
   n <- length(x)
   se <- sd/sqrt(n)
   CI.vec <- c((mean-(1.96*se)), mean, (mean+(1.96*se)))
   names(CI.vec) <- c("2.5%", "Mean", "97.5%")
   CI.vec
}
@
	
	Explore this function for a few minutes then use it on a few of the vectors in 
	\S~\ref{sec:vectors} or on any of the variables in \code{mydata} (\eg 
	Heartwood area). Remember, the vector does not have to be named \code{x}
	per se, it can have any name since \code{x} above is only stored locally 
	\emph{within} the function itself (and therefore not a global variable).
	
<<CI95_call>>=
CI95(mydata$Heartwood)
@

	\item Write a function that, given a vector of length=3, containing the
	numbers of individuals of each genotype [$AA$, $Aa$, $aa$], will produce the
	allele frequencies of the population (lets assume no sampling error). These are
	your bog standard $p$'s and $q$'s from Hardy-Weinberg.
	
	\item Write a function that incorporates the steps you used above in 
	Eqn~\eqref{eqn:B} to produce matrix \textbf{B}, given an initial matrix 
	\textbf{A} (\ie given a matrix, create a new matrix with an extra column and row
	which contains the row and column sums of the original matrix).
	\item Look at the function below. What does it do? Can you predict its output?
	First execute the function with its default arguments (\code{NormFun()}), then
	explore arguments of your choosing. Finally, modify the core of the function to 
	explore areas of your own interest (BinomialFun?).

<<NormFun_fun,tidy=FALSE>>=
NormFun <- function(n=1000, mu=400, sdv=25, brks=25) {
   Y <- rnorm(n, mu, sdv)
   hist(Y, main="", xlab="", freq= FALSE, col= "gray88", breaks= brks)
   curve(dnorm(x, mu, sdv),
         col="navy", add=TRUE)
}
@

\een

Last note on functions: it is common and often useful to use a function as an argument in yet another function (a function \emph{within} a function), analogous to \emph{nested}-loops in \S~\ref{sec:loops}. You will be doing this often in \S~\ref{sec:ode} below.


%%%
\np %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Ordinary Differential Equations (ODEs)} \label{sec:ode}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Ordinary differential equations (ODEs) govern how one dependent variable (population size or number of infectious individuals) changes relative to changes in some other independent variable (\eg time). ODEs can be used to mathematically describe mechanistic, biological relationships between dependent and independent variables and to project this relationship in time in order to understand the dynamics of the dependent variables. Based on this mechanistic relationship, once you have the determined equations, the next step is to investigate the dynamics of your model (\eg changes over time) for a given set of parameter values. This section will show you how to do just than in \RR. First, you will need to load the \code{deSolve} package since we will be relying heavily upon \code{ode()} in this section. This function is the workhorse for ODEs and is peculiar in that it, as mentioned briefly in \S~\ref{sec:functions}, takes a function as one of its arguments.

\bit
	\item \code{ode}'s main arguments are the starting values (\code{y}), the
	times at which you want to compute the values of the variables you are
	interested in (\code{times}), a derivative function (\code{func}), and some
	parameters (\code{parms}). 
	\item \code{func} must take as its \emph{first three} arguments the current time 
	(\code{t}), the current values of the variables (\code{y}), and a vector containing the 
	parameter values. It must also return a list (using \code{list(item1)}
        where the \code{item1} is the vector of derivatives calculated at the current time
        and system state (see below).
\eit

\noi Here are two examples to get you started:%
%


%%%%%%%%%%%%
\subsection{Logistic growth}
%%%%%%%%%%%%%%%%
We will start with  logistic growth partly because it has only one dependent variable (and one differential equation to solve) and partly because most of you are familiar with this type of growth. The model equation that describes how the population size changes over time ($\frac{dN}{dt}$) for this kind of system at its simplest looks like this:

\Eqn \label{eqn:logistic}
   \frac{dN}{dt} = rN \bigg(1 - \frac{N}{K}\bigg)
\eqn

\noi where $N$ is the population size (dependent variable), $r$ is the instantaneous growth rate (a parameter), and $K$ is the carrying capacity (also a parameter). To solve this differential equation in \R we are going to give \code{ode} a function which describes the model Eqn~\eqref{eqn:logistic}, some values for $r$ and $K$, and some initial conditions (\eg the initial values of time and population size). \code{ode} likes its arguments to be as follows:

<<eval=FALSE>>=
ode(initial_values, time_interval, gradient_function, parameters)
@ 

\dots~ so lets first create our function describing Eqn~\eqref{eqn:logistic}:

<<LogisGrow_fun>>=
LogisGrow <- function(t, y, parms) {
   N <- y[1]
   dN <- parms[1] * N * (1 - N/parms[2]) 
   list(dN) 
} 
@

Notice our function has the format that \code{ode} likes. The \code{t} will be used as a time step by \code{ode()}, the \code{y} is used as the initial value of \code{N} (\code{y} can be a vector if there are more than one dependent variables to follow), and \code{parms} is the vector containing the values of the parameters to be used in the differential equation. The fourth line defines how \code{N} changes with changes in time ($\frac{dN}{dt}$) and the final line returns a vector containing the rate equations (in this case only one). Next solve the equation using \code{ode()}:

<<logistic>>=
logistic <- as.data.frame(ode(y= c(N= 0.1), times= seq(0, 10, by=0.1), 
     func= LogisGrow, parms= c(r= 0.9, K= 5)))
head(logistic)	# take a peek at the first few rows
@

Notice the start values, time interval, and parameter values were defined directly as arguments within \code{ode}. The starting value of $N=0.1$ and $r$ and $K$ have been set to 0.9 and 5 using the \code{parms=} argument. The time interval has been set using the \code{seq()} function we learned back in \S~\ref{sec:vectors} and goes from 0 $\to10$ by increments of 0.1 (it's simply a vector of time steps). Lastly, \code{ode()} will produce a matrix of the solutions with \code{time} as Col 1 by default. Notice I have asked \R to produce a \code{data frame} instead of a matrix which is slightly more convenient for plotting as in Fig.~\ref{fig:logistic}.

\fig[H]
\centering
<<logplot>>=
plot(N ~ time, data= logistic,
     main= "Simple Logistic Growth Model", col= "navy")
@
\caption{Logistic growth model with $r = 0.9$ and $K = 5$.\label{fig:logistic}}
\efig


%%%%%%%%%%%%%%
\subsection{The SI model}
%%%%%%%%%%%%%%
Now that you've gone through a simple example with one differential equation, let's follow up with a model system of a set of two, coupled ODEs following two dependent variables of interest. The same general steps apply, we just scale up the process slightly. 

One of the most basic ODE disease models many of you are probably familiar with is the SI model%
%
\footnote{Loads of assumptions but still quite reasonable for some systems.} %
%
which stands for the \textbf{S}usceptible and \textbf{I}nfected classes within a host population (we're not concerned here about parasite populations \Smiley).

In Eqn~\eqref{eqn:SI}, $\beta$ is the transmission rate which governs how quickly $S$ become $I$, $S$ is the number of susceptible hosts in the population, and $I/N$ is the proportion infected (frequency-dependent infection). One standard assumption is a closed system (\ie $N$ is constant) and therefore $N = S + I$. We will be describing the rates of change of the susceptible and infected classes and since these two classes sum to a constant, only one differential equation needs to be explicitly described in the model (since $I = N - S$, $\frac{dI}{dt}$ is superfluous), yet for illustrative purposes we include both equations as an example of how one might set up a two dimensional (2D) model in \RR.

What do the differential equations look like for a simple SI model?%
%
\footnote{Just for fun, google SI model and see what you get \Smiley} %
%
Pay attention, you'll be doing a full SI\textbf{R} (with an additional Recovered/Removed class) in a moment. Here we go:\\

\Eqn\label{eqn:SI}
\begin{gathered}
	\frac{dS}{dt} = -\beta S \bigg(\frac{I}{N}\bigg) \hfill \\
	\frac{dI}{dt} = \beta S \bigg(\frac{I}{N}\bigg) \hfill \\
\end{gathered}
\eqn \\

\noi Here is the \R code used for the SI model above:

<<SIfun>>=
tInt <- seq(0, 25, by = 1/2) 
pars <- c(beta= 0.75) 
Initial <- c(S = 4999, I = 1) 
#
#     The function     #
#
SIfun <- function(t, y, parms) { 
    with(as.list(c(y,parms)), {
        dS <- -beta * S * (I/(S+I))
        dI <- beta * S * (I/(S+I))
        ODEs <- c(dS, dI) 
        list(ODEs)
    })
} 
@

The \code{with()} function is a \emph{magical} functions: the point here is that it enables you to write out your ODE equations in a more familiar way, as opposed to having to refer to parameters with respect to the \code{parms} vector (\ie \code{beta = parms[1]}, $\dots$) or your state variables with respect to the \code{y} vector (however, in order to use \code{with} in this way you must have given appropriate names to your parameter and initial-state vectors, and (of course) the sets of names must not overlap). Compare this construction with the \code{logistic} ODE function above with I didn't use \code{with()}. 
%If you like, you may think of \code{with()} as a temporary form of \code{attach()}, but without the potential pitfalls associated with \code{attach()}.

<<SIout_dataframe>>=
SIout <- as.data.frame(ode(Initial, times= tInt, func= SIfun, parms= pars))
head(SIout)
@

\fig[H]
\centering
<<SIout, fig=TRUE, echo=TRUE>>=
par(mfrow = c(1,2)) 
plot(S ~ time, data= SIout, ylab= "Number",
     col= "navy")
points(I ~ time, data= SIout, col= "darkred")
legend("right",bg="gray95",c("S","I"),pch=1,col=c("navy","darkred"))
plot(I ~ S, data= SIout, ylab= "Susceptible", xlab= "Infected")
@
\caption{Results of the simple SI model system of ODEs of Eqn (\ref{eqn:SI}).\label{fig:SIout}}
\efig

Notice that this time I created vectors up front defining the time interval, parameter values, and initial starting conditions and then simply used that vector name as an argument in \code{ode}. This is often a simpler way to a) change values quickly when exploring a model, and b) organize your parameters easily if there are many (here there was only one). I've included my summary plot of the system (Fig.~\ref{fig:SIout}) but I'll leave it to you to code for yourself using the matrix \code{SIout} (since $S$ and $I$ are on similar scales, use \code{points()} to plot both on the same plot). What do you notice about the susceptible and infected populations? Does the epidemic peak quickly, then peter out, are there oscillations, etc.? Do the dynamics make sense considering the coupled differential equations of the model? Try repeating the process while exploring a few values of $\beta$. How do the population dynamics respond?

%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%
\benN
	\item Now it is your turn to expand on Eqn~\eqref{eqn:SI} with a
	complete SIR model. Remember you can assume a closed system and
	therefore $N = S + I + R =$ constant, so you need only write two ODEs,
	however I suggest as a learning exercise that you do all three explicitly. Since
	this is an abstract example and we don't have any ``real'' parameters to go on,
	you may have to play around with them until you get something interesting.

	\item Next, consider an different type of two-dimensional model. Imagine a 
	population that grows exponentially. Growth is partially determined by the 
	activation temperature of some enzyme. This activation temperature is most 
	efficient when it matches the temperature of the environment.  Individuals who 
	do not match the environment pay a cost in terms of growth. We can write the 
	ODEs for changes in population size $x$ (the ecological part of the model) and 
	for changes in the mean activation temperature in the population, $\alpha$ (the 
	evolution of the trait), as follows:
	 
	\Eqn
	\begin{gathered}
		\frac{dx}{dt} = x \big( r - (\alpha - \alpha_{opt})^2 \big) \hfill \\
		\frac{d\alpha}{dt} = \varepsilon \big( -2 (\alpha - \alpha_{opt})\big)
	\end{gathered}
	\eqn \

	where $\alpha_{opt}$ is the temperature of the environment and $r$ is the 
	growth rate of the population (try to see how the ecological equation captures 
	the reduced growth rate when the mean activation temperature is \emph{not}
	the same as the temperature of the environment), and $\varepsilon$ is the
	genetic variation of the population for that trait (var($\alpha$)). The
	programming here is analogous to a classical interaction model, so don't let it
	intimidate you.%
	%
	\footnote{This is why I had you do the 2D SIR model above!} %
	%

	\benI
		\item What do the dynamics of population size, $x$, look like without 
		evolution (\ie there is no genetic variation)? Try the following values: $r =
		0.2$ and $\alpha_{opt} = 10$ with initial conditions $x(0) = 2$ and $
		\alpha(0) = 9.15$ for a timespan from 0 $\to$ 100. Then make a
		plot of $x$ vs. $t$.
		
		\item What do the dynamics of population size look like when $\alpha$ 
		evolves? What do the evolutionary dynamics of $\alpha$ itself look like?
		Try the following parameter values: $r = 0.2$, $\alpha_{opt} = 10$, and $
		\varepsilon$ = 0.03 with initial conditions $x(0) = 2$ and $\alpha(0) = 
		9.15$ for a timespan from 0 $\to$ 100. \textbf{Hint}: make plots of $x
		$ vs. $t$ and $\alpha$ vs. $t$. How does evolution change the population 
		dynamics? Now set $\alpha(0) = 10.85$ and observe the population 
		dynamics of the system. Is this in agreement with your understanding of 
		how stabilizing selection functions? Explore various parameter values both
		here and above in ($i$) while making predictions regarding the behaviour of
		the model. What happens when there is more genetic variation in the
		population?
	\een
\een


%%%%
\np %%
%%%%%%%%%%%%%%%%%%%%%%
\section{Basic Statistics} \label{sec:stats}
%%%%%%%%%%%%%%%%%%%%%%
Of course one of the great advantages of \R is that you can perform many statistical analyses right here, within the \R environment, without having to go to another program. The variety of analyses available in \R is vast.  Unless you're pretty heavy into statistics, it is likely \R will be able to fulfill your needs. So let's start with basic exploratory, comparative, and relationship statistics. Table~\ref{tab:statfunct} shows some of the more common functions you'll need. First let's return to an example we came across in \S~\ref{sec:exploring}. We were curious about a relationship we discovered in Fig.~\ref{fig:expplot}c, now we get to test whether this putative relationship is statistically significant. Assume you've already determined a simple linear regression is appropriate, this is how we create the statistical model in \RR:

<<dbh_Heartwood, eval=FALSE>>=
plot(Heartwood ~ dbh, data= mydata)
@

Don't close this plotting window, you will be using it in a moment (if you did, simply re-plot it using the $\uparrow$ key). Now actually perform the regression analysis using a linear model via the \code{lm()} function and give the object the name \code{fit}.

<<fit_lm>>=
fit <- lm(Heartwood ~ dbh, data= mydata)
@

Notice that there is no immediate output of \code{lm()} as with most \R functions; in order to see the results we must use \code{summary(fit)}. Now we can add the regression line from \code{fit} to our existing plot using our old friend \code{abline()}. Do this and see if you can reproduce Fig.~\ref{fig:lm} without using the provided code. Can you use the various aesthetic arguments of \code{plot()} to reproduce it accurately?

<<lm_summary>>=
summary(fit)
@

\fig[H]
\centering
<<lm_plot,tidy=FALSE>>=
plot(Heartwood ~ dbh, data= mydata,
   main= "Heartwood Area vs. DBH",
   ylab= expression(paste(plain(HeartwoodArea)," ",(cm^2))),
   xlab= "Diameter at breast height (dbh)",
   pch= 19, col= 1)
grid(NA,NULL, lty= 4)
abline(fit, col= 2, lty= 4, lwd= 2)
legend("topleft", legend=c("lm(fit)"), col= 2, lty= 4, bg= "gray85", box.lty=0)
@
\caption{Graphical summary of the linear regression, \code{lm()}, performed		on our tree allometric data (dbh vs. heartwood area).\label{fig:lm}}
\efig

In general, \R provides \emph{accessor} methods, such as \code{coef}, for extracting the information from complicated objects like linear model fits.  You can see what methods are available for a given class, such as \code{lm} 
(\code{class(fit)} tells you what class you should be inspecting), by typing \code{methods(class="lm")}.  If you try this\footnote{Note that there is a bug in the current version of the \code{gdata} package, which the \code{gplots} package loaded automatically; in order to make \code{methods(class="lm")} work you'll first have to \code{detach("package:gplots",unload=TRUE); detach("package:gdata",unload=TRUE)} (and later reload them if you need them again).}, you'll see that there is (for example) a \code{residuals.lm}, telling you that you can type \code{residuals(fit)} to extract the residuals (and possibly \code{?residuals.lm} for more information).
<<echo=FALSE,results="hide",message=FALSE>>=
detach("package:gplots",unload=TRUE)
detach("package:gdata",unload=TRUE)
methods(class="lm")
library(gdata)
@ 
It is always best to use the accessor methods (\code{coef}, \code{residuals}, etc.) if you can. However,
sometimes you have to peek under the hood and see all the underlying components by typing \code{attributes(fit)}. 

<<fit_attributes>>=
attributes(fit)
coef(fit)
@

You can then extract the individual components via \verb+fit$name+, where \code{name} is the name of the bit you want.
Using this syntax, convince yourself that you get the same values for the residuals by extracting the component with \verb+$+ or using the accessor method.

Lastly, you can manipulate these objects or save them, perhaps for use in other analyses. What does \code{coef(fit)[1]} produce? And \code{coef(fit)[2]}?

%%%%%%%%%%%%%%%%%
\begin{table}[H] \centering
\belowcaptionskip=0.25cm
\caption{A few of the functions in \R for statistical modeling and data 
analysis. There are \textbf{many} more, but you will have to learn about them somewhere else. The statistical functions such as var and 
sd assume values are samples from a population and compute an estimate of the population statistic (\eg for example sd(1:3)=1).}
\begin{tabular}{l l}
\hline \hline
	\code{hist(x)} & Histogram plot of value in $v$ \\
	\code{mean(x),var(x),sd(x)} & Estimate of population mean, variance, standard \\ & deviation based on data values in $x$ \\
	\code{median(x)} & Median value of $x$ \\
	\code{cor.test(x,y)} & Correlation between the two vectors $x$ and $y$\\
	\code{t.test} & One and two sample Student's $t$-test \\
	\code{pairwise.t.test}& Pairwise $t$-test \\
	\code{chisq.test} & $\chi^2$ test; differences in frequencies \\
	\code{binom.test}& Binomial test for differences in proportions, binomial samples \\
	\code{aov, anova} & Analysis of variance or deviance \\
	\code{var.test} & Test of equal variance of sample means (Levene's test) \\
	\code{ks.test(x,pnorm)} & Kolmogorov-Smirnov test of $x$ to Normal distribution \\
	\code{ks.test(x,y)} & K-S test, do $x$ and $y$ come from different distributions? \\
	\code{wilcox.test} & Mann-Whitney or Wilcoxon U non-parametric rank tests \\
	\code{bartlett.test} & Bartless's test for equal variance, multiple treatments \\
	\code{kruskal.test} & Kruskal-Wallis non-parametric rank test, multiple treats \\
	\code{lm} & Linear models (regression, ANOVA, ANCOVA)\\ 
	\code{glm} & Generalized linear models (\eg logistic, Poisson)\\
	\code{nls} & Fit nonlinear models by least-squares \\ 
        \code{optim} & Minimize (or maximize) a function over one or more parameters \\
	\code{lme}, \code{nlme} & Linear and nonlinear mixed-effects models\\
            & (repeated measures, block effects, spatial models)\\
	\code{manova} & Multivariate analysis \\
\hline
\end{tabular}
\label{tab:statfunct}
\end{table}
%%%%%%%%

%%%%%%%%%%%%%
\subsection*{Exercises}
%%%%%%%%%%%%%
\benN
	\item In the linear model above, lets assume new information comes available
	which violates the assumptions of a linear regression. How might you now 
	explore the relationship between tree diameter and heartwood area? How might 
	you look at parametric and non-parametric alternatives?
	\item Create a new factor variable (\ie column) to be added to
	\code{mydata} called \code{Temp} which groups the seasons spring/summer
	and fall/winter two into levels of either \emph{Warm} or \emph{Cool}. There
	should be two levels for \code{Temp}. Now conduct a simple $t$-test for any
	of the continuous variables in \code{mydata}. For example, is there a 
	difference in bark thickness depending on whether it is warm or cool? Or does 
	bark thickness correlate with whether a tree is infected or not? I suggest you 
	first check for a normal distribution and equal variances. Depending on the 
	variable you choose, you may need to find a non-parametric rank test.
	\item Based on Fig.~\ref{fig:expplot} it seems as though species differ in 
	sapwood depth. Test this hypothesis using a simple ANOVA. Then examine
	other continuous variables via boxplots, look for differences by species (or 
	season), and test for statistical significance. Come up with a few of your own
	questions and hypotheses, then test them.
	\item \label{ex:bars} For any of the comparisons above, produce a barplot
	including error bars (SEMs or CI95s).%
	%
	\footnote{As mentioned, I prefer boxplots for this purpose because it shows the
	     actual data points, but some advisors/journals/fields traditionally prefer
	     barplots, so it's good practice to know how.} %
	%
	First try for yourself using the function you created in \S~\ref{sec:functions},
	and adding the appropriate lines the the barplot. If you run into trouble I have
	included a function in the script file \code{R Tutorial Script.R} for creating
	barplots with error bars called \code{BarplotBars()} along with an example in
	TextBox~\ref{box:BarplotBars}. Alternatively,	you can ``cheat'' (I do it all the 
	time) and use the function \code{barplot2()}%
	%
	\footnote{Within the \pkg{gplots} package.} %
	%
	which can simplify the process.
\een

By now you should have a pretty firm introduction to the practical applications of \R for ecologists. You will be able to import and explore data and plot it to produce pretty graphs, perform some simulations (usually via loops), write your own functions, and carry out basic statistical analyses. You will rely heavily on these tools over the next few days during the \textbf{\number\year~EEID Workshop}.


%%%
\np %
%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Tutorial Contributors} \label{sec:thanks}
%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{description}
\item[Stu Field,] Dept. of Biology, Colorado State University, Fort Collins, CO.
\item[Ben Bolker,] Depts. of Math \& Stats and Biology, McMaster University, Hamilton, Ontario.
\item[Colleen Webb,] Dept. of Biology, Colorado State University, Fort Collins, CO.
\item[Mike Antolin,] Dept. of Biology, Colorado State University, Fort Collins, CO.
\item[Aaron King,] Depts. of Ecology \& Evolutionary Biology and Mathematics, University of Michigan, Ann Arbor, MI.
\item[John Drake,] Odum School of Ecology, University of Georgia, Athens, GA.
\end{description}


\vfill
\noi \hzline \\ \centering{\textbf{Last updated: }\today}


%%%%%%%%%%%%%%%%
%           TextBoxes              %
%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%
\begin{textbox}
\caption{Here is the sample code for producing error bars for the exercise in \S~\ref{sec:stats}.\ref{ex:bars} above. You must \emph{first} load the function for making error bars, \code{BarplotBars()}, which is provided in the \R code script file \code{R\_cheat\_script.R}.}
<<textbox1, eval=FALSE, tidy=FALSE>>=
##     Barplots with Error Bars    #
BarplotBars()	# load this function first


##         Produce necessary arguments for error.bars
f <- factor(rep(LETTERS[1:4],each=20))
x <- runif(80)
data <- data.frame(x,f)
N <- as.numeric(table(data$f))
FactorMeans <- tapply(data$x, data$f, mean)
sd <- tapply(data$x, data$f, sd)
sem <- sd/sqrt(N)
ci <- (FactorMeans + (sem * 1.96)) - (FactorMeans - (sem * 1.96))
labels <- as.factor(levels(data$f))

##        Make the Barplot with Error Bars (SEMs & CIs)       #
par(mfrow=c(1,2))
BarplotBars(FactorMeans, sem, labels, axis=2)
title("Barplot with SEM bars")
BarplotBars(FactorMeans, ci, labels, bcol=c(1,8), lcol=2, type=4, w=50)
title("Barplot with CI95 bars")

##        Points with Error Bars       
require(gplots)
f <- factor(rep(LETTERS[1:4], each=20))
x <- runif(length(f))
data <- data.frame(x,f)
N <- as.numeric(table(data$f))
Means <- tapply(data$x, data$f, mean)
sd <- tapply(data$x, data$f, sd)
sem <- sd/sqrt(N)
ci <- (FactorMeans + (sem * 1.96)) - (FactorMeans - (sem * 1.96))
Treatments <- c(A=1,B=2,C=3,D=4)

par(mfrow=c(1,2))
plotCI(Treatments, Means, uiw=sem, pch= 19, xaxt="n")
   axis(1, 1:length(Treatments), LETTERS[1:length(Treatments)])
plotCI(Treatments, Means, uiw=ci, pch= 19, xaxt="n")
   axis(1, 1:length(Treatments), LETTERS[1:length(Treatments)])
@
\label{box:BarplotBars}
\end{textbox}
%%%%%%%%%%
%%%%%%%%%%


%%%%%%%%%%%
%%%%%%%%%%%
\begin{textbox}
\caption{The \R code below was used to project the imaginary population described in \S~\ref{sec:plots}.\ref{itm:popmat} while exploring the \code{matplot()} function. The population projections are stored in the files \code{Pop6C.csv}, \code{Pop6C.R}, and \code{Pop6C.Rdata} and are plotted in Fig.~\ref{fig:matplot}. By the way, this is an example of a stage-structured $SI$ model in discrete time.}
<<textbox2, eval=FALSE, tidy=FALSE>>=
##       Used for creating Pop6C.csv 

require(popbio)
s1 = 0.97
s2 = 0.08
s3 = 0.93
s4 = 0.06
s5 = 0.99
FA = 0.14
FJ = FA * 0.66       
B = 0.015
F.sel = 0.12
S.sel = 0.9
I.pop = c(20, 14, 9, 25, 15, 19)
#
PM <- matrix(c(s1*(1-B), FJ, FA, 0, FJ*F.sel, FA*F.sel, 
               s2*(1-B), s3*(1-B), 0, 0, 0, 0, 
               0, s4*(1-B), s5*(1-B), 0, 0, 0,
               s1*B, 0, 0, s1*S.sel, 0, 0,
               s2*B, s3*B, 0, s2*S.sel, s3*S.sel, 0,
               0, s4*B, s5*B, 0, s4*S.sel, s5*S.sel), 
       nrow=6, ncol=6, byrow=TRUE)
pop.projection(PM, I.pop, iterations= 20)
popdata <- pop.projection(PM, I.pop, iterations= 20)$stage.vectors
write.csv(round(t(popdata), 2), file= "Pop6C.csv")
@
\label{box:Pop6C}
\end{textbox}
%%%%%%%%
%%%%%%%%


%%%%%%%%%
%%%%%%%%%
\begin{textbox}
\caption{The following \R code was used to create the figure on the title page of the tutorial (it also shows several variations on this theme). The \code{curve3d()} is a plotting function within the \pkg{emdbook} package. The \code{rgl} environment opens a special interactive graphics window allowing the user to rotate the image in 3D -- very cool!}
<<textbox3, eval=FALSE, tidy=FALSE>>=
x <- seq(-10, 10, length = 50)
y <- seq(-10, 10, length = 50)

rotsinc <- function(x, y) {
   sinc <- function(x) {
      y <- sin(x)/x 
      y[is.na(y)] <- 1; y}
   10 * sinc(sqrt(x^2+y^2))
}
z <- outer(x, y, rotsinc)
##  using persp()
persp(x, y, z, theta = 30, phi = 30, expand = 0.5, ltheta= 0,
   shade=0.4, col = "navy", axes=FALSE)
persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = "navy",
   ltheta = 120, shade = 0.5, ticktype = "detailed", xlab = "X",
   ylab = "Y", zlab = "Z")
##      using curve3d()     
require(emdbook)
curve3d(outer(x, y, rotsinc), from= c(-10,-10), to= c(10,10),
        zlab="Z", ylab="Y", xlab="X", col="gray35")  ## with persp
curve3d(outer(x, y, rotsinc), from= c(-10,-10), to= c(10,10),
        zlab= "Z", ylab= "Y", xlab= "X", sys3d= "rgl", col= "navy")
curve3d(outer(x, y, rotsinc), from= c(-10,-10), to= c(10,10), zlab="", ylab="",
        xlab="", sys3d= "rgl", col= rainbow(40), axes= FALSE, box= FALSE)
@
\label{box:image}
\end{textbox}
%%%%%%%%%
%%%%%%%%%


%%%%%%%%%
%%%%%%%%%
\begin{textbox}
\caption{Summary of some critical and useful functions in \RR.}
\begin{table}[H]
\centering
\belowcaptionskip=0.5cm
\begin{tabular}{l l}
\textbf{Function} 		& \textbf{Description} \\[10pt]
\code{ls}			& 	lists contents of \R workspace/global environment \\
\code{rm}			& 	removes objects from R workspace \\
\code{save}			& 	save selected objects \\
\code{+},\code{-},\code{*},\code{/},\verb+^+	&	arithmetic operators \\
\verb+%*%+		&	matrix multiplication \\
\code{t}				&	matrix transpose \\
\code{solve}		&	matrix inverse (and solving linear equations) \\
\code{c}				&	combines (concatenates) objects, simplest way to make vectors \\
\code{seq}			&	creates vectors that are regular sequences \\
\code{rep}			&	replicates vectors \\
\code{length}		&	returns length of a vector \\
\code{sum}			&	returns the sum \\
\code{mean}		&	returns the mean \\
\code{median}		&	returns the median \\
\code{sd}			&	returns the standard deviation ($n-1$ in denominator) \\
\code{min}			&	returns minimum \\
\code{max}			&	returns maximum \\
\code{sort}			&	sort a vector (rearranges the vector in order) \\
\code{order}		&	returns indices of vectors that will order them \\
\code{rank}			&	returns rank of each element in vector \\
\code{==}, \code{<}, \code{>} 	&	comparison operators \\
\code{<=}, \code{>=}, \code{!=} 		& \\
\code{|}, \verb+&+		&	\code{OR}, \code{AND} \\
\code{is.na}			&	tests for missing value \code{NA} \\
\code{which}		&	does logical comparison and indicates which elements are
                             \code{TRUE} \\ & that is, gives the \code{TRUE} indices of 
                             a logical object \\
\code{any}			&	does logical comparison returns 1 (\code{TRUE}) if any of the 
						comparisons \\ & are \code{TRUE}, 
						\ie is at least one of the values true? \\
\code{exp}			&	returns \code{e} to that power \\
\code{log}			&	returns natural logarithm (to the base e) \\
\code{log10}		&	returns logarithm (to the base 10) \\
\code{sqrt}			&	returns square root \\
\code{table}		&	does frequencies and cross-tabs \\
\code{help}			&	help page on specified function \\
\code{cbind}		&	combine by columns \\
\code{rbind}		&	combine by rows \\
\code{matrix}		&	create a matrix \\
\code{vector}		&	create a vector \\
\code{nrow}		&	number of rows in an array or data frame \\
\code{ncol}			&	number of columns in an array or data frame \\
\code{dim}			&	dimensions of an array or data frame \\
\code{array}		&	create an array \\[10pt]
\end{tabular}
\end{table}
\end{textbox}
%%%%%%%%%
%%%%%%%%%


%%%%%%%%%
%%%%%%%%%
\begin{textbox}
\caption{Summary of some critical and useful functions in \RR.}
\begin{table}[H]
\centering
\belowcaptionskip=0.5cm
\begin{tabular}{l l}
\textbf{Function} 		& \textbf{Description} \\[10pt]
\code{is.vector}	&	answers the question, is this a vector \code{TRUE} or 
                             \code{FALSE} \\
\code{as.vector}	&	attempts to coerce object into a vector \\
\code{read.table}	&	reads data from a text file \\
\code{read.csv}	&	reads data from a text file with comma separated data \\
\code{write.table}	&	writes a data frame to a text file \\
\code{is.data.frame}	&	tests object to see if it is data frame \\
\code{as.data.frame}	&	coerces object into data frame \\
\code{is.factor}	&	tests object to see if it is a factor \\
\code{as.factor}	&	coerces object into a factor \\
%\code{attach}		&	reference variables in a data frame without having to use \\
%                         &	data frame column headers name (undo with \code{detach})\\
\code{head, tail}	&	list the first, last six rows \\
\code{names}		&	returns names of elements of object \\
\code{colnames}	&	returns or sets column names of object \\
\code{rownames}	&	returns or sets row names of object \\
\code{subset}		&	select part of a vector, matrix, or data frame \\
\code{merge}		&	merge two data frames \\
\code{lm}			&	multiple linear regression \\
\code{glm}			&	generalized linear regression \\
\code{anova}		&	analysis of variance \\
\code{chisq.test}	& 	Pearson's Chi-squared test for count data \\
\code{summary}	&	shows results of various model fitting functions \\
\code{predict}		&	predicted results from model \\
\code{hist}			&	histogram \\
\code{boxplot}		&	box plot \\
\code{plot}			&	scatterplot \\
\code{lines}			&	connects points sequentially with lines (added to a plot) \\
\code{segments}	&	add lines to a plot (between pairs of points) \\
\code{text}			&	add text to a plot \\
\code{legend}		&	add a legend to a plot \\
\code{abline}		&	add a line to a plot by specifying its slope and intercept \\
                         &	passing an lm object will result in adding the predicted line 
                              to the plot \\
\code{x11}			&	open another graphics window (\pkg{PC}) \\
\code{pdf}			&	open a pdf file for recording graphics \\
\code{dev.off}		&	close graphics device \\
\code{par(mfrow)}	&	arranges multiple plots on same page (by row) \\
\code{sample}		&	produces a random sample of the specified values \\
\code{set.seed}	&	sets seed for next random sample (repeat random sample)\\
\code{rnorm}		&	produces a random sample from a normal distribution \\
\code{qnorm}		&	quantiles (percentiles) of normal distribution \\
\code{pnorm}		&	CDF of normal distribution \\
\code{dnorm}		&	PDF of normal distribution \\
\code{rbinom}		&	produces a random sample from a binomial distribution 
\\[10pt]
\end{tabular}
\end{table}
\end{textbox}
%%%%%%%%%
%%%%%%%%%

<<Close_devices, echo=FALSE, results="hide">>=
dev.off()
@

%%%%%%%%%
\end{document}
%%%%%%%%%