\documentclass[12pt]{article} %\documentclass[12pt,reqno,final]{amsart} % different style %% Changelog: %% 2012-05-06 BMB modifications: %% * \I -> \emph (or \code); \B -> \textbf %% * revised for knitr %% * got rid of attach() %%%%%%%%% \input{stuTeX} \newcommand{\proginput}[1]{{\sf #1}} %%%%%%%%% %\usepackage{times} %\usepackage{helvetica} \geometry{left=0.75in, bottom=0.75in, right=0.75in, top=0.75in, foot=0.5in, head=0.5in} \parindent = 1.0cm %%%%%%%%%%%%%%%%%%%%% % Sweave Options % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % prefix.string = folder & prefix for figs % % height & width = dimensions of plot passed to device % % eps = write eps file also % % keep.source = keeps results looking nice % % fig = produce figure if called % % echo = repeat the chunk call % % strip.white = white lines before/after output % % eval = evaluate chunk % % results = show results verbatim, tex, or hide % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\SweaveOpts{prefix.string= Figures/eeid, height= 6, width= 8, eps= false, keep.source= true, fig= false, echo= true, strip.white= true, eval= true, results= verbatim} %\setkeys{Gin}{width=0.85\textwidth} \SweaveOpts{fig.path="Figures/",fig.height=6,fig.width=8,out.width="0.85\\textwidth"} %%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%% \title{\R Basics Tutorial for Ecologists} \date{} %\address{ % Stu Field, % Department of Biology, % Colorado State University, % Fort Collins, Colorado % 80523-1878 USA %} %\author[Field]{Stuart G. Field} %\email{sgf at colostate dot edu} \author{EEID Workshop: by Stu Field% % \footnote{Special thanks to the organizers of the $7^{th}$ annual Ecology \& Evolution of Infectious Disease Workshop for insightful comments, suggestions, \& examples in development of this tutorial, however all careless grammar, errors, and typos are my own.} } <>= require(popbio) require(gplots) require(lattice) require(deSolve) glop <- options(digits=5) ## on.exit(options(.glop)) ## BMB: this line seems to prevent options taking effect pdf.options(useDingbats=FALSE) #options(device= function(...) {.Call("R_GD_nullDevice", PACKAGE= "grDevices")}) # read in relevant data # mydata <- read.csv("Datafiles/TreeData.csv") popdata <- read.csv("Datafiles/Pop6C.csv") @ %%%%%%%%%%%%%%% \begin{document} \maketitle %%%%%%%%%%%%%%% %%%%%%%%%%%% % Timestamp % %%%%%%%%%%%% \begin{flushright} \fbox{\parbox{0.37\textwidth}{\B{TimeStamp:} \timestamp}} \end{flushright} %%%%%%%%%% %%%%%%%%%%%%% % Title figure % %%%%%%%%%%%%% \tabv \fig[H] \centering \includegraphics[width=9cm, angle=0]{Figures/rglImage} \efig \vfill \fig[H] \centering \hfill \includegraphics[width=2cm, angle=0]{Figures/NSF} \efig %%% \np % %%%%%%%% \tableofcontents %%%%%%%% %%% \np % %%%%%%%%%%%%%%%%%%% \section{What exactly is \RR?} \label{sec:R} %%%%%%%%%%%%%%%%%% \R is an object-oriented scripting language that combines:% % \footnote{From Ben Bolker (2008). \emph{Ecological Models and Data in R}. Princeton Univ. Press, Ch. 1.} % \bit \item A dialect of the programming language \pkg{S}, developed by John Chambers at Bell Labs, that can be used for numerical simulation of both deterministic and stochastic dynamic models \item An extensive set of functions for classical and modern statistical data analysis and modeling \item Graphics functions for visualizing data and model output \item A user interface with a few basic menus \eit \R is an \emph{open source} project, available for free download via the Internet (\url{http://www.r-project.org}). Originally a research project in statistical computing, it is now managed by an international core development team that includes a number of well-respected statisticians, and is widely used by statistical researchers (and a growing number of theoretical ecologists and ecological modelers) as a platform for making new methods available to users. %% BMB: no-one uses S-PLUS any more! %% The commercial implementation of \pkg{S} (called \pkg{S}-\textsc{plus}) offers an Office-style ``point \& click'' interface that \R lacks. For our purposes, however, the advantage of this user-friendly front-end is balanced by the fact that \R is built on a faster, much less memory-hungry implementation of \pkg{S}. A standard installation of of \R includes extensive documentation, including an introductory manual ($\sim$ 100 pages).% % \footnote{See \code{R-intro.pdf} accompanying this tutorial.} %%%%%%%%%%%%%% \section{Introduction to \R} \label{sec:intro} %%%%%%%%%%%%%% Reading a tutorial about the applications and capabilities of \R is a great start, but until you do it yourself and are actually forced to solve problems on your own in \RR, you will become neither confident nor proficient. This is precisely what the following exercises are designed to do; give you practical programming experience in \RR. Before you start, I recommend you get used to running \R from script files, rather than directly from the command window (\ie The Console). It'll make your life much simpler, especially for more complex computations. For those of you using MacOS, \R already comes with a fairly nice script editor. Alternatively, for those using Windows, I would recommend a supplementary script editor like \textsf{RStudio}, which is becoming rather popular in recent years (\code{\blue{http://www.rstudio.org/}}).\footnote{Really, any editor or GUI that can (1) hot-paste code chunks into R; (2) do syntax highlighting; (3) match closing with opening brackets will do. \textbf{Do not under any circumstances use Microsoft Word to edit your R code \ldots}} %There is also \textsf{Tinn-R}, which has been around a while although recent upgrades seem to have only made it more complicated and less reliable.% % %\footnote{Older versions of Tinn-\R seem to be more stable \& straight-forward (try \code{v1.17.2.4}).} %At the top of your script file you should start with %\code{rm(list=ls())}, which removes all objects from \RR's memory. %You'll use it often. To remove a single object use %\code{rm(objectname)}. You may occasionally want to run the command \code{rm(list=ls())} to remove all objects from \RR's memory. However, it is \emph{not} good practice to embed this in your script files by default, because it can trip up anyone else who uses your code and isn't expecting it. To remove a single object use \code{rm(objectname)}. The objective of this tutorial is to create your own library of scripts for analyses which grows with your knowledge of \RR. Always be sure to annotate your scripts \emph{diligently} with the \verb+#+ symbol, otherwise you may come back to the script months later and not remember what you did (extremely frustrating!). I tend to have paired files with the same name, \eg \code{XYZdata.csv} and \code{XYZdata.R}, where the \code{*.R} file contains all the necessary commands, analyses, and plots for that particular data set in \code{*.csv}. %%%%%%%%%%%%%%%%%% \subsection{Installing Packages} %%%%%%%%%%%%%%%%%% Packages containing additional functions for \R to use can be downloaded from the internet.% % \footnote{The process is fairly simple, although some issues with admin rights may arise if you don't have rights to modify the \R directory.} % There are essentially two steps to using packages: (1) you need to download and install the package onto your machine (once), and (2) you need to load the package (each \R session). Step (2) is simple: type \code{require(packagename)}. Step (1) is done as follows (for a \textsf{PC}): \benN \item Make sure you are connected to the internet. \item Click on \proginput{Install package(s)} from the drop-down \proginput{Packages} menu. \item Select your CRAN mirror (somewhere within your current country is probably a good idea). \item Select the package you want from the menu (in alphabetical order). \item Click \textsf{OK} and wait for it to install (should auto-install). \item The package is now installed and ready for you to call for it (i.e., load it as stated above). \een Alternatively, if you know the name of the package you want to install you can just type \code{install.packages("pkgname")} at the \R prompt. Not all packages are independent (functions from one package may use functions from within other packages). Usually the required dependencies will load/install automatically; on the other hand, if you get an error message you will have to install them yourself. If you go to the \proginput{Packages} menu and click on \proginput{Load package}, you will see a list of the packages that have been successfully installed on your machine. For MacOS, the process is very similar except you click on \proginput{Packages \& Data} from the main menu and then \proginput{Package Installer} for package installation. There is an option to also install dependencies at this point. To see which packages are currently installed on the machine, go to \proginput{Package Manager}, where you can check the box of the package of interest to load it. %%%%%%%%%%%%%%%%%% \subsection{The Help Menu} %%%%%%%%%%%%%%%%%% One of the best features of \R is the extensive \textbf{Help menu} for general information about syntax and arguments for a given function -- just type \code{?functionname}. But be forewarned, it can take a little time to decipher these menus; at first they may seem like gibberish but with time this will pass. It is a very handy tool that I use almost daily. There are a few key sections of the Help Menu to highlight: %%%%%%%%%%%%% \begin{table}[H] \centering \belowcaptionskip=0.25cm \caption{Summary of the key sections of \RR's Help Menu.} \begin{tabular}{l l} \hline\hline \textbf{Section} & \textbf{Description} \\[5pt] \hline \code{line 1} & Gives the function name and the package in \code{\{\dots\}} \\ \code{Description} & A short general description of what the function does \\ \code{Usage} & Illustrates proper syntax; with function arguments and defaults \\ \code{Arguments} & Details about the various function arguments (important!) \\ \code{Details} & Any additional notes and caveats that a user should know \\ \code{Value} & Description of what the function returns following execution \\ \code{References} & Who wrote it and their source if applicable \\ \code{See Also} & Related functions that might also be useful \\ \code{Examples} & Most useful \dots fully executable examples! \\[10pt] \hline\hline \end{tabular} \label{tab:helpmenu} \end{table} %%%%%%%%%%%%% \noi Let's take a look at the Help menu using the function \code{apply()} as an example. In the \code{R Console } type \code{?apply}: this should bring up a \textbf{Help} window with details about the \code{apply} function. First, you can see that this function \emph{applies} a given function (\code{FUN}) to the row or column (\code{MARGIN}) of an object (\code{X}). The object (\code{X}) will have to be a data frame, matrix, or array (any 2-dimensional object). The \code{\dots} refer to any additional arguments of (\code{FUN}) you may want to include. The \code{Usage} section tells you how to formulate your syntax and construct your arguments within the function and the \code{Arguments} section gives more information about the actual pieces of information \code{apply()} will need to to its job. Here is an example: <>= Mx = matrix(1:16, nrow=4, ncol=4) # create a matrix called Mx Mx apply(Mx, 1, mean) apply(Mx, c(1,2), sqrt) @ This tells \R to calculate the mean of the \emph{rows} \code{(,1,)} of \code{Mx} (by column is \code{(,2,)}). In addition, the Help menu says in the \code{MARGIN} description that using \code{c(1,2)} tells \code{apply()} to perform the function by row \emph{and} by column (\ie entry-wise). The fourth line above uses the \code{sqrt()} function to take the entry-wise square-root of \code{Mx}. \textbf{Note:} how you construct the command depends upon the function you want use (\ie it wouldn't make sense to use the \code{mean} function with an entry-wise construction because you'd be taking the mean of one value -- probably not what you had in mind). Likewise, the construction above is pretty redundant (and unnecessarily complex), since it's exactly the same as \code{sqrt(Mx)}. Lastly, take a look at the \code{Examples} section to see \code{apply()} in action. I have provided the full \R code associated with all figures and exercises in this tutorial. The intent is that you first try to complete the exercises on your own, then only use this code when/if you get stuck. If you're \emph{really} stuck, I suggest you leave it, try to move on, and we will take care of it during the workshop. Good luck and happy Rrrrring! \np %%%%%%%%%%%%%%%%% \section{Walk before you run} \label{sec:walking} %%%%%%%%%%%%%%%%% Before we get into more advanced procedures in \RR, you will need to familiarize yourself with the basic input structure of \RR, completing interactive calculations, assigning variables, and becoming comfortable with the \R console and visual interface. At the \R console notice the drop down menus and familiarize yourself with where you can find many of the options you will need during an \R session. The console itself, upon startup, will state the version of \R you are running and some basic information about the developers of the software. Also of importance here is the command to exit \RR, which is \code{q()}. Commands can be typed directly into the console at the \code{>} prompt and executed by pressing \code{Enter}. As stated previously, for shorter, non-repetitive commands, utilizing the console is fine, but for the most part I strongly suggest you make it a priority to learn to work from script files. You may then execute lines or chunks of \R code by sending them to the console and in this way you are able to repeat a command by resending the code rather than retyping the entire line(s) in the console (which is extremely tedious). For Windows script editor: place the cursor on the line of interest, or highlight multiple lines, and hit \code{CTRL+r}. For the Mac: hit \code{Cmd+Enter} from the built in Mac editor. Of course, if using an auxiliary script editor (such as \code{RStudio}), you bypass this process and send code chunks directly from the external script editor. For now, lets use the console for the following simple interactive calculations. %%%%%%%%%% \subsection{Basic calculations and input} %%%%%%%%%% \R uses most of the familiar mathematical operators you may be accustomed to from other computer languages. Addition, subtraction, multiplication, division, exponentiation, $e^x$, etc. are \code{+, -, *, /, \^{}, exp(x)}, respectively. In addition, when working from the console you may use the arrow up key ($\uparrow$) to scroll through the previous few commands to the prompt and edit them (using the $\leftarrow \to$ keys) as desired. Lets begin with some simple computations: type \code{6 + 9} and then \code{Enter} at the command prompt (\code{>}) of the \R Console. <>= 6+9 @ The answer is returned with \code{[1]} next to it, which represents the number of the first element of that output row (in this case only one entry). Try some simple calculator-type calculations on your own to get used to the input/output style of \RR. Rules of mathematical order of operations also apply. Try typing <>= 7-12/3^2 @ although, when making more complex calculations I recommend using parentheses, this is much less for \R than for your own sake. Also note, you may separate an equation (line break) to a second line by hitting \code{Enter} and this will result in a \emph{continuation prompt} (\code{+}), where \R tells you that you haven't finished and prompts you to continue.% % \footnote{For the remainder of this tutorial I have removed the continuation prompt (\code{+}) in printed \R code to simplify the cutting and pasting of code from the tutorial, but you will likely see it during your session, particularly if you directly use the Console.} <>= (18 * 9) * (2*(3 + 0.5)) - # R knows you want to subtract something 1.5*(81)^0.5 @ \noi You may also separate multiple different commands on a single line using the semicolon (\code{;}). <>= 7/2; 8*3; 9/3*10; 64^(1/3); log(45) @ Up until now the input and output are printed in this document more or less exactly as you will see it on the Console, showing the prompt (\verb+>+) and continuation (\code{+}) characters; from now on the prompt and continuation characters will be missing, and the results will have comment characters (\verb+##+) in front of them, to make it easier for you to cut and paste the results into \R if you like. %%%%%%%%%%%%%%%%%% \subsection{Variable assignment} %%%%%%%%%%%%%%%%%% \R carries out calculations with variables in the same manner as numbers (as above). You may assign values to variables (\ie create objects) by using either the \code{=} or \code{<-} symbol, which is a ``less than'' symbol immediately followed by a ``minus'' sign.% % \footnote{In practice \code{<-} can be used interchangeably with \code{=}, in almost every situation you will encounter. Opinions differ about which is better.} % This symbol can be viewed as ``gets assigned to'': \code{z} ``gets assigned to'' \code{y*(x/2 + 7) + sqrt(y)}. When assigning/naming variables, remember they must begin with a letter, but may contain numbers and decimal points thereafter, and that assignment is case-sensitive (\code{variable.1} $\not=$ \code{Variable1}). Try the following: <>= x = 7 y = 9 z <- y * (x/2 + 7) + sqrt(y) @ Notice the result (\code{z}) was not displayed in the console. Once assigned, a variable (or object) is stored in \R memory as an object and to view the result you must first ``call'' the variable by typing the variable name and pressing \code{Enter}. In calculating \code{z}, \R remembers that \code{x} and \code{y} have already been assigned a numeric value and uses those values in calculating \code{z}. <>= x; y; z @ A trick you can use if you want to both assign and print the value is to surround the entire expression in parentheses: <>= (Ex1 <- y^x - z^2) @ Try using the up-arrow ($\uparrow$) to edit the value of \code{y=5} and then recalculate \code{z} and \code{Ex1}. We can now try a more practical example combining assignment and calculation using a list of numbers called a vector (more on vectors in \S~\ref{sec:vectors}). Remember that \R is case sensitive, so \code{X} $\not=$ \code{x}. <>= X <- 1:10 Y <- (1-1/X) + 2 X Y @ This will return two lists% % \footnote{I'm using the term ``list'' loosely here, in \R the term \code{list} is a unique class of object. Technically, \code{X} and \code{Y} are \emph{vectors}.} % % of numbers which can then be plotted to display the relationship between $x$ and $y$. We will discuss plotting in more detail in \S~\ref{sec:plots}. \fig[H] \centering <>= plot(X, Y) @ \caption{Your very first plot using \R -- the relationship between \code{X} and \code{Y}.\label{fig:xy}} \efig %%%%% \np %%% %%%%%%%%%%%%%%%%%%%%%%%% \section{Importing Data} \label{sec:importing} %%%%%%%%%%%%%%%%%%%%%%%% First things first. You need to get your data, which was perhaps collected in the field and now exists as an Excel worksheet or something similar, loaded into \R memory. For the \S~\ref{sec:importing} and ~\ref{sec:exploring} I suggest reading the supplement by Ben Bolker entitled \emph{Importing data into} \RR. In addition, Chapter~2 of his book, \emph{Ecological Models and Data in R},% % \footnote{Bolker, B. (2008). \emph{Ecological Models and Data in R}. Princeton University Press. pp. 408. Graciously provided in \code{pdf} format by B. Bolker (ISBN: 0691125228; \url{http://www.math.mcmaster.ca/~bolker/emdbook/}).} % % provides a more comprehensive discussion of how to shape your data into a compatible form for analysis in \R (including troubleshooting tips for common data formatting issues). Importing data is a relatively simple but essential process, yet you would be surprised how many unexpected issues can arise. The simplest, but by no means the only, way is to save your \pkg{Excel} file as a comma delimited file (\code{filename.csv}), using \proginput{Save as}. Two small hints: \benN \item make sure your home \R directory is set to the location where your datafile is located. \item make sure your headings are clear but simple (\ie no spaces, funny characters, etc.). \een The function \code{read.csv()} converts the \code{csv} file into a \emph{data frame}. A data frame is essentially a 2-dimensional array that can contain any combination of vectors (columns of data) that are of integer, numeric, character, or factor classes. This differs from a \code{matrix} object which can only contain one class of data. One last note on data file organization: it's important to know the difference between \emph{wide} format and \emph{long} format. Table~\ref{tab:dataorg} shows the same data organized in two ways. On the left, it's in long format: there are a few long columns since there's one row for each observation. On the right, it's in wide format: more, shorter columns, with multiple observations on each row. For some analyses it's important to have your data in one format or another. Wide format is often easier to read, but many analyses require the data in long format. There is a function in the base installation of \RR, \code{reshape()}, that converts between the formats (somehow I can never remember how to use it! -- \code{?reshape}). Lately I've been using the \code{melt()} and \code{cast()} functions (confusingly within the \pkg{\{reshape\}} \emph{package}), instead of \code{reshape()}. Use whatever tool/function you like to get your data into long format. %%%%%%%%%%% \multicolumn{3}{c|}{Three Merged Cols} \\ \begin{table}[H] \centering \belowcaptionskip=0.25cm \caption{Long vs. wide format.} \begin{tabular}{c c c | c c c c c c } \hline\hline \multicolumn{3}{c|}{\textbf{Long format}} & \multicolumn{6}{c}{\textbf{Wide format}}\\ % No. & Species & Variable & No. & A & No. & B & No. & C \\ \hline 1 & A & 10.88 & 1 & 10.88 & 6 & 13.38 & 11 & 4.31 \\ 2 & A & 11.59 & 2 & 11.59 & 7 & 14.97 & 12 & 6.13 \\ 3 & A & 10.91 & 3 & 10.91 & 8 & 12.01 & 13 & 5.66 \\ 4 & A & 10.78 & 4 & 10.78 & 9 & 15.14 & 14 & 4.99 \\ 5 & A & 10.00 & 5 & 10.00 & 10 & 12.60 & 15 & 5.77 \\ 6 & B & 13.38 \\ 7 & B & 14.97 \\ 8 & B & 12.01 \\ 9 & B & 15.14 \\ 10 & B & 12.60 \\ 11 & C & 4.31 \\ 12 & C & 6.13 \\ 13 & C & 5.66 \\ 14 & C & 4.99 \\ 15 & C & 5.77 \\ \hline \end{tabular} \label{tab:dataorg} \end{table} %%%%%%%%%%%% %%%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%%% \benN \item Read the \code{TreeData.csv} file into \R memory using \code{read.csv("TreeData.csv"}. Add the argument \code{(\dots, row.names= 1)} and note what it does. Be sure to redefine (overwrite) this data frame back to the original (without the \code{row.names} argument) because you will use the original data below. It is usually easier to assign the new data frame a name such as \code{mydata} so that you can refer to it later if necessary (\eg \code{tree.diameter = mydata\$dbh}). \item So you don't get hung up at this preliminary stage, if you are having difficulty reading \code{TreeData.csv} in to memory I have provided a supplementary ``cheat'' file called \code{TreeData.Rdata} which should help. Simply double-click this file and its contents (the object \code{mydata}) should be read into memory). Alternatively, type \code{load("TreeData.Rdata")} directly in the console.% % \footnote{Be sure that the working directory is set correctly!} % % Make sure this was done correctly by calling \code{mydata} in the console (or type \code{ls()} to make sure \code{mydata} is in \RR's list of objects in memory). I have also included the file \code{R\_Tutorial\_All\_Data.Rdata} which contains \textbf{all} the data sets required for this tutorial. Try to read the data into memory using \code{read.csv()} but as a last resort use the \code{*.Rdata} files. \item Call the data frame and take a look at what was loaded into memory. Explore the \code{ls()} function as well as these below to see what they do. \bit \item \code{summary(mydata)} \item \code{sapply(mydata, class)} \item \code{names(mydata)} \item \code{attributes(mydata)} % \item \code{attach(mydata)} % \item \code{detach(mydata)} \eit % Notice what \code{ls()} gives you before and after you \code{attach} your % data frame. Are there any new objects? I prefer to assign each variable % individually for this reason, so I know there is nothing going on in the % background I cannot see. % \textbf{Note about attach():} Advanced \R users do not routinely employ attach % in their work, because it can lead to unexpected problems in resolving % names (\eg you can end up with multiple copies of the same variable % name, each of a different length and each meaning something completely % different). The problem is that if you manipulate an ``attached object'', the % changes aren't propagated back to the original data, therefore \code{xyz} % and \code{mydata\$xyz} are no longer are the same. Most modelling % functions like \code{lm()} or \code{glm()} have a \code{data=} argument % so attach becomes unnecessary. \item The first thing you'll need to do is make sure your data set is complete, that means no pesky \code{NA}s (the code for missing data in \RR). First, determine if some of those \code{NA}s should actually be zeros. You can identify whether you have a complete data set with the command \code{complete.cases(mydata)} or if you have a \emph{huge} data set you may want to simplify your life with \code{which(!complete.cases(mydata))} and hope \R returns \code{integer(0)}. If not, and you're sure you want to remove cases with with missing data, you can remove them with \code{mydata <- na.omit(mydata)}.% % \footnote{The Bolker references give a excellent and detailed description regarding how to deal with \code{NA}s.} % % \item You wish to determine how many of each species of tree is present within the dataset. This is particularly effective if you are organizing data by a factor (such as species, which you just identified as a factor with \code{sapply()}% % \footnote{The \pkg{gdata} package has a simplified version called \code{ll()}, try \code{ll(mydata, dim=TRUE)}.} % % function above), or some discrete variable rather than a continuous one. Try assigning a column of \code{mydata} a name% % \footnote{It is a good idea to avoid using the following: \code{c}, \code{q}, \code{t}, \code{C}, \code{D}, \code{F}, \code{I}, and \code{T}. These are already reserved for other meanings in \R (\code{T} and \code{F} are abbreviations for \code{TRUE} and \code{FALSE}). It's also a good idea to avoid \code{l} and \code{O} (lower-case letter ``L'' and upper-case letter ``O''), because they are easily confused with 1 (one) and 0 (zero).} % % (\eg \code{species = mydata\$spp}) and using the \code{table()} function, \code{table(species)}. Repeat this with other variables within the data frame. \item Lastly, especially for data frames with factors (which you will probably want to do analyses by), the function \code{levels()} can be useful. Explore this function. \een \np %%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Exploring Your Data} \label{sec:exploring} %%%%%%%%%%%%%%%%%%%%%%%%%%% Check to see that your \code{mydata} is still in memory using the \code{ls()} function. If not, re-read it into memory. Then type \code{summary(mydata)} to get a basic overview of the descriptive statistics of your data. You'll have to be able to manipulate your data set at some point. This is primarily done using the \code{sort()} and \code{order()} functions. Make sure you know the difference between them! Typically, \code{sort()} will organize according to your specifications \emph{independently} of the other columns, which you may not want to do (be thankful \R doesn't rewrite your original \code{*.csv} file!), and is therefore more appropriate for arranging stand alone vectors (see \S~\ref{sec:vectors}) as opposed to data frames or matrices (see \S~\ref{sec:matrix}). Lets try a simple example: <>= order(mydata$dbh) mydata[order(mydata$dbh), 1:ncol(mydata)] @ The first line returns a vector indicating the row numbers of the entries in the \code{dbh} column in increasing order. The second line tells \R to rearrange the data frame by increasing values of \emph{dbh} and to order \textbf{all} columns in this fashion. The part before the comma determines the order of and which rows to include. The part after the comma tells \R which columns to return (I used the \code{ncol} function to extract the number of columns of the data frame). Alternatively you could simply leave the column declaration blank which defaults to all columns: <>= mydata[order(mydata$dbh), ] mydata[order(-mydata$dbh), ] # decreasing @ The default for \code{order()} is increasing; you may add the negative symbol (\code{-}) to arrange the data frame according to \emph{decreasing} values of the desired column. This will work only for \code{numeric} or \code{integer} class variables; for non-numeric sorting variables you must use \code{rev(order())} -- for example: <>= mydata[rev(order(mydata$Infected)), ] # for a factor (character string) @ %%%%%%%%%%%%%%%% \begin{table}[H] \centering \belowcaptionskip=0.25cm \caption {List of the main logical operators used by \RR.} \begin{tabular}{l l} \hline \hline \code{x < y} & less than \\ \code{x > y} & greater than \\ \code{x <= y} & less than or equal to \\ \code{x >= y} & greater than or equal to \\ \code{x == y} & equal to \\ \code{x != y} & not equal to \\ \code{\&} & AND \\ \code{\&\&} & AND with IF \\ \code{|} & OR \\ \code{||} & OR with IF \\ \hline \end{tabular} \label{tab:ops} \end{table} %%%%%%%%%% %%%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%%% \benN \item Now your turn; use the \code{order()} function to re-sort \code{mydata} according to \emph{increasing} bark thickness, then decreasing sapwood area (be sure that the neighboring columns are also affected!). If you want the changes to be stored as new objects in memory, name it first (\eg \code{mydata2 <- mydata[order(mydata\$dbh),]}). Also try sorting by the column \emph{spp} (both $\uparrow \downarrow$), which by now you know to be a factor: what happens? \item You can also sort by multiple variables/factors. Sort \code{mydata} first by \code{species} alphabetically and then by decreasing \code{NobarkArea}. \item Now create a new data frame called \code{tree1} which contains only \code{tree \#}, \code{SapDepth}, and \code{SapArea} and is sorted by decreasing \code{SapDepth}. I suggest you play with this a bit as you will be doing much of this with your own data and the data sets we will be using during this workshop. There is a function called \code{subset()} that can simplify this process, but since this function does not do any sorting, you'll have to do this in two steps. If you have time you should explore this way too. How would you make a data frame containing only those trees measured in \code{Winter}? \item You can also exclude certain cases from the data frame that do not satisfy a given set of arguments. For example:% % % \footnote{For this command and all that follow throughout this tutorial, if you % attached your data using \code{attach(mydata)} you do \textbf{not} need to % refer to objects from the data frame using this construction. Nevertheless, % I would steer clear of \code{attach()} and see my note about it above.} % <>= mydata[mydata$BarkThick != 0.0, ] @ \dots~check how many cases were removed <>= BTzero <- mydata[mydata$BarkThick != 0.0, ] nrow(BTzero) nrow(mydata) @ In the first call above you are asking \R to return \textbf{all rows} (notice the comma) where the column \code{BarkThick} is \textbf{not} equal to zero. These logical arguments (\ie \code{!=}) can also be used in combination with others using the \verb+&+ symbol. For example, limit the data frame to only those trees which have a sapwood area $\le$ 100 ${\rm cm}^2$ \emph{and} have a sapwood depth $\ge$ 2 (there should be 7). If you think you'll be using this new data set, simply assign it a new variable name and work with that new data frame. Table~\ref{tab:ops} shows other operators for comparisons and selecting data. \item No preliminary exploration is complete without taking a look at some graphs (psst: advisors \emph{love} graphs \Smiley), so lets take a look at the usual suspects in data exploration. \bit \item Histogram (the overall distribution of your data) \item Boxplot (discontinuous data; relationships) \item Scatterplot (continuous data; relationships) \item Multiple scatterplots (data separated by \eg sex, season, etc.) \item Barplot (data displayed by category as bars) \eit \noi Here is the \R code for Fig.~\ref{fig:expplot} (ensure that \code{mydata} is still in memory!): \fig[H] \centering <>= par(mfrow=c(1,3)) ## create plot array of 1 row x 3 columns hist(mydata$SapDepth, xlab= "Sapwood Depth", main= "Histogram: Sapwood Depth", col= "gray50") boxplot(SapDepth ~ spp, data= mydata, ylab= "SapDepth", col= "darkslateblue", main= "Boxplot: Sapwood Depth by Species") plot(mydata$dbh, mydata$Heartwood, pch= 17, col= "darkred", ylab= expression(paste(plain(Area)," ",(cm^2))), xlab= "DBH (cm)", main= "DBH vs. Heartwood & Sapwood") points(mydata$dbh, mydata$SapArea, pch= 19, col= "darkgreen") legend("topleft", legend= c("Heartwood", "Sapwood"), pch= c(17, 19), col= c("darkred", "darkgreen"), bg= "gray95") @ \caption{Quick and dirty visual exploration of data.\label{fig:expplot}} \efig Fig.~\ref{fig:expplot} shows examples of various types of exploratory plots. Use the code above to replicate it. Look at \code{mydata} and explore some relationships for yourself graphically (\eg do species differ in sapwood depth?). Upon visual inspection of Fig.~\ref{fig:expplot}b it looks like they do. To confirm this, you would then have to show if this difference is statistically significant (see \S~\ref{sec:stats}). From the scatterplot it also appears there is a relationship between dbh and the two chosen tree characteristics. A correlation or linear regression analysis would bear this out statistically (see again \S~\ref{sec:stats}). Lastly, the \code{qplot()} function (from the \code{ggplot2} package, which will be covered more later in the week) allows us (among other things) to plot each species in a separate color \emph{or} in its own separate subplot. In the first case shown in Figure~\ref{fig:xyplot}, we specify that \code{SapDepth} and \code{dbh} are the $x$ and $y$ coordinates respectively, we tell \R to look within the data frame \code{mydata} for the variables, and we specify that the points should be colored according to species. In the second case, we say that the plot should be divided into \emph{facets} according to species (we have to specify it as a formula, which is why we say \verb+~spp+ instead of just \code{spp}). Try it out for yourself with different dependent and independent variables. What happens when if you use a two-sided formula for the facets (\verb+Infected~spp+) instead of just \verb+~spp+ as your grouping factor? (You will need the \code{ggplot2} package installed and loaded to use \code{qplot}.) \fig[H] \centering % fake <>= qplot(SapDepth,dbh,data=mydata,color=spp) ## left subfigure qplot(SapDepth,dbh,data=mydata,facets=~spp) ## right subfigure @ <>= library(ggplot2) library(gridExtra) grid.arrange(qplot(SapDepth,dbh,data=mydata,color=spp), qplot(SapDepth,dbh,data=mydata)+facet_wrap(~spp), ncol=2) ## +facet_wrap(~spp) ## xyplot(SapDepth ~ dbh | spp, data= mydata, ## xlab= "dbh (cm)", ## ylab= "Sapwood Depth", ## pch= 19, ## col= "navy", ## as.table= TRUE) @ \caption{More quick and dirty visual exploration of data.\label{fig:xyplot}} \efig % The function \code{coplot()} produces similar graphics output to % \code{xyplot()}, however it may be more appropriate for multivariate data, % especially where the ``organizing'' variable (spp in Fig.~\ref{fig:xyplot}) is % continuous (\eg temperature, altitude, etc.). Experiment with \code{coplot()} % on the above instead of \code{xyplot()} and compare. \item Make a Barplot of the mean \emph{Area Without Bark} first by \code{season} and then by \code{species}. The conclusions from these results are meaningless, but you'll soon have a fairly complete script for making nice \R plots. I will start you off with the season, can you follow what is happening at each step? \code{tapply()}% % \footnote{Similar to \code{apply()} from above, but can be applied to 1D objects. It can be viewed as a combination of the \code{table()} and \code{apply()} functions.} % % is an extremely useful function that applies a function to the values in a vector and produces a table based on one or more grouping variables (\eg factors) and shows the result of that function for each group. This sounds complicated but it really isn't. To illustrate, try typing \code{tapply(mydata\$dbh, mydata\$season, sd)}. What did \R return? All I'm doing below is telling \R to take that table and make it a vector instead. Now complete the Barplot exercise for season, and then repeat for species or another dependent variable. <>= Nobark.season <- tapply(mydata$NobarkArea, mydata$season, mean) x.axis <- levels(mydata$season) barplot(Nobark.season, names= x.axis, ylim= c(0, max(Nobark.season))) @ Notice that when defining the vector containing means, they are combined by \code{tapply()} from left to right \emph{alphabetically}. This is good because in the next step I define the x-axis with the \code{levels()} function, which also arranges its levels alphabetically. Otherwise it would decouple the mean value with the season to which it corresponds! Pay attention to these kinds of operations: always double check against known values (do they look right?) and always scrutinize the graph (does it meet your expectations?). \een %%%% \np %% %%%%%%%%%%%%%%%%%%%%% \section{Vectors} \label{sec:vectors} %%%%%%%%%%%%%%%%%%%%% A vector is basically just a list of values, all the same type or class (numeric, logical, character). The numeric vector is a list of numbers and is one of the most often manipulated objects in \RR. Actually, a number in \R is treated as a vector of length 1. They are used for everything from calculating means, plotting graphs, storing outputs of functions, or simply storing parameters for later use within a larger program. It is essential to learn to master their manipulation. Some \R functions create vectors, but you can also create/assign vectors yourself. For example: <>= v1 <- c(1, 2, 7, 8, 9, 13) @ Simply typing \code{1, 2, 7, 8, 9, 13} without \code{c()} results in an error. In addition, vector elements can be named. For example: <>= v2 <- c(first = 1, second = 1.2, third = 1.6, fourth = 2.1) @ There are numerous ways to refer to elements in vectors (\emph{vector addressing}): \bit \item by position: \code{v1[1]} or (multiple elements) \code{v1[c(1,2)]} \item by name: \code{v2["first"]}; only if you've already assigned element names as in \code{v2} \item by exclusion: \code{v1[-1]} drops the first element \item with logical vectors: \code{v2[c(TRUE,TRUE,FALSE,FALSE)]} gives the first two elements only; more usefully, \code{v2[v2>1.2]} first computes a logical vector \code{[FALSE,FALSE,TRUE,TRUE]} and then uses it to select the last two elements only. \eit Most of \RR's functions are \emph{vectorized}; if you give them vectors they automatically do the right thing (which can frustrate \pkg{MATLAB} users!). Functions of two vectors such as \code{(c(1,3,5,7) + c(2,4))} will automatically replicate the shorter vector until it is as long as the longer one, then carry out the calculation and give a warning message if the longer vector is not an even multiple of the shorter. For example: <>= v1 3 * v1 v1 + c(3,2,1) v1 * c(2,4) v1 * c(1,2,3,4) @ Some of the most basic and useful functions for manipulating vectors are in Table~\ref{tab:vecs}. \textbf{Remember:} operations and functions are generally applied to vectors in \R on an \emph{element-by-element} basis. %%%%%%%%%%%%%%% \begin{table}[H]\centering \belowcaptionskip=0.25cm \caption{A few of useful commands for creating and modifying vectors in \RR.} \begin{tabular}{l l} \hline \hline \code{1:x}&Vector from 1 to $x$ by increments of 1. \\ \code{seq(from,to,by=x)}&Vector values in increments of $x$ (if $x$ $<$ 0 = decreasing) \\ \code{seq(from,to,length=x)} & Vector of evenly spaced values of $x$ length \\ \code{c(u,v,\dots)}&Combine numbers and/or vectors into a single \\ & vector (from left $\to$ right) \\ \code{rep(a, b)}& Creates vector of repeating $a$ elements by amounts of $b$\\ \code{length(x)}& Number of entries/elements in vector $x$\\ \code{abs(x)}& Absolute value of $x$ \\ \code{cos(x),sin(),tan()}& Cosine, sine, tangent of angle x in radians \\ \code{exp(x)}& Exponential function, $e^x$ \\ \code{log(x)}& Natural logarithm of $x$ ($log_e$)\\ \code{log10(x)}& Common logarithm of $x$ ($log_{10}$)\\ \code{sqrt(x)}& Square root of $x$ \\ \code{head(x,n)}& Returns the first $n$ entries of vector $x$ \\ \code{tail(x,n)}& Returns the last $n$ entries of vector $x$ \\ \code{round(x,n)}& Round entries of $x$ to $n$ decimal places \\ \hline \end{tabular} \label{tab:vecs} \end{table} %%%%%%%%%%%% %%%%%%%%%%%%%% \subsection*{Exercises} \label{sec:vector_ex} %%%%%%%%%%%%%%% \benN \item Create a vector with the numbers: \code{2,5,8,2,1,9,4,7,5,6} named \code{v3}. \item Create a vector with the numbers: \code{2,2,2,5,5,9,9,8,8,8,8}. Try to use the \code{rep()} command and call it \code{v4}. \item Create a vector called \code{v5} that ranges from 0 to 6.5 by increments of 0.5. How many entries are in this vector? (I don't want to see anyone pointing at the screen and counting!). \item Understand why \code{rep(1:5, 1:5)} and \code{rep(1:6, c(1, 2, 3, 3, 2, 1))} produce what they do.% % \footnote{The \code{each=} argument within \code{rep()} can also be very useful.} % \item Create a vector called \code{all.vecs} that includes all the numbers you just created (in \code{v3,v4,v5}) in one long vector (any order). \item Create a vector of 15 random numbers from 50 $\to$ 150 and name it \code{mass}.% % \footnote{See Table~\ref{tab:distns} for more information about random number generation.} % % In this case assume a uniform distribution. \item Assume this is the yearly increase in mass for some treatment of interest, create a new vector called \code{mass.d} representing the \emph{daily} weight gain. Now create a new vector called \code{mass.r} with these daily values rounded to two decimal places. What is the average daily weight gain? \item As I said in \S~\ref{sec:exploring}, I think \code{sort()} is more appropriate for stand-alone vectors, so try using it on \code{mass} (both increasing and decreasing). \item Create a new vector of length 7 with \binpI \item the minimum value \item maximum value \item mean \item sum \item median \item and range \einp of \code{mass.r}. \item Use the entries in the vector above to multiply the mean daily weight gain by the total weight gain, divided by the square root of its median, plus the log of the minimum daily gain. I cannot imagine why you would want to do this, but you get the point; you can use vectors to make calculations and then use those vector entries to make subsequent calculations. For example, you could place all the parameters of a model into one vector, then call them independently by vector addressing we learned above (\ie \code{[]} brackets). \item Now let's use the imported tree data to do something. The process is similar to above. Assume individual leaf area is a function of numerous other tree characteristics and can be calculated as follows: \Eqn \label{eqn:leafarea} \text{Leaf Area} = \big( 0.1 \cdot \delta ^{(1-\beta)} \big) \ + \ \sqrt{\frac{\tau + 1}{\sigma}} \eqn where $\delta$, $\beta$, $\tau,$ and $\sigma$ are tree diameter (\code{dbh}), bark thickness, area without bark, and sapwood area respectively. This would be similar to writing an equation in \code{Excel} and then copying it down the column. \textbf{Note}: you could use either \code{sqrt()} or \verb+^0.5+), but for cube roots or higher, you must use the ``fractional exponents'' syntax construction. Create an object called \code{LeafArea} that represents Eqn~\eqref{eqn:leafarea}. You will use this object later in this tutorial. The resulting vector should look like this: \een <>= LeafArea <- 0.1*mydata$dbh^(1-mydata$BarkThick) + sqrt((mydata$NobarkArea+1)/mydata$SapArea) @ <>= LeafArea @ %%%% \np %% %%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Generating Random Numbers} \label{sec:rvec} %%%%%%%%%%%%%%%%%%%%%%%%%%%% \R provides great flexibility and options for generating random sequences of numbers from various distributions. Table~\ref{tab:distns} shows the major functions used to produce numbers according to these distributions. \begin{table}[H] \centering \belowcaptionskip=0.25cm \caption{Random number generation in \R from various distributions.} \begin{tabular}{l l}\hline\hline \code{runif(n,min,max)} & Random $n$ numbers from uniform distribution \\ \code{rnorm(n,mean,sd)} & Random $n$ numbers from a Normal distribution\\ & with given mean and standard deviation \\ \code{rpois(n,lambda)} & Random $n$ numbers of Poisson distribution\\ \code{rbinom(n,size,p)} & Random $n$ numbers from Binomial distribution; \\ & size = \# trials; p = probability of ``successful'' trial \\ \code{rnbinom(n,size,mu)} & Same as above except Negative binomial distribution \\ & (\code{mu} = alternative parameterization via the mean: \code{mu} must be specified by name) \\ \code{rgamma(n,shape,scale)} & Random $n$ numbers from Gamma distribution \\ \hline \end{tabular} \label{tab:distns} \end{table} Of course you can also determine the probability density, distribution, and quantile functions for each of these distributions using either \code{d}, \code{p}, or \code{q} instead of \code{r}. For example, \code{dbinom}, \code{pbinom}, \code{qbinom} will produce the probability density, distribution, and quantile functions respectively from a Binomial distribution for a given set of parameters. %%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%% \benN \item It is important to get some experience ``playing'' with these functions, especially what they look like and what they're used for. Look at Fig.~\ref{fig:normaldist} which shows what each of these functions do. Try to reproduce it using the \code{curve()} and \code{hist()} functions.% % \footnote{These functions are introduced in \S~\ref{sec:plots}. Look ahead for clues.} % % Below is the code for the first plot to get you started:% % \footnote{Modified from Crawley, M.J. (2007). \emph{The R Book}. John Wiley \& Sons. pp. 950.} % % % Just to show the 1st one; not all figs present here <>= par(mfrow=c(2,2)) curve(dnorm, from= -4, to= 4, xlab= "z", ylab= "Probability Density", main= "Density", col= "darkgreen", lwd=2) @ \fig[H] \centering <>= par(mfrow=c(2,2), oma = c(0, 0, 3, 0), bg="white") curve(dnorm, -4, 4, xlab= "z", ylab= "Probability Density", main= "Density", col="darkgreen", lwd=2) curve(pnorm, -4, 4, xlab= "z", ylab= "Probability", main= "Probability", col="darkgreen", lwd=2) curve(qnorm, 0, 1, xlab= "p", ylab= "Quantiles (z)", main= "Quantiles", col="darkgreen", lwd=2) set.seed(101) hist(rnorm(1000), xlab= "z", ylab= "Frequency", main= "Random Numbers", col="darkgreen") mtext("The Normal Distribution", outer= TRUE, cex= 1.5, font=2) # main title @ \caption{Reproduced shamelessly from \emph{The R Book} (p. 219).\label{fig:normaldist}} \efig \item What does the following \R code (below) produce and why? Can you predict what the plot will display before you run it? It is often convenient when generating random numbers to set the \emph{seed} using \code{set.seed()} so that you can ensure that the \emph{same} random numbers are generated each time. Doesn't that make it \emph{non}-random? Yes, but it's good for ``debugging'' and code verification (and making sure students get the same result you do!). The actual seed can be any arbitrary integer (666 produced a nice histogram). Can you produce something similar, but with \emph{SapDepth} as in Fig.~\ref{fig:expplot}a? <>= set.seed(666) nD <- rnorm(1000, mean=400, sd=25) hist(nD, main= "", xlab= "nD", ylab= "Probability Density", col= "gray88", freq= FALSE) curve(dnorm(x, mean(nD), sd(nD)), from= min(nD), to= max(nD), col= "navy", lwd= 2, add= TRUE) @ \een %%%% \np %% %%%%%%%%%%%%%%%%%%%%%% \section{The Matrix} \label{sec:matrix} %%%%%%%%%%%%%%%%%%%%%% Besides evoking images of bullets flying in 3D surround slo-mo, the matrix (or 2-dimensional array) can be put to good use in \RR. Aside from their obvious use in population projections and classical matrix algebra. Be careful! \RR's default multiplication is set to the entry-wise product% % \footnote{The Hadamard product. Remember, the default in \R is almost always entry-wise.} % % using the \code{*} symbol. For proper matrix multiplication you will want to use \verb+%*%+ (see Table~\ref{tab:matrix}). If nothing else, matrices are useful in storing output of a given program/function, which is then to be used to some other purpose (\eg plotting). %%%%%%%%%%%%%%%% \begin{table}[H] \centering \belowcaptionskip=0.25cm \caption{A few important functions for working with matrices in \RR.} \begin{tabular}{l l} \hline \hline \code{matrix(v,m,n)} & $m \times n$ matrix using the values in $v$ \\ \code{nrow(),ncol()}& Number of rows and columns of a data frame or matrix\\ \code{t(A)} & Transpose (exchange rows and columns) of matrix \textbf{A} \\ \code{dim(A)} & Dimensions of matrix \textbf{A}. (dim(A)[1]=rows, dim(A)[2]=cols) \\ \code{edit(A)} & Call up ``spreadsheet'' interface to edit the values in \textbf{A} \\ \code{diag(v,n)} & A diagonal $n \times n$ matrix with v on diagonal, 0 elsewhere \\& (by default v = 1, so \code{diag(n)} gives an $n \times n$ identity matrix; \\&if v = vector it is placed on the diagonal and $n$ = length(v))\\ \code{cbind(a,b,c,\dots)} & Combine \textbf{compatible} objects by attaching them by columns \\& (\eg a data frame with only numeric data) \\ \code{rbind(a,b,c,\dots)} & Same as \code{cbind} but attaching them by rows (can also bind \\& multiple matrices in this manner if dimensions match) \\ \code{as.matrix(x)} & Convert object of another type to a matrix (if possible) \\ \code{outer(v,w)} & Outer product of vectors \code{v}, \code{w}: the matrix whose $(i, j)^{th}$ \\ & element is \code{v[i]*w[j]}\\ \code{A[,n]} & Returns a vector of column $n$ \\ \code{A[m,]} & Returns a vector of row $m$ \\ \code{A + B} & Matrix addition (or - for subtraction); entrywise \\ \code{sum(A)} & Sum of all elements of matrix \textbf{A} \\ \code{A \%*\% B} & Matrix multiplication ($A \times B$); (or \%/\% for division) \\ \code{A * B} & Entrywise multiplication ($A \times B$); the Hadamard product \\ \code{det(A)} & The determinant of matrix \textbf{A}\\ \code{eigen(A)} & The eigenvalues and right eigenvectors of matrix \textbf{A} \\ \hline \end{tabular} \label{tab:matrix} \end{table} %%%%%%%%%% %%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%%% \benN \item Use the construction \code{matrix(v, nrow, ncol)} to create the matrix in Eqn~\eqref{eqn:ex1} where $v$ is a data vector created using the \code{seq()} function from Table~\ref{tab:vecs}. You will probably need to modify the function with the argument \code{byrow=TRUE}. \Eqn \label{eqn:ex1} \textbf{Mat} = \Mat 16 & 12 & 8 \\ 4 & 0 & -4 \\ -8 & -12 & -16 \\ \mat \eqn \item Create the matrix \textbf{A} below in Eqn~\eqref{eqn:A}. You can do this either by using the \code{c()} command and typing the entries individually or first creating a vector with the entries you want and then putting that vector as the first argument in the \code{matrix()} function. Try it both ways. \Eqn \label{eqn:A} \textbf{A} = \Mat 2 & 3 & 7 & 8.8 & 11 \\ 3 & 4.3 & 8 & 9 & 12 \\ 8 & 16 & 0.1 & 5 & 9 \\ 5 & 0.4 & 9 & 1.7 & 3\\ 0.7 & 6 & 5.9 & 4 & 7 \mat \eqn \item Use \code{dim()} to determine the dimensions of \textbf{A}. The output is a two column vector (rows, columns), which is the standard for \R (rows first, then columns), however this example is perhaps not ideal as nrows = ncols (5 $\times$ 5). Alternatively, you may use \code{nrow()} or \code{ncol()} instead of \code{dim()[1]} and \code{dim()[2]} respectively. These commands are especially useful when writing functions (see \S~ \ref{sec:functions}). \item Use the \code{rowSums()} command to create a vector that contains the totals of each row of \textbf{A}. Now do the same for the columns; can you guess what the command is called? It might be useful to give these vectors a name. Now try using the \code{apply()} function from \S~\ref{sec:intro} to do the same thing. \item Now use the commands \code{cbind()} and \code{rbind()} to attach these two vector of sums to create a new matrix \textbf{B} using the tools above which looks like Eqn~\eqref{eqn:B}. \Eqn \label{eqn:B} \textbf{B} = \Mat 2 & 3 & 7 & 8.8 & 11 & 31.8 \\ 3 & 4.3 & 8 & 9 & 12 & 36.3 \\ 8 & 16 & 0.1 & 5 & 9 & 38.1 \\ 5 & 0.4 & 9 & 1.7 & 3 & 19.1 \\ 0.7 & 6 & 5.9 & 4 & 7 & 23.6 \\ 18.7 & 29.7 & 30.0 & 28.5 & 42 & 148.9 \mat \eqn \item Add the vector you created earlier called \code{LeafArea} to \code{mydata} as a column in a similar fashion as above. \item It's sometimes necessary to export your data frame or matrix. Use the \code{write.csv()} command to do this (it will go to your home directory). Try exporting matrix \textbf{B} to a \code{.csv} file called \code{MatrixSum.csv}.% % \footnote{If you do export a modified data frame, be sure to rename the \emph{new} modified data set with a unique file name. It's potentially disastrous to re-enter your data (so be careful). It's usually possible, and preferable, to reshape/manipulate your data \emph{within} the \R environment using commands at the beginning of your script, as opposed to overwriting \code{*.csv} files.} % If you get stuck, don't forget about \code{?write.csv}. \item Matrix population models use projection matrices to estimate future populations using a set of difference equations which are condensed into a matrix which contains the vital rates of various stages/classes. The simplest of these models takes the form:\label{q:map} \Eqn \label{eqn:map} \vec x_{(t+1)} = \textbf{M} \; \cdot \; \vec x_{(t)} \eqn where $\vec x$ is a vector containing the number of individuals in a given age, stage, or class and \textbf{M} is the projection matrix. Consider the projection matrix \textbf{M} in Eqn~\eqref{eqn:M} which contains 3 classes. The diagonal represents the survival rates of the classes, the sub-diagonal (0.1 and 0.3) represents the transition probability (\ie class 1 $\to$ 2 and class 2 $\to$ 3) and the top row represents the fecundity of each class (the number of class 1 individuals produced per individuals of class 2 or 3). Class 1 does not reproduce but if it did, entry \textbf{M}[1,1] would be $s_1 + f_1$. \Eqn \label{eqn:M} \textbf{M} = \Mat s_1 & f_2 & f_3 \\ t_1 & s_2 & 0 \\ 0 & t_2 & s_3 \mat = \Mat 0.3 & 1.0 & 3.0 \\ 0.1 & 0.4 & 0.0 \\ 0.0 & 0.3 & 0.8 \mat \eqn Assuming $\vec x_1$ = [1 \ 2 \ 4], calculate $\vec x_2$. Be careful, remember how \RR's matrix multiplication works (you'll need to use \verb+%*%+). We'll come back to Eqn~\eqref{eqn:M} later. \een %%% \np % %%%%%%%%%%%%%%%%%%%% \section{Loops} \label{sec:loops} %%%%%%%%%%%%%%%%%%%% Loops are essential in any programming language, master them and you're well on your way. There are two basic loops in \RR, the \code{for}-loop and the \emph{while}-loop. The \code{for}-loop runs through the loop $x$ number of iterations, then exits the loop and continues to the rest of the program. The \code{while}-loop continues looping until some predetermined criteria (\code{condition}) is met (\eg as long as $x$ is $\le$ then 1000, continue on looping). In \code{while}-loops it is often a good idea to place a \code{counter} within the loop (see below) so that later you can determine how many iterations were required to exit the loop. In this sense, \code{for}-loops could be considered a subset of \code{while}-loops (if the counter were used in the condition statement). In addition, your variable in defining the condition must appear somewhere within the loop and must change as the loop iterates (otherwise you'll enter the loop and never get out!). Below is an example of a simple \code{for}-loop: <>= Popn <- 1 ### create an initial population size Gen <- 10 ### how many generations for (n in 2:Gen) { Popn[n] <- Popn[n - 1] * 2 } Popn @ After calling \code{Popn}, this should look familiar, the population appears to exhibit exponential growth. \textbf{Note} the values over which you are asking \R to iterate (2:Gen). Have you seen this construction before? What do you get when you enter \code{2:10} in the console? What this means is that it is possible to skip iterations by using something like: \code{for (i in seq(2,10,2))}. \noi Here is a similar example to the above using a \code{while}-loop: <>= pop.vec = pop.now <- 1; count <- 1 while (pop.now <= 25000) { pop.now <- pop.now * 2 pop.vec <- c(pop.vec, pop.now) count <- count + 1 } pop.vec @ So essentially what you're telling \R is that it should start with the current population of 1, the storage vector that will contain the population trajectory is also 1 (since it hasn't started yet), and I've added a counter so that following the simulation I will be able to know how many iterations were required to satisfy the condition that the population be $\le 25000$. Notice that I've used the \code{c()} function to \emph{combine} the previous population sizes with the current population size; \R is essentially adding a new number (\code{pop.now}) to an ever growing population vector (\code{pop.vec}), until the population is more than 25000. Now call \code{pop.vec} and \code{n} to see the simple projection and how many iterations it took to do so. Now, you can modify this code to replicate the classic expression, ``double a penny every day for a month and you'll be a millionaire.'' Is this true? How long does it take to become a millionaire? Lastly, you can also do loops within loops (aka \emph{nested} loops), which is quite common but no more complicated than regular sequential loops, just keep track of what you're doing as they can get messy. Also, people tend to use \code{n} or \code{i} as counters in \code{for}-loops (I don't know why), but it could be any letter or even combination of letters you want. %%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%% \benN \item Try adding some ``noise'' or stochasticity to the first population growth model (the \code{for}-loop) using some random number generation, then roughly plot the results. We'll do more intense plotting in \S~\ref{sec:plots}. For now try: <>= plot(1:Gen, Popn, type="b") @ \fig[H] \centering <>= par(mfrow=c(1,2)) plot(1:Gen, Popn, type= "b") plot(1:Gen, jitter(Popn, amount=50), type= "b", pch= 19) @ \caption{Quick graph of exponential population growth using a simple \code{for}-loop.\label{fig:Popn}} \efig Alternatively, you could add noise to a data vector using the \code{jitter()} function. \item Use a loop to create a vector representing a Fibonacci sequence running from 0 $\to$ 610. I suggest starting with the \code{Fib.v = c(0,1)}, then add to this vector at each iteration of the loop. In the end it should look like this: <>= Fib.v <- c(0,1) while (max(Fib.v) < 610) Fib.v<-c(Fib.v, Fib.v[length(Fib.v)-1]+Fib.v[length(Fib.v)]) @ <>= Fib.v @ Then: \benI \item Use this vector to create the following matrix: <>= Fibonacci1 <- matrix(Fib.v, 4,4) Fibonacci1 @ \item Use the \code{sample()} function to create a matrix of random Fibonacci numbers (see \code{?sample} to determine the proper syntax if necessary). For example:% % \footnote{Yours won't be \emph{exactly} the same because of random sampling!}\\ % <>= Fibonacci2 <- matrix(sample(Fib.v, length(Fib.v), replace=FALSE), 4, 4) Fibonacci2 @ \een \item Write a script using a \emph{nested} \code{for}-loop which produces the following $5 \times 5$ projection matrix:% % \footnote{It is not absolutely necessary to use a nested \code{for}-loop to accomplish this but it's a good exercise.} % % \Eqn \label{eqn:nested} \textbf{NestMat} = \Mat 0 & 1 & 2 & 3 & 4 \\ 0.1 & 0 & 0 & 0 & 0 \\ 0 & 0.2 & 0 & 0 & 0 \\ 0 & 0 & 0.3 & 0 & 0\\ 0 & 0 & 0 & 0.4 & 0 \mat \nonumber \eqn \item Use a loop and Eqn~\eqref{eqn:M} to project the 3 class population described above for 20 generations (\ie $\vec x_i$, $i=1\dots20$). You will need to use a secondary matrix to store the results of the iterations (and then to plot them in \S~\ref{sec:plots}). You should use Exercise~\ref{q:map} in \S~\ref{sec:matrix} to guide you. \een %%% \np % %%%%%%%%%%%%%%%%%%%%%% \section{Plotting and Graphing} \label{sec:plots} %%%%%%%%%%%%%%%%%%%%%% Remember what I said in \S~\ref{sec:exploring} about advisors and graphs. That aside, plots are an excellent way to summarize various data visually and are essential if you're ever going to publish your data. Luckily, \R makes great high-quality plots of all types, colors, and varieties. I also recommend you figure out how to save your plots as a pdf so you can send them to your advisor \Smiley. \begin{table}[H] \centering \belowcaptionskip=0.5cm \renewcommand{\arraystretch}{2} \tabcolsep=0.33cm \caption {List of the 8 numeric colors used by \RR. Thereafter, numeric colors are recycled, thus \code{col=9} is again \code{"black"}. There are many many other colors available using a string argument construction (\ie \code{"midnightblue")}: see \code{colors()}.} \begin{tabular}{l l | l l | l l} \hline \hline No. & Color & No. & Color & No. & Color \\ \hline 1 & Black & 4 & \blue{Blue} & 7 & \color{yellow}{Yellow} \\ 2 & \red{Red} & 5 & \color{cyan}{Cyan} & 8 & \color{light-gray}{Gray} \\ 3 & \green{Green} & 6 & \color{magenta}{Magenta} & 9 & Black \\ \hline \end{tabular} \label{tab:cols} \end{table} One of the simplest ways view, visually explore, or summarize data is to plot them. The usual suspects are barplots, boxplots, scatterplots, and pie charts. However, depending on your audience they can be useful. I can never remember the individual numeric codes for the various \textbf{points}, \textbf{line types}, and \textbf{colors} available in \R plotting functions, so Fig.~\ref{fig:pchsymbols} below is a cheat-sheet: <>= plot(1:25, 1:25, xlab="",ylab="",pch=1:25,col=1:25,cex=2) grid(lty=1, col="gray90") points(1:25, 1:25, xlab="",ylab="",pch=1:25,col=1:25,cex=2) title("Plotting symbol, line type, & color codes") legend("topleft", legend=1:6, lty=1:6, lwd=1.5, ncol=2, bg="gray95") legend("bottomright", legend=1:8, col=1:8, ncol=3, pch=19, bg="gray95") @ \fig[H] \centering <>= <> @ \caption{Basic numeric plotting codes used by \RR. For points, the symbols are accessed using the \code{pch=} argument and range $1 \to 25$ (some are redundant). Line types are shown in the legend, are accessed using the \code{lty=} argument, and range $1 \to 6$. Color codes (see Table~\ref{tab:cols}) range from $1 \to 8$ and are accessed using the \code{col=} argument.\label{fig:pchsymbols}} \efig First create some dummy data to plot: <>= set.seed(876) InfStatus <- factor(sample(c("Susceptible","Infected","Recovered"), size= 50, replace= TRUE)) I <- table(InfStatus); I Genotype <- factor(sample(c("RR","Rr","rr"), size= 50, replace= TRUE)) G <- table(Genotype); G table(Genotype, InfStatus) @ \fig[H] \centering <>= par(mfrow=c(2,2), mar=c(3,2,2,1), oma= c(0,0,3,0), bg="white") plot(InfStatus, ylim= c(0,27)); box() ### or barplot(I); box() barplot(table(Genotype, InfStatus), ylim= c(0,13), beside= TRUE); box() legend("topright", c("RR","Rr","rr"), fill= c("gray40","gray70","gray90"), ncol= 1, cex= 0.75) boxplot(rnorm(50, mean=15, sd=3) ~ Genotype, col= "gray75") pie(G, col= c("gray50","gray70","gray90")) mtext("Basic R Plots", outer= TRUE, cex= 1.5, font= 2) # main title @ \caption{Examples of a few basic exploratory plots (and code syntax) for barplots, boxplots, and pie charts in \RR.\label{fig:dummyplots}} \efig There are literally \emph{endless} ways to modify graphics plots in \R including colors, axis type, line type, adding a legend (as above), add a grid, background colors, labels, borders, etc. What happens if you change the \code{beside} argument = \code{FALSE} in Fig.~\ref{fig:dummyplots}b? The help pages for most of these functions can point you in the right direction for the syntax of what you would like to do. You may also want to use the \code{par()} to modify your plots; type \code{?par} to see how truly massive this help file is! Odds are you'll find what you need buried in there, somewhere. %%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%% \benN \item Boxplots can be a particularly good way of presenting data and exploring relationships without losing a lot of important information about the distribution of the data (especially for continuous data; I don't like barplots for this reason). Take a look at Fig.~\ref{fig:expplot}b and reproduce it for another variable in the data set. Next, say you want to explore the possibility of a relationship between season, infection and some other continuous variable (perhaps trees become infected at a particular time of year as a function of this variable?). Make a doubly grouped boxplot to explore this possibility. \item Symbols plots are also a great way to add information about the data points of a basic scatter plot. The \code{symbols()} function is great for this, and is especially good for geographical data where $x$ and $y$ are grid coordinates.% % \footnote{Especially longitude and latitude.} % % Read the data file \code{InfectionByCounty.csv} into \R and make a geographical plot of infection counts for these data with blue solid circles and gray edges. \item Now lets generate some fake data of a typical infectious disease outbreak. Lets assume that if an individual becomes infected they are infected for life (\ie no recovery). We define the model by assuming some general underlying deterministic function, then assume some stochasticity about this deterministic function. We assume the variation around this model is normally distributed and will use the \code{rnorm()} function. <>= set.seed(125) t <- seq(1, 30, by= 0.5) y.det <- 0.4 / (1 + exp(0.4 * (15 - t))) # underlying deterministic model p.inf <- rnorm(length(y.det), mean= y.det, sd= 0.02) # add normal noise plot(t, p.inf, main= "Outbreak XYZ", ylab= "Proportion Infected", xlab= "time") @ % make the actual plot \fig[H] \centering <>= par(mfrow=c(1,2)) set.seed(125) <> plot(t, p.inf, ylab= "Proportion Infected", pch=19, main= "Outbreak XYZ", xlab= "time", col= 4); grid() curve(0.4 / (1 + exp(0.4*(15 - x))), from= 0, to= 40, col= 6, lwd= 3, add= TRUE) @ \caption{Some fake generated outbreak data.\label{fig:outbreak}} \efig Now plot these data, you should get something like Fig.~\ref{fig:outbreak}a. You can use the \code{lines()} function to add lines to an existing plot as shown in Fig.~\ref{fig:outbreak}b. See if you can replicate it using the \code{pch=} and \code{col=} arguments within the \code{plot()} function along with the \code{grid()} function for gridlines. \item Plot the population projection you produced in the final exercise of \S~\ref{sec:loops}. You should get something like Fig.~\ref{fig:3class}. Play with the arguments to get it to look like Fig.~\ref{fig:3class}, including the legend. <>= Gen <- 20 StoreMat <- matrix(0, Gen, 3) pars <- c(0.3, 0.1, 0, 1, 0.4, 0.3, 3, 0, 0.8) ipop <- c(1,2,4) M <- matrix(pars, 3, 3) ### Projection matrix from above StoreMat[1,] <- ipop ### place 1st row as initial population (1,2,4) ### Now the loop ### for (n in 2:Gen){ StoreMat[n,]= M %*% StoreMat[n-1,] } @ <>= par(mfrow=c(1,1)) @ \fig[H] \centering <>= plot(1:Gen, StoreMat[,1], type="l", col=1, , xlab="No. of generations", main="Three Class Population Example", ylab="No. of Individuals", lwd=2) lines(1:Gen, StoreMat[,2], type="l", col="navy", lwd=2) lines(1:Gen, StoreMat[,3], type="l", col="darkred", lwd=2) legend("topleft", legend=c("Class 1","Class 2","Class 3"), lty=1, col=c("black", "dark blue", "dark red"), lwd=2, bg="gray95") @ \caption{Quick population projection of 3 class model.\label{fig:3class}} \efig \item \label{itm:popmat} Matplot is a very good command for quickly graphing data multiple lines in the same plotting window. Your data frame or matrix must be arranged in a particular format however. Each variable to be plotted must be arranged by column and must have the same independent variable (\eg time). Read \code{Pop6C.csv} into memory and use \code{matplot()} to produce Fig.~\ref{fig:matplot}. If you are having trouble importing \code{*.csv} files I have also included an \R script file of these data (\code{Pop6C.R}). Don't forget the gridlines and legend. Are the horizontal gridlines% % \footnote{See \code{?grid}} % % on top of the legend and plot-lines or beneath them? \fig[H] \centering <>= matplot(1:20, popdata, type="n", xlab= "Generations", ylab= "Number of Individuals per Class", main="Population Projection for 6 Classes") grid(NA, NULL, col="gray90", lty=1) matplot(1:20, popdata, type="l", xlab= "", lwd= 1.5, ylab= "", main="", add=TRUE) legend("topleft", legend= c("Class 1","Class 2","Class 3","Class 4","Class 5","Class 6"), lty= c(1:5), ncol= 2, col= c(1:6), bg= "gray95") @ \caption{Using \code{matplot} to quickly graph simultaneous variables with the same independent variable (time/generations in this case).\label{fig:matplot}} \efig \item The next essential function for your \R toolbox is \code{curve()}, which is a \emph{magical} function that takes a mathematical function/expression and plots over a given range of $x$. Most of the arguments are identical to the \code{plot()} function so you don't have to learn many new ones. Graph the following functions with \code{curve()} (to start try from=--10, to=10) and finally use the \code{add=TRUE} argument to create the graph in Fig. \ref{fig:outbreak}b. \begin{eqnarray*} f(x) &=& \frac{1}{1 + x^2}\\ f(x) &=& 2x^3 - 8x^2 + 2x + 6\\ f(x) &=& \frac{0.4}{(1+e^{(0.4(15-x))}) } \end{eqnarray*} \ \item Yet another useful plotting function is \code{abline(a,b)}. This nifty little function adds a straight line with intercept = a and slope = b to an existing plot (in school I learned \textbf{m} = slope and \textbf{b} = intercept, so now I find it personally confusing to have b = slope; so just be careful). \code{bmline()} would make more sense, but that's just me \Smiley. Familiarize yourself with the following: \bit \item \code{abline(5, 0.66)} \item \code{abline(h = 4)} \item \code{abline(v = 2)} \eit You must add the line to an \emph{existing} plot, but all the familiar arguments for plotting lines apply here, you can change its thickness, colour, type, etc., just as you would any other line using \code{lines()}, \code{curve()}, or \code{plot()}. We'll be using \code{abline()} again later in \S~\ref{sec:stats}. \een %%% \np % %%%%%%%%%%%%%%%%%%%%%% \section{Creating Functions} \label{sec:functions} %%%%%%%%%%%%%%%%%%%%%% Writing your own functions (aka subroutines) in \R is in my opinion the most useful tool/strategy in \R and much of it's power stems from this flexibility. Let's start simple with the generalized format of a function: <>= myfun <- function(x, y, z) { expression_1 expression_2 expression_n Output ## if desired ## or list(Output) } @ In this very generic example, \code{x}, \code{y}, and \code{z} are the arguments for the function and will have to be provided by the user (you). They are essentially the ingredients the function requires to perform its job and can be vectors, scalars, matrices, or data frames (any supported \R object). It is also possible to provide the function with a default value for an argument by writing \code{function(x, y=5, z)} for example. If no value for \code{y} is provided, the function will use \code{y=5}. The expressions within the function will describe what \code{myfun} will do to \code{x}, \code{y}, and \code{z} in order to produce what you want as an output. The last line usually ends with what you want the function to spit out when you hit \code{Enter} (this must be described within the expressions). If there is more than one object you wish to output, use \code{list()}. The function \code{cat()} can be especially useful for printing some output during the completion of a function (useful for debugging or tracking a long simulation). In addition, an output line is not absolutely \emph{necessary}, only if you want \R to return something when you hit \code{Enter}. Lets try a very simple example. Below is a function that already exists in \RR, the \code{mean()} function, which I will emulate so you can double check it with the ``real'' function that is included in the \code{base} package installation of \RR. <>= xbar <- function(x) { mu <- sum(x)/length(x) mu } @ So, now to use the function you just created, simply type \code{xbar(vector)} where \code{vector} is a predetermined list of numbers already in memory.% % \footnote{This function will also work on matrices because they can be easily converted into vectors.} % % Type \code{ls()} and find a vector (or matrix) already in memory and execute your new function, then double check it against the \code{mean(vector)} function provided with \RR. Here it is applied to \code{LeafArea} from \S~\ref{sec:vector_ex}. <>= xbar(LeafArea); mean(LeafArea) xbar(Mx) xbar(21:49) @ Also notice that in the list of objects in memory, your new function \code{xbar} is now present. Lastly, it is important to remember that the variable names you choose in a function's arguments and expressions are \emph{not} saved as objects in \RR's global memory and are used only within the function \emph{locally} to tell \R how to manipulate the arguments you've given it. As a result, they can be used elsewhere in your program (\ie in the above, \code{mu}, \code{x}, \code{y}, and \code{z} can already exist as objects, which is good because in a long program you may start running out of obvious abbreviations for your variables). This also means, however, that you \emph{cannot} use a function to reassign the value of a variable. OK, one last example and then you're on your own. Say for some reason you wanted a function that created a matrix of random numbers. There are numerous ways to do this, here is one: <>= rMat <- function(m, n, min, max, dec=0) { random <- runif(m*n, min, max) A <- matrix(random, m, n) round(A, dec) } rMat(3, 7, 5, 37) @ Line one tells \R that the new \code{rMat()} is going to be a function and defines the arguments that will be required for that function to work. The arguments \code{m} and \code{n} are the dimensions of the matrix (rows, cols), \code{min} and \code{max} define the range of numbers the matrix should contain, and \code{dec} is the number of decimal places the numbers within the matrix should be rounded to (default set to zero decimal places). The second line creates a vector of uniform random numbers of the desired range using the \code{runif()} function already provided with \RR. Line three creates a matrix called \textbf{A} from those random numbers and of the desired dimensions. The final line tells \R to spit out \textbf{A} with its elements rounded to the desired number of places. Play around with \code{rMat()} until you feel you're comfortable with the syntax of writing functions, then move on to the exercises. %%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%% \benN \item Below is a function that produces the 95\% confidence intervals (assuming a normal distribution). It uses Eqn~\eqref{eqn:se} to calculate the standard error of the mean, and returns the upper and lower CI95s along with the mean of any vector (\code{x}) provided as an argument. \Eqn\label{eqn:se} SE_m = \frac{sd}{\sqrt{n}} \eqn \np <>= CI95 <- function(x) { mean <- mean(x) sd <- sd(x) n <- length(x) se <- sd/sqrt(n) CI.vec <- c((mean-(1.96*se)), mean, (mean+(1.96*se))) names(CI.vec) <- c("2.5%", "Mean", "97.5%") CI.vec } @ Explore this function for a few minutes then use it on a few of the vectors in \S~\ref{sec:vectors} or on any of the variables in \code{mydata} (\eg Heartwood area). Remember, the vector does not have to be named \code{x} per se, it can have any name since \code{x} above is only stored locally \emph{within} the function itself (and therefore not a global variable). <>= CI95(mydata$Heartwood) @ \item Write a function that, given a vector of length=3, containing the numbers of individuals of each genotype [$AA$, $Aa$, $aa$], will produce the allele frequencies of the population (lets assume no sampling error). These are your bog standard $p$'s and $q$'s from Hardy-Weinberg. \item Write a function that incorporates the steps you used above in Eqn~\eqref{eqn:B} to produce matrix \textbf{B}, given an initial matrix \textbf{A} (\ie given a matrix, create a new matrix with an extra column and row which contains the row and column sums of the original matrix). \item Look at the function below. What does it do? Can you predict its output? First execute the function with its default arguments (\code{NormFun()}), then explore arguments of your choosing. Finally, modify the core of the function to explore areas of your own interest (BinomialFun?). <>= NormFun <- function(n=1000, mu=400, sdv=25, brks=25) { Y <- rnorm(n, mu, sdv) hist(Y, main="", xlab="", freq= FALSE, col= "gray88", breaks= brks) curve(dnorm(x, mu, sdv), col="navy", add=TRUE) } @ \een Last note on functions: it is common and often useful to use a function as an argument in yet another function (a function \emph{within} a function), analogous to \emph{nested}-loops in \S~\ref{sec:loops}. You will be doing this often in \S~\ref{sec:ode} below. %%% \np % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Ordinary Differential Equations (ODEs)} \label{sec:ode} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Ordinary differential equations (ODEs) govern how one dependent variable (population size or number of infectious individuals) changes relative to changes in some other independent variable (\eg time). ODEs can be used to mathematically describe mechanistic, biological relationships between dependent and independent variables and to project this relationship in time in order to understand the dynamics of the dependent variables. Based on this mechanistic relationship, once you have the determined equations, the next step is to investigate the dynamics of your model (\eg changes over time) for a given set of parameter values. This section will show you how to do just than in \RR. First, you will need to load the \code{deSolve} package since we will be relying heavily upon \code{ode()} in this section. This function is the workhorse for ODEs and is peculiar in that it, as mentioned briefly in \S~\ref{sec:functions}, takes a function as one of its arguments. \bit \item \code{ode}'s main arguments are the starting values (\code{y}), the times at which you want to compute the values of the variables you are interested in (\code{times}), a derivative function (\code{func}), and some parameters (\code{parms}). \item \code{func} must take as its \emph{first three} arguments the current time (\code{t}), the current values of the variables (\code{y}), and a vector containing the parameter values. It must also return a list (using \code{list(item1)} where the \code{item1} is the vector of derivatives calculated at the current time and system state (see below). \eit \noi Here are two examples to get you started:% % %%%%%%%%%%%% \subsection{Logistic growth} %%%%%%%%%%%%%%%% We will start with logistic growth partly because it has only one dependent variable (and one differential equation to solve) and partly because most of you are familiar with this type of growth. The model equation that describes how the population size changes over time ($\frac{dN}{dt}$) for this kind of system at its simplest looks like this: \Eqn \label{eqn:logistic} \frac{dN}{dt} = rN \bigg(1 - \frac{N}{K}\bigg) \eqn \noi where $N$ is the population size (dependent variable), $r$ is the instantaneous growth rate (a parameter), and $K$ is the carrying capacity (also a parameter). To solve this differential equation in \R we are going to give \code{ode} a function which describes the model Eqn~\eqref{eqn:logistic}, some values for $r$ and $K$, and some initial conditions (\eg the initial values of time and population size). \code{ode} likes its arguments to be as follows: <>= ode(initial_values, time_interval, gradient_function, parameters) @ \dots~ so lets first create our function describing Eqn~\eqref{eqn:logistic}: <>= LogisGrow <- function(t, y, parms) { N <- y[1] dN <- parms[1] * N * (1 - N/parms[2]) list(dN) } @ Notice our function has the format that \code{ode} likes. The \code{t} will be used as a time step by \code{ode()}, the \code{y} is used as the initial value of \code{N} (\code{y} can be a vector if there are more than one dependent variables to follow), and \code{parms} is the vector containing the values of the parameters to be used in the differential equation. The fourth line defines how \code{N} changes with changes in time ($\frac{dN}{dt}$) and the final line returns a vector containing the rate equations (in this case only one). Next solve the equation using \code{ode()}: <>= logistic <- as.data.frame(ode(y= c(N= 0.1), times= seq(0, 10, by=0.1), func= LogisGrow, parms= c(r= 0.9, K= 5))) head(logistic) # take a peek at the first few rows @ Notice the start values, time interval, and parameter values were defined directly as arguments within \code{ode}. The starting value of $N=0.1$ and $r$ and $K$ have been set to 0.9 and 5 using the \code{parms=} argument. The time interval has been set using the \code{seq()} function we learned back in \S~\ref{sec:vectors} and goes from 0 $\to10$ by increments of 0.1 (it's simply a vector of time steps). Lastly, \code{ode()} will produce a matrix of the solutions with \code{time} as Col 1 by default. Notice I have asked \R to produce a \code{data frame} instead of a matrix which is slightly more convenient for plotting as in Fig.~\ref{fig:logistic}. \fig[H] \centering <>= plot(N ~ time, data= logistic, main= "Simple Logistic Growth Model", col= "navy") @ \caption{Logistic growth model with $r = 0.9$ and $K = 5$.\label{fig:logistic}} \efig %%%%%%%%%%%%%% \subsection{The SI model} %%%%%%%%%%%%%% Now that you've gone through a simple example with one differential equation, let's follow up with a model system of a set of two, coupled ODEs following two dependent variables of interest. The same general steps apply, we just scale up the process slightly. One of the most basic ODE disease models many of you are probably familiar with is the SI model% % \footnote{Loads of assumptions but still quite reasonable for some systems.} % % which stands for the \textbf{S}usceptible and \textbf{I}nfected classes within a host population (we're not concerned here about parasite populations \Smiley). In Eqn~\eqref{eqn:SI}, $\beta$ is the transmission rate which governs how quickly $S$ become $I$, $S$ is the number of susceptible hosts in the population, and $I/N$ is the proportion infected (frequency-dependent infection). One standard assumption is a closed system (\ie $N$ is constant) and therefore $N = S + I$. We will be describing the rates of change of the susceptible and infected classes and since these two classes sum to a constant, only one differential equation needs to be explicitly described in the model (since $I = N - S$, $\frac{dI}{dt}$ is superfluous), yet for illustrative purposes we include both equations as an example of how one might set up a two dimensional (2D) model in \RR. What do the differential equations look like for a simple SI model?% % \footnote{Just for fun, google SI model and see what you get \Smiley} % % Pay attention, you'll be doing a full SI\textbf{R} (with an additional Recovered/Removed class) in a moment. Here we go:\\ \Eqn\label{eqn:SI} \begin{gathered} \frac{dS}{dt} = -\beta S \bigg(\frac{I}{N}\bigg) \hfill \\ \frac{dI}{dt} = \beta S \bigg(\frac{I}{N}\bigg) \hfill \\ \end{gathered} \eqn \\ \noi Here is the \R code used for the SI model above: <>= tInt <- seq(0, 25, by = 1/2) pars <- c(beta= 0.75) Initial <- c(S = 4999, I = 1) # # The function # # SIfun <- function(t, y, parms) { with(as.list(c(y,parms)), { dS <- -beta * S * (I/(S+I)) dI <- beta * S * (I/(S+I)) ODEs <- c(dS, dI) list(ODEs) }) } @ The \code{with()} function is a \emph{magical} functions: the point here is that it enables you to write out your ODE equations in a more familiar way, as opposed to having to refer to parameters with respect to the \code{parms} vector (\ie \code{beta = parms[1]}, $\dots$) or your state variables with respect to the \code{y} vector (however, in order to use \code{with} in this way you must have given appropriate names to your parameter and initial-state vectors, and (of course) the sets of names must not overlap). Compare this construction with the \code{logistic} ODE function above with I didn't use \code{with()}. %If you like, you may think of \code{with()} as a temporary form of \code{attach()}, but without the potential pitfalls associated with \code{attach()}. <>= SIout <- as.data.frame(ode(Initial, times= tInt, func= SIfun, parms= pars)) head(SIout) @ \fig[H] \centering <>= par(mfrow = c(1,2)) plot(S ~ time, data= SIout, ylab= "Number", col= "navy") points(I ~ time, data= SIout, col= "darkred") legend("right",bg="gray95",c("S","I"),pch=1,col=c("navy","darkred")) plot(I ~ S, data= SIout, ylab= "Susceptible", xlab= "Infected") @ \caption{Results of the simple SI model system of ODEs of Eqn (\ref{eqn:SI}).\label{fig:SIout}} \efig Notice that this time I created vectors up front defining the time interval, parameter values, and initial starting conditions and then simply used that vector name as an argument in \code{ode}. This is often a simpler way to a) change values quickly when exploring a model, and b) organize your parameters easily if there are many (here there was only one). I've included my summary plot of the system (Fig.~\ref{fig:SIout}) but I'll leave it to you to code for yourself using the matrix \code{SIout} (since $S$ and $I$ are on similar scales, use \code{points()} to plot both on the same plot). What do you notice about the susceptible and infected populations? Does the epidemic peak quickly, then peter out, are there oscillations, etc.? Do the dynamics make sense considering the coupled differential equations of the model? Try repeating the process while exploring a few values of $\beta$. How do the population dynamics respond? %%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%% \benN \item Now it is your turn to expand on Eqn~\eqref{eqn:SI} with a complete SIR model. Remember you can assume a closed system and therefore $N = S + I + R =$ constant, so you need only write two ODEs, however I suggest as a learning exercise that you do all three explicitly. Since this is an abstract example and we don't have any ``real'' parameters to go on, you may have to play around with them until you get something interesting. \item Next, consider an different type of two-dimensional model. Imagine a population that grows exponentially. Growth is partially determined by the activation temperature of some enzyme. This activation temperature is most efficient when it matches the temperature of the environment. Individuals who do not match the environment pay a cost in terms of growth. We can write the ODEs for changes in population size $x$ (the ecological part of the model) and for changes in the mean activation temperature in the population, $\alpha$ (the evolution of the trait), as follows: \Eqn \begin{gathered} \frac{dx}{dt} = x \big( r - (\alpha - \alpha_{opt})^2 \big) \hfill \\ \frac{d\alpha}{dt} = \varepsilon \big( -2 (\alpha - \alpha_{opt})\big) \end{gathered} \eqn \ where $\alpha_{opt}$ is the temperature of the environment and $r$ is the growth rate of the population (try to see how the ecological equation captures the reduced growth rate when the mean activation temperature is \emph{not} the same as the temperature of the environment), and $\varepsilon$ is the genetic variation of the population for that trait (var($\alpha$)). The programming here is analogous to a classical interaction model, so don't let it intimidate you.% % \footnote{This is why I had you do the 2D SIR model above!} % % \benI \item What do the dynamics of population size, $x$, look like without evolution (\ie there is no genetic variation)? Try the following values: $r = 0.2$ and $\alpha_{opt} = 10$ with initial conditions $x(0) = 2$ and $ \alpha(0) = 9.15$ for a timespan from 0 $\to$ 100. Then make a plot of $x$ vs. $t$. \item What do the dynamics of population size look like when $\alpha$ evolves? What do the evolutionary dynamics of $\alpha$ itself look like? Try the following parameter values: $r = 0.2$, $\alpha_{opt} = 10$, and $ \varepsilon$ = 0.03 with initial conditions $x(0) = 2$ and $\alpha(0) = 9.15$ for a timespan from 0 $\to$ 100. \textbf{Hint}: make plots of $x $ vs. $t$ and $\alpha$ vs. $t$. How does evolution change the population dynamics? Now set $\alpha(0) = 10.85$ and observe the population dynamics of the system. Is this in agreement with your understanding of how stabilizing selection functions? Explore various parameter values both here and above in ($i$) while making predictions regarding the behaviour of the model. What happens when there is more genetic variation in the population? \een \een %%%% \np %% %%%%%%%%%%%%%%%%%%%%%% \section{Basic Statistics} \label{sec:stats} %%%%%%%%%%%%%%%%%%%%%% Of course one of the great advantages of \R is that you can perform many statistical analyses right here, within the \R environment, without having to go to another program. The variety of analyses available in \R is vast. Unless you're pretty heavy into statistics, it is likely \R will be able to fulfill your needs. So let's start with basic exploratory, comparative, and relationship statistics. Table~\ref{tab:statfunct} shows some of the more common functions you'll need. First let's return to an example we came across in \S~\ref{sec:exploring}. We were curious about a relationship we discovered in Fig.~\ref{fig:expplot}c, now we get to test whether this putative relationship is statistically significant. Assume you've already determined a simple linear regression is appropriate, this is how we create the statistical model in \RR: <>= plot(Heartwood ~ dbh, data= mydata) @ Don't close this plotting window, you will be using it in a moment (if you did, simply re-plot it using the $\uparrow$ key). Now actually perform the regression analysis using a linear model via the \code{lm()} function and give the object the name \code{fit}. <>= fit <- lm(Heartwood ~ dbh, data= mydata) @ Notice that there is no immediate output of \code{lm()} as with most \R functions; in order to see the results we must use \code{summary(fit)}. Now we can add the regression line from \code{fit} to our existing plot using our old friend \code{abline()}. Do this and see if you can reproduce Fig.~\ref{fig:lm} without using the provided code. Can you use the various aesthetic arguments of \code{plot()} to reproduce it accurately? <>= summary(fit) @ \fig[H] \centering <>= plot(Heartwood ~ dbh, data= mydata, main= "Heartwood Area vs. DBH", ylab= expression(paste(plain(HeartwoodArea)," ",(cm^2))), xlab= "Diameter at breast height (dbh)", pch= 19, col= 1) grid(NA,NULL, lty= 4) abline(fit, col= 2, lty= 4, lwd= 2) legend("topleft", legend=c("lm(fit)"), col= 2, lty= 4, bg= "gray85", box.lty=0) @ \caption{Graphical summary of the linear regression, \code{lm()}, performed on our tree allometric data (dbh vs. heartwood area).\label{fig:lm}} \efig In general, \R provides \emph{accessor} methods, such as \code{coef}, for extracting the information from complicated objects like linear model fits. You can see what methods are available for a given class, such as \code{lm} (\code{class(fit)} tells you what class you should be inspecting), by typing \code{methods(class="lm")}. If you try this\footnote{Note that there is a bug in the current version of the \code{gdata} package, which the \code{gplots} package loaded automatically; in order to make \code{methods(class="lm")} work you'll first have to \code{detach("package:gplots",unload=TRUE); detach("package:gdata",unload=TRUE)} (and later reload them if you need them again).}, you'll see that there is (for example) a \code{residuals.lm}, telling you that you can type \code{residuals(fit)} to extract the residuals (and possibly \code{?residuals.lm} for more information). <>= detach("package:gplots",unload=TRUE) detach("package:gdata",unload=TRUE) methods(class="lm") library(gdata) @ It is always best to use the accessor methods (\code{coef}, \code{residuals}, etc.) if you can. However, sometimes you have to peek under the hood and see all the underlying components by typing \code{attributes(fit)}. <>= attributes(fit) coef(fit) @ You can then extract the individual components via \verb+fit$name+, where \code{name} is the name of the bit you want. Using this syntax, convince yourself that you get the same values for the residuals by extracting the component with \verb+$+ or using the accessor method. Lastly, you can manipulate these objects or save them, perhaps for use in other analyses. What does \code{coef(fit)[1]} produce? And \code{coef(fit)[2]}? %%%%%%%%%%%%%%%%% \begin{table}[H] \centering \belowcaptionskip=0.25cm \caption{A few of the functions in \R for statistical modeling and data analysis. There are \textbf{many} more, but you will have to learn about them somewhere else. The statistical functions such as var and sd assume values are samples from a population and compute an estimate of the population statistic (\eg for example sd(1:3)=1).} \begin{tabular}{l l} \hline \hline \code{hist(x)} & Histogram plot of value in $v$ \\ \code{mean(x),var(x),sd(x)} & Estimate of population mean, variance, standard \\ & deviation based on data values in $x$ \\ \code{median(x)} & Median value of $x$ \\ \code{cor.test(x,y)} & Correlation between the two vectors $x$ and $y$\\ \code{t.test} & One and two sample Student's $t$-test \\ \code{pairwise.t.test}& Pairwise $t$-test \\ \code{chisq.test} & $\chi^2$ test; differences in frequencies \\ \code{binom.test}& Binomial test for differences in proportions, binomial samples \\ \code{aov, anova} & Analysis of variance or deviance \\ \code{var.test} & Test of equal variance of sample means (Levene's test) \\ \code{ks.test(x,pnorm)} & Kolmogorov-Smirnov test of $x$ to Normal distribution \\ \code{ks.test(x,y)} & K-S test, do $x$ and $y$ come from different distributions? \\ \code{wilcox.test} & Mann-Whitney or Wilcoxon U non-parametric rank tests \\ \code{bartlett.test} & Bartless's test for equal variance, multiple treatments \\ \code{kruskal.test} & Kruskal-Wallis non-parametric rank test, multiple treats \\ \code{lm} & Linear models (regression, ANOVA, ANCOVA)\\ \code{glm} & Generalized linear models (\eg logistic, Poisson)\\ \code{nls} & Fit nonlinear models by least-squares \\ \code{optim} & Minimize (or maximize) a function over one or more parameters \\ \code{lme}, \code{nlme} & Linear and nonlinear mixed-effects models\\ & (repeated measures, block effects, spatial models)\\ \code{manova} & Multivariate analysis \\ \hline \end{tabular} \label{tab:statfunct} \end{table} %%%%%%%% %%%%%%%%%%%%% \subsection*{Exercises} %%%%%%%%%%%%% \benN \item In the linear model above, lets assume new information comes available which violates the assumptions of a linear regression. How might you now explore the relationship between tree diameter and heartwood area? How might you look at parametric and non-parametric alternatives? \item Create a new factor variable (\ie column) to be added to \code{mydata} called \code{Temp} which groups the seasons spring/summer and fall/winter two into levels of either \emph{Warm} or \emph{Cool}. There should be two levels for \code{Temp}. Now conduct a simple $t$-test for any of the continuous variables in \code{mydata}. For example, is there a difference in bark thickness depending on whether it is warm or cool? Or does bark thickness correlate with whether a tree is infected or not? I suggest you first check for a normal distribution and equal variances. Depending on the variable you choose, you may need to find a non-parametric rank test. \item Based on Fig.~\ref{fig:expplot} it seems as though species differ in sapwood depth. Test this hypothesis using a simple ANOVA. Then examine other continuous variables via boxplots, look for differences by species (or season), and test for statistical significance. Come up with a few of your own questions and hypotheses, then test them. \item \label{ex:bars} For any of the comparisons above, produce a barplot including error bars (SEMs or CI95s).% % \footnote{As mentioned, I prefer boxplots for this purpose because it shows the actual data points, but some advisors/journals/fields traditionally prefer barplots, so it's good practice to know how.} % % First try for yourself using the function you created in \S~\ref{sec:functions}, and adding the appropriate lines the the barplot. If you run into trouble I have included a function in the script file \code{R Tutorial Script.R} for creating barplots with error bars called \code{BarplotBars()} along with an example in TextBox~\ref{box:BarplotBars}. Alternatively, you can ``cheat'' (I do it all the time) and use the function \code{barplot2()}% % \footnote{Within the \pkg{gplots} package.} % % which can simplify the process. \een By now you should have a pretty firm introduction to the practical applications of \R for ecologists. You will be able to import and explore data and plot it to produce pretty graphs, perform some simulations (usually via loops), write your own functions, and carry out basic statistical analyses. You will rely heavily on these tools over the next few days during the \textbf{\number\year~EEID Workshop}. %%% \np % %%%%%%%%%%%%%%%%%%%%%%%%%% \section*{Tutorial Contributors} \label{sec:thanks} %%%%%%%%%%%%%%%%%%%%%%%%%% \begin{description} \item[Stu Field,] Dept. of Biology, Colorado State University, Fort Collins, CO. \item[Ben Bolker,] Depts. of Math \& Stats and Biology, McMaster University, Hamilton, Ontario. \item[Colleen Webb,] Dept. of Biology, Colorado State University, Fort Collins, CO. \item[Mike Antolin,] Dept. of Biology, Colorado State University, Fort Collins, CO. \item[Aaron King,] Depts. of Ecology \& Evolutionary Biology and Mathematics, University of Michigan, Ann Arbor, MI. \item[John Drake,] Odum School of Ecology, University of Georgia, Athens, GA. \end{description} \vfill \noi \hzline \\ \centering{\textbf{Last updated: }\today} %%%%%%%%%%%%%%%% % TextBoxes % %%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%% \begin{textbox} \caption{Here is the sample code for producing error bars for the exercise in \S~\ref{sec:stats}.\ref{ex:bars} above. You must \emph{first} load the function for making error bars, \code{BarplotBars()}, which is provided in the \R code script file \code{R\_cheat\_script.R}.} <>= ## Barplots with Error Bars # BarplotBars() # load this function first ## Produce necessary arguments for error.bars f <- factor(rep(LETTERS[1:4],each=20)) x <- runif(80) data <- data.frame(x,f) N <- as.numeric(table(data$f)) FactorMeans <- tapply(data$x, data$f, mean) sd <- tapply(data$x, data$f, sd) sem <- sd/sqrt(N) ci <- (FactorMeans + (sem * 1.96)) - (FactorMeans - (sem * 1.96)) labels <- as.factor(levels(data$f)) ## Make the Barplot with Error Bars (SEMs & CIs) # par(mfrow=c(1,2)) BarplotBars(FactorMeans, sem, labels, axis=2) title("Barplot with SEM bars") BarplotBars(FactorMeans, ci, labels, bcol=c(1,8), lcol=2, type=4, w=50) title("Barplot with CI95 bars") ## Points with Error Bars require(gplots) f <- factor(rep(LETTERS[1:4], each=20)) x <- runif(length(f)) data <- data.frame(x,f) N <- as.numeric(table(data$f)) Means <- tapply(data$x, data$f, mean) sd <- tapply(data$x, data$f, sd) sem <- sd/sqrt(N) ci <- (FactorMeans + (sem * 1.96)) - (FactorMeans - (sem * 1.96)) Treatments <- c(A=1,B=2,C=3,D=4) par(mfrow=c(1,2)) plotCI(Treatments, Means, uiw=sem, pch= 19, xaxt="n") axis(1, 1:length(Treatments), LETTERS[1:length(Treatments)]) plotCI(Treatments, Means, uiw=ci, pch= 19, xaxt="n") axis(1, 1:length(Treatments), LETTERS[1:length(Treatments)]) @ \label{box:BarplotBars} \end{textbox} %%%%%%%%%% %%%%%%%%%% %%%%%%%%%%% %%%%%%%%%%% \begin{textbox} \caption{The \R code below was used to project the imaginary population described in \S~\ref{sec:plots}.\ref{itm:popmat} while exploring the \code{matplot()} function. The population projections are stored in the files \code{Pop6C.csv}, \code{Pop6C.R}, and \code{Pop6C.Rdata} and are plotted in Fig.~\ref{fig:matplot}. By the way, this is an example of a stage-structured $SI$ model in discrete time.} <>= ## Used for creating Pop6C.csv require(popbio) s1 = 0.97 s2 = 0.08 s3 = 0.93 s4 = 0.06 s5 = 0.99 FA = 0.14 FJ = FA * 0.66 B = 0.015 F.sel = 0.12 S.sel = 0.9 I.pop = c(20, 14, 9, 25, 15, 19) # PM <- matrix(c(s1*(1-B), FJ, FA, 0, FJ*F.sel, FA*F.sel, s2*(1-B), s3*(1-B), 0, 0, 0, 0, 0, s4*(1-B), s5*(1-B), 0, 0, 0, s1*B, 0, 0, s1*S.sel, 0, 0, s2*B, s3*B, 0, s2*S.sel, s3*S.sel, 0, 0, s4*B, s5*B, 0, s4*S.sel, s5*S.sel), nrow=6, ncol=6, byrow=TRUE) pop.projection(PM, I.pop, iterations= 20) popdata <- pop.projection(PM, I.pop, iterations= 20)$stage.vectors write.csv(round(t(popdata), 2), file= "Pop6C.csv") @ \label{box:Pop6C} \end{textbox} %%%%%%%% %%%%%%%% %%%%%%%%% %%%%%%%%% \begin{textbox} \caption{The following \R code was used to create the figure on the title page of the tutorial (it also shows several variations on this theme). The \code{curve3d()} is a plotting function within the \pkg{emdbook} package. The \code{rgl} environment opens a special interactive graphics window allowing the user to rotate the image in 3D -- very cool!} <>= x <- seq(-10, 10, length = 50) y <- seq(-10, 10, length = 50) rotsinc <- function(x, y) { sinc <- function(x) { y <- sin(x)/x y[is.na(y)] <- 1; y} 10 * sinc(sqrt(x^2+y^2)) } z <- outer(x, y, rotsinc) ## using persp() persp(x, y, z, theta = 30, phi = 30, expand = 0.5, ltheta= 0, shade=0.4, col = "navy", axes=FALSE) persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = "navy", ltheta = 120, shade = 0.5, ticktype = "detailed", xlab = "X", ylab = "Y", zlab = "Z") ## using curve3d() require(emdbook) curve3d(outer(x, y, rotsinc), from= c(-10,-10), to= c(10,10), zlab="Z", ylab="Y", xlab="X", col="gray35") ## with persp curve3d(outer(x, y, rotsinc), from= c(-10,-10), to= c(10,10), zlab= "Z", ylab= "Y", xlab= "X", sys3d= "rgl", col= "navy") curve3d(outer(x, y, rotsinc), from= c(-10,-10), to= c(10,10), zlab="", ylab="", xlab="", sys3d= "rgl", col= rainbow(40), axes= FALSE, box= FALSE) @ \label{box:image} \end{textbox} %%%%%%%%% %%%%%%%%% %%%%%%%%% %%%%%%%%% \begin{textbox} \caption{Summary of some critical and useful functions in \RR.} \begin{table}[H] \centering \belowcaptionskip=0.5cm \begin{tabular}{l l} \textbf{Function} & \textbf{Description} \\[10pt] \code{ls} & lists contents of \R workspace/global environment \\ \code{rm} & removes objects from R workspace \\ \code{save} & save selected objects \\ \code{+},\code{-},\code{*},\code{/},\verb+^+ & arithmetic operators \\ \verb+%*%+ & matrix multiplication \\ \code{t} & matrix transpose \\ \code{solve} & matrix inverse (and solving linear equations) \\ \code{c} & combines (concatenates) objects, simplest way to make vectors \\ \code{seq} & creates vectors that are regular sequences \\ \code{rep} & replicates vectors \\ \code{length} & returns length of a vector \\ \code{sum} & returns the sum \\ \code{mean} & returns the mean \\ \code{median} & returns the median \\ \code{sd} & returns the standard deviation ($n-1$ in denominator) \\ \code{min} & returns minimum \\ \code{max} & returns maximum \\ \code{sort} & sort a vector (rearranges the vector in order) \\ \code{order} & returns indices of vectors that will order them \\ \code{rank} & returns rank of each element in vector \\ \code{==}, \code{<}, \code{>} & comparison operators \\ \code{<=}, \code{>=}, \code{!=} & \\ \code{|}, \verb+&+ & \code{OR}, \code{AND} \\ \code{is.na} & tests for missing value \code{NA} \\ \code{which} & does logical comparison and indicates which elements are \code{TRUE} \\ & that is, gives the \code{TRUE} indices of a logical object \\ \code{any} & does logical comparison returns 1 (\code{TRUE}) if any of the comparisons \\ & are \code{TRUE}, \ie is at least one of the values true? \\ \code{exp} & returns \code{e} to that power \\ \code{log} & returns natural logarithm (to the base e) \\ \code{log10} & returns logarithm (to the base 10) \\ \code{sqrt} & returns square root \\ \code{table} & does frequencies and cross-tabs \\ \code{help} & help page on specified function \\ \code{cbind} & combine by columns \\ \code{rbind} & combine by rows \\ \code{matrix} & create a matrix \\ \code{vector} & create a vector \\ \code{nrow} & number of rows in an array or data frame \\ \code{ncol} & number of columns in an array or data frame \\ \code{dim} & dimensions of an array or data frame \\ \code{array} & create an array \\[10pt] \end{tabular} \end{table} \end{textbox} %%%%%%%%% %%%%%%%%% %%%%%%%%% %%%%%%%%% \begin{textbox} \caption{Summary of some critical and useful functions in \RR.} \begin{table}[H] \centering \belowcaptionskip=0.5cm \begin{tabular}{l l} \textbf{Function} & \textbf{Description} \\[10pt] \code{is.vector} & answers the question, is this a vector \code{TRUE} or \code{FALSE} \\ \code{as.vector} & attempts to coerce object into a vector \\ \code{read.table} & reads data from a text file \\ \code{read.csv} & reads data from a text file with comma separated data \\ \code{write.table} & writes a data frame to a text file \\ \code{is.data.frame} & tests object to see if it is data frame \\ \code{as.data.frame} & coerces object into data frame \\ \code{is.factor} & tests object to see if it is a factor \\ \code{as.factor} & coerces object into a factor \\ %\code{attach} & reference variables in a data frame without having to use \\ % & data frame column headers name (undo with \code{detach})\\ \code{head, tail} & list the first, last six rows \\ \code{names} & returns names of elements of object \\ \code{colnames} & returns or sets column names of object \\ \code{rownames} & returns or sets row names of object \\ \code{subset} & select part of a vector, matrix, or data frame \\ \code{merge} & merge two data frames \\ \code{lm} & multiple linear regression \\ \code{glm} & generalized linear regression \\ \code{anova} & analysis of variance \\ \code{chisq.test} & Pearson's Chi-squared test for count data \\ \code{summary} & shows results of various model fitting functions \\ \code{predict} & predicted results from model \\ \code{hist} & histogram \\ \code{boxplot} & box plot \\ \code{plot} & scatterplot \\ \code{lines} & connects points sequentially with lines (added to a plot) \\ \code{segments} & add lines to a plot (between pairs of points) \\ \code{text} & add text to a plot \\ \code{legend} & add a legend to a plot \\ \code{abline} & add a line to a plot by specifying its slope and intercept \\ & passing an lm object will result in adding the predicted line to the plot \\ \code{x11} & open another graphics window (\pkg{PC}) \\ \code{pdf} & open a pdf file for recording graphics \\ \code{dev.off} & close graphics device \\ \code{par(mfrow)} & arranges multiple plots on same page (by row) \\ \code{sample} & produces a random sample of the specified values \\ \code{set.seed} & sets seed for next random sample (repeat random sample)\\ \code{rnorm} & produces a random sample from a normal distribution \\ \code{qnorm} & quantiles (percentiles) of normal distribution \\ \code{pnorm} & CDF of normal distribution \\ \code{dnorm} & PDF of normal distribution \\ \code{rbinom} & produces a random sample from a binomial distribution \\[10pt] \end{tabular} \end{table} \end{textbox} %%%%%%%%% %%%%%%%%% <>= dev.off() @ %%%%%%%%% \end{document} %%%%%%%%%