Exercise #1 - RHints - Getting Started

2005-10-07


Here are some suggestions that will get you started on Exercise #1 with R.

To launch R in the Student Technolgy Centre, double-click on the shortcut to R in the course folder. It will open in C:\Temp. You can change the working directory with "Change dir" under the File menu, but the default will be the best choice for now. If your lab computer crashes, files in C:\Temp will be restored on reboot but anything you saved elsewhere will be lost.

When you are working at home, you may want to set up a different folder for each project you do in R, in which case you could make a copy of the shortcut to R on the desktop and set its properties so that when you double-click on the shortcut, R opens in that folder.

Get the gasoline prices data page from the course web site, and save it as a text file called prices.txt in C:\Temp.

Open prices.txt with NotePad and edit it so that the column names are in the top line, the data start on the second line, and there is nothing else in the file. Make sure that the bottom line is the last line of data, with no extraneous blanks or carriage returns at the end of the file. I also recommend changing any upper case letters in the column names to lower case, to make them easier to type later. Save prices.txt.

Since the data file is in standard format (rectangular array, rows are cases, columns are variables), you can use read.table() make it into a "data frame" in R. A data frame is an R object made up of columns of equal length and each column is a sub-object that can be referenced by name. Call this data frame prices.

My R session is shown below.

Note that the file name is in quotes in read.table() because it isn't an R object. Because the file is in the current working directory, we don't need to give the full path. Because the first row gives the column names, we specify "header=T".

To see what is in an object, type the name of the object. Note the use of subscripts to give only the first 10 rows. The notation 1:10 generates an array of integers from 1 to 10. A negative subscript means to exclude that element while including all others.

The rest of the commands carry out some of the required graphs.

Note that type = "l" in the plot() command is an "l" for "lines", not the number 1. Other choices are "p" for "points", "b" for "both lines and points", "h" for "height" or a vertical line, etc.

The lag -1 plot can be made using the general-purpose plot() command, or with the specialized function lag.plot() if the library of time series functions is loaded.

I also show how split() is used to make comparative box plots. Note the function c() used to combine elements into an array.

These commands give the default graphs. They can be enhanced with graphics parameters. To learn about graphics parameters, type ?par to get help for function par().

> prices <- read.table("prices.txt", header=T)
> prices[1:10,]
         date day sunoco petro.can
1  1993-11-16   T   46.8      46.8
2  1993-11-17   W   46.8      46.8
3  1993-11-18   R   46.8      46.8
4  1993-11-19   F   52.5      52.5
5  1993-11-20   S   52.5      52.5
6  1993-11-21  Su   52.5      52.5
7  1993-11-22   M   52.5      52.5
8  1993-11-23   T   51.9      51.9
9  1993-11-24   W   51.9      51.9
10 1993-11-25   R   49.8      49.8
> plot(prices$petro.can, prices$sunoco)
> plot(prices$sunoco)
> plot(prices$sunoco,type="l")
> plot(prices$sunoco[-430], prices$sunoco[-1])
> library(ts)
> lag.plot(prices$sunoco)
> lag.plot(prices$sunoco, lag = -1)
> split(prices$sunoco, prices$day)
$F
 [1] 52.5 49.7 46.7 45.7 45.2 45.7 43.9 43.9 48.5 44.9 45.5 49.5 46.9 52.5 46.9
[16] 52.5 48.5 45.5 52.5 51.5 48.9 46.3 53.5 49.9 53.5 49.9 48.4 52.5 49.9 48.5
[31] 48.8 55.9 53.6 56.7 53.4 53.7 52.5 55.7 54.7 50.8 53.5 50.5 49.5 49.9 48.4
[46] 47.4 45.4 44.9 51.9 47.7 53.7 49.9 46.9 51.5 47.9 48.9 47.6 51.2 49.8 48.7
[61] 47.8
 
$M
 [1] 52.5 49.7 46.7 45.7 49.1 45.7 43.9 49.0 47.5 49.5 45.5 49.5 46.5 49.9 52.5
[16] 52.5 46.5 52.5 49.9 51.5 48.5 46.3 52.8 49.9 52.5 48.8 53.5 49.9 49.1 54.5
[31] 54.5 54.5 51.6 55.5 55.5 53.7 51.7 54.9 52.1 56.5 50.5 49.9 53.5 49.5 51.7
[46] 46.5 44.9 49.5 50.1 47.7 51.9 47.9 53.2 48.7 47.7 48.7 46.9 51.2 49.7 48.7
[61] 52.5
 
$R
 [1] 46.8 49.8 46.7 45.9 45.3 48.7 43.8 43.9 48.5 44.9 46.5 49.5 48.9 52.5 46.9
[16] 46.2 48.9 45.5 52.5 48.5 49.5 46.3 46.3 51.5 53.9 49.9 48.4 52.5 49.9 48.5
[31] 48.9 55.9 53.7 51.6 53.4 54.7 52.5 55.7 54.7 50.8 53.5 50.5 49.8 50.7 48.5
[46] 47.6 45.6 44.9 48.5 49.7 53.9 50.9 46.9 51.9 47.9 49.2 47.6 46.9 49.8 48.7
[61] 47.8 51.5
 
$S
 [1] 52.5 49.7 46.7 45.7 49.5 45.7 43.9 49.5 47.5 49.5 45.5 49.5 46.5 52.5 46.9
[16] 52.5 46.5 52.9 49.9 51.5 48.9 46.3 53.5 49.9 52.5 48.9 53.5 49.9 49.9 54.9
[31] 54.9 54.9 53.5 56.5 52.6 53.5 52.5 55.5 52.5 56.5 53.5 49.9 53.9 49.7 52.3
[46] 46.9 45.3 49.5 50.7 47.7 52.6 48.3 53.5 48.9 47.9 48.7 47.6 51.2 49.8 48.7
[61] 52.5
 
$Su
 [1] 52.5 49.7 46.7 45.7 49.1 45.7 43.9 49.0 47.5 49.5 45.5 49.5 46.5 49.9 52.5
[16] 52.5 46.5 52.5 49.9 51.5 48.9 46.3 52.8 49.9 52.5 48.8 53.5 49.9 49.1 54.5
[31] 54.9 54.9 52.7 55.6 56.5 54.7 51.7 55.5 52.5 56.5 50.5 49.9 53.9 49.6 52.1
[46] 46.9 44.9 49.5 50.1 47.7 52.6 48.3 53.2 48.7 47.9 48.7 46.9 51.2 49.7 48.7
[61] 52.5
 
$T
 [1] 46.8 51.9 48.1 45.9 45.7 48.9 45.4 43.9 49.0 46.9 49.5 45.5 48.9 46.5 49.9
[16] 52.5 51.9 45.5 52.5 49.9 51.5 48.5 46.3 51.5 49.9 50.9 48.8 53.5 49.9 49.1
[31] 48.9 49.7 54.5 51.6 55.5 55.5 52.9 51.6 54.7 51.9 56.5 50.5 49.9 52.7 49.3
[46] 47.9 46.4 44.9 49.4 49.9 47.3 51.6 46.9 52.7 48.7 47.6 48.7 46.9 50.9 49.1
[61] 48.7 52.5
 
$W
 [1] 46.8 51.9 46.7 45.9 45.7 48.7 45.4 43.9 48.5 46.5 46.7 49.7 48.9 46.5 48.9
[16] 46.3 50.8 45.5 52.5 48.9 49.9 46.3 46.3 51.5 48.9 50.5 48.8 53.5 49.9 48.6
[31] 48.9 49.6 54.5 51.6 54.3 55.5 52.5 51.5 54.7 51.9 56.5 50.5 49.8 52.5 48.9
[46] 47.7 45.9 44.9 49.4 49.7 47.3 51.6 46.9 52.3 47.9 49.5 48.7 46.9 50.5 48.9
[61] 47.9 51.5
 
> boxplot(split(prices$sunoco, prices$day))
> boxplot(split(prices$sunoco, prices$day)[c("Su","M","T","W","R","F","S")])
> prices$diff <- c(prices$sunoco[-1]-prices$sunoco[-430], NA)
> weekdays <- c("Su","M","T","W","R","F","S")
> boxplot(split(prices$diff, prices$day)[weekdays])
> hist(prices$diff)
> hist(prices$diff[prices$diff != 0])
>

At any time during the session, you can save the current Workspace; you can use the default file name .RData and save it in the current working directory. You can also save the History file .Rhistory. If these files are in D:\Temp when R is opened later, then you will begin the session with the same workspace and history you had when they were last saved. (You may need to "Load History" on some systems, or "Load Workspace" if default file names were not used.)


Statistics 2MA3 Statistics 3N03