Getting Started in Splus and R

2002-09-19


The Basics

Spend some time with the On-line Splus Tutorial from Dalhousie University. If you are going to be using the UNIX versions of Splus, read the relevant sections of Running Splus at McMaster.

Setting your Working Directories in Windows

If you are using R in the BSB Windows labs, copy the shortcut to R from the course folder to the desktop. Edit the Properties of the new shortcut, so that R will "Start in" a folder of your choosing, such as "D:\Temp\myRproject1". Save the workspace image and session history to this folder. After your session, copy the whole folder myRproject1 to a floppy disk or by FTP to a remote server, or e-mail it to yourself. To begin a new session where you left off, copy myRproject1 from the floppy disk or remote server to the hard drive of the machine you are on.

The same procedure will work for Splus 2000, but by default Splus will start in D:\Temp\sp2000\userid so you will not usually have to set the "Start in" folder to anything else.

Random Data, Histograms, and Density Functions

Plot a line graph of the standard normal probability density function and add a title to the graph.

> xs <- seq(-4, 4, length = 100)
> plot(xs, dnorm(xs), type = "l", main = "Standard Normal pdf")

Generate 100 observations from a standard normal distribution, compute the mean and standard deviation, and plot a histogram.

> xx <- rnorm(100)
> mean(xx)
[1] 0.1211835
> sqrt(var(xx))
[1] 1.181486
> hist(xx)

Plot a histogram with break points from -4 to +4 in steps of 0.5.

> xb <- seq(-4, 4, by = 0.5)
> hist(xx, breaks = xb)

Add a standard normal density function to this plot, scaled to match the area under the histogram.

> lines(xs, 100*0.5*dnorm(xs))

Note that plot() initiates a new plot but lines() adds lines to an existing plot. Explain the scaling factor.

Same thing, but scale the histogram to be a density so the normal pdf doesn't have to be rescaled.

> hist(xx, breaks = xb, prob = T)
> lines(xs, dnorm(xs))

Repeat for a sample of 1000 standard normal observations.

Generate 100 observations from a gamma distribution with shape parameter 0.5; compute the sample mean and standard deviation and plot an appropriately scaled histogram with a gamma density function superimposed. Repeat for 1000 observations.


Exploratory Data Analysis of Forestry Remote Sensing

Each observation is a stand of trees observed by satellite and on the ground.

  ht    Mean height
  dbh   Mean diameter (at breast height)
  bnd5  Satellite Band 5
  cci   Crown cover index (fraction of ground covered by the canopy)
  type  "Damaged" (road construction, wind damage, etc.) or "Undamaged".

Get the data and save it in a file called trees.txt. This is the 1995 data extracted from a larger Case Study. Click here to see the Case Study. Use

> trees <- read.table("trees.txt",header=T)

to read data from the file into a data frame called trees. Attach the data frame to your session. Confirm the names of the 5 variables. Look at the first 6 rows to see what is there. Draw a scatterplot matrix for all variables, then for the 4 quantitative variables, for all stands of trees. Draw a scatterplot matrix for the 4 quantitative variables for the undamaged stands alone, and for the damaged stands alone.

> attach(trees)
> dimnames(trees)[[2]]
[1] "ht"   "dbh"  "bnd5" "cci"  "type"
 
> trees[1:6,]
    ht dbh bnd5        cci      type
1 1.00  NA  102 0.04146902 undamaged
2 1.21  NA   99 0.05835823 undamaged
3 1.21  NA   99 0.06029240 undamaged
4 1.18  NA  102 0.06414085 undamaged
5 1.03  NA  103 0.04427970 undamaged
6 1.06  NA  104 0.05017752 undamaged
 
> pairs(trees)
> pairs(trees[, 1:4])
> pairs(trees[type == "undamaged", 1:4])
> pairs(trees[type == "damaged", 1:4])
 

There are some missing values in dbh. Does pairs() eliminate missing vales pairwise or listwise? One way to find out would be to do a pairs plot of the first 71 rows.

Draw a box plot to compare crown cover index in the damaged and undamaged stands.

Because we have attached the data frame, we can refer directly to the variables within it, i.e. we can write cci instead of trees$cci.

> boxplot(split(cci, type))

What have you learned about the trees? Can the satellite readings predict crown cover? Discuss any special features you see in the data.


Spin Plots

Try brush and spin plots in Splus for the trees data, spinning the variables 3 at a time.

The best way to move objects between R and Splus is to use dput() in one environment to create a text file with a portable version of the object, then dget() to import the file to the workspace of the other.

For example, if you have created the object trees in R, write it to a file called trees.dat in your R working directory. Be careful not to conflict with existing file names.

> dput(trees, "trees.dat")

Move the file trees.dat to your Splus working directory. Now you can create a new object trees in Splus with

> trees <- dget("trees.dat")

You don't have to move the file if you can remember the path. Remember that in R and Splus you can specify a path either with forward slashes or double back-slashes; the DOS convention of single back-slashes is not permitted. Either of the following will work if the file trees.dat is in D:\Temp.

> trees <- dget("D:/Temp/trees.dat")
> trees <- dget("D:\\Temp\\trees.dat")

Stars and Faces

Try the data in Table 1.2.3 of Srivastava. The data are available online.


Writing a Function

Use fix(by2) to write a function by2() which multiplies a given value by 2. Note that the value returned by a function is the last value computed (unlike other programming languages which require an assignment statement) and the default value of the argument is 1.

> by2
function(x = 1)
{
        2 * x
}
 
> by2()
[1] 2
> by2(99)
[1] 198

Use fix(norm3d) to write a function to prepare a grid, compute the standard bivariate normal probability density function over the grid, and display it as either a perspective plot or a contour plot.

Note that the scaling factor scl could be omitted in this calculation as it does not affect the shape of the distribution. Look at the first few rows of xynor; what did expand.grid do? What is the purpose of invisible() on the last line?

> norm3d
function(view = "p")
{
        xnor <- seq(-3., 3., by = 0.1)
        ynor <- xnor
        xynor <- expand.grid(list(xnor, ynor))
        scl <- 1/(2 * pi)
        znor <- matrix(scl * exp(-0.5 * (xynor[, 1.]^2. + xynor[, 2.]^2.)),
                nrow = length(xnor))
        if(view == "p")
                persp(xnor, ynor, znor)
        if(view == "c")
                contour(xnor, ynor, znor)
        invisible()
}	

There is more than one way to make this plot. Try the different functions described in Plotting the Bivariate Standard Normal Density in Splus or R.


Plotting an Ellipse

Try the example discussed in class, Drawing an ellipse in Splus or R.


Statistics 4M03/6M03
Last modified 2002-09-19 13:50