Exercise #2

Statistics 2MA3 - Exercise #2

2002-01-20

Organizing your work in R

I think the best way to manage the .Rdata and .Rhistory files is to set up a different folder for each project or assignment. Make a copy of the shortcut to the R application and set the properties of the shortcut (right-click on the shortcut icon) so it starts in the project folder you made. .RData and .Rhistory will be stored in that folder; copy both to a floppy if you want to take your work to a new computer at the end of a session. If you are working in the BSB lab remember that you can't write to Drive K, so the best place to create the folder is in D:\Temp.

Text Plots

Use the data frame mydata you set up when you were learning R.

Add a column to the data frame, giving the names of the subjects, "Joe","Bill","Sam","Beth","Sue".
Fit a straight line through the plot of y against x1, that is, compute the simple linear regression of y on x1.
Plot y against x1, but hide the points.
Use text() to place the subjects' names in place of the points on the graph.
Add a title to the graph.
Add the fitted line to the graph.

> mydata$name <- c("Joe","Bill","Sam","Beth","Sue")
> mydata
    y  x1 x2 name
1 1.2 1.5  1  Joe
2 3.6 2.5  1 Bill
3 5.1 6.0  1  Sam
4 4.2 3.1  2 Beth
5 2.1 2.2  2  Sue
 
> lmfit <- lm(y~x1, data=mydata)
> lmfit
 
Call:
lm(formula = y ~ x1, data = mydata)
 
Coefficients:
(Intercept)           x1
     0.8519       0.7804
 
> plot(mydata$x1, mydata$y, xlab="x1", ylab="y", type="n")
> text(mydata$x1, mydata$y, mydata$name)
> title("An X-Y Text Plot")
> abline(lmfit)

Plotting Distributions in Excel

Plot the Binomial distribution by setting up a spreadsheet with consecutive values of x in the first column, f(x) in the second column, and values for n and p in nearby cells. The graph should automatically redraw if n or p changes. Repeat for the Poisson distribution.

If you're not sure what I'm asking for here, click here to see the Excel workbook distributions.xls.

Looking at Random Data

Generate 20 observations from a standard normal distribution and draw a graph showing a histogram (as relative frequencies), a smoothed density estimate, a dot plot, and the true standard normal probability density function.

Repeat this a few times with n = 20, then a few times with n = 40, a few times with n = 100 , a few times with n = 1000, and a few times with n = 10000. How many observations do you need before you can say with any certainty whether or not a given sample came from a Normal distribution?

If you have time, do this again for a skewed distribution, such as the chi-square distribution on 1 or 3 degrees of freedom.

Since creating the graph involves several steps, you might want to write a function normdat(n) so you don't have to type in all the steps every time. The easiest way to write this function is to type fix(normdat); this will open a text editor, where you can write the function, then save it as you exit the editor. The same command fix(normdat) is also a convenient way to edit or modify an existing function.

> normdat
function(n = 50)
{
        xdat <- rnorm(n)
        hist(xdat, prob = T)
        lines(density(xdat))
        points(xdat, rep(0, n))
        xgr <- seq(-4, 4, length = 100)
        lines(xgr, dnorm(xgr), lty = 2)
}

Once normdat() is written, you just have to type normdat(20) a few times, normdat(40) a few times, etc., to complete the exercise.