Statistics 4P03/6P03

2001-01-24


A First Look at Autocorrelation

Most of the statistical methods you have learned so far assume that the observations in a sample are independent, so it is important to be able to tell when that assumption isn't satisfied.

The most common form of dependence comes when observations are made sequentially and observations close together in the sequence are related. For example, if the exchange rate of the Canadian dollar is observed daily and it is trading at an above average rate today, it will probably also be above average tomorrow: this is called a positive lag-1 autocorrelation, where "lag 1" refers to the one-day time step.

A good summary of time-series terms and methods is found in Appendix A.14 of Chatfield, Problem Solving.

Here is a simple exercise to show how autocorrelation may be detected. Begin by generating a sample of 200 independent standard normal observations.

> normind <- rnorm(200)

To generate 200 autocorrelated observations, try a simple autoregression model, that is, make each observation a linear combination of the previous observation and a new independent standard normal error.

Xi = aXi-1 + b ei

Find the mean and variance of Xi . Show that the distribution of Xi will converge to a stationary distribution if a < 1, and the process will be stationary from the beginning if X0 ~ N(0, s02 ) and b2 = s02(1 - a2). Find Cor(Xi, Xi+k), k = 1, 2, 3,..., for the stationary process.

Setting a = b = 1/sqrt(2) gives stationarity with unit variance. The Splus code to generate observations from this series is given below. Note the use of rep() to initialize the vector of observations, and the use of a for() loop to fill in the values after the first one.

> normdep <- rep(0,200)
> normdep[1] <- rnorm(1)
> for(i in 2:200) normdep[i] <- (normdep[i-1]/sqrt(2))+(rnorm(1)/sqrt(2))

Try the following displays and statistics with the independent and dependent samples: observations, histogram, mean, variance, sequence plot, lag-1 scatterplot, lag-1 autocorrelation, autocorrelation function, spectrum. Which displays and which statistics reveal the autocorrelation? Which ones do not?

> normind
> normdep
> hist(normind)
> hist(normdep)
> mean(normind)
> mean(normdep)
> var(normind)
> var(normdep)
> plot(normind,type="l")
> plot(normdep,type="l")
> plot(normind[-200],normind[-1])
> plot(normdep[-200],normdep[-1])
> cor(normind[-200],normind[-1])
> cor(normdep[-200],normdep[-1])
> acf(normind)
> acf(normdep)
> spectrum(normind)
> spectrum(normdep)

Statistics 4P03/6P03