1998-01-30 (Week 4)


Assignment #1

Due: 1998-02-13

Choose one of the Case Studies prepared for the 1998 Annual Meeting of the Statistical Society of Canada, "Effects of effluent on the Kapuskasing River" or "Habitat availability for cutthroat trout in Dutch Creek, Alberta". Using S, apply EDA and submit a report showing what you have learned about the data. Be creative. Your report should include a variety of graphs.


What to Prepare for the Next Class...

Our discussion of the fish growth data raised several issues which I have investigated further.

I was interested to know how many observations there were at each "inage" and each "ygrow".

> attach(operc1)
> table(inage,ygrow)
   76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 1  0  1  0  1  0  2  2  3  5  8  7  9 13 21  7  4  1  0  0  0
 2  0  0  1  0  1  0  1  3  3  5  8  6  8 12 20  7  4  1  0  0
 3  0  0  1  2  0  3  2  5  9  9 15 19 17 25 22 43 22  4  1  0
 4  0  0  0  1  2  0  5  2  7  9  9 16 24 19 30 25 46 23  4  1
 5  0  0  0  0  1  2  0  5  2  7  9  9 16 24 20 30 26 46 23  3
 6  0  0  0  0  0  1  2  0  5  2  7  9  9 16 24 20 30 26 38 15
 7  0  0  0  0  0  0  1  2  0  5  2  7  9  9 16 24 20 29 22 22
 8  0  0  0  0  0  0  0  1  2  0  5  2  7  9  9 16 24 20 22 14
 9  0  0  0  0  0  0  0  0  1  2  0  5  2  7  9  9 16 24 14 14
10  0  0  0  0  0  0  0  0  0  1  2  0  5  2  7  9  9 16 17  7
11  0  0  0  0  0  0  0  0  0  0  1  2  0  5  2  7  9  9  9  8
12  0  0  0  0  0  0  0  0  0  0  0  1  2  0  5  1  7  9  2  3
13  0  0  0  0  0  0  0  0  0  0  0  0  1  2  0  5  2  7  7  0
14  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2  0  5  2  4  2
15  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2  0  5  1  1
16  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2  0  4  0
17  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2  0  2
18  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2  0
>

Here is the ANOVA table from last week's notes, first as given by Splus, then expressed in a format you may be more familiar with.

Splus ANOVA Table:

> anova(operc.fit.1,test="F")
Analysis of Deviance Table
 
Gaussian model
 
Response: lingrow1
 
Terms added sequentially (first to last)
           Df Deviance Resid. Df Resid. Dev  F Value     Pr(F)
      NULL                  1636   1027.621
     inage 17 781.5660      1619    246.055 310.5557 0.0000000
     ygrow 18   3.7770      1601    242.278   1.4174 0.1133496
      site  2   2.9352      1599    239.343   9.9135 0.0000527
inage:site 32   7.3652      1567    231.978   1.5547 0.0251957
>

Reorganized ANOVA table:

        SV   DF         SS        MS         F          P
 
     inage   17   781.5660   45.9745  310.5557  0.0000000
     ygrow   18     3.7770    0.2098    1.4174  0.1133496
      site    2     2.9352    1.4676    9.9135  0.0000527
inage:site   32     7.3652    0.2302    1.5547  0.0251957
  residual 1567   231.978     0.1480
 
     TOTAL 1636  1027.621

Conclusions:

There is evidence that the mean log growth increments vary with site and with increment age, and that the effect of increment age is different at different sites.

The following notched box plot shows how log growth increment varies with increment age, all sites combined. What do the notches show? Can you plot this separately for each site?

> attach(operc1)
> boxplot(split(lingrow1, inage), notch = T)

The graph below (created as a "Category Plot" in Systat) plots the mean log growth increment against increment age, separately for each of the three sites. Most of the difference between sites seems to be at the older ages. Is there an anomaly at increment age 1? Is there something odd happening at increment age 14, or can the kinks be explained by small sample sizes?

Here is the same graph created in Splus. It needs a legend. The lines all appear solid here but may be differentiated by dots and dashes on other graphics devices. In the code below, split produces a list of vectors and sapply applies the mean function to each one to return a vector of means, one for each of the 18 increment ages.

> attach(operc1)
> meanff <- sapply(split(lingrow1[site=="downff"],inage[site=="downff"]),mean)
> meantb <- sapply(split(lingrow1[site=="downtb"],inage[site=="downtb"]),mean)
> meanup <- sapply(split(lingrow1[site=="up2"],inage[site=="up2"]),mean)
> matplot(c(1:18), matrix(c(meanff, meantb, meanup), ncol = 3), type = "l",
xlab  = "increment age", ylab = "log growth increment")

We might also consider how precisely the lines in the plot are estimated. For simplicity, I combined all three sites, so the graph below is a lot like the notched box plot above. I computed a 95% confidence interval for each mean, and connected all the lower bounds, means, and upper bounds by lines.

> attach(operc1)
> meanall <- sapply(split(lingrow1, inage), mean)
> varall <- sapply(split(lingrow1, inage), var)
> nall <- sapply(split(lingrow1, inage), length)
> meanall.lb <- meanall - qnorm(0.975) * sqrt(varall/nall)
> meanall.ub <- meanall + qnorm(0.975) * sqrt(varall/nall)
> matplot(c(1:18),matrix(c(meanall.lb,meanall,meanall.ub), ncol=3),type="l")


Back to S4P03