1998-01-23 (Week 3)

What to Prepare for the Next Class...

Most of you still need to get caught up.

Learn the difference between commands in UNIX and functions in S. What does it mean when we say that S is "object-oriented"?

When we meet next Friday, I will expect you to have looked at the "Effects of Effluent on the Kapuskasing River" data and we will discuss what we have learned.

I will also expect you to have set up the data files for the other SSC Case Study, "Habitat Availability in Dutch Creek," as data frames in S. You may find that this process is messier than you would like.

I plan to talk about model fitting, beginning with Chapter 10 in Spector. Have a look at the following functions: lm(), glm(), anova() and update() and see how to specify a model. Note that lm() doesn't work on hydra and glm() isn't available on hydra, so you will have to use data.

Try some simple examples of model fitting. In particular, try the two examples on the S2MA3 web site: simple linear regression with repeated x-values and two-factor ANOVA with replication. These examples are worked out in detail there. Convince yourself that it was easier to learn S and do the calculations in S than it would have been to do the calculations "by hand."

I think I have taken the analysis of the Kapuskasing River data as far as it can go. I created a new column (lingrow) with the growth increments (ingrow) transformed to logs, then another new column (lingrow1) with the outlier removed.I made a new data frame (operc1) with rows removed if lingrow1 == NA, since the model fitting functions won't work with missing data. I had to use glm() because the design is unbalanced. My results are as follows.

> operc[1:10,]
    date   site fish flen fage yrclass inage ingrow ygrow  lingrow lingrow1
 1 sep94 downff  503 39.7    5      89     1     NA    89       NA       NA
 2 sep94 downff  503 39.7    5      89     2     NA    90       NA       NA
 3 sep94 downff  503 39.7    5      89     3 6.8811    91 1.928779 1.928779
 4 sep94 downff  503 39.7    5      89     4 4.8336    92 1.575592 1.575592
 5 sep94 downff  503 39.7    5      89     5 3.2650    93 1.183260 1.183260
 6 sep94 downff  505 46.2   10      84     1     NA    84       NA       NA
 7 sep94 downff  505 46.2   10      84     2     NA    85       NA       NA
 8 sep94 downff  505 46.2   10      84     3 3.1331    86 1.142023 1.142023
 9 sep94 downff  505 46.2   10      84     4 5.4613    87 1.697687 1.697687
10 sep94 downff  505 46.2   10      84     5 5.9248    88 1.779147 1.779147
> operc1<-operc[!is.na(lingrow1),]
> operc1[1:10,]
    date   site fish flen fage yrclass inage ingrow ygrow   lingrow  lingrow1
 3 sep94 downff  503 39.7    5      89     3 6.8811    91 1.9287785 1.9287785
 4 sep94 downff  503 39.7    5      89     4 4.8336    92 1.5755915 1.5755915
 5 sep94 downff  503 39.7    5      89     5 3.2650    93 1.1832598 1.1832598
 8 sep94 downff  505 46.2   10      84     3 3.1331    86 1.1420229 1.1420229
 9 sep94 downff  505 46.2   10      84     4 5.4613    87 1.6976869 1.6976869
10 sep94 downff  505 46.2   10      84     5 5.9248    88 1.7791469 1.7791469
11 sep94 downff  505 46.2   10      84     6 4.5335    89 1.5114943 1.5114943
12 sep94 downff  505 46.2   10      84     7 1.1277    90 0.1201802 0.1201802
13 sep94 downff  505 46.2   10      84     8 1.5824    91 0.4589427 0.4589427
14 sep94 downff  505 46.2   10      84     9 1.2797    92 0.2466257 0.2466257
> operc.fit<-glm(lingrow1~inage+ygrow+site,data=operc1)
> anova(operc.fit,test="F")
Analysis of Deviance Table
 
Gaussian model
 
Response: lingrow1
 
Terms added sequentially (first to last)
      Df Deviance Resid. Df Resid. Dev  F Value     Pr(F)
 NULL                  1636   1027.621
inage 17 781.5660      1619    246.055 307.1458 0.0000000
ygrow 18   3.7770      1601    242.278   1.4018 0.1205401
 site  2   2.9352      1599    239.343   9.8047 0.0000586
> operc.fit.1<-update(operc.fit,~.+inage:site)
> anova(operc.fit.1,test="F")
Analysis of Deviance Table
 
Gaussian model
 
Response: lingrow1
 
Terms added sequentially (first to last)
           Df Deviance Resid. Df Resid. Dev  F Value     Pr(F)
      NULL                  1636   1027.621
     inage 17 781.5660      1619    246.055 310.5557 0.0000000
     ygrow 18   3.7770      1601    242.278   1.4174 0.1133496
      site  2   2.9352      1599    239.343   9.9135 0.0000527
inage:site 32   7.3652      1567    231.978   1.5547 0.0251957
>

Using glm() to analyse an unbalanced design, it makes a difference what order the variables are added to the model. Try adding the terms in a different order and see how different the ANOVA is.


Back to S4P03