Binomial GLM in Splus

1999-03-23


Because this example is a designed experiment and both independent variables are factors, there is a binomial n > 1 for each design point. An observational regression experiment with a linear independent variable might have no two observations with the same x-value.

> remission<-data.frame(rem=c(18,27,14,25),sex=factor(c(1,1,2,2)),treat=factor(c
(1,2,1,2)),n=c(30,30,30,30))
> remission
  rem sex treat  n
1  18   1     1 30
2  27   1     2 30
3  14   2     1 30
4  25   2     2 30

Dependent Variable: Bound columns of "Success" and "Failure" counts

> fit1<-glm(cbind(rem,n-rem)~sex*treat,family = binomial(link = logit),
+ data=remission,subset=n>0)
> summary(fit1)
 
Call: glm(formula = cbind(rem, n - rem) ~ sex * treat, family = binomial(link =
logit), data = remission, subset = n > 0)
 
Coefficients:
                  Value Std. Error     t value
(Intercept)  1.01964905  0.2349443  4.33996088
        sex -0.28169579  0.2349443 -1.19898970
      treat  0.88368219  0.2349443  3.76124133
  sex:treat -0.01219754  0.2349443 -0.05191674
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 18.23268 on 3 degrees of freedom
 
Residual Deviance:  0 on 0 degrees of freedom
 
Number of Fisher Scoring Iterations: 3
 
Correlation of Coefficients:
          (Intercept)        sex      treat
      sex -0.1532238
    treat  0.3821937  -0.1419910
sex:treat -0.1419910   0.3821937 -0.1532238
> anova(fit1,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: cbind(rem, n - rem)
 
Terms added sequentially (first to last)
          Df Deviance Resid. Df Resid. Dev   Pr(Chi)
     NULL                     3   18.23268
      sex  1  1.43362         2   16.79906 0.2311748
    treat  1 16.79637         1    0.00270 0.0000416
sex:treat  1  0.00270         0    0.00000 0.9585671
> fit2<-update(fit1,.~.-sex:treat)
> summary(fit2)
 
Call: glm(formula = cbind(rem, n - rem) ~ sex + treat, family = binomial(link =
logit), data = remission, subset = n > 0)
Deviance Residuals:
 
 -0.02064035 0.03350603 0.02026137 -0.02719909
 
Coefficients:
                 Value Std. Error   t value
(Intercept)  1.0179709  0.2323865  4.380508
        sex -0.2770531  0.2169754 -1.276887
      treat  0.8818647  0.2320043  3.801070
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 18.23268 on 3 degrees of freedom
 
Residual Deviance: 0.002699 on 1 degrees of freedom
 
Number of Fisher Scoring Iterations: 3
 
Correlation of Coefficients:
      (Intercept)        sex
  sex -0.1028273
treat  0.3668905  -0.0855042
> anova(fit2,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: cbind(rem, n - rem)
 
Terms added sequentially (first to last)
      Df Deviance Resid. Df Resid. Dev   Pr(Chi)
 NULL                     3   18.23268
  sex  1  1.43362         2   16.79906 0.2311748
treat  1 16.79637         1    0.00270 0.0000416

Dependent Variable: Proportion of "Success" weighted by binomial n

This analysis corresponds exactly to the derivation in McCullagh & Nelder, where the mean is the binomial proportion.

> fit3<-glm(rem/n~sex*treat,family=binomial(link=logit), data = remission, subse
t = n > 0, weight=n)
> summary(fit3)
 
Call: glm(formula = rem/n ~ sex * treat, family = binomial(link = logit), data =
 
        remission, weights = n, subset = n > 0)
 
Coefficients:
                  Value Std. Error     t value
(Intercept)  1.01964905  0.2349448  4.33995087
        sex -0.28169579  0.2349448 -1.19898694
      treat  0.88368219  0.2349448  3.76123265
  sex:treat -0.01219754  0.2349448 -0.05191662
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 18.23268 on 3 degrees of freedom
 
Residual Deviance:  0 on 0 degrees of freedom
 
Number of Fisher Scoring Iterations: 5
 
Correlation of Coefficients:
          (Intercept)        sex      treat
      sex -0.1532273
    treat  0.3821965  -0.1419945
sex:treat -0.1419945   0.3821965 -0.1532273
> anova(fit3,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: rem/n
 
Terms added sequentially (first to last)
          Df Deviance Resid. Df Resid. Dev   Pr(Chi)
     NULL                     3   18.23268
      sex  1  1.43362         2   16.79906 0.2311748
    treat  1 16.79637         1    0.00270 0.0000416
sex:treat  1  0.00270         0    0.00000 0.9585671
> fit4<-update(fit3,.~.-sex:treat)
> summary(fit4)
 
Call: glm(formula = rem/n ~ sex + treat, family = binomial(link = logit), data =
 remission, weights = n, subset = n > 0)
Deviance Residuals:
           1          2          3           4
 -0.02064035 0.03350603 0.02026137 -0.02719909
 
Coefficients:
                 Value Std. Error   t value
(Intercept)  1.0179709  0.2323868  4.380503
        sex -0.2770531  0.2169756 -1.276886
      treat  0.8818647  0.2320046  3.801066
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 18.23268 on 3 degrees of freedom
 
Residual Deviance: 0.002699 on 1 degrees of freedom
 
Number of Fisher Scoring Iterations: 5
 
Correlation of Coefficients:
      (Intercept)        sex
  sex -0.1028275
treat  0.3668920  -0.0855044
> anova(fit4,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: rem/n
 
Terms added sequentially (first to last)
      Df Deviance Resid. Df Resid. Dev   Pr(Chi)
 NULL                     3   18.23268
  sex  1  1.43362         2   16.79906 0.2311748
treat  1 16.79637         1    0.00270 0.0000416

Expand the data frame so that there is one row per subject, scoring 1 for "success" and 0 for "failure"

> remission.expand <- data.frame(rem = c(rep(1, sum(rem)), rep(0, sum(n - rem))),
        sex = c(rep(sex, rem), rep(sex, n - rem)), treat = c(rep(treat, rem),
        rep(treat, n - rem)))
> remission.expand
   rem sex treat
 1   1   1     1
 2   1   1     1
 3   1   1     1
 4   1   1     1
 5   1   1     1
 6   1   1     1
 7   1   1     1
 8   1   1     1
 9   1   1     1
10   1   1     1
11   1   1     1
12   1   1     1
13   1   1     1
14   1   1     1
15   1   1     1
16   1   1     1
17   1   1     1
18   1   1     1
19   1   1     2
20   1   1     2
21   1   1     2
22   1   1     2
23   1   1     2
24   1   1     2
25   1   1     2
26   1   1     2
27   1   1     2
28   1   1     2
29   1   1     2
30   1   1     2
31   1   1     2
32   1   1     2
33   1   1     2
34   1   1     2
35   1   1     2
36   1   1     2
37   1   1     2
38   1   1     2
39   1   1     2
40   1   1     2
41   1   1     2
42   1   1     2
43   1   1     2
44   1   1     2
45   1   1     2
46   1   2     1
47   1   2     1
   rem sex treat
48   1   2     1
49   1   2     1
50   1   2     1
51   1   2     1
52   1   2     1
53   1   2     1
54   1   2     1
55   1   2     1
56   1   2     1
57   1   2     1
58   1   2     1
59   1   2     1
60   1   2     2
61   1   2     2
62   1   2     2
63   1   2     2
64   1   2     2
65   1   2     2
66   1   2     2
67   1   2     2
68   1   2     2
69   1   2     2
70   1   2     2
71   1   2     2
72   1   2     2
73   1   2     2
74   1   2     2
75   1   2     2
76   1   2     2
77   1   2     2
78   1   2     2
79   1   2     2
80   1   2     2
81   1   2     2
82   1   2     2
83   1   2     2
84   1   2     2
85   0   1     1
86   0   1     1
87   0   1     1
88   0   1     1
89   0   1     1
90   0   1     1
91   0   1     1
92   0   1     1
93   0   1     1
94   0   1     1
    rem sex treat
 95   0   1     1
 96   0   1     1
 97   0   1     2
 98   0   1     2
 99   0   1     2
100   0   2     1
101   0   2     1
102   0   2     1
103   0   2     1
104   0   2     1
105   0   2     1
106   0   2     1
107   0   2     1
108   0   2     1
109   0   2     1
110   0   2     1
111   0   2     1
112   0   2     1
113   0   2     1
114   0   2     1
115   0   2     1
116   0   2     2
117   0   2     2
118   0   2     2
119   0   2     2
120   0   2     2
> remission.expand$sex<-factor(remission.expand$sex)
> remission.expand$treat<-factor(remission.expand$treat)

Dependent Variable: Success/Fail binary score unweighted

Note that when we fit the saturated model we get the same estimates and tests as we got from the previous fits, but there are now 116 degress of freedom for the residual deviance. Because the observed scores are either 0 or 1 but, in this example, the fitted binomial probabilities are never 0 or 1, the residual deviance can't be zero.

> fit5<-glm(rem~sex*treat,data=remission.expand,family=binomial(link=logit))
> summary(fit5)
 
Call: glm(formula = rem ~ sex * treat, family = binomial(link = logit), data =
        remission.expand)
Deviance Residuals:
       Min        1Q    Median       3Q      Max
 -2.145962 -1.121257 0.4590455 1.010768 1.234617
 
Coefficients:
                  Value Std. Error     t value
(Intercept)  1.01964685  0.2347573  4.34340879
        sex -0.28169360  0.2347573 -1.19993549
      treat  0.88368000  0.2347573  3.76422822
  sex:treat -0.01219535  0.2347573 -0.05194875
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 146.6074 on 119 degrees of freedom
 
Residual Deviance: 128.3747 on 116 degrees of freedom
 
Number of Fisher Scoring Iterations: 4
 
Correlation of Coefficients:
          (Intercept)        sex      treat
      sex -0.1519319
    treat  0.3812090  -0.1406812
sex:treat -0.1406812   0.3812090 -0.1519319
> anova(fit5,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: rem
 
Terms added sequentially (first to last)
          Df Deviance Resid. Df Resid. Dev   Pr(Chi)
     NULL                   119   146.6074
      sex  1  1.43362       118   145.1738 0.2311748
    treat  1 16.79637       117   128.3774 0.0000416
sex:treat  1  0.00270       116   128.3747 0.9585671
> fit6<-update(fit5,.~.-sex:treat)
> summary(fit6)
 
Call: glm(formula = rem ~ sex + treat, family = binomial(link = logit), data =
        remission.expand)
Deviance Residuals:
       Min        1Q   Median       3Q      Max
 -2.137429 -1.118173 0.463493 1.007725 1.237822
 
Coefficients:
                 Value Std. Error   t value
(Intercept)  1.0179706  0.2323177  4.381805
        sex -0.2770530  0.2169430 -1.277078
      treat  0.8818644  0.2319361  3.802188
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 146.6074 on 119 degrees of freedom
 
Residual Deviance: 128.3774 on 117 degrees of freedom
 
Number of Fisher Scoring Iterations: 4
 
Correlation of Coefficients:
      (Intercept)        sex
  sex -0.1027515
treat  0.3665306  -0.0854362
> anova(fit6,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: rem
 
Terms added sequentially (first to last)
      Df Deviance Resid. Df Resid. Dev   Pr(Chi)
 NULL                   119   146.6074
  sex  1  1.43362       118   145.1738 0.2311748
treat  1 16.79637       117   128.3774 0.0000416

Dependent Variable: Success/Fail binary score, data grouped by unique rows, weighted by counts

This analysis gives exactly the same residual deviances as the previous analysis, but the residual from the saturated model has only 4 degrees of freedom.

> remission.group
  rem count sex treat
1   1    18   1     1
2   0    12   1     1
3   1    27   1     2
4   0     3   1     2
5   1    14   2     1
6   0    16   2     1
7   1    25   2     2
8   0     5   2     2
> fit7<-glm(rem~sex*treat,data=remission.group,family=binomial(link=logit),
+ weight=count)
> summary(fit7)
 
Call: glm(formula = rem ~ sex * treat, family = binomial(link = logit), data =
        remission.group, weights = count)
Deviance Residuals:
        1         2       3         4        5         6        7         8
 4.288324 -4.689454 2.38527 -3.716916 4.619515 -4.485028 3.019284 -4.232918
 
Coefficients:
                  Value Std. Error     t value
(Intercept)  1.01964685  0.2347573  4.34340879
        sex -0.28169360  0.2347573 -1.19993549
      treat  0.88368000  0.2347573  3.76422822
  sex:treat -0.01219535  0.2347573 -0.05194875
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 146.6074 on 7 degrees of freedom
 
Residual Deviance: 128.3747 on 4 degrees of freedom
 
Number of Fisher Scoring Iterations: 4
 
Correlation of Coefficients:
          (Intercept)        sex      treat
      sex -0.1519319
    treat  0.3812090  -0.1406812
sex:treat -0.1406812   0.3812090 -0.1519319
> anova(fit7,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: rem
 
Terms added sequentially (first to last)
          Df Deviance Resid. Df Resid. Dev   Pr(Chi)
     NULL                     7   146.6074
      sex  1  1.43362         6   145.1738 0.2311748
    treat  1 16.79637         5   128.3774 0.0000416
sex:treat  1  0.00270         4   128.3747 0.9585671
> fit8<-update(fit7,.~.-sex:treat)
> summary(fit8)
 
Call: glm(formula = rem ~ sex + treat, family = binomial(link = logit), data =
        remission.group, weights = count)
Deviance Residuals:
        1        2       3         4        5        6        7         8
 4.275416 -4.70127 2.40838 -3.702135 4.631506 -4.47269 3.000915 -4.246047
 
Coefficients:
                 Value Std. Error   t value
(Intercept)  1.0179706  0.2323177  4.381805
        sex -0.2770530  0.2169430 -1.277078
      treat  0.8818644  0.2319361  3.802188
 
(Dispersion Parameter for Binomial family taken to be 1 )
 
    Null Deviance: 146.6074 on 7 degrees of freedom
 
Residual Deviance: 128.3774 on 5 degrees of freedom
 
Number of Fisher Scoring Iterations: 4
 
Correlation of Coefficients:
      (Intercept)        sex
  sex -0.1027515
treat  0.3665306  -0.0854362
> anova(fit8,test="Chisq")
Analysis of Deviance Table
 
Binomial model
 
Response: rem
 
Terms added sequentially (first to last)
      Df Deviance Resid. Df Resid. Dev   Pr(Chi)
 NULL                     7   146.6074
  sex  1  1.43362         6   145.1738 0.2311748
treat  1 16.79637         5   128.3774 0.0000416
>

Back to Statistics 743