Fitting the Kaplan-Meier Estimate and the Exponential Survival Model

2002-04-06

The first calculation uses the library functions survfit and survreg to plot the Kaplan-Meier estimate and perform a regression on tumour size (categorized into three size classes) assuming an exponential distribution of time to relapse.

> library(survival)
> cervical$csize <- cut(cervical$size,br=c(0,10,30,80),incl=T)
> plot(survfit(Surv(etime,event)~csize,data=cervical))
> cervfite <- survreg(Surv(etime, event) ~ csize, data = cervical, 
subset = etime > 0 & !is.na(etime) & !is.na(csize), dist="exp")
> summary(cervfite)
 
Call:
survreg(formula = Surv(etime, event) ~ csize, data = cervical, 
    subset = etime > 0, dist = "exp")
             Value Std. Error     z        p
(Intercept)  10.50      0.200 52.48 0.00e+00
csize(10,30] -1.10      0.278 -3.97 7.06e-05
csize(30,80] -2.56      0.334 -7.67 1.67e-14
 
Scale fixed at 1 
 
Exponential distribution
Loglik(model)= -693.1   Loglik(intercept only)= -717
	Chisq= 47.71 on 2 degrees of freedom, p= 4.4e-11 
Number of Newton-Raphson Iterations: 6 
n=844 (59 observations deleted due to missing)
 
> anova(cervfite)
      Df Deviance Resid. Df    -2*LL    P(>|Chi|)
NULL  NA       NA         1 1433.923           NA
csize  2  47.7124         3 1386.211 4.358964e-11

Here is a home-made function to plot the Kaplan-Meier curve and fit an exponential distribution.

> plotsurv2
function (eetime, eetype, KM = T, showexp = F, newplot = T, subset = NULL, 
    ...) 
{
    if (is.null(subset)) 
        subset <- TRUE
    etime <- eetime[subset][!is.na(eetime[subset])]
    etype <- eetype[subset][!is.na(eetime[subset])]
    tord <- order(etime)
    etime <- etime[tord]
    etype <- etype[tord]
    n <- length(etime)
    r <- sum(etype)
    ni <- n - cumsum(!etype)
    if (KM) 
        frac <- cumprod(1 - etype/(n:1))
    else frac <- 1 - cumsum(etype/ni)
    if (newplot) 
        plot(c(0, etime), c(1, frac), type = "l", col = "blue", 
            xlab = "time", ylab = "P(Survive > time)", ...)
    else lines(c(0, etime), c(1, frac), type = "l", col = "blue")
    if (showexp) {
        muhat <- (n/r) * mean(etime)
        tgr <- seq(0, max(etime), length = 60)
        lines(tgr, exp(-tgr/muhat))
        list(muhat = muhat, logL = -r * (log(muhat) + 1), r = r, 
            n = n)
    }
    else invisible()
}
> attach(cervical)
> fit0 <- plotsurv2(etime,event,T,T,F,!is.na(csize))
> fit1 <- plotsurv2(etime,event,T,T,F,as.numeric(csize)==1)
> fit2 <- plotsurv2(etime,event,T,T,F,as.numeric(csize)==2)
> fit3 <- plotsurv2(etime,event,T,T,F,as.numeric(csize)==3)
> fit1$mu
[1] 36177.56
> fit2$mu
[1] 12005.93
> fit3$mu
[1] 2792.429
> log(fit1$mu)
[1] 10.49619
> log(fit2$mu)-log(fit1$mu)
[1] -1.103039
> log(fit3$mu)-log(fit1$mu)
[1] -2.561527

Here is the likelihood ratio chi-square to test the equivalence of the three groups:

> chisq <- -2*(fit0$l-(fit1$l+fit2$l+fit3$l))
> chisq
[1] 47.7124

The following calculation applies Bartlett's correction to the likelihood ratio chi-square:

> Cinv <- 1+((1/fit1$r)+(1/fit2$r)+(1/fit3$r)-(1/fit0$r))/12
> Cinv
[1] 1.011110
> chisq/Cinv
[1] 47.18817
> 1-pchisq(chisq,2)
[1] 4.358969e-11
> 1-pchisq(chisq/Cinv,2)
[1] 5.665257e-11

Statistics 4P03/6P03