Statistics 2MA3 - Exercise #3 - Hints

Updated 2001-02-17


Plotting ROC Curves

Think of how we plotted the curve by hand: we sorted all the observations from both groups into one vector then, for each observation in turn, from smallest to largest, we found the proportions of the disease sample, and of the control sample, less than or equal to that observation, and we plotted these proportions against each other.

> plot.roc
function (sd, sc)
{
    sall <- sort(c(sd, sc))
    sens <- 0
    specc <- 0
    for (i in 1:length(sall)) {
        sens <- c(sens, mean(sd <= sall[i]))
        specc <- c(specc, mean(sc <= sall[i]))
    }
    plot(specc, sens, xlim = c(0, 1), ylim = c(0, 1), type = "l",
        xlab = "1-specificity", ylab = "sensitivity")
    abline(0, 1)
    invisible()
}

Note that mean(sd <= sall[i]) computes the proportion of times the condition is satisfied, which is just the sensitivity at cut-off sall[i]. Note the use of sens <- c(sens, ...) to accumulate successive values of sensitivity in the vector sens. The invisible() command ensures that the only effect if the function is plotting and no values are returned.

We can improve on this code. The following version of plot.roc will work when there are missing values (NA) in the data and returns the area under the ROC curve.

> plot.roc
function (sd, sc)
{
    sall <- sort(c(sd, sc))
    sens <- 0
    specc <- 0
    for (i in 1:length(sall)) {
        sens <- c(sens, mean(sd <= sall[i], na.rm = T))
        specc <- c(specc, mean(sc <= sall[i], na.rm = T))
    }
    plot(specc, sens, xlim = c(0, 1), ylim = c(0, 1), type = "l",
        xlab = "1-specificity", ylab = "sensitivity")
    abline(0, 1)
    npoints <- length(sens)
    sum(0.5 * (sens[-1] + sens[-npoints]) * (specc[-1] - specc[-npoints]))
}
 

Choosing a Cut-off Point

Note that within each group, the misclassification probability is given by a normal tail area, either above (for the disease group) or below (for the control group) the cut-off a. To find the total probability of misclassification, multiply the conditional rates by the respective probabilities for the disease group and the control group, and add. To find the value of a that minimizes this expression, differentiate with respect to a, set the derivative to zero and solve for a. Remember that F() is just the integral of the normal probability density function.


Statistics 2MA3
Last modified 2001-02-17 15:39