Statistics 2MA3 Test #2 Solutions

Statistics 2MA3 - Test #2 Solutions

2003-03-08

The marking scheme is indicated in red. Full Marks = 40.

Q1 [9]

Bayes' Theorem says that if a sample space is partitioned into k mutually exclusive events E₁, ..., E_k, then
P(E_i | A) = P(A | E_i) P(E_i)/{ P(A | E₁) P(E₁) + ... + P(A | E_k) P(E_k) }.
Give three interesting facts about the life and work of Thomas Bayes (1702-1761).
A parameter is a scalar or vector that indexes a family of probability distributions. A statistic is any function of the observations in a sample; it may not include any unknown parameters. The distribution of a statistic is called a sampling distribution; it describes how the statistic will vary from one sample to another.

Q2 [9]

The distribution of X is discrete uniform, with each possible value from 1 to 4 having the same probabilty 1/4. Hence
E[X] = 1*(1/4) + 2*(1/4) + 3*(1/4) + 4*(1/4) = 2.5
E[X²] = 1*(1/4) + 4*(1/4) + 9*(1/4) + 16*(1/4) = 7.5
Var[X] = E[X²] - E[X]² = 1.25
Since Y is the sum of 9 independent realizations of X, E[Y] = 9*E[X] = 22.5 and Var[Y] = 9*Var[X] = 11.25. By the Central Limit Theorem the distribution of Y will be approximately normal so, approximately, P[Y > 26] = 1 - F((26.5 - 22.5)/sqrt(11.25)) = 1 - F(1.19257) = 0.117. (Note that I have used a continuity correction: because the actual score must be an integer, I have approximated the probability of getting 25 or less by taking the area under the normal curve up to 25.5. You will get a slightly different answer if you omit the continuity correction.)

Q3 [9]

> sens <- pnorm((3-3.5)/.6)
> spec <- 1-pnorm((3-4)/.5)
> prev <- 0.22
> c(sens=sens, spec=spec, prev=prev)
     sens      spec      prev 
0.2023284 0.9772499 0.2200000 
> pvp <- sens*prev/(sens*prev + (1-spec)*(1-prev))
> pvp
[1] 0.7149717

A subject is classified as a smoker if FEV < 3; since FEV ~ N(3.5, 0.6^2) for smokers, the sensitivity of the test is F((3 - 3.5)/0.6) = 0.202. Also, since FEV ~ N(4, 0.5^2) for nonsmokers, the specificity is 1 - F((3 - 4)/0.5) = 0.977. If the prevalence is 22%, Bayes' Theorem gives PV⁺ = 0.715.

Q4 [3]

Assume a Poisson distribution for the number X of accidents in a given week; if the mean really is 1.6, then the probability of getting 4 or more is easily computed as

P(X >= 4) = 1-exp(-1.6)*(1 + 1.6 + (1.6^2)/2 + (1.6^3)/6) = 0.0788

Since this is greater than 5%, we conclude that 4 accidents is not significantly high and hence there is no evidence that the mean rate has increased.

Q5 [10]

> dhr <- c(4, 6, 5, 2, 5, -8, 1, -2, 8, 0, 12, 13, 1, 0, 7)
> n <- length(dhr)
> dbar <- mean(dhr)
> s <- sqrt(var(dhr))
> c(n=n, dbar=dbar, s=s)
        n      dbar         s 
15.000000  3.600000  5.395766 
> qt(.975,n-1)
[1] 2.144787
> dbar + c(-1,1)*qt(.975,n-1)*s/sqrt(n)
[1] 0.6119246 6.5880754
> qnorm(.975)
[1] 1.959964
> dbar + c(-1,1)*qnorm(.975)*s/sqrt(n)
[1] 0.869416 6.330584

Whether computed as (0.61, 6.59) with the t-distribution or, approximately, as (0.87, 6.33) with the normal distribution, the 95% confidence interval for the mean difference in heart rate excludes 0 so at the 95% level of confidence (or 5% level of significance) there is evidence that the true mean difference between heart rate before treatment and heart rate after treatment is not zero. This analysis assumes that the subjects are independent and the differences are normally distributed.
The zero differences carry no information one way or the other; it would bias the result to treat them as positive or negative, so it is best to omit them and say that we have 11 positive differences out of 13 non-zero differences. Under the hypothesis that the median difference is zero, the number of positive differences will be distributed Bin(13, 0.5). The probability of getting 11 or more positive differences out of 13 would then be 78/8192 + 13/8192 + 1/8192 = 92/8192 = 0.0112 which is less than 5% so there is reason to claim that the median difference between heart rate before and after treatment is not zero.