STATISTICS 3N03 - Test #2 Solutions

2002-10-31

Question 1 [3 + 3 + 3 marks]

(a) Probability is a measure of certainty on a scale of 0 to 1. The probability of an impossible event is 0, the probability of an inevitable event is 1.

(b) Siméon Denis Poisson derived the Poisson distribution in 1837 as a limiting case of the binomial distribution, and applied it to the decisions of juries in criminal and civil cases. It attracted little attention until von Bortkiewicz in 1898 applied it to a number of data sets, the most famous being the number of deaths by horse kicks in the Prussian army in different years. [Either example will be accepted.]

(c) John Wilder Tukey died on 26 July 2000. While his most important contribution may have been the Fast Fourier Transform, which is applied to many areas of science and engineering, including time series analysis, he developed many areas of statistics, particularly robust methods, analysis of variance and exploratory data analysis. His graphical innovations included the boxplot and the stem and leaf plot. He introduced many new terms, giving new definitions to common words like "hinge" and "stem and leaf". [Any two contributions will do, year of death is sufficient.]


Question 2 [4 marks]

This code will draw a "Normal Probability Plot" like the one given by qqnorm(x), with standard normal theoretical quantiles on the abscissa and the sample quantiles on the ordinate. If the data come from a normal distribution the line will be straight (except for a few points at either end); the intercept then gives a graphical estimate of the mean, while the slope gives a graphical estimate of the standard deviation. [Your answer does not have to be this detailed.]


Question 3 [9 marks]

Let A be the event that the pellet is acceptable, let G be the event that the process has not shifted. Assume that the distribution of weight is Normal. [You must state any assumptions.]

We compute P(A|G) = 0.8904, P(A|G') = 0.7799 and we are given that P(G) = 0.9, so P(G') = 0.1. Hence Bayes' Theorem give the required probability as

P(G'|A')
= P(A'|G')*P(G')/(P(A'|G')*P(G') + P(A'|G)*P(G))
= (1-P(A|G')*P(G')/((1-P(A|G'))*P(G') + (1-P(A|G))*P(G))
= 0.182

> prag <- pnorm((104-100)/2.5) - pnorm((96-100)/2.5)
> prag
[1] 0.8904014
> pras <- pnorm((104-102)/2.5) - pnorm((96-102)/2.5)
> pras
[1] 0.779947
> (1-pras)*0.1/(((1-pras)*0.1) + ((1-prag)*0.9))
[1] 0.1823985

Question 4 [9 marks]

Assume that flaw occur one at a time, independently, at a constant average rate along the wire. Assume that coils are independent of each other. [You must state any assumptions.]

Let A be the event that a given coil is acceptable. If X is the number of flaws in a given coil, X ~ Pois(0.2) so p = P(A) = P(X=0) = 0.81873.

(a) Let Y be the number of coils you test until the first unacceptable coil is found. Since Y ~ Geom(p),

P(Y=6) = (1-p)*p^5 = 0.0667 and (b) P(Y>=6) = p^5 = 0.3679.

(c) Let W be the number of acceptable coils out of 100. Since W ~ Binom(100, p), the normal approximation to the binomial (with continuity correction) gives

P(W>=80) = 0.731

[You do not need to do the exact binomial calculation under test conditions.]

> pac <- dpois(0,0.2)
> pac
[1] 0.8187308
> (1-pac)*pac^5
[1] 0.06668523
> pac^5
[1] 0.3678794
> 1 - pbinom(79, 100, pac)
[1] 0.7366162
> 1 - pnorm((79.5 - 100*pac)/sqrt(100*pac*(1-pac)))
[1] 0.7310519


Question 5 [9 marks]

The outlier is the 17th observation (x17 = 22). With the outlier included, there are 21 observations. They appear to be discrete valued, so the sample mode is the most frequent observed value. The median is the 11th ordered observation, the lower hinge the 6th, the upper hinge the 16th. Mode = 2, Mean = 3.190, Lower Hinge = 2, Median = 2, Upper Hinge = 3, Standard deviation = 4.445.

With the outlier removed there are 20 observations. The median is the average of the 10th and 11th, the lower hinge the average of the 5th and 6th, the upper hinge the average of the 15th and 16th. Mode = 2, Mean = 2.25, Lower Hinge = 2, Median = 2, Upper Hinge = 2.5, Standard Deviation = 1.118.

> stem(x)

  The decimal point is 1 digit(s) to the right of the |

  0 | 00222222222222234444
  0 | 
  1 | 
  1 | 
  2 | 2

> length(x)
[1] 21
> median(x)
[1] 2
> mean(x)
[1] 3.190476
> sqrt(var(x))
[1] 4.445436
>
> x <- x[-17]
> stem(x)

  The decimal point is at the |

  0 | 00
  1 | 
  2 | 0000000000000
  3 | 0
  4 | 0000

> median(x)
[1] 2
> mean(x)
[1] 2.25
> sqrt(var(x))
[1] 1.118034

Statistics 3N03