Statistics 743 Assignments 1998-99


A01 1998-10-15

Due: 1998-10-30

Given n independent observations from a gamma distribution with shape parameter alpha and unit scale parameter, find the marginal distributions of the sample mean xbar, the sample variance s2, and t = sqrt(n)*(xbar - alpha)/s. Find the distributions by simulation and,where possible, analytically. Compare them with the corresponding distributions for normally-distributed data. Plot cumulative distribution functions to compare the distributions.

Do this for n = 2, 3, 50 and alpha = 0.5, 1, 10.

Derive and plot the joint probability density for xbar and s2 when n = 2 and alpha = 0.5, 1, 10. Compare these with the corresponding plots for normally-distributed data.

You should be able to derive all of these distributions exactly for n = 2 and alpha = 1, and some of them for n = 2 and any alpha. Other cases can only be studied by simulation.


A02 1998-11-11

Due: 1998-11-30

(a) In A01 you found the exact distribution of t when n = 2 and alpha = 1. Call this distribution "ext1" for "exponential-data t on 1 degree of freedom". Write Splus functions rext1, dext1, pext1 and qext1 and do what you can to verify that they will give accurate results in any application.

(b) Write an Splus function that computes maximum likelihood estimates for the two-parameter gamma distribution and returns the estimates, their standard errors, marginal confidence intervals and a joint confidence ellipse for the parameters. Let the parameters be the shape parameter and the distribution mean. Why is this a good choice of parameters?


A03 1999-02-09

Due: 1999-03-02

A. Problems from Silvey

MVUE: 2.2, 2.3, 2.4, 2.8

MLE: 4.2, 4.3

BAYES: 10.6, 10.7

B. MLE without derivatives

Use the minimization function ms() in Splus to find maximum likelihood estimates for the same two-parameter gamma distribution you studied in A02. What are the advantages and disadvantages of this method?

C. Bayesian methods

Repeat everything we did in class for the beta-binomial model, for the gamma-poisson. That is, a Poisson distribution with a prior gamma distribution for the mean.

This includes writing Splus code to graph the prior and posterior distributions and the likelihood function and compute point and interval estimates. Here is the code I used for the beta-binomial results I presented in class (Lecture #39 on 1999-01-26). Note the use of ms() to solve for the limits of likelihood intervals, and the use of a data frame to return multiple values from the function.

> betabin
function(n, x, a = 1., b = 1.)
{
        th <- (0.:100.)/100.
        matplot(th, matrix(c(dbeta(th, a, b), dbeta(th, a + x, b + n - x),
                dbeta(th, 1. + x, 1. + n - x)), ncol = 3.), type = "l")
        bayesest <- (x + a - 1.)/(n + a + b - 2.)
        bayeslow <- qbeta(0.025, a + x, b + n - x)
        bayesupp <- qbeta(0.975, a + x, b + n - x)
        mle <- x/n
        chis1 <- qchisq(0.95, 1.)
        thtmp <- ms( ~ (-2. * (x * log(the/mle) + (n - x) * log((1. - the)/
                (1. - mle))) - chis1)^2., start = data.frame(the = c(0.2)))
        likelow <- thtmp$parameters
        thtmp <- ms( ~ (-2. * (x * log(the/mle) + (n - x) * log((1. - the)/
                (1. - mle))) - chis1)^2., start = data.frame(the = c(0.8)))
        likeupp <- thtmp$parameters
        data.frame(bayes = pbeta(0.5, a + x, b + n - x), pval = 1. - pbinom(
                x - 1., n, 0.5), midp = ((1. - pbinom(x - 1., n, 0.5)) + (
                1. - pbinom(x, n, 0.5)))/2., bayesest, bayeslow, bayesupp,
                mle, likelow, likeupp, row.names = c("Values:"))
}


A04 1999-04-07

Due: 1999-04-30

A. Problems 2.1 - 2.5 from McCullagh & Nelder, pp.44-45.

B. Overdispersion

Derive the probability density function for the over-dispersed Poisson. Plot the density with mean m = 2 and m = 5 and three different values of the dispersion parameter: f = 1, 1.5, 2. [Hint: I call this distribution the "Expanded Poisson".]

C. Stratified person-time data

Take the example from pp.594-600 of Rosner, B.A. (1995) Fundamentals of Biostatistics, Fourth Edition. Duxbury. Reproduce all the tests and estimate he gives there, using GLM software (GLMStat, Splus, or SAS). Your results will not always agree exactly because Rosner uses Pearson's chi-square and sometimes includes a continuity correction, but they will be very close. [Hint: Use a log-linear model with offset. Rosner only considers the "Never users" and "Current users"; if you are interested, you might also consider what happens when you include the "Past users".]

D. Logistic regression for Bernoulli data

In the example Binomial GLM in Splus from the lecture of 1999-03-23 there were n = 120 subjects and k = 4 distinct rows in the design matrix. The data could be analysed as k Binomial distributions or as n Bernoulli distributions. The results differed only with respect to the residual deviance. Fitting a saturated model to the k binomials gave a null residual deviance on 0 degrees of freedom but fitting the same model to the n Bernoulli distributions, while giving the same tests and estimates, left a positive residual on n - k degrees of freedom. Where does this residual come from?

Show that if there are mi > 1 copies of the ith distinct row in the design matrix, i = 1, ... , k, and if the fitted saturated model (with k parameters) is such that every binomial proportion is > 0 and < 1, then the Pearson residual deviance after fitting the data as n Bernoulli distributions is exactly n on n - k degrees of freedom. Hence the log-likelihood residual deviance will approximately equal n. What are the implications of this result?


A05 1999-04-07

Due: 1999-04-30

Exercises on the jackknife and bootstrap

Here are 50 observations from some distribution. We are interested in the coefficient of variation s/m of this distribution and require (a) a point estimate, bias-corrected if possible; (b) the standard error of the point estimate; (c) a 95% confidence interval.

Try the following methods with (i) the 5 observations in the first row and (ii) all 50 observations.

(A) Maximum likelihood, assuming that the observations come from a gamma distribution. Do what you can to test the assumption of a gamma distribution.

(B) The moment estimate s/x_bar, using the delta-method to adjust for bias and approximate the standard error.

(C) The moment estimate s/x_bar, using the delete-1 jackknife to adjust for bias and estimate the standard error.

(D) The moment estimate s/x_bar, using the bootstrap to adjust for bias and approximate the standard error. When n = 5, compute also the ideal bias adjustment and standard error.

0.105 3.390 1.034 0.248 1.321
2.252 0.924 1.250 1.266 3.535
2.870 0.751 0.802 0.649 4.531
1.567 2.953 1.299 0.932 0.355
2.873 1.524 2.244 4.264 0.450
2.390 0.816 1.495 3.594 1.746
2.155 1.684 0.681 0.228 2.133
3.371 0.168 3.205 0.842 5.956
1.995 0.815 1.588 0.873 1.333
3.803 1.326 5.137 1.508 2.051

Return to Statistics 743