S3N03 Assignment #1

Statistics 3N03 - Assignment #1 Solutions

2001-10-04

Due: 2001-10-03 18:00

The following problems and data sets are taken from Montgomery & Runger, Applied Statistics and Probability for Engineers, 2nd edition. I have used R to prepare these solutions; you may use any software you like.

Here are the graphs, calculations and discussion I am looking for; you may think of others that are also helpful in interpreting the data.

Full marks = 70

2-23 (p 42) [6 for graphs, 6 for interpretation]

Follow the instructions in the text and also do a lag-1 scatter plot. Is there evidence of trend or autocorrelation?

Analysis of Pull-off Force for Connectors, in Serial Order of Testing

The time sequence plot suggest a constant mean pull-off force for the first 20 or so connectors, followed by a downward trend. This suggests that the connectors later in the sequence are weaker.

The lag plot (a plot of the time series against a lagged version of itself) indicates weak positive autocorrelation. That is, the points lie more or less along the diagonal but do not lie tightly along the diagonal. If there were a strong positive autocorrelation, two connectors tested one after the other would have similar values of pull-off force and the points on the lag plot with lag +1 or -1 would lie very close to the diagonal.

The stem and leaf plot shows two modes. We can see the same effect in the time sequence plot; the high values of pull-off force are all close to 245 and most of the low values are close to 195.

  The decimal point is 1 digit(s) to the right of the |
 
  17 | 558
  18 | 357
  19 | 00445589
  20 | 1399
  21 | 00238
  22 | 005
  23 | 5678
  24 | 1555899
  25 | 158

2-31 (p 45) [6 for graphs and calculations, 6 for interpretation]

Follow the instructions in the text and also do a lag-1 scatter plot. Is there evidence of trend or autocorrelation?

Analysis of Viscosity from a Batch Chemical Process

The time sequence plot shows a fairly consistent variation with no indication of trend.

The lag plot shows a wide scatter of points, not concentrated on the diagonal, so there is no evidence of autocorrelation in these data.

The mean of the first 40 observations is 14.88, which is very close to the mean of the second 40, which is 14.92. The difference is small compared to the sample standard deviation (which is 0.948 for the first 40 and 1.02 for the second 40, or 0.98 overall), supporting the claim based on visual inspection of the time sequence plot that there was no evidence of a change in the process after the first 40 observations.

> mean(viscosity[1:40])
[1] 14.875
> sqrt(var(viscosity[1:40]))
[1] 0.9483454
> mean(viscosity[41:80])
[1] 14.9225
> sqrt(var(viscosity[41:80]))
[1] 1.022939
> mean(viscosity)
[1] 14.89875
> sqrt(var(viscosity))
[1] 0.9803763

Crude Oil Data [13 for graphs, 13 for interpretation]

The attached data file gives measurements of trace elements (vanadium, iron, and beryllium, all in % ash) and hydrocarbons (saturated and unsaturated, both in % area) in chemically analyzed samples of crude oil from three zones of sandstone (Wilhelm, Sub-Mulinia, Upper Mulinia). The data are listed in an arbitrary order within each zone.

Use a scatterplot matrices to study relations between the variables and use histograms to assess normality. Use box plots to look for differences between the zones. State your conclusions. Why would time series plots and lag-1 plots be inappropriate for these data?

Chemical Analysis of Crude Oil from Three Different Zones of Sandstone

The scatterplot matrix reveals weak linear relationships between iron and saturated hydrocarbons, and between saturated hydrocarbons and aromatic hydrocarbons, but no other pairwise relationships are evident.

It might be worth drawing scatterplot matrices for the three zones separately.

Considering the small sample, the histogram of vanadium content could have come from a normal distribution, but when we inspect the box plot that breaks down this distribution by zone, we realize that the left tail of the histogram comes mainly from Sub-Mulinia and Wilhelm, while the right tail is mostly Upper Mulinia. It would make sense to plot histograms for each zone separately, but the sample sizes within each zone would be very small and you wouldn't learn much more than the box plot shows.

The overall distribution of iron content is positively skewed. The Upper Mulinia zone has lower iron concentrations than the other zones.

The overall distribution of beryllium content is positively skewed. The Upper Mulinia zone has higher beryllium concentrations than the other zones, and two unusually high values (outliers on the box plot).

The overall distribution of saturated hydrocarbons is positively skewed. The Upper Mulinia zone has lower saturated hydrocarbons than the other zones.

The overall distribution of aromatic hydrocarbons is positively skewed. The Wilhelm zone has lower aromatic hydrocarbons than the other zones.

Since the data are listed in an arbitrary order within each zone, there is no reason for adjacent observations to be more closely related to each other than observations further apart, so time series plots and lag plots would be inappropriate.

13-8 (p 640) [10 for graphs, 10 for interpretation]

Do graphical analyses using comparative box plots to compare crack growth rates between the three frequencies, between the three environments, and between the nine different combinations of frequency and environment. Repeat using the log of crack growth rate. State your conclusions. (The question asks for a test of hypothesis and an analysis of residuals but you are not expected to do those for this assignment.)

A Factorial Experiment to Examine the Effects of Loading Frequency and Environment on Fatigue Crack Growth

Loading frequency of 0.1 gives a much wider range of crack growth rates than the two higher frequencies; the highest frequency gives a consistently low growth rate.

This result appears strange; in the Saltwater and Water environments, the crack growth rate can be as low as it is in Air, or it can be much higher. There isn't much difference between the Saltwater and Water environments.

Looking at all 9 combinations of Environment and Loading Frequency separately, the confusing result of the previous box plot is explained. Crack growth rate is relatively low in Air at any loading frequency. In Water or Saltwater the growth rate is the same as in Air at loading frequency 10, it is a bit higher at loading frequency 1, and much higher at loading frequency 0.1.

In the jargon of factorial designs, this is called an "interaction"; the effect of loading frequency is different in different environments.

Taking logarithms pulls in the high values and stretches out the low values, making positively skewed distributions more symmetric. Applied to the preceding box plots, it doesn't change the first two graphs very much, but the third one, with the 9 combinations of Environment and Loading Frequency shown separately, shows a more equal spread in the 9 distributions. (This will be important later; the standard analysis of a factorial design requires each treatment group to have the same variance.)

Note that in R we can specify a logarithmic axis with original units. In other packages, we may have to define a new variable as the log of crack growth rate and the Y-axis will be linear in logarithmic units.

Note: In this course, log is always natural logarithm.