Statistics 3N03/3J04 - Assignment #1

2003-09-21

Due: 2003-10-01 18:00


Do your graphs and calculations in R. Submit your work as a report, pasting the graphs into a word processor and adding comments and discussion. The first five problems are taken from Montgomery & Runger, Applied Statistics and Probability for Engineers, 3rd edition, and some of the data sets are available on the accompanying CD.

Question 1: 6-85 (p 217)

Hint: Review the notes on making comparative box plots. Remember that when you import data into an R data frame, all observations of the same quantity (in this case the measured speed of light in km/sec minus 299000) should go in one column, and any categorical variables (in this case, the trial number) should go in their own columns beside it. Categorical variables should be entered as factors so they don't get treated as numbers later. Here is a simple way to generate the column of indicators for trial; assume that you have called the data frame michelson.

michelson$trial <- factor(rep(1:5, rep(20, 5)))

Question 2: 6-60 (p 211)

R doesn't have a digidot plot, so do stem-and-leaf and time series plots separately. Also do a lag-1 scatter plot. Is there evidence of trend, a shift in mean or autocorrelation?

Question 3: Surface Finish - Example 12-12 (p 450)

Do an exploratory data analysis of the data in Table 12-14 (p 451).

Question 4: 12-64 (p 463)

Do graphical analyses using a scatterplot matrix to determine which variables affect thrust. Use different plotting symbols and colours for points with low ambient temperature. State your conclusions. (The question asks for linear regression and tests of hypothesis but of course you are not expected to do those for this assignment!)

Warning: The data are in Table 12-20 (p 456); the textbook has the data for 12-64 and 12-66 interchanged. You can get the file for Table 12-20 on the CD if you go to 12-66, but it is missing the "fuel flow rate". To simulate the real-life frustrations of working with data, I should have left you to discover this for yourself!

Hint: If the data are in a data frame called jet in R, then pairs(jet) will give a scatterplot matrix. If ambient temperature is in a column called ambtemp, what does pch = 1 + (jet$ambtemp < 90) do if you add it to the pairs call? How does it work?

Question 5: 14-8 (p 520)

Do graphical analyses using comparative box plots to compare crack growth between the loading frequencies and between the environment conditions. Give "interaction plots" like the one in Figure 14-8 (p 516): plot the mean crack growth against environment condition, separately for each loading frequency, and plot the mean crack growth against loading frequency, separately for each environment condition. Repeat the graphs with crack growth on a log scale. State your conclusions. (The question asks for a two-factor analysis of variance but you will do that in Assignment #3.)

Hints: Enter the data as three columns in a data frame, putting the crack growth in the first column, a code for loading frequency in the second, and a code for environment condition in the third. To plot on a log scale, you can add the option log="y" to the plot or boxplot command to transform the Y-axis, or you can compute a new column of log-transformed crack growths. Will the boxplots look the same either way? Will the interaction plots look the same either way?

Question 6: Environmental Data

The following air quality measurements were taken downwind of a coal burning electrical generating station. Construct and interpret the following plots:

  1. box plot for sulphur concentration versus time of day;
  2. box plot for sulphur concentration versus date;
  3. time series plot.

Based on the plots you have constructed, do you think it is important to consider the time of day in an air-quality sampling regime? In your opinion, do the data provided tell the whole story regarding sulphur contamination in the air at this location, or do more data need to be collected? Justify your answer, and discuss what other factors might be considered.

Sulphur Concentration (ppm)

 

 

12:00am

6:00am

12:00pm

6:00pm

 

July 4 1990

22

34

35

43

 

July 5 1990

30

18

9

4

 

July 6 1990

18

27

23

11

 

July 7 1990

17

13

10

11

July 8 1990

23

21

16

9

 

July 9 1990

16

8

9

14

 

July 10 1990

15

3

2

1


Statistics 3N03/3J04