Week 5 - Exercises with Excel Pivot Tables

1999-02-08

Here are some useful pivot tables and graphical displays for the Niagara River Pollution data. You should be able to figure out how to make the tables with the Pivot Table Wizard, but there are a few tricks to learn so don't wait too long before asking me for help if you get stuck. Any of these summaries would be suitable for your Assignment 1 report, or you might look at these and think of something better. Could you add vertical standard error bars to the graphs, for example?

I did these examples with Excel 98 for Macintosh. Your version of Excel may behave a bit differently.

I used the Excel Workbook that I gave you, but I changed the column headings to make them more verbose and descriptive. When I imported the file to Splus I changed them to be short and cryptic.

You can represent missing data in Excel as empty cells or as non-numbers such as "NA". Empty cells will give warnings when you take logs, while non-numbers may give warnings when you take averages or do other statistics.


Correction

1999-03-05 The original data file had the codes for stations "FE" and"NOTL" switched. I have corrected the data files and revised all the graphs and tables on this page to show the correct stations.


How many values of Dieldrin in Water were at the Detection Limit?

This is a one-way pivot table constructed using the same variable for both Rows and Data.

Count of Dieldrin in Water - Limit

Dieldrin in Water - Limit

Total

L

7

N

958

Grand Total

965


Plot the Annual Average of Dieldrin in Water

Average of Dieldrin in Water

Station

Years

Date

FE

NOTL

1986

0.304516129

0.300384615

1987

0.326458333

0.31212766

1988

0.293137255

0.27

1989

0.292708333

0.292790698

1990

0.280816327

0.26877551

1991

0.256041667

0.265869565

1992

0.199361702

0.220612245

1993

0.2

0.205116279

1994

0.176888889

0.17775

1995

0.161304348

0.162244898

1996

0.146666667

0.155


Plot the Monthly Average of Dieldrin in Water (all years combined)

Average of Dieldrin in Water

Station

Date

FE

NOTL

Jan

0.251219512

0.232820513

Feb

0.259459459

0.256666667

Mar

0.247619048

0.242368421

Apr

0.23804878

0.240952381

May

0.23175

0.2275

Jun

0.252

0.261

Jul

0.257647059

0.264705882

Aug

0.264047619

0.246756757

Sep

0.236428571

0.228421053

Oct

0.22

0.2295

Nov

0.253243243

0.240555556

Dec

0.247419355

0.245806452


Plot the Monthly Average of Dieldrin in Water (for 10 years)

Average of Dieldrin in Water

Station

Years

Date

FE

NOTL

1986

Apr

0.243333333

0.244

May

0.273333333

0.21

Jun

0.25

0.205

Jul

0.14

0.326666667

Aug

0.4075

Sep

0.295

0.31

Oct

0.258

0.296

Nov

0.4025

0.37

Dec

0.383333333

0.4

1987

Jan

0.35

0.245

Feb

0.405

0.37

Mar

0.305

0.345

Apr

0.336

0.314

May

0.295

0.285

Jun

0.3575

0.3125

Jul

0.33

0.3575

Aug

0.395

0.345

Sep

0.276

0.294

Oct

0.2475

0.2675

Nov

0.316666667

0.27

Dec

0.306666667

0.355

1988

Jan

0.34

0.28

Feb

0.3075

0.28

Mar

0.326

0.29

Apr

0.2625

0.2875

May

0.2575

0.23

Jun

0.3

0.306666667

Jul

0.306666667

0.24

Aug

0.3475

0.223333333

Sep

0.266

0.296666667

Oct

0.2575

0.2375

Nov

0.282

0.28

Dec

0.2675

0.254

1989

Jan

0.315

0.2875

Feb

0.3125

0.33

Mar

0.298

0.314

Apr

0.245

0.28

May

0.23

0.263333333

Jun

0.275

0.316

Jul

0.2925

0.276666667

Aug

0.276

0.326

Sep

0.4175

0.23

Oct

0.255

0.2575

Nov

0.2975

0.29

Dec

0.31

0.305

1990

Jan

0.29

0.2925

Feb

0.3

0.31

Mar

0.2725

0.242

Apr

0.24

0.25

May

0.268

0.258

Jun

0.325

0.3

Jul

0.353333333

0.31

Aug

0.276

0.26

Sep

0.2625

0.2375

Oct

0.264

0.2625

Nov

0.2675

0.26

Dec

0.28

0.263333333

1991

Jan

0.284

0.254

Feb

0.29

0.38

Mar

0.2825

0.2625

Apr

0.285

0.265

May

0.262

0.274

Jun

0.295

0.33

Jul

0.28

0.3025

Aug

0.2

0.268

Sep

0.21

0.26

Oct

0.206

0.228

Nov

0.233333333

0.2225

Dec

0.226666667

0.25

1992

Jan

0.2

0.242

Feb

0.1925

0.256666667

Mar

0.193333333

0.22

Apr

0.196

0.214

May

0.196666667

0.2125

Jun

0.19

0.2125

Jul

0.2325

0.238

Aug

0.205

0.2275

Sep

0.1525

0.2175

Oct

0.2025

0.196

Nov

0.213333333

0.203333333

Dec

0.2225

0.21

1993

Jan

0.1975

0.2175

Feb

0.215

0.195

Mar

0.208

0.21

Apr

0.2

0.208

May

0.193333333

0.195

Jun

0.216666667

0.252

Jul

0.235

0.22

Aug

0.23

0.223333333

Sep

0.182

0.194

Oct

0.18

0.18

Nov

0.186666667

0.17

Dec

0.146666667

0.145

1994

Jan

0.176666667

0.166666667

Feb

0.183333333

Mar

0.24

0.1825

Apr

0.2025

0.1825

May

0.1725

0.1475

Jun

0.135

0.226

Jul

0.185

0.19

Aug

0.156

0.176666667

Sep

0.1525

0.168

Oct

0.1625

0.17

Nov

0.16

0.145

Dec

0.196666667

0.1725

1995

Jan

0.165

0.15

Feb

0.24

0.185

Mar

0.16

0.176666667

Apr

0.1575

0.18

May

0.162

0.1825

Jun

0.17

0.17

Jul

0.123333333

0.1625

Aug

0.16

0.14

Sep

0.14

0.15

Oct

0.14

0.1525

Nov

0.15

0.15

Dec

0.156666667

0.156666667

1996

Jan

0.1575

0.163333333

Feb

0.1475

0.158

Mar

0.135

0.145


Plot a log-log scatterplot of Dieldrin in Solids against PCB in Solids

This exercise doesn't involve Pivot Tables but I included it because it is a very useful graph and you should know how to do it in Excel. Use different symbols and colours to distinguish between the two stations. Which station appears to have the more serious problem with detection limits?

Here is the same graph plotted in Splus. Which is easier for this example, Excel or Splus?

> plot(pcbsol, diesol, log = "xy", type = "n", xlab="PCB in Solids", ylab=
+ "Dieldrin in Solids")
> par(cex=.8)
> points(pcbsol[stn == "NOTL"], diesol[stn == "NOTL"], pch =15, col = 3)
> points(pcbsol[stn == "FE"], diesol[stn == "FE"], pch =17, col = 1)
> legend(1200, 25, c("NOTL", "FE"), marks = c(15, 17), col = c(3, 1), 
+ background = 4)


Working with Dates in Splus

The function dates() will convert the Julian dates stored in Excel to calendar dates, but you need to specify the origin (day zero) and the output format. I wrote a function my.dates() so I wouldn't have to write this out every time.

> my.dates
function(date)
{
        dates(date, origin = c(month = 12, day = 30, year = 1899), out.format
                 = "year-m-d")
}

The following commands give boxplots of dieldrin in water, by year, for each station. Compare these plots to the annual averages plotted above. What are the advantages and disadvantages of the box plots here?

What will you get if you use "6, 7" or "1, 7" instead of "1, 4" in the substring() call? Why is as.character() necessary here?

> boxplot(split(diewat[stn == "FE"], substring(as.character(my.dates(date[
   stn == "FE"])), 1, 4)))
> title("Dieldrin in Water - FE")

> boxplot(split(diewat[stn == "NOTL"], substring(as.character(my.dates(date[
   stn == "NOTL"])), 1, 4)))
> title("Dieldrin in Water - NOTL")

Back to S4P03