Frequently-Asked Questions - January


"What are Fuzzygrams and Stripe Plots?"

1998-01-30

A Fuzzygram is a histogram with the top of each bar made fuzzy to reflect our uncertainty as to its true height. The uncertainty comes from the Poisson distibution; if the count in a bin is c, the fuzziness in the bar should extend from c - 2 sqrt(c) to c + 2 sqrt(c), since, for the Poisson distribution, the variance is equal to the mean and hence the best estimate of Var(c) is c.

A stripe plot is like a dot plot but each point is represented by a vertical line instead of a dot. It is what you would get if you took a histogram with a very large number of very narrow bins, so each bin has either one point or no point.

The plots below show a histogram, a fuzzygram, a dot plot and a stripe plot, all plotted from the same data. Which gives the best impression of the underlying probability density?


"What aids am I allowed on the tests and on the exam?"

1998-01-28

For the mid-term tests:

For the final examination:

In past years I didn't allow tables for tests, I put the values you might need right on the question paper, but I now think it would be good if you became used to using your tables under exam conditions. Bring the tables you are most familiar with; these are likely the tables in Rosner, the course text. I apologize for not mentioning tables on the course outline; it was an oversight.


"How do I reference an individual cell in MINITAB?"

1998-01-22

MINITAB allows subscripts; C2(3), for example, refers to the cell in row 3 of column 2. Subscripts can be scalar variables, as in

LET K1 = 3
LET C2(K1) = 7

This allows you to write simple programs. You store commands in a file and run them with the EXECute command. See "Using MINITAB to recode the bilaterally-infected ears data" for an example.


"What does BASIC stand for?"

1998-01-21

BASIC is an acronym for Beginner's All-purpose Symbolic Instruction Code, a high-level programming language developed by John Kemeny and Thomas Kurtz at Dartmouth College in the mid-1960s. There have been many versions of BASIC, including GW BASIC, Quick BASIC and BASIC for the Commodore 64 (which you may have seen in primary school). A group at Dartmouth continues to develop and market what they call True BASIC, supporting several different platforms. Microsoft Visual BASIC is now the standard for developing Windows applications. I use True BASIC and Visual BASIC in my work.


"Why doesn't SpeedFill work for me in Quattro Pro?"

1998-01-20

SpeedFill will only fill into empty cells. Delete the contents of the cells you want to fill into (select the cells and push "Delete" or select "Clear" from the Edit Menu), then try again.


"What is the difference between mutually exclusive events A and B and independent events A and B?"

1998-01-17

If A and B are mutually exclusive, it means that if one happens the other can't. So they are very dependent on each other. In symbols, P(AB) = 0.

If A and B are independent, then whether or not A happens is completely unrelated to whether or not B happens, and vice versa. In symbols, P(A|B) = P(A). You should be able to show (using the definition of conditional probability) that if P(A|B) = P(A) then it follows that P(B|A) = P(B) and P(AB) = P(A)P(B).

Let H1 be the event of a head on the first toss of a coin, let T1 be the event of a tail on the first toss, let T2 be the event of a tail on the second toss, etc. Since H1 and T1 can't both happen on the same toss, they are mutually exclusive and so P(H1T1) = 0. If we can assume that the coin is tossed in such a way that the result on one toss can't affect the result on the next toss, then H1 and T2 are independent, so P(H1T2) = P(H1)P(T2).


"How do I get started on a complicated probability problem like 4.36-38 on page 99?"

1998-01-16

Begin by reading through the problem, identifying and assigning a symbol to each "event." Here are the events I found, in the order I found them.

H

Hypertension

M

Mortality

T

Treated adequately for Hypertension

K

Knows he/she has Hypertension

C

Complies with the Treatment

The next step is to go through the problem again and write down each given probability in terms of these events.

The statement "reduced their overall mortality by 20%" is, unfortunately, ambiguous. It could mean that

P(M|HTc) - P(M|HT) = 0.2

but this chapter empasizes Relative Risk and the correct interpretation turns out to be that

P(M|HT) / P(M|HTc) = 0.8.

The other probabilities are straightforward:

P(Kc|H) = 0.5, P(Tc|HK) = 0.5, P(Cc|HKT) = 0.5.

Now, express the probability you need to compute in terms of the events. In 4.36, for example, you have a Bin(10,p) distribution where p = P(CT|H). All that remains to do is play around with the conditional probabilities until you can express what you need to know in terms of what you already know.

For 4.38 you need to find P(M|H) using the probabilities above, then repeat the calculation taking P*(Kc|H) = 0.4, P*(Tc|HK) = 0.4 and P*(Cc|HKT) = 0.4, to get P*(M|H) and hence compute the relative risk P*(M|H) / P(M|H). I didn't want to come up with a whole new set of symbols, so I just used * to indicate probabilities computed under the new assumptions.

Some things to think about...

Of all the probabilities given in this problem, which ones depend on the nature of the disease and hence could not easily be changed? Which ones would change with a more effective medical treatment? Which ones could be changed by educating the physician? Which ones could be changed by educating the patient? If you were applying this problem in "real life," how would you estimate the probabilities?


"What's wrong with this formula?"

1998-01-14

The formula

IF ($C3>$A$2,0,$D2*(($A$2-$C3+1)/$C3)*$B$2/(1-$B$2))

was modelled after the one given in class last Wednesday and looks perfectly OK, but Excel gives an annoyingly unhelpful "Error in formula" message and won't let you do anything else until you fix it. The problem? You can't have a space between the IF and the (.


"What exercises from Chapters 1-5 are good for self-study?"

1998-01-13

All the exercises in these chapters are good. Do as many as you have time for. If you find the first ones easy, jump ahead to the more advanced ones. If you aren't sure of your answers, come and check them against the solutions manual in my office, or ask me or your TA to look them over with you.


"What should I be reading in the next week or so?"

1998-01-10

I have been asked for reading suggestions for the next week or two.

I'm assuming that everyone knows descriptive statistics (Chapt. 2) and I don't plan to go through that material in detail. You need to know it well enough to use it, though. Work through it and tell me if there is anything you want me to discuss in class.

You should also have seen most of Chapt 3 but I will be reviewing the material, approaching it a bit differently perhaps, and introducing some new terms (see e.g. Sect 3.7 Bayes' Rule and Screening Tests).

Right now, I am working through Discrete Distributions (all of Chapt 4 plus the Hypergeometric from p. 372). Much of this will be new to you.

I am also reviewing properties of the Normal Distribution which I think you have seen before (Chapt 5, Sect 5.1-5.5). Again, let me know if there is anything you want me to discuss in more detail.


Back to the Statistics 2MA3 Home Page