The ratio of upper limit to lower limit is just the chi-square 0.975 quantile divided by the 0.025 quantile, it does not depend on the sample variance; checking this ratio for several degrees of freedom gives 27 degrees of freedom as the answer, which means that 28 observations are needed.
> data.frame(df=20:30,ratio=qchisq(.975,20:30)/qchisq(.025,20:30)) df ratio 1 20 3.562757 2 21 3.450280 3 22 3.349084 4 23 3.257514 5 24 3.174228 6 25 3.098120 7 26 3.028276 8 27 2.963932 9 28 2.904442 10 29 2.849260 11 30 2.797920
(a) Note that I gave up trying to match the tic-marks exactly.
> xgr <- seq(0,10,len=100) > plot(xgr,dchisq(xgr,1),type="l",xlim=c(0,10),ylim=c(0,1.2), xlab="Value",ylab="Frequency",lwd=2,bty="l") Warning message: NaNs produced in: dchisq(x, df, log) > lines(xgr,dchisq(xgr,2),lwd=2,lty=2) > lines(xgr,dchisq(xgr,5),lwd=2,lty=3) > legend(6,1.1,c("= 1 df","= 2 df","= 5 df"),lwd=2,lty=1:3) > title("General shape of various chi-sq distributions with n df")
(b) Note that u = 0.9896377 from the pchisq(15,5) calculation below.
> xgr <- seq(0,20,len=100) > xgr15 <- seq(0,15,len=200) > plot(xgr,dchisq(xgr,5),xlab="Value",ylab="Frequency",type="l",bty="l") > abline(h=0) > lines(xgr15,dchisq(xgr15,5),type="h",col="gray") > lines(15,dchisq(15,5),type="h") > lines(xgr,dchisq(xgr,5)) > text(10,.1,"Area = u") > lines(c(6,8.5),c(.06,.095)) > text(15,0.02,"chi-sq(5,u)") > title("Graphic display of the percentiles of a chi-sq(5) distribution") > pchisq(15,5) [1] 0.9896377
(c)
> xgr <- seq(0,2,len=100) > plot(xgr[-1],df(xgr[-1],1,6),type="l",lwd=2,ylim=c(0,1.5),xlab="x",ylab="f(x)",bty="l") > abline(h=0) > lines(xgr,df(xgr,4,6),lwd=2) > text(0.3,1,"F(1,6)") > text(1,0.6,"F(4,6)") > title("Probability density for the F distribution")
> 1975*310/(1627*255) [1] 1.47571
The odds for a boy are 47.6% higher for nonsmokers, compared to smokers.
> (1975/(1975+1627))/(255/(255+310)) [1] 1.214875
The probability of having a boy is 21.5% higher for nonsmokers, compared to smokers.
> (310/(255+310))/(1627/(1975+1627)) [1] 1.214701
The probability that both parents smoke is 21.5% higher for parents of a girl, compared to parents of a boy.
> sexrat <- chisq.test(matrix(c(1975, 1627, 255, 310), ncol = 2),correct=F) > sexrat Pearson's Chi-squared test data: matrix(c(1975, 1627, 255, 310), ncol = 2) X-squared = 18.4645, df = 1, p-value = 1.731e-05 > sexrat$obs [,1] [,2] [1,] 1975 255 [2,] 1627 310 > sexrat$exp [,1] [,2] [1,] 1927.636 302.3638 [2,] 1674.364 262.6362
> bmi1 base twoyr 1 26.5 26.1 2 26.1 26.4 3 25.4 25.5 4 27.4 26.3 5 25.4 NA 6 25.4 25.7 7 25.8 26.1 8 26.3 26.2 9 26.5 26.5 10 26.1 26.4 11 26.4 26.2 12 25.9 26.4 13 25.5 25.6 14 25.7 24.9 15 25.4 25.9 16 27.0 26.5
> t.test(bmi1$base,bmi1$twoyr,pair=T) Paired t-test data: bmi1$base and bmi1$twoyr t = 0.3806, df = 14, p-value = 0.7092 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2163206 0.3096539 sample estimates: mean of the differences 0.04666667
> bmi1$base-bmi1$twoyr [1] 0.4 -0.3 -0.1 1.1 NA -0.3 -0.3 0.1 0.0 -0.3 0.2 -0.5 -0.1 0.8 [15] -0.5 0.5 > 2*pbinom(6,14,.5) [1] 0.7905273
> anova(lm(twoyr~base,data=bmi1)) Analysis of Variance Table Response: twoyr Df Sum Sq Mean Sq F value Pr(>F) base 1 1.1218 1.1218 8.4025 0.01244 * Residuals 13 1.7356 0.1335 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > summary(lm(twoyr~base,data=bmi1)) Call: lm(formula = twoyr ~ base, data = bmi1) Residuals: Min 1Q Median 3Q Max -0.96164 -0.15276 0.02683 0.22668 0.44428 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.7721 4.2356 3.252 0.00631 ** base 0.4704 0.1623 2.899 0.01244 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.3654 on 13 degrees of freedom Multiple R-Squared: 0.3926, Adjusted R-squared: 0.3459 F-statistic: 8.402 on 1 and 13 DF, p-value: 0.01244 > plot(rep(0,16),bmi1$base-bmi1$twoyr) > boxplot(bmi1$base-bmi1$twoyr) > title(ylab="base - twoyr") > hist(bmi1$base-bmi1$twoyr) > plot(bmi1$base,bmi1$twoyr) > abline(0,1) > abline(lm(twoyr~base,data=bmi1), lty=2)
|
|
|
|
> t.test(creek$pH[creek$loc=="Up"],creek$pH[creek$loc=="Down"],var=T)
Two Sample t-test
data: creek$pH[creek$loc == "Up"] and creek$pH[creek$loc == "Down"]
t = -1.133, df = 18, p-value = 0.2721
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.7325194 0.5185194
sample estimates:
mean of x mean of y
6.461 7.068
> anova(lm(pH~location,data=creek))
Analysis of Variance Table
Response: pH
Df Sum Sq Mean Sq F value Pr(>F)
location 1 1.8422 1.8422 1.2838 0.2721
Residuals 18 25.8303 1.4350
> F0 <- var(creek$pH[creek$loc=="Up"])/var(creek$pH[creek$loc=="Down"])
> F0
[1] 8.43619
> 2*(1-pf(F0,9,9))
[1] 0.003964524
> var(creek$pH[creek$loc=="Up"])
[1] 2.565877
> var(creek$pH[creek$loc=="Down"])
[1] 0.3041511
> boxplot(split(creek$pH,creek$location))
> var(creek$pH[creek$loc=="Up"][-3])
[1] 0.5049611
> F0a <- var(creek$pH[creek$loc=="Up"][-3])/var(creek$pH[creek$loc=="Down"])
> F0a
[1] 1.660231
> 2*(1-pf(F0a,8,9))
[1] 0.4654831
> anova(lm(pH~location,data=creek[-3,]))
Analysis of Variance Table
Response: pH
Df Sum Sq Mean Sq F value Pr(>F)
location 1 0.1022 0.1022 0.2564 0.6191
Residuals 17 6.7770 0.3986
All of the required answers are found in the output below. There is, of course, no evidence (P = 0.56 by the F- test in the ANOVA or by the t-test on the slope) of a relationship between % reticulytes and lymphocytes per mm2. This is confirmed by the R2 which is near 0, which means that linear prediction is not possible.
The mean square residual gives s2y.x = 490818.
> anemia retic lymph 1 3.6 1700 2 2.0 3078 3 0.3 1820 4 0.3 2706 5 0.2 2086 6 3.0 2299 7 0.0 676 8 1.0 2088 9 2.2 2013 > fitanemia<-lm(lymph~retic,anemia) > plot(lymph~retic,anemia) > abline(fitanemia) > anova(fitanemia) Analysis of Variance Table Response: lymph Df Sum Sq Mean Sq F value Pr(>F) retic 1 180750 180750 0.3683 0.5631 Residuals 7 3435727 490818 > summary(fitanemia) Call: lm(formula = lymph ~ retic, data = anemia) Residuals: Min 1Q Median 3Q Max -1218.82 -128.47 67.84 168.76 958.95 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1894.8 348.5 5.437 0.000969 *** retic 112.1 184.7 0.607 0.563109 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 700.6 on 7 degrees of freedom Multiple R-Squared: 0.04998, Adjusted R-squared: -0.08574 F-statistic: 0.3683 on 1 and 7 DF, p-value: 0.5631
There is some evidence (P = 0.018) that the mean bronchial reactivity varies between the three groups based on the ratio FEV1/FVC. Note that the graph suggests that Group A has greater variance than B or C but the sample sizes are too small to be sure.
> lung react group 1 20.8 A 2 4.1 A 3 30.0 A 4 24.7 A 5 13.8 A 6 7.5 B 7 7.5 B 8 11.9 B 9 4.5 B 10 3.1 B 11 8.0 B 12 4.7 B 13 28.1 B 14 10.3 B 15 10.0 B 16 5.1 B 17 2.2 B 18 9.2 C 19 2.0 C 20 2.5 C 21 6.1 C 22 7.5 C > boxplot(react~group,lung,xlab="Group",ylab="Bronchial Reactivity") > anova(lm(react~group,lung)) Analysis of Variance Table Response: react Df Sum Sq Mean Sq F value Pr(>F) group 2 503.55 251.77 4.9893 0.01813 * Residuals 19 958.80 50.46 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> toxic1 life poison antidote 1 0.31 I A 2 0.46 I A 3 0.36 II A 4 0.40 II A 5 0.22 III A 6 0.18 III A 7 0.82 I B 8 0.88 I B 9 0.92 II B 10 0.49 II B 11 0.30 III B 12 0.38 III B 13 0.43 I C 14 0.63 I C 15 0.44 II C 16 0.31 II C 17 0.23 III C 18 0.24 III C 19 0.45 I D 20 0.66 I D 21 0.56 II D 22 0.71 II D 23 0.30 III D 24 0.31 III D > anova(lm(life~poison*antidote, data=toxic1)) Analysis of Variance Table Response: life Df Sum Sq Mean Sq F value Pr(>F) poison 2 0.43641 0.21820 15.2103 0.0005124 *** antidote 3 0.33875 0.11292 7.8709 0.0036184 ** poison:antidote 6 0.08989 0.01498 1.0443 0.4445361 Residuals 12 0.17215 0.01435 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > mse <- anova(lm(life~poison*antidote, data=toxic1))$"Mean Sq"[4] > mse 0.01434583 > mse/c(qchisq(.995,12)/12, qchisq(.005,12)/12) [1] 0.006083142 0.056005165
|
> plot(piat$PIAT,piat$WRAT) > abline(lm(WRAT~PIAT,data=piat)) > summary(lm(WRAT~PIAT,data=piat)) Call: lm(formula = WRAT ~ PIAT, data = piat) Residuals: Min 1Q Median 3Q Max -4.4511 -0.9485 -0.4175 1.3157 4.3531 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.9099 3.3642 1.459 0.183 PIAT 1.1495 0.1131 10.161 7.53e-06 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 2.785 on 8 degrees of freedom Multiple R-Squared: 0.9281, Adjusted R-squared: 0.9191 F-statistic: 103.2 on 1 and 8 DF, p-value: 7.535e-06 > anova(lm(WRAT~PIAT,data=piat)) Analysis of Variance Table Response: WRAT Df Sum Sq Mean Sq F value Pr(>F) PIAT 1 800.84 800.84 103.24 7.535e-06 *** Residuals 8 62.06 7.76 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > anova(lm(WRAT~PIAT+as.factor(PIAT),data=piat)) Analysis of Variance Table Response: WRAT Df Sum Sq Mean Sq F value Pr(>F) PIAT 1 800.84 800.84 78.1310 0.01256 * as.factor(PIAT) 6 41.56 6.93 0.6757 0.69970 Residuals 2 20.50 10.25 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > mse1 <- anova(lm(WRAT~PIAT,data=piat))$"Mean Sq"[2] > mse2 <- anova(lm(WRAT~PIAT+as.factor(PIAT),data=piat))$"Mean Sq"[3] > mse1 7.757136 > mse1/c(qchisq(.995,8)/8,qchisq(.005,8)/8) [1] 2.826564 46.159240 > mse2 10.25 > mse2/c(qchisq(.995,2)/2,qchisq(.005,2)/2) [1] 1.934576 2044.870718 > predict(lm(WRAT~PIAT,data=piat),data.frame(PIAT=30)) [1] 39.39432