MAT 540 Statistical Concepts For Research

325
Chapter 11

REGRESSION ANALYSIS I
SIMPLE LINEAR REGRESSION

11.1 The points on the line 3 2 y x = + for 1 x = and 4 x = are (1,5) and (4,11)
respectively. The intercept is 3 and the slope is 2.

11.2 (a) If 41 x = , then 12(41) 75 417 y = = .
(b) We must determine the smallest x value such that 12 75 0 y x = > .
Solving this inequality yields 6.25 x > . So, we must sell at least 7
batteries in a month in order to make a profit.

11.3 (a) Predictor variable x = duration of training
Response variable y = performance in skilled job
(b) Predictor variable x = average number of cigarettes smoked daily
Response variable y = CO level in blood
(c) Predictor variable x = Humidity level in environment
Response variable y = Growth rate of fungus
(d) Predictor variable x = Expenditures in promoting product
Response variable y = Amount of product sales

326 CHAPTER 11. REGRESSION ANALYSIS I
11.4 The model is
0 1
Y x e = + + , where ( ) 0 E e = and ( ) sd e = , so
0 1
4, 3, = = and 5 = .

11.5 The model is
0 1
Y x e = + + , where ( ) 0 E e = and ( ) sd e = , so
0 1
6, 3, = =
and 3 = .

11.6
0 1
1 3 Y x e x e = + + = + + , where ( ) 0 E e = and ( ) sd e = .
(a) At 4 x = , ( ) 1 3(4) 13 E Y = + = and ( ) ( ) 2 sd Y sd e = = .
(b) At 2 x = , ( ) 1 3(2) 7 E Y = + = and ( ) ( ) 2 sd Y sd e = = .

11.7
0 1
2 3 Y x e x e = + + = + , where ( ) 0 E e = and ( ) sd e = .
(a) At 1 x = , ( ) 2 3(1) 1 E Y = = and ( ) ( ) 4 sd Y sd e = = .
(b) At 2 x = , ( ) 2 3(2) 4 E Y = = and ( ) ( ) 4 sd Y sd e = = .

11.8 The straight line for the means of the model 3 4 Y x e = + + is 3 4 y x = + . The
graph of the line is shown below.

11.9 The straight line for the means of the model 7 2 Y x e = + + is 7 2 y x = + . The
graph of the line is shown below.

327

11.10 (a) At 3 x = ,
0 1
( ) (3) 2 1(3) 5 E Y = + = = .
At 6 x = ,
0 1
( ) (6) 2 1(6) 8 E Y = + = = .
(b) No, only the mean is larger. By chance the error e at 2 x = , which has
standard deviation 3, could be quite negative and/or the error at 4 x = very
large.

11.11 (a) At 4 x = ,
0 1
( ) (4) 4 3(4) 16 E Y = + = + = .
At 5 x = ,
0 1
( ) (5) 4 3(5) 19 E Y = + = + = .
(b) No, only the mean is larger. By chance the error e at 5 x = , which has
standard deviation 4, could be quite negative and/or the error at 4 x = very
large.

11.12 (a) The scatter diagram is shown below.

0
2
4
6
8
10
12
0 1 2 3 4 5 6 7
x
y

(b) The computations needed to calculate , , , , and
xx xy yy
x y S S S are provided
in the following table:

x y x x

y y

( x x )( y y ) ( x x )
2
( y y )
2
2 1 -2 -4 8 4 16
5 6 1 1 1 1 1
6 10 2 5 10 4 25
3 3 -1 -2 2 1 4
4 5 0 0 0 0 0
Total
20 25 0 0 21 10 46

So, we have
2
20
5
25
5
2
4 ( ) 10
5 ( )( ) 21
( ) 46
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =

(c)
1 0 1
21

2.1, 5 2.1(4) 3.4
10
xy
xx
S
y x
S
= = = = = =
(d) The fitted line is 3.4 2.1 y x = + , which is graphed above in (a).

11.13 (a) The scatter diagram is shown below.

0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6
x
y

(b) The computations needed to calculate , , , , and
xx xy yy

x y x x

y y

( x x )( y y ) ( x x )
2
( y y )
2
1 8 2 4 -8 4 16
2 4 1 0 0 1 0
3 5 0 1 0 0 1
3 3 0 -1 0 0 1
4 3 1 -1 -1 1 1
5 1 2 -3 -6 4 9
Total
18 24 0 0 -15 10 28
So, we have
2
18
6
24
6
2
3 ( ) 10
4 ( )( ) 15
( ) 28
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =

(c)
1 0 1
15

1.5, 4 ( 1.5)(3) 8.5
10
xy
xx
S
y x
S
= = = = = =
(d) The fitted line is 8.5 1.5 y x = , which is graphed in part (a).
329

11.14 (a) The residuals and their sum are calculated in the following table:

x y 3.4 2.1 y x = + e y y =

( y y )
2
2 1 0.8 0.2 0.04
5 6 7.1 -1.1 1.21
6 10 9.2 0.8 0.64
3 3 2.9 0.1 0.01
4 5 5 0 0
Total
0 1.9

So, ( ) 0.0 y y =
.
(b) SSE = Sum of squares residuals = 1.9
2
2
(21)
SSE 46 1.9
10
xy
yy
xx
S
S
S
= = = (Check this using the calculations in
Exercise 11.12.)
(c)
2
SSE 1.9
0.6333
2 5 2
S
n
= = =

11.15 (a) The residuals and their sum are calculated in the following table:

x y 8.5 1.5 y x = e y y =

( y y )
2
1 8 7 1 1
2 4 5.5 -1.5 2.25
3 5 4 1 1
3 3 4 -1 1
4 3 2.5 0.5 0.25
5 1 1 0 0
Total
5.50

So, ( ) 0.0 y y =
.
2
2
( 15)
SSE 28 5.50
10
xy
yy
xx
S
S
S
= = = (Check this using the calculations in

Exercise 11.13.)
(c)
2
SSE 5.50
1.375
2 6 2
S
n
= = =

11.16 (a) The computations needed to calculate , , , , and
xx xy yy

x y x x

y y

( x x )( y y ) ( x x )
2
( y y )
2
0 7 2 2 -4 4 4
1 8 1 3 -3 1 9
2 5 0 0 0 0 0
3 4 1 -1 -1 1 1
4 1 2 -4 -8 4 16
Total
10 25 0 0 -16 10 30

So, we have
2
10
5
25
5
2
2 ( ) 10
5 ( )( ) 16
( ) 30
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =

(b)
1 0 1
16

1.6, 5 ( 1.6)(2) 8.2
10
xy
xx
S
y x
S
= = = = = =
(c) The fitted line is 8.2 1.6 y x = .

xx xy yy

x y x x

y y

( x x )( y y ) ( x x )
2
( y y )
2
1 4 3 2 6 9 4
2 3 2 3 6 4 9
4 6 0 0 0 0 0
6 8 2 2 4 4 4
7 9 3 3 9 9 9
Total
20 30 0 0 25 26 26

So, we have
2
20
5
30
5
2
4 ( ) 26
6 ( )( ) 25
( ) 26
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =

(b)
1 0 1
25

0.96, 6 (0.96)(4) 2.16
26
xy
xx
S
y x
S
= = = = = =
(c) The fitted line is 2 y x = .
331

11.18 (a)
1
0 1
6191.04
6.228,
994.038

89.20 (6.228)(38.046) 147.750
xy
xx
S
S
y x

= = =
= = =

The fitted line is 147.750 6.228 y x = + .
(b)
2
2
(6191.04)
SSE 76, 293.2 37, 734.34
994.038
xy
yy
xx
S
S
S
= = =
(c)
2
SSE 37, 734.34
2096.352
2 18
S
n
= = =

11.19 (a)
1
0 1
290.10
0.8091,
358.55

5.30 (0.8091)(3.15) 2.751
xy
xx
S
S
y x

= = =
= = =

(b)
2
2
(290.10)
SSE 618.2 383.482
358.55
xy
yy
xx
S
S
S
= = =
(c)
2
SSE 383.482
21.305
2 18
S
n
= = =

11.20 We first calculate the means and sums of squares and products:

889 520
7 7
7 7 7
2 2
1 1 1
127, 74.286
113, 237, 39, 328, 66, 392
x y
n n
i i i i
i i i
x y
x y x y
= = =

= = = = = =
= = =

Thus,
2
7 7
2 2
1 1
7 7 7
1 1 1
2
7 7
2 2
1 1
/ 7 113, 237 (889) / 7 334
/ 7 66, 392 (520)(889) / 7 352
/ 7 39, 328 (520) / 7 699.43
xx i i
i i
xy i i i i
i i i
yy i i
i i
S x x
S x y x y
S y y
= =
= = =
= =
| |
= = =
|
\
| || |
= = =
| |
\ \
| |
= = =
|
\

(a)
352
1 0 1 334
352

1.054, 74.286 ( )(127) 59.56
334
xy
xx
S
y x
S
= = = = = =
(b)
2
2
(352)
SSE 699.43 328.46
334
xy
yy
xx
S
S
S
= = =
(c)
2
SSE 328.46
65.69
2 5
S
n
= = =

11.21 We first calculate the means and sums of squares and products:
889 520
7 7
7 7 7
2 2
1 1 1
127, 74.286
113, 237, 39, 328, 66, 392
y x
n n
i i i i
i i i
y x
y x x y
= = =

= = = = = =
= = =

Thus,
2
7 7
2 2
1 1
7 7 7
1 1 1
2
7 7
2 2
1 1
/ 7 39, 328 (520) / 7 699.43
/ 7 66, 392 (520)(889) / 7 352
/ 7 113, 237 (889) / 7 334
xx i i
i i
xy i i i i
i i i
yy i i
i i
S x x
S x y x y
S y y
= =
= = =
= =
| |
= = =
|
\
| || |
= = =
| |
\ \
| |
= = =
|
\

(a)
1
352
0.5033,
699.43
xy
xx
S
S
= = =

0 1

127 (0.5033)(74.286) 89.61 y x = = =
(b)
2
2
(352)
SSE 334 156.85
699.43
xy
yy
xx
S
S
S
= = =
(c)
2
SSE 156.85
31.37
2 5
S
n
= = =

(d) No, the two should be inverses of each other (the graphs of which are
reflections over the y x = line). Hence, they will not have the same
equation unless they are both identical to y x = .

11.22 Since
1
xy
xx
S
S
= and
2
SSE
xy
yy
xx
S
S
S
= , we have
(a)
2
1
SSE
xy xy
yy yy xy yy xy
xx xx
S S
S S S S S
S S
= = =
(b)
2
1 1 1

SSE
xx
yy xy yy xy yy xx
xx
S
S S S S S S
S
= = =

11.23 At
0
0 1 1 1

, x y x y x x y

=
= + = + =
_

333

11.24 (a) At
0
0 1 1 1 1

, ( )
i i i i i
x y x y x x y x x

=
= + = + = +
_
.
(b) The residual at
i
x is
i i
y y or
1 1

( ) ( ) ( )
i i i i i
e y y x x y y x x = =
Summing the
i
e , we obtain
1
( ) ( ) 0 0
i i
y y x x
(
= +

because the
sum of deviations ( )
i
y y
and ( )
i
x x
are zero.
(c)
2
2 2 2 2
1 1 1

( ) ( ) ( ) 2 ( )( ) ( )
i i i i i i i
e y y x x y y x x y y x x
(
= = +

Summing, we obtain

2 2 2 2
1 1
2
1 1
2 2
2

SSE ( ) 2 ( )( ) ( )

2
2
i i i i i
yy xy xx
xy xy xy
yy xy xx yy
xx xx xx
e y y x x y y x x
S S S
S S S
S S S S
S S S

(
= = +

= +
= + =

xx xy yy

x y x x

y y

( x x )( y y ) ( x x )
2
( y y )
2
1 5 -2 -6 12 4 36
2 11 -1 0 0 1 0
3 9 0 -2 0 0 4
4 14 1 3 3 1 9
5 16 2 5 10 4 25
Total
15 55 0 0 25 10 74

So, we have
2
15
5
55
5
2
3 ( ) 10
11 ( )( ) 25
( ) 74
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =

1 0 1
25

2.5, 11 (2.5)(3) 3.5
10
xy
xx
S
y x
S
= = = = = =

2
2
(25)
SSE 74 11.5
10
xy
yy
xx
S
S
S
= = =

2
SSE 11.5
3.833
2 3
S
n
= = =

(b) We test the hypotheses:
0 1 1 1
: 0 versus : 0 H H =
Since H
1
is two sided and 0.05 = , the rejection region is
0.025
: 3.182 R T t > = (for d.f. = 3). The value of the observed t is
1
0 2.5
4.038
/ 3.833/ 10
xx
t
s S

= = = ,
which lies in R. Hence, H
0
is rejected at 0.05 = .
(c) The expected value is estimated by 3.5 2.5(3) 11 + = . Since
2 2
1 (3 ) 1 (3 3)
3.833 0.8756
5 10
xx
x
s
n S

+ = + =
and the upper 0.05 point of the t with d.f. = 3 is 2.353, the 90%
confidence interval for the expected y value is given by
11 2.353(0.8756) or (8.9397, 13.0603).

11.26 A 90% confidence interval for
0
(in Exercise 11.25) is given by
2 2
0
1 1 3
2.353 3.5 2.353 3.833 3.5 4.832

5 10
xx
x
s
n S
+ = + =
or (-1.332, 8.332).

11.27 Since
0.025
3.182 t = for d.f. = 3, a 95% confidence interval for
1
is given by
1
3.833
3.182 2.5 3.182 2.5 1.970

10
xx
s
S
= =
or (0.53, 4.47) .

xx xy yy
x y x x

y y

( x x )( y y ) ( x x )
2
( y y )
2
0 1.9 -2 -0.5 1.0 4 0.25
1 2.0 -1 -0.4 0.4 1 0.16
2 2.5 0 0.1 0 0 0.01
3 2.6 1 0.2 0.2 1 0.04
4 3.0 2 0.6 1.2 4 0.36
Total
10 12 0 0 2.8 10 0.82

So, we have

2
10
5
12
5
2
2 ( ) 10
2.4 ( )( ) 2.8
( ) 0.82
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =

335

1 0 1
2.8

0.28, 2.4 (0.28)(2) 1.84
10
xy
xx
S
y x
S
= = = = = =

2
2
2.8
SSE 0.82 0.036
10
xy
yy
xx
S
S
S
= = =

2
SSE 0.036
0.012
2 3
S
n
= = =

0 1 1 1
Since H
1
is two sided and 0.05 = , the rejection region is
0.025
1
1 0.28 1
20.785
/ 0.012 / 10
xx
t
s S

= = = ,
0
(c) The expected value is estimated by 1.84 0.28(3.5) 2.82 + = . Since
2 2
1 (3.5 ) 1 (3.5 2)
0.012 0.0714
5 10
xx
x
s
n S

+ = + = ,
the 95% confidence interval for the expected y value is given by
2.82 3.182(0.0714) or (2.593, 3.047).
(d) A 90% confidence interval for
0
is given by

2 2
0
1 1 2
2.353 1.84 2.353 0.012

5 10
xx
x
s
n S
+ = + or (1.64, 2.04 ).

11.29 (a) & (b) Using Minitab, we have:
320 310 300 290 280 270
320
310
300
290
280
270
x
y
Scatterplot of y vs x

Note the equation of the least squares regression line is y = 0.8694x +
41.58. This could have been computed by hand using the same
calculations used in exercises from the previous section.
(c) First, note that
0.025
2.571 t = for d.f. = 5. In order to determine the
confidence interval for
1
, we need the following computations:

x y x x

y y

( x x )( y y ) ( x x )
2
( y y )
2

283.5 288
12.2

10.6

129.32 148.84 112.36

290 291.2
5.7 7.4
42.18 32.49 54.76

270.5 276.2
25.2

22.4

564.48 635.04 501.76

300.8 307 5.1 8.4 42.84 26.01 70.56

310.2 311 14.5 12.4 179.80 210.25 153.76

294.6 299
1.1
0.4
0.44
1.21 0.16

320 318 24.3 19.4 471.40 590.49 376.36
Total 2069.6 2090.4 1429.6 1644.3 1269.7

So, we have
2
7
7
2
295.7 ( ) 1644.3
298.6 ( )( ) 1429.6
( ) 1269.7
x
xx
y
xy
yy
x S x x
y S x x y y
S y y
= = = =
= = = =
= =

Thus, we have

2
SSE 26.766
xy
yy
xx
S
S
S
= =
2
SSE 26.766
5.3532
2 5
S
n
= = =
.
Therefore,
2.314
0.0571
1644.3
xx
s
S
= = . Also,
1
0.8694
xy
xx
S
S
= = .
So, a 95% confidence interval for
1
is given by

1
2.571 0.8694 2.571(0.0571) 0.8694 0.1468

xx
s
S
= =
or (0.7226, 1.016) .

11.30 (a) For 290 x = , using the fitted line we see that
0.8694(290) 41.58 293.706 y + = .
So, the expected selling price is $293,706.
A 95% confidence interval (for 290 x
= ) of the expected response 293.71

is given by

2
1 (290 295.7)
293.71 2.571(2.314) 293.71 2.3991
7 1644.3
+ =
or (291.31, 296.11).

337

(b) A 95% prediction interval (for 290 x
= ) is given by
2
1 (290 295.7)
293.71 2.571(2.314) 1 293.71 6.415
7 1644.3
+ + =
or (287.30,300.13).

11.31 (a) The expected value of HDI corresponding to 22 x
= internet users per 100

is estimated as

0 1
0.4934 0.01745(22) 0.8773 x

+ = + =
Its estimated SE =
( )
2
22 9.953 1
0.0298
15 1173.46
s
+ =
Since
0.025
2.160 t = for d.f. = 13, the 95% confidence interval is
0.8773 2.160(0.0298) 0.8773 0.0644 =
or (0.8129, 0.9417).

The width of the confidence interval in Example 9 is 0.103, whereas the
interval just computed has width 0.1288.

(b) Now, we compute the 95% confidence interval for a single country
corresponding to 22 x
= . As in (a),

0 1
0.8773 x

+ = , but the estimated
SE is now
( )
2
22 9.953 1
1 1.091
15 1173.46
+ + =
Since
0.025
0.8773 2.160(1.091) 0.8773 2.357 =
or (-1.480, 3.234).

(c) No, it cannot establish causality.

11.32 (a) The line from Example 9 with the variables reversed was
0.493 0.174 y x = + . To obtain the line with the predictor variable reversed,
solve this equation for x to obtain 5.747 2.833 x y = .

(b) Test the hypotheses:
0 1 1 1
Using
0.05
2
0.05, 2.160 t = = with d.f. = 13, we use the two-sided rejection
region : 2.160 R T . Now, note that
2
2
20.471
1173.46 170.248
0.41772
170.248
3.619
2 13
xy
xx
yy
S
SSE S
S
SSE
s
n
= = =
= = =

The test statistic is
5.747 5.747
1.026
3.619
0.41772
yy
T
s
S
= = =
Since this value does not lie in R, we do not reject H
0
at this level.

(c) The expected value of internet usage corresponding to 0.650 y =
is estimated as
5.747(0.650) 2.833 0.903 x = =

Its estimated SE =
( )
2
0.650 0.667 1
3.619 0.9393
15 0.41772
+ =
Since
0.025
0.903 2.160(0.9393) 0.903 2.029 =
or (-1.126, 2.932).

(d) Now, we compute the 95% prediction interval for a single country
with HDI 0.650. As in (a),
0.903 x = , but the estimated SE is now

( )
2
0.650 0.667 1
1 3.7389
15 0.41772
s
+ + =
Since
0.025
2.160 t = for d.f. = 13, the 95% prediction interval is
0.903 2.160(3.7389) 0.903 8.07602 =
or (-7.173, 8.97902).

11.33 (a) The model is
0 1
Y x e = + + and the fit suggested by the data is
994 0.10373 y x = + with
( ) 299.4 sd e = = .
Note that the r
2
is only 0.302. This means that only 30.2% of the
variability in the data is explained by the model (refer to Exercise 11.44).

(b) The t-ratio on the computer output is the t-statistic for testing that the
coefficient is zero. Since the t-ratio for the x term is 3.48 with p-value
0.002, we reject H
0
:
0
0 = at 0.05 = .

11.34 (a) At 5000 x = , the predicted mean response is given by
994 0.10373(5000) 1512.7 y = + =
.

(b) Since
0.05
1.701 t = for d.f. = 28, a 90% confidence interval for the mean
response at 5000 x = is given by

2 2
0
1 1 (5000 8354)
1.701 1512.7 1.701 89, 636

30 97, 599, 296
xx
x
s
n S

+ = +
or (1316.4, 1709.0).

339

11.35 (a) The model is
0 1
Y x e = + + and the fit suggested by the data is
0.3381 0.83099 y x = + with
( ) 0.1208 sd e = = .
(b) Since the t-ratio for the x term is 9.55 with p-value less than 0.0001, we
reject H
0
:
1
0 = at 0.05 = . As such, the x term is needed in the model.

11.36 (a) At 3 x = , the predicted mean response is
0.3381 0.83099(3) 2.8311 y = + = .
(b) Since
0.05
1.714 t = for d.f. = 23, a 90% confidence interval for the mean
response at 3 x = is given by

2 2
0
1 1 (3 1.793)
1.714 0.3381 1.714 0.0146

25 1.848
xx
x
s
n S

+ = +
or (0.14961, 0.52659).
(c) This 90% confidence interval is given by
[ ]
2
1 (2 1.793)
0.338 0.831(2) (1.714) 0.0146
25 1.848
+ + or (1.95, 2.05)

11.37 (a) Using Minitab, we find that the fitted line plot is as follows:
0 1 2 3 4 5 6 7 8
24
25
26
27
28
29
30
31
32
33
age
l
e
n
g
t
h
Y= 26.3101 + 0.537657X
R-Sq = 27.7 %
Regression Plot

(b) Enter the data into a Minitab worksheet. The output is as follows:

Regression Analysis

The regression equation is
length = 26.3 + 0.538 age

Predictor Coef StDev T P
Constant 26.3101 0.7356 35.77 0.000
age 0.5377 0.2105 2.55 0.021

S = 1.722 R-Sq = 27.7% R-Sq(adj) = 23.5%

Analysis of Variance

Source DF SS MS F P
Regression 1 19.353 19.353 6.52 0.021
Residual Error 17 50.437 2.967
Total 18 69.789

Look in the age row the p-value of 0.021 is the result of the hypothesis
test of the slope at the 95% level. Here, we reject H
0
in favor of claiming
there is linear relationship between the two variables.

(c) & (d) One can proceed by hand as in other exercises/examples. We, however
choose to use Minitab to obtain the following two 90% confidence
intervals corresponding to age x = 4:

Predicted Values
Fit StDev Fit 90.0% CI 90.0% PI
28.461 0.453 ( 27.673, 29.249) ( 25.362, 31.559)

11.38
2
2
2
(6191.04)
0.505
(994.038)(76293.2)
xy
xx yy
S
r
S S
= = =

11.39
2
2
2
(290.10)
0.380
(358.55)(618.2)
xy
xx yy
S
r
S S
= = =

11.40
2
2
2
(20.471)
0.855
(1173.46)(0.41772)
xy
xx yy
S
r
S S
= = =

11.41 (a) and (b): The
2
r value is the same as in Exercise 11.40.

11.42 By Exercise 11.25, we have 10, 74, 25
xx yy xy
S S S = = = . So,
(a)
2
2
2
(25)
0.845
(10)(74)
xy
xx yy
S
r
S S
= = =
(b)
25
0.919
(10)(74)
r = =
(c)
2
(25)
SSE 74 11.5
10
= =
(d)
2
SSE 11.5
3.833
2 3
s
n
= = =

341

11.43 By Exercise 11.28, we have 10, 0.82, 2.8
xx yy xy
S S S = = = . So,
(a) Proportion of variance explained
2
2
0.956
xy
xx yy
S
r
S S
= =
(b) 0.978
xy
xx yy
S
r
S S
= =
11.44 Proportion explained =
2
0.302 r =

11.45 Proportion explained =
2
0.799 r =

11.46 (a)
2
0.649 and 0.421 r r = =
(b)
2
0.279 and 0.078 r r = =
(c)
2
0.733 and 0.537 r r = =
(d) The pattern is quite different for male and female wolves. The scatter
diagram is as follows:

11.47 (a) Recall
1
xy
xx
S
S
= , so multiplying r by
xx
xx
S
S

1 1
1 1

xy xy xx xx xx
xx xx yy xx yy xx yy yy
S S S S S
r
S S S S S S S S
= = = =
(b)
2 2 2
2
SSE 1 (1 )
xy xy yy xy
yy yy yy yy
xx xx yy xx yy
S S S S
S S S S r
S S S S S
| |
= = = =
|
|
\

11.48 SS due to regression =
2 2
2
1
xy xy
xx
xx
xx xx xx
S S
S
S
S S S
= = .

11.49 The product x = (leaf length) (leaf width) is the area of a rectangle that contains
the leaf. It should be larger than the leaf, so the slope should be less than one.

11.50 (a) & (c) The fitted line is given by 3.966 3.144 y x = + . The scatter diagram and
fitted line are illustrated below:
0
5
10
15
20
25
0 1 2 3 4 5 6
x
y

(b) Observe that
2
23
9
108
9
2
2.556 ( ) 16.2018
12 ( )( ) 50.952
( ) 176
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =

(d) The predicted value is 3.966 3.144(3) 13.398 + = .

11.51 (a) Note that
1
0 1
3.144
12 3.144(2.556) 3.964
xy
xx
S
S
y x

= =
= = =

So,
0 1

y x = + .
The residuals and their sum are calculated in the following table:

x y y e y y = ( y y )
2
1 8 8.802 -0.802 0.643
1 6 8.802 -2.802 7.851
1 7 8.802 -1.802 3.247
2 10 10.857 -0.857 0.734
3 15 12.912 2.088 4.360
3 12 12.912 -0.912 0.832
3 13 12.912 0.088 0.008
343

4 19 14.967 4.033 16.265
5 18 17.022 0.978 0.956
Total
34.896
So, SSE 34.896 = .

2
2
(69.904)
SSE 176 32.337
34.014
xy
yy
xx
S
S
S
= = =
(c)
2
SSE 34.896
4.985
2 9 2
S
n
= = =

11.52 (a) A 95% confidence interval for
1
is given by

1
2.233
2.571 3.144 2.365 3.144 1.312

16.2018
xx
s
S
| |
= =
|
\

or (1.832, 4.456) .
(b) This 95% confidence interval is given by
[ ]
2
1 (4 2.556)
3.964 3.144(4) 1.895(2.233) 16.54 2.072
9 16.2018
+ + =
or (14.468, 18.612).

11.53 (a)
1 0 1
12.4

2.214, 54.8 (2.214)(8.3) 73.18
5.6
xy
xx
S
y x
S
= = = = = + =
The fitted line is then given by 73.18 2.214 y x = .
Also,
2
2
(12.4)
SSE 38.7 11.24
5.6
xy
yy
xx
S
S
S
= = =

2
SSE 11.24
0.8648
2 13
S
n
= = =

0 1 1 1
: 2 versus : 2 H H = <
Since H
1
is left-sided and 0.05 = , the rejection region is
0.05
: 1.771 R T t < = (for d.f. = 13). The value of the observed t is
1
( 2) 2.214 2
0.5446
/ 0.8648 / 5.6
xx
t
s S
+
= = = ,
which does not lie in R. Hence, H
0
is not rejected at 0.05 = .
(c) A 95% confidence interval (for 10 x
= ) of the expected response 51.04 is

given by

2
1 (10 8.3)
51.04 2.160(0.93)
15 5.6
+ or (49.51, 52.57).

11.54 (a) The decomposition of the total y-variability is given by

2
SSE
Total Explained by linear relation Error
38.7 27.46 11.24
yy
xy
xx
S
S
S
= +
= +
_

(b) Proportion explained =
2
27.46
0.71
38.7
r = =
(c)
12.4
0.84
(5.6)(38.7)
xy
xx yy
S
r
S S
= = =

11.55 (a) The scatter diagram is given by

0
200
400
600
800
1000
1200
850 900 950 1000 1050 1100 1150
x
y

We first calculate the means and sums of squares and products:

7880 6845
8 8 8 8
8 8 8
2 2
1 1 1
985, 855.625
7, 797, 438, 5, 914,875, 6, 785, 540
x y
i i i i
i i i
x y
x y x y
= = =

= = = = = =
= = =

Thus,

2
8 8
2
1 1
8 8 8
1 1 1
2
8 8
2
1 1
/ 8 35, 638
/ 8 43, 215
/ 8 58121.88
xx i i
i i
xy i i i i
i i i
yy i i
i i
S x x
S x y x y
S y y
= =
= = =
= =
| |
= =
|
\
| | | |
= =
| |
\ \
| |
= =
|
\

Consequently, we have
345

1 0 1
73, 215

2.054, 1167.57
35, 638
xy
xx
S
y x
S
= = = = =
Furthermore,

2
SSE 5718.93
xy
yy
xx
S
S
S
= =

2
SSE 5718.93
953.16
2 6
S
n
= = =
, so that 30.87 S = .

0 1 1 1
: 0 versus : 0 H H = >
Since H
1
is right-sided and 0.05 = , the rejection region is
0.05
1
0 2.054
5.406
/ 71.73/ 35, 638
xx
t
s S

= = = ,
0
is rejected at 0.05 = . This implies that the
mean rent y increases with size x.

(c) The expected increase is
1
. A 95% confidence interval is calculated as

30.87
2.054 2.447 2.054 0.0021
35, 638
| |
=
|
\
or (2.0519,2.0561).
(d) For a specific apartment of size 1025 x = , a 95% prediction interval is
given by

[ ]
2
1 (1025 985)
1167.57 2.054(1025) 2.447(30.87) 1
8 35, 638
937.78 81.70
+ + +
=

or (856.08, 1019.48).

11.56 (a)
43, 215
0.9495
(35, 638)(58,121.88)
xy
xx yy
S
r
S S
= = =
(b) Proportion of variability explained is
2
0.9016 r = .

11.57 (a) & (b) The scatter diagram and fitted line are shown below:

0
2
4
6
8
10
12
14
16
18
20
0 1 2 3 4 5 6 7 8 9
x
y

2
40
9
100.7
9
2
4.4 ( ) 50.58
11.19 ( )( ) 77.85
( ) 136.41
xx
xy
yy
x S x x
y S x x y y
S y y
= = = =
= = = =
= =

As such, we have
1 0 1

1.558, 18.114
xy
xx
S
y x
S
= = = =
So, the fitted line is 18.114 1.558 y x = .
(c) Note that
2
SSE 16.59,
xy
yy
xx
S
S
S
= = so that 1.539
7
SSE
s = = .
Since
0.025
2.365 t = for d.f. = 7, a 95% confidence interval for
1
is

1
1.539
2.306 1.558 2.365 1.558 0.512

50.58
xx
s
S
| |
= =
|
\

or (-2.07, -1.046).

11.58 (a) The predicted value is 18.114 1.558(5) = 10.324
A 95% prediction interval for this value is then given by

2
1 (5 4.4)
10.324 2.365(1.539) 10.324 1.252
9 50.58
+ =
or (9.072, 11.576).
(b) The predicted value for a single car is 16.556.
A 90% prediction interval for this value is then given by

2
1 (5 4.4)
16.556 1.895(1.539) 1 16.556 3.084
9 50.58
+ + =
or (13.472, 19.64).
(c) No. There is no data over that region. The linear relation cannot hold for
cars that old because we know the selling price never goes below zero.
347

11.59
77.85
0.9455
(50.58)(136.41)
xy
xx yy
S
r
S S
= = = . Since
2
0.8939 r = is the
proportion of variance explained by the linear regression of y on x, the fit
appears to be inadequate.

11.60 (a) Observe that
2
2
(106)(63) (106)
20 20
(63)
20
624 290.1, 1180 618.20
557 358.55
xy xx
yy
S S
S
= = = =
= =

So,
1 0 1
290.1

0.469, 3.15 (0.469)(5.3) 0.6643
618.20
xy
xx
S
y x
S
= = = = = =
(b)
290.1
0.616
(618.20)(358.55)
xy
xx yy
S
r
S S
= = =
(c)
2
0.380 r = , so the proportion of y-variability explained by the straight line
fit is nearly 38%.

11.61 (a)
0 1

1.071, 2.7408 = =
(b) SSE = 63.65
(c) Estimated S.E. of
0
is 2.751,
Estimated S.E. of
1
is 0.4411.
(d) For testing
0 0
: 0, 0.39 H t = = .
For testing
0 1
: 0, 6.21 H t = = .
(e)
2
0.828 r =
(f) The decomposition of the total sum of squares is

2
SSE
Total Explained by linear relation Error
307.90 307.25 63.65
yy
xy
xx
S
S
S
= +
= +
_

11.62 The Minitab output is as follows:

(a) The scatter diagram is illustrated below:

(b) From the output, the least squares regression line is 53.17 1.0349 y x = + .
According to the large p-value for the test of zero intercept, the model
349

could be re-fit without an intercept term. However, the unusual
observation noted in the output could make the current analysis misleading
(see Exercise 11.63). We proceed as if the intercept term is needed.

(c) For d.f. = 17,
0.05
1.740 t = . Therefore, the null hypothesis
0 1
: 0 H = will
be rejected at 0.05 = if the observed t value is in the rejection region
: 1.740 R T . According to the output, the observed t is given by
1.0349
0.2945
3.51 t = = , which lies in R. Hence, H
0
Furthermore, since the p-value of 0.003 is very small, the data strongly
support that
1
0 > which, in turn, indicates that the expected value of
weight increases with body length.

11.63 The Minitab output is on the next page.
(a) From the output, the fitted line is 87.17 1.2765 y x = + .
(b) For d.f. = 16,
0.05
1.746 t = . Therefore, the null hypothesis
0 1
: 0 H = will
be rejected at 0.05 = if the observed t value is in the rejection region
: 1.746 R T . According to the output, the observed t is given by
1.2765
0.2156
5.92 t = = , which lies in R. Hence, H
0
Furthermore, since the p-value is very small, the data strongly support that
1
0 > which, in turn, indicates that the expected value of weight increases
with body length.
(c) The intercept term is now needed since the p-value is 0.003. The
estimated slope has increased.

11.64 (a) From the data, we calculate:
2
2
154.7 ( ) 11, 262.2
616.1 ( )( ) 41, 914.6
( ) 225, 897.8
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = =
= = = =
= =

So, the fitted line is 40.35 3.722 y x = + .
(b) The residual sum of squares is SSE = 69,904, and the estimate of is

SSE
62.3
18
s = = . Since
0.025
2.101 t = for d.f. = 18, the 95% confidence
interval is given by

1
62.3
2.101 3.722 2.101

11, 262.2
xx
s
S
| |
=
|
\
or (2.489, 4.955).
(c) The predicted value of y for 150 x
= is 40.35 3.722(150) 599 + .

The corresponding 95% confidence interval for a single CLEP score at
CQT = 150 is given by

2
1 (150 154.7)
599 2.101(62.3) 1 599 134
20 11, 262.2
+ + =
or (465, 733).
(d) The calculations are similar to those in part (c), so we present only the
final results:
At 175 x = , the confidence interval is 692 136 or (556, 828).
At 195 x = , the confidence interval is 766 143 or (623, 909).

11.65 (a) From the data, we calculate:
2
15
15
2
16.66 ( ) 40.476
80.04 ( )( ) 133.804
( ) 629.836
x
xx
y
xy
yy
x S x x
y S x x y y
S y y
= = = =
= = = =
= =

So,
1
0 1
133.804
3.306,
40.476

80.04 (3.306)(16.66) 24.96
xy
xx
S
S
y x

= = =
= = =

So, the fitted line is 24.96 3.306 y x = + .
(b) The residual sum of squares is SSE = 187.5, and the estimate of is

SSE
3.798
13
s = = . Since
0.025
2.160 t = for d.f. = 13, the 95% confidence
interval is given by

1
3.798
2.160 3.306 2.160

40.476
xx
s
S
| |
=
|
\
or (2.02, 4.60).
351

(c) The predicted temperature ( ) F
for 15 x
= is 24.96 3.306(15) 74.55 + .

11.66 The scatter diagram and fitted line are shown below.

Enter the data into columns C4 and C5 of a Minitab worksheet. The output is as
follows:

11.67 (a) The data for salmon growth are entered into columns C1 and C4 of a
Minitab worksheet. And, the data for all salmon growth are stacked into
C5 and C6. From the output (seen below), the freshwater growth of
salmon is not an effective predictor for its marine growth.

(b) The data for male salmon growth are entered into columns C1 and C2.
From the output (seen below), the freshwater growth of a male salmon is
not an effective predictor of its marine growth.
353

(c) The data for female salmon growth are entered into columns C3 and C4.
From the output (seen on the next page), the freshwater growth of a female
salmon is not an effective predictor for its marine growth.

11.68 (a) Use Minitab to compute the proportion of variation in speed due to
regression is R-Sq = 88.6%

(b) The Minitab output is as follows:

Regression Analysis: speed versus height

The regression equation is speed = 33.5 + 0.193 height

Predictor Coef SE Coef T P
Constant 33.453 7.095 4.71 0.001
height 0.19291 0.02184 8.83 0.000

S = 5.85513 R-Sq = 88.6% R-Sq(adj) = 87.5%

Analysis of Variance

Source DF SS MS F P
Regression 1 2675.8 2675.8 78.05 0.000
Residual Error 10 342.8 34.3
Total 11 3018.7

Unusual Observations

Obs height speed Fit SE Fit Residual St Resid
3 415 100.00 113.51 2.75 -13.51 -2.61R
R denotes an observation with a large standardized residual.

Since the regression equation is y = 33.5 + 0.193x, if x = 325, then the top
predicted speed is 33.5 0.193(325) 96.225 y = + = .

(c) Using the same line, if x = 480, then the top predicted speed is 126.14.
Since the value of x is getting quite far away from the x-values of known
data points, the predictive power of the regression line is severely
diminished.

MAT 540 Statistical Concepts For Research

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MAT 540 Statistical Concepts For Research

Uploaded by

Copyright:

Available Formats

325

= = = (Check this using the calculations in

2.353 3.5 2.353 3.833 3.5 4.832

3.182 2.5 3.182 2.5 1.970

2.353 1.84 2.353 0.012

, we need the following computations:

2.571 0.8694 2.571(0.0571) 0.8694 0.1468

= ) of the expected response 293.71

= internet users per 100

5.747(0.650) 2.833 0.903 x = =

0.903 x = , but the estimated SE is now

1.701 1512.7 1.701 89, 636

1.714 0.3381 1.714 0.0146

2.571 3.144 2.365 3.144 1.312

= ) of the expected response 51.04 is

2.306 1.558 2.365 1.558 0.512

2.101 3.722 2.101

= is 40.35 3.722(150) 599 + .

2.160 3.306 2.160

= is 24.96 3.306(15) 74.55 + .

You might also like