You are on page 1of 8

STA200B Tutorial No.

4: Multiple Linear Regression Analysis


Submission Instructions:
- SAS Output: 8 marks (1b, 1d, 1e, 1f)
- Handwritten part: 59 marks
- Total: 67 marks

1. Open the data file WHEAT4.xlsx. The Shipment column represents the total U.S. wheat exports
per month in thousands of bushels, the Exchrate column represents a weighted average of US
exchange rates, and the Price column represents the global price per bushel of red winter
wheat.
a) Construct a multiple linear regression model using Shipment as dependent variable
and the other two as independent variables. State the population model equation.
i i i i
x x Y c | | | + + + =
2 2 1 1 0
where y is total exports, x1 is exchange rate and x2 is price [2
marks]
b) Estimate the model parameters using the method of least squares and state the
estimated model equation. (You can use SAS; dont need to do it by hand)
2 1
84 . 2413 86915 . 1 93 . 3361
84 . 2413
86915 . 1
93 . 3361

x x y + =
(
(
(

= |
[2 marks]
c) Interpret the parameter estimates in words.
If the price of wheat was 0 and the exchange rate was 0, we would estimate total exports to
be 3 361 930 bushels (remember that y is measured in thousands of bushels).
If the weighted exchange rate increased by one unit, we would expect wheat exports to
increase by 1869.15 bushels
If the price of wheat decreased by one unit, we would expect wheat exports to decrease by
2 413 840 bushels [6 marks]
d) Conduct a test of the null hypothesis H
0
: |
1
=|
2
=0 vs. H
A
: they are not both 0 and
interpret your finding. You dont have to calculate anything by hand here; use SAS
output.
s. t variable independen the of one least at and exports between
ip relationsh linear a is there : t significan is model that the conclude We ) 6
H reject we so 06 . 3 37 . 6 SAS, From ) 5
06 . 3 if Reject ) 4
) , 1 ( ~
) /(
) 1 /(
) 3
05 . 0 ) 2
2 , 1 , 0 one least at :
0 : ) 1
0
3 135 , 1 3 , 05 . 0 , 1 ,
Re
2 1 0
> =
~ = >

=
=
= =
= =

obs
p n p obs
sidual
Model
j A
F
f f F
p n p F
p n SS
p SS
F
j H
H
o
o
|
| |
[4 marks]
e) Conduct a test of the null hypothesis H
0
: |
1
=0 vs. H
A
: |
1
0 and interpret your finding.

You must calculate the t statistic by hand. You can use the following information given
to you:
869 . 1

, 1082678565

, 1166791487
1
= = = | | y X y y
T T T

(
(
(

1245 . 1 002792 . 0 6488 . 0


002792 . 0 00002799 . 0 003893 . 0
6488 . 0 003893 . 0 6292 . 0
) (
1
X X
T


( )
exports. wheat and rate exchange between ip relationsh linear t significan no is e that ther conclude We
3 135 , 025 . 0 , 2 /
1
1 0
) 6
H0 reject not do we thus 98 . 1 4426 . 0
834 . 17
869 . 1
) 3 135 /( ) 00002799 . 0 )( 1082678565 1166791487 (
0 869 . 1
) 5
98 . 1 | | if H0 Reject ) 4
) ( ~
) /(

) 3
05 . 0 ) 2
0 :
0 : ) 1
< = =


=
~ = >



=
=
=
=

obs
p n obs
jj
T T T
j j
A
t
t t t
p n t
p n C y X y y
t
H
H
o
|
| |
o
|
|
[4 marks]

f) Give a 95% confidence interval for the true value of |
1
. Calculate it by hand using the
values given above.

( )
) 23 . 10 , 49 . 6 (
834 . 17 98 . 1 869 . 1
834 . 17 869 . 1
) /(

132 , 025 . 0
11 , 2 / 1



t
p n C y X y y t
T T T
p n
| |
o
[3 marks]
g) Predict the total wheat exports in a month where the average exchange rate is 100 and
the price of red wheat is 0.4.

31 . 2583
) 4 . 0 ( 84 . 2413 ) 100 ( 86915 . 1 93 . 3361
84 . 2413 86915 . 1 93 . 3361
2 1
=
+ =
+ = x x y


Total wheat exports in such a month would be estimated at 2583.31 thousand bushels
or 2 583 310 bushels.

[2 marks]

2. Open the data file STEEL.xlsx. The Hardness column gives the hardness of a batch of steel (y)
while the Copper column gives the copper content of the steel (as a percent) and the Temp
column gives the Annealing Temperature at which the steel was processed in degrees Celsius.
You do not need to submit SAS output for this question, although you are welcome to check
your answers using SAS.
a) Construct a multiple linear regression model with Hardness as the dependent variable
and Copper and Temp as independent variables. State the population regression
function.
i i i i
x x Y c | | | + + + =
2 2 1 1 0
where y is hardness of steel, x1 is copper content and x2 is
annealing temperature [2 marks]

b) Use matrices to find the least squares parameter estimates for this model by hand.
(
(
(

=
(
(
(
(
(
(
(
(

(
(
(




=
(
(
(
(
(
(
(
(

(
(
(

(
(
(


=
=
(
(
(
(
(
(
(
(

=
(
(
(
(
(
(
(
(

2167 . 0
17 . 37
05 . 196
7 . 60
3 . 85
4 . 57
9 . 80
2 . 55
9 . 78
0030324 . 0 0030282 . 0 0030324 . 0 0030282 . 0 0030324 . 0 0030282 . 0
12495 . 3 12495 . 3 00005 . 0 00005 . 0 12505 . 3 12505 . 3
94733 . 1 65847 . 1 63483 . 1 97097 . 1 32233 . 1 28347 . 2
7 . 60
3 . 85
4 . 57
9 . 80
2 . 55
9 . 78
650 540 650 540 650 540
18 . 0 18 . 0 1 . 0 1 . 0 02 . 0 02 . 0
1 1 1 1 1 1
10 5096 . 5 0 03278 . 0
0 0625 . 39 9063 . 3
03278 . 0 9063 . 3 0628 . 20
) (

650 18 . 0 1
540 18 . 0 1
650 1 . 0 1
540 1 . 0 1
650 02 . 0 1
540 02 . 0 1
,
7 . 60
3 . 85
4 . 57
9 . 80
2 . 55
9 . 78
5
1
y X X X
X y
T T
|

[4 marks]

c) Interpret the parameter estimates in words.
0

| =196.05 means that a batch of steel produced with no copper content at 0 degrees
Celsius would be expected to have a hardness of 196.05 (this does not make sense since
steel cannot be made at 0 degrees Celsius)
1

| =37.17 means that for each one unit increase in copper content, the hardness is
expected to increase by 37.17 units. It might make more sense to divide by 100 and say that
for each 0.01 unit increase in copper content, the hardness is expected to increase by
0.3717 units (since copper content is a proportion between 0 and 1, it cant increase by one
unit)
2

| =-0.2167 means that for each one degree increase in annealing temperature, the
hardness is expected to decrease by 0.2167 units
[6 marks]
d) Use your fitted regression equation to obtain the predicted hardness ( y ) for all
observations in the sample.
89 . 61
72 . 85
91 . 58
75 . 82
94 . 55
: get we method same By the
78 . 79
) 540 ( 2167 . 0 ) 02 . 0 ( 17 . 37 05 . 196
2167 . 0 17 . 37 05 . 196
6
5
4
3
2
21 11 1
=
=
=
=
=
=
+ =
+ =
y
y
y
y
y
x x y
[6 marks]
e) Calculate the residuals of all observations in the sample, and use these to calculate
SS
Residual
and
2
o (or, if you prefer, you can calculate SS
Residual
and
2
o using matrix
operations).
| |
| |
off. anything round t don' yourself, it do to have you If
yourself. it doing n rather tha estimation parameter your do SAS let better to s it' that is lesson home take The
problem. big a such create d error woul rounding the realize t didn' I since mistake my is This
positive. be to has it - sense make t doesn' answer This
33 . 151
3 6
22 . 454

22 . 454 62 . 30526 4 . 30072


7 . 60
3 . 85
4 . 57
9 . 80
2 . 55
9 . 78
650 540 650 540 650 540
18 . 0 18 . 0 1 . 0 1 . 0 02 . 0 02 . 0
1 1 1 1 1 1
2167 . 0 17 . 37 05 . 196
7 . 60
3 . 85
4 . 57
9 . 80
2 . 55
9 . 78
7 . 60 3 . 85 4 . 57 9 . 80 2 . 55 9 . 78

OR...
87 . 2 ) 3 6 /( 6171 . 8 ) /(
6171 . 8
4161 . 1 1764 . 0 2801 . 2 4225 . 3 5476 . 0 7744 . 0
19 . 1
42 . 0
51 . 1
85 . 1
74 . 0
: get we method same By the
88 . 0 78 . 79 9 . 78
Re 2
Re
Re
2
1
2
Re
6
5
4
3
2
1 1 1
=

=
= =
(
(
(
(
(
(
(
(

(
(
(


(
(
(
(
(
(
(
(

= =
= = =
=
+ + + + + = =
=
=
=
=
=
= = =

=
p n
SS
y X y y SS
p n SS
e SS
e
e
e
e
e
y y e
sidual
T T T
sidual
sidual
n
i
i sidual
o
|
o

[6 marks]

f) Given that the matrix ( )
1
X X
T
is as follows, give a 95% confidence interval for
1
| .
(
(
(

5
1
10 5096 . 5 0 03278 . 0
0 0625 . 39 9063 . 3
03278 . 0 9063 . 3 0628 . 20
) ( X X
T


If you used the matrix approach above, you wouldnt be able to do this part since you
have to take the square root of a negative number. So everyone gets free marks on this
question, as long as you attempted it.

However, the correct method (using the other estimate of 2.87) would be:
) 86 . 70 , 48 . 3 (
) 588 . 10 ( 182 . 3 17 . 37
) 0625 . 39 ( 87 . 2 17 . 37

3 6 , 025 . 0
11
2
, 2 / 1
=

t
C t
p n
o |
o

Note that these values are very far from the correct Confidence Interval from SAS,
because of our rounding errors.
[3 marks]

g) Using the above answers, give a 95% confidence interval for the average hardness of
steel with a copper content of 0.14% and an annealing temperature of 600 degrees
Celsius.

Again, if you got a negative answer for e) you cant do this, so full marks are given to
everyone who attempted the question. However, if you use the 2.87 value above:

| |
| |
) 83 . 73 , 63 . 68 (
) 23321 . 0 ( 87 . 2 182 . 3 23 . 71
* ) ( * *
233221 . 0
600
14 . 0
1
0002776 . 0 56245 . 1 152082 . 0
600
14 . 0
1
10 5096 . 5 0 03278 . 0
0 0625 . 39 9063 . 3
03278 . 0 9063 . 3 0628 . 20
600 14 . 0 1 * ) ( *
23 . 71 ) 600 ( 2167 . 0 ) 14 . 0 ( 17 . 37 05 . 196 *
1 2
3 6 , 025 . 0
5
1
=

=
(
(
(

=
(
(
(

(
(
(


=
= + =

x X X x t y
x X X x
y
T T
T T
o

Again, because of our rounding errors this is far from the true answer given by SAS.
[3 marks]

h) Give a 95% prediction interval for the hardness of a particular batch of steel with a
copper content of 0.14% and an annealing temperature of 600 degrees Celsius.
) 22 . 77 , 24 . 65 (
) 23321 . 1 ( 87 . 2 182 . 3 23 . 71
*) ) ( * 1 ( *
23 . 71 ) 600 ( 2167 . 0 ) 14 . 0 ( 17 . 37 05 . 196 *
1 2
3 6 , 025 . 0
=

+
= + =

x X X x t y
y
T T
o

Again, because of our rounding errors this is far from the true answer given by SAS.
[3 marks]
i) Calculate the
2
r and
2
r values for this model and comment on the goodness of fit.
Our SSy value is 30072.4, and our estimated SS
Residual
value is 8.6171 (although this is a bad
estimate).
Thus our r
2
statistic is:
9997 . 0
) 73333 . 69 ( 6 4 . 30072
6171 . 8
1 1
Re 2
=

= =
y
sidual
SS
SS
r
Note: the true r
2
value is 0.9985, so this is not such a bad estimate in this case.
9995 . 0
) 0003 . 0 (
3
5
1
) 9997 . 0 1 (
3 6
1 6
1
) 1 (
1
1
2 2
=
=

|
|
.
|

\
|

= r
p n
n
r

Either way the model is a very good fit to the data!
[3 marks]

You might also like