Professional Documents
Culture Documents
Regression Models
To accompany
Quantitative Analysis for Management, Tenth Edition,
by Render, Stair, and Hanna
Power Point slides created by Jeff Heyl
Learning Objectives
After completing this chapter, students will be able to:
1. Identify variables and use them in a regression
2.
3.
4.
5.
model
Develop simple linear regression equations
from sample data and interpret the slope and
intercept
Compute the coefficient of determination and
the coefficient of correlation and interpret their
meanings
Interpret the F-test in a linear regression model
List the assumptions used in regression and
use residual plots to identify problems
2009 Prentice-Hall, Inc.
42
Learning Objectives
After completing this chapter, students will be able to:
6. Develop a multiple regression model and use it
7.
8.
9.
10.
to predict
Use dummy variables to model categorical
data
Determine which variables should be included
in a multiple regression model
Transform a nonlinear function into a linear
one for use in regression
Understand and avoid common mistakes made
in the use of regression analysis
43
Chapter Outline
4.1
4.2
4.3
4.4
Introduction
Scatter Diagrams
Simple Linear Regression
Measuring the Fit of the Regression
Model
4.5 Using Computer Software for
Regression
4.6 Assumptions of the Regression
Model
2009 Prentice-Hall, Inc.
44
Chapter Outline
4.7
4.8
4.9
4.10
4.11
4.12
45
Introduction
Regression analysis is a very valuable
variables
Predict the value of one variable based on
another variable
Examples
Determining best location for a new store
Studying the effectiveness of advertising
dollars in increasing sales volume
2009 Prentice-Hall, Inc.
46
Introduction
The variable to be predicted is called
predictor variable
Dependent
variable
Independent
variable
Independent
variable
47
Scatter Diagram
48
Triple A Construction
Triple A Construction renovates old homes
They have found that the dollar volume of
Table 4.1
LOCAL PAYROLL
($100,000,000s)
4.5
9.5
5
2009 Prentice-Hall, Inc.
49
Triple A Construction
12
10
Sales ($100,000)
8
6
4
2
0
|
1
|
2
|
|
|
3
4
5
Payroll ($100 million)
|
6
|
7
|
8
Figure 4.1
2009 Prentice-Hall, Inc.
4 10
where
Y = dependent variable (response)
X = independent variable (predictor or explanatory)
0 = intercept (value of Y when X = 0)
1 = slope of the regression line
= random error
2009 Prentice-Hall, Inc.
4 11
Y b0 b1 X
where
^
Y
X
b0
b1
4 12
Triple A Construction
Triple A Construction is trying to predict sales
Y = Sales
X = Area payroll
The line chosen in Figure 4.1 is the one that
4 13
4 14
Triple A Construction
For the simple linear regression model, the
X
average (mean) of X values
n
Y
average (mean) of Y values
n
b1
( X X )(Y Y )
(X X )
2
b0 Y b1 X
2009 Prentice-Hall, Inc.
4 15
Triple A Construction
Regression calculations
Y
(X X)2
(X X)(Y Y)
(3 4)2 = 1
(3 4)(6 7) = 1
(4 4)2 = 0
(4 4)(8 7) = 0
(6 4)2 = 4
(6 4)(9 7) = 4
(4 4)2 = 0
(4 4)(5 7) = 0
4.5
(2 4)2 = 4
(2 4)(4.5 7) = 5
9.5
(5 4)2 = 1
(5 4)(9.5 7) = 2.5
(X X)2 = 10
(X X)(Y Y) = 12.5
Y = 42
Y = 42/6 = 7
X = 24
X = 24/6 = 4
Table 4.2
4 16
Triple A Construction
Regression calculations
X 24
4
6
Y 42
7
6
b1
( X X )(Y Y ) 12.5
1.25
10
(X X )
2
b0 Y b1 X 7 (1.25 )( 4 ) 2
Therefore Y 2 1.25 X
2009 Prentice-Hall, Inc.
4 17
Triple A Construction
Regression calculations
X 24
sales = 2 + 1.25(payroll)
Y 42
b1
) 2 12
1.5
.25(6) 9.5 or $ 950,000
( X X )(Y YY
1.25
10
(X X )
2
b0 Y b1 X 7 (1.25 )( 4 ) 2
Therefore Y 2 1.25 X
2009 Prentice-Hall, Inc.
4 18
4 19
SST (Y Y )2
SSE e 2 (Y Y )2
SSR (Y Y )2
An important relationship
4 20
(Y Y)2
Y^
^ 2
(Y Y)
(Y^ Y)2
(6 7)2 = 1
2 + 1.25(3) = 5.75
0.0625
1.563
(8 7)2 = 1
2 + 1.25(4) = 7.00
(9 7)2 = 4
2 + 1.25(6) = 9.50
0.25
6.25
(5 7)2 = 4
2 + 1.25(4) = 7.00
4.5
2 + 1.25(2) = 4.50
6.25
9.5
2 + 1.25(5) = 8.25
1.5625
1.563
(Y Y)2 = 22.5
Y=7
SST = 22.5
^2
(Y Y)
= 6.875
SSE
= 6.875
(Y Y)2 =
15.625
SSR = 15.625
Table 4.3
4 21
For Triple
A Construction
2
SST (Y Y )
SST = 22.5
Sum of the squared error SSE = 6.875
SSR=2 15.625
2
SSE e (Y Y )
Sum of squares due to regression
SSR (Y Y )2
An important relationship
4 22
Y = 2 + 1.25X
10
^
YY
Sales ($100,000)
8
^
YY
YY
4
2
0
|
1
|
2
|
|
|
3
4
5
Payroll ($100 million)
|
6
|
7
|
8
Figure 4.2
2009 Prentice-Hall, Inc.
4 23
Coefficient of Determination
The proportion of the variability in Y explained by
15.625
r
0.6944
22.5
2
4 24
Correlation Coefficient
The correlation coefficient is an expression of the
r 0.6944 0.8333
4 25
Correlation Coefficient
Y
*
** *
*
** *
* *
* *
*
*
Y
* **
* * * *
*
* *** *
(c) No Correlation: X
Figure 4.3
r=0
(b) Positive
Correlation:
0<r<1
4 26
Program 4.1A
4 27
Program 4.1B
2009 Prentice-Hall, Inc.
4 28
Program 4.1C
2009 Prentice-Hall, Inc.
4 29
Program 4.1D
2009 Prentice-Hall, Inc.
4 30
Program 4.1D
2009 Prentice-Hall, Inc.
4 31
4 32
Residual Plots
Error
X
Figure 4.4A
2009 Prentice-Hall, Inc.
4 33
Residual Plots
Nonconstant error variance
Errors increase as X increases, violating the
Error
X
Figure 4.4B
2009 Prentice-Hall, Inc.
4 34
Residual Plots
Nonlinear relationship
Errors consistently increasing and then consistently
Error
X
Figure 4.4C
2009 Prentice-Hall, Inc.
4 35
where
n = number of observations in the sample
k = number of independent variables
2009 Prentice-Hall, Inc.
4 36
SSE
6.8750 6.8750
s MSE
1.7188
n k 1 6 1 1
4
2
4 37
4 38
Y 0 1X
4 39
SSR
MSR
k
where
The F statistic is
MSR
F
MSE
4 40
4 41
H 0 : 1 0
H1 : 1 0
4 42
methods
4 43
Triple A Construction
Step 1.
Step 2.
H0 : 1 = 0
(no linear
relationship between X and Y)
H1 : 1 0
(linear
relationship exists between X
and Y)
Select = 0.05
Step 3.
Calculate the value of the test
statistic
SSR 15.6250
MSR
15.6250
k
1
MSR 15.6250
F
9.09
MSE 1.7188
2009 Prentice-Hall, Inc.
4 44
Triple A Construction
Step 4.
Reject the null hypothesis if the test statistic
is greater than the F-value in Appendix D
df1 = k = 1
df2 = n k 1 = 6 1 1 = 4
The value of F associated with a 5% level of
significance and with degrees of freedom 1
and 4 is found in Appendix D
F0.05,1,4 = 7.71
Fcalculated = 9.09
Reject H0 because 9.09 > 7.71
2009 Prentice-Hall, Inc.
4 45
Triple A Construction
We can conclude there is a
statistically significant
relationship between X and Y
The r2 value of 0.69 means
about 69% of the variability
in sales (Y) is explained by
local payroll (X)
0.05
F = 7.71
Figure 4.5
9.09
2009 Prentice-Hall, Inc.
4 46
r2 coefficient of determination
The F-test determines whether or not there is
relationship.
Good regression models have a low significance
level for the F-test and high r2 value.
2009 Prentice-Hall, Inc.
4 47
Coefficient Hypotheses
Statistical tests of significance can be performed on
the coefficients.
The null hypothesis is that the coefficient of X (i.e.,
the slope of the line) is 0 i.e., X is not useful in
predicting Y
P values are the observed significance level and can
be used to test the null hypothesis.
Values less than 5% negate the null hypothesis and
4 48
SS
MS
Regression
SSR
MSR = SSR/k
Residual
n-k-1
SSE
MSE =
SSE/(n - k - 1)
Total
n-1
SST
SIGNIFICANCE
MSR/MSE
Table 4.4
2009 Prentice-Hall, Inc.
4 49
Program 4.1D
(partial)
4 50
where
Y=
dependent variable (response variable)
Xi =
ith independent variable (predictor or
explanatory variable)
0 =
intercept (value of Y when all Xi = 0)
I =
coefficient of the ith independent variable
k=
number of independent variables
=
random error
2009 Prentice-Hall, Inc.
4 51
Y b0 b1 X 1 b2 X 2 ... bk X k
where
Y =
predicted value of Y
b0 = sample intercept (and is an estimate
of 0)
bi =
sample coefficient of the ith variable
(and is an estimate of i)
4 52
where
She
=
predicted value of dependent
variable (selling price)
b0 =
Y intercept
X1 and X2 =
value of the two independent
variables (square footage and age)
respectively
b1 and b2 =
slopes for X1 and X2
selects
a sample of houses that have sold
respectively
4 53
Table 4.5
SQUARE
FOOTAGE
AGE
95,000
1,926
30
Good
119,000
2,069
40
Excellent
124,800
1,720
30
Excellent
135,000
1,396
15
Good
142,000
1,706
32
Mint
145,000
1,847
38
Mint
159,000
1,950
27
Mint
165,000
2,323
30
Excellent
182,000
2,285
26
Mint
183,000
3,752
35
Good
200,000
2,300
18
Good
211,000
2,525
17
Good
215,000
3,800
40
Excellent
219,000
1,740
12
Mint
CONDITION
4 54
Program 4.2
Y 146631 44 X 1 2899 X 2
2009 Prentice-Hall, Inc.
4 55
regression models
The p-value for the F-test and r2 are
interpreted the same
The hypothesis is different because there
is more than one independent variable
The F-test is investigating whether all
the coefficients are equal to 0
4 56
4 57
4 58
4 59
4 60
Program 4.3
2009 Prentice-Hall, Inc.
4 61
Program 4.3
Low p-values
indicate each
variable is
significant
2009 Prentice-Hall, Inc.
4 62
Model Building
The best model is a statistically significant
4 63
Model Building
The formula for r2
r2
SSR
SSE
1
SST
SST
SSE /( n k 1)
Adjusted r 1
SST /( n 1)
2
4 64
Model Building
It is tempting to keep adding variables to a model
to try to increase r2
The adjusted r2 will decrease if additional
independent variables are not beneficial.
As the number of variables (k) increases, n-k-1
decreases.
This causes SSE/(n-k-1) to increase which in turn
decreases the adjusted r2 unless the extra variable
causes a significant decrease in SSE
The reduction in error (and SSE) must be sufficient
to offset the change in k
2009 Prentice-Hall, Inc.
4 65
Model Building
In general, if a new variable increases the
4 66
Nonlinear Regression
In some situations, variables are not linear
Transformations may be used to turn a
*
** *
***
*
Linear relationship
*
* **
* **
*
*
*
Nonlinear relationship
4 67
Colonel Motors
The engineers want to use regression analysis to
WEIGHT
(1,000 LBS.)
MPG
WEIGHT
(1,000 LBS.)
12
4.58
20
3.18
13
4.66
23
2.68
15
4.02
24
2.65
18
2.53
33
1.70
19
3.09
36
1.95
19
3.11
42
1.92
Table 4.6
2009 Prentice-Hall, Inc.
4 68
Colonel Motors
45
40
35
30
MPG
Linear model
Y b0 b1 X 1
25
20
15
10
5
0
Figure 4.6A
1.00
2.00
3.00
4.00
5.00
4 69
Colonel Motors
Program 4.4
4 70
Colonel Motors
45
40
35
30
MPG
Nonlinear model
MPG b0 b1( weight ) b2 ( weight )2
25
20
15
10
5
0
Figure 4.6B
1.00
2.00
3.00
4.00
5.00
4 71
Colonel Motors
The nonlinear model is a quadratic model
The easiest way to work with this model is to
X 2 ( weight)2
This gives us a model that can be solved with
Y b0 b1 X 1 b2 X 2
4 72
Colonel Motors
Y 79.8 30.2 X 1 3.4 X 2
Program 4.5
4 73
4 74
4 75