Professional Documents
Culture Documents
=
t - TEST FOR CORRELATION
Hypotheses
H
0
: = 0 (No Correlation)
H
1
: = 0 (Correlation)
Test Statistic
( )( )
( ) ( )
2
2
1
2 2
1 1
where
2
n
i i
i
n n
i i
i i
r
t
r
n
X X Y Y
r r
X X Y Y
=
= =
=
1
= =
27
HYPOTHESIS TESTING
( )( )
( )( )
) 1 , 0 ( ~
1 1
1 1
ln
2
3
3
1
N
r
r n
n
z
Z
z
|
|
.
|
\
|
+
+
=
|
.
|
\
|
+
=
r
r
z
1
1
ln
2
1
3
1
=
n
o
Null hypothesis: =
0
(two variables are not associated)
Alternative hypothesis:
0
(two variables are
associated)
Level of significance =0.05
Test statistic
Decision : if null hypothesis is rejected there is a
relationship between the two variables.
|
|
.
|
\
|
+
=
1
1
ln
2
1
z
EXAMPLE
COEFFICIENT OF CORRELATION BASED ON A SAMPLE OF SIZE
18 WAS COMPUTED TO BE 0.32. CAN WE CONCLUDE AT
SIGNIFICANCE LEVELS OF A) 0.05 B) 0.01
Null hypothesis: =0 (two variables are not associated)
Alternative hypothesis: > 0 One tail test
Alternative hypothesis: 0 Two tail test
EXAMPLE
COEFFICIENT OF CORRELATION BASED ON A SAMPLE OF SIZE
24 WAS COMPUTED TO BE 0.75. CAN WE CONCLUDE AT
SIGNIFICANCE LEVELS OF A) 0.05 B) 0.01
Null hypothesis: =0.60 (two variables are not associated)
Alternative hypothesis: > 0.60 One tail test
Alternative hypothesis: 0.60 Two tail test
CONFIDENCE INTERVAL FOR
3 3
2 2
+ < <
n
z
z
n
z
z
z
o o
EXAMPLE:
IF R = 0.7 FOR THE MATHEMATICS AND STATISTICS GRADES OF 30
STUDENTS, CONSTRUCT 95% CONFIDENCE INTERVAL FOR THE
POPULATION CORRELATION COEFFICIENT.
r = 0.70, n = 30, and
z
0.025
=1.96
z that correspond to r =
0.70 from table is 0.867
95% confidence interval for the population correlation
coefficient
85 . 0 45 . 0
27
96 . 1
867 . 0
27
96 . 1
867 . 0
3 3
2 2
< <
+ < <
+ < <
o o
z
z
n
z
z
n
z
z
construct 95% confidence interval for the
population correlation coefficient when
a) r = 0.72, n = 30
b) r = 0.35, n = 40
c) r = -0.87, n = 35,
d) r = 0.16, n = 42,
construct 99% confidence interval for the
population correlation coefficient when
a) r = 0.72, n = 30
b) r = 0.35, n = 40
c) r = -0.87, n = 35,
d) r = 0.16, n = 42,
STRENGTH VS. SIGNIFICANCE OF THE
CORRELATION:
the significance, given by P-value, depends on the
statistical evidence. When small, the correlation
exists.
the strength, given by the r value, is meaningful only
it is supported by statistical significance.
R
2
=12.70%
Means that the variables in the model explains
about 12.70% of the total variation in that age
r = .6 r = 1
SAMPLE OF OBSERVATIONS FROM VARIOUS r
VALUES
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
38
EXAMPLE: PRODUCE STORES
Reg ressi o n S tati sti cs
M u l t i p l e R 0 . 9 7 0 5 5 7 2
R S q u a r e 0 . 9 4 1 9 8 1 2 9
A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4
S t a n d a r d E r r o r 6 1 1 . 7 5 1 5 1 7
O b s e r va t i o n s 7
From Excel Printout
r
Is there any
evidence of linear
relationship between
Annual Sales of a
store and its Square
Footage at .05 level
of significance?
H
0
:
= 0 (No association)
H
1
: = 0 (Association)
o = .05
df = 7 - 2 = 5
39
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
EXAMPLE: PRODUCE STORES SOLUTION
0 2.5706 -2.5706
.025
Reject Reject
.025
Critical Value(s):
Conclusion:
There is evidence of a linear
relationship at 5% level of
significance
Decision:
Reject H
0
2
.9706
9.0099
1 .9420
5
2
r
t
r
n
= = =
Y b b X = + =
Simple Regression Equation
(Fitted Regression Line, Predicted Value)
LINEAR REGRESSION EQUATION
49
b
0
and b
1
are obtained by finding the values of b
0
and b
1
that minimizes the sum of the squared
residuals (Least Squares Method)
b
0
provides an estimate of |
0
b
1
provides and estimate of |
1
(continued)
( )
2
2
1 1
n n
i i i
i i
Y Y e
= =
=
LINEAR REGRESSION EQUATION
50
LEAST SQUARES METHOD
b
0
= Y - b
1
X
b
1
= Exy (ExEy)/n
Ex
2
- (Ex)
2
/n
Y = Ey/n
X = Ex/n
51
Y
X
Observed Value
| Y X i
X | |
0 1
= +
i
c
|
0
|
1
i i i
Y X | | c
0 1
+ + =
0 1
i i
Y b b X = +
i
e
0 1 i i i
b b Y X e + + =
1
b
0
b
(continued)
LINEAR REGRESSION EQUATION
52
is the average value of Y when the value of
X is zero.
measures the change in the average
value of Y as a result of a one-unit change in X.
| 0 Y X
|
0 =
=
|
1
Y X
X
|
A
=
A
INTERPRETATION OF THE SLOPE AND
INTERCEPT
53
You wish to examine
the linear dependency
of the annual sales of
produce stores on their
sizes in square
footage. Sample data
for 7 stores were
obtained. Find the
equation of the straight
line that fits the data
best.
Annual
Store Square Sales
Feet ($1000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
LINEAR REGRESSION EQUATION: EXAMPLE
54
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S qua re Fe e t
A
n
n
u
a
l
S
a
l
e
s
(
$
0
0
0
)
Excel Output
SCATTER DIAGRAM: EXAMPLE
55
0 1
1636.415 1.487
i i
i
Y b b X
X
= +
= +
From Excel Printout:
Co effi ci en ts
I n t e r c e p t 1 6 3 6 . 4 1 4 7 2 6
X V a r i a b l e 1 1 . 4 8 6 6 3 3 6 5 7
SIMPLE LINEAR REGRESSION EQUATION:
EXAMPLE
56
GRAPH OF THE SIMPLE LINEAR REGRESSION
EQUATION: EXAMPLE
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S q u a r e F e e t
A
n
n
u
a
l
S
a
l
e
s
(
$
0
0
0
)
57
INTERPRETATION OF RESULTS: EXAMPLE
The slope of 1.487 means that each increase of one unit
in X, we predict the average of Y to increase by an
estimated 1.487 units.
The equation estimates that for each increase of 1
square foot in the size of the store, the expected annual
sales are predicted to increase by $1487.
1636.415 1.487
i i
Y X = +
58
TOPICS
Measures of Variation
Coefficient of Determination
Coefficient of Correlation
59
MEASURES OF VARIATION:
THE SUM OF SQUARES
SST = SSR + SSE
Total
Sample
Variability
Explained
Variability
Unexplained
Variability
To examine the ability of the independent variable to
predict the dependant variable
60
MEASURES OF VARIATION:
THE SUM OF SQUARES
SST = Total Sum of Squares
Measures the variation of the Y
i
values around their
mean,
SSR = Regression Sum of Squares
Explained variation attributable to the relationship
between X and Y, between predicted value and mean
value
SSE = Error Sum of Squares
Variation attributable to factors other than the
relationship between X and Y, between observed value
and predicted value
(continued)
Y
61
MEASURES OF VARIATION:
THE SUM OF SQUARES
(continued)
SST = (Y
i
- Y)
2
= Y
i
2
(Y
i
)
2
/n
_
SSR = (Y
i
- Y)
2
= b
0
Y
i
+ b
1
X
i
Y
i
- (Y
i
)
2
/n
_
.
SSE = (Y
i
- Y)
2
= Y
i
2
- b
0
Y
i
- b
1
X
i
Y
i
.
62
MEASURES OF VARIATION:
THE SUM OF SQUARES
(continued)
X
i
Y
X
Y
SST = (Y
i
- Y)
2
SSE =(Y
i
- Y
i
)
2
.
SSR = (Y
i
- Y)
2
.
_
_
_
63
MEASURES OF VARIATION
THE SUM OF SQUARES: EXAMPLE
ANOVA
df SS MS F Significance F
Regression 1 30380456.12 30380456.1 81.1790902 0.000281201
Residual 5 1871199.595 374239.919
Total 6 32251655.71
Excel Output for Produce Stores
SSR
SSE
Regression (explained) df
Degrees of freedom
Error (residual) df
Total df
SST
64
THE COEFFICIENT OF DETERMINATION
Measures the proportion of variation in Y that is
explained by the independent variable X in the
regression model
2
Regression Sum of Squares
Total Sum of Squares
SSR
r
SST
= =
65
COEFFICIENTS OF DETERMINATION (R
2
)
AND CORRELATION (R)
r
2
= 1, r
2
= 1,
r
2
= .81, r
2
= 0,
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
r = +1 r = -1
r = +0.9 r = 0
66
TOPICS
Standard Error of Estimate
Assumptions of Simple Linear Regression
Model
Residual Analysis
67
STANDARD ERROR OF ESTIMATE
The standard deviation of the variation of
observations around the regression equation
( )
2
1
2 2
n
i
i
YX
Y Y
SSE
S
n n
=
= =
68
6
9
2
1
1 0
2
==
=
n
XY b X b Y
S
n
i
YX
X = values of the independent variable
Y = values of the dependent variable
b
0
= Y-intercept
b
1
= slope of the estimating equation
n = number of data points
Finding the Standard Error of Estimate
INFERENCE ABOUT THE SLOPE:
t - TEST
t Test for a Population Slope
Is there a linear dependency of Y on X ?
Null and Alternative Hypotheses
H
0
: |
1
= 0 (No Linear Dependency)
H
1
: |
1
= 0 (Linear Dependency)
Test Statistic
1
1
1 1
2
1
where
( )
YX
b
n
b
i
i
b S
t S
S
X X
|
=
= =
. . 2 d f n =
70
EXAMPLE: PRODUCE STORE
Data for 7 Stores:
Estimated Regression Equation:
The slope of this
model is 1.487.
Does Square
Footage Affect
Annual Sales?
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
1636.415 1.487
i
Y X = +
71
INFERENCES ABOUT THE SLOPE:
T TEST EXAMPLE
H
0
: |
1
= 0
H
1
: |
1
= 0
o = .05
df = 7 - 2 = 5
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
There is evidence that
square footage affects
annual sales.
t
0 2.5706 -2.5706
.025
Reject Reject
.025
Reject H
0
72
INFERENCES ABOUT THE SLOPE:
F TEST
F Test for a Population Slope
Is there a linear dependency of Y on X ?
Null and Alternative Hypotheses
H
0
: |
1
= 0 (No Linear Dependency)
H
1
: |
1
= 0 (Linear Dependency)
Test Statistic
Numerator d.f.=1, denominator d.f.=n-2
( )
1
2
SSR
F
SSE
n
=
73
INFERENCES ABOUT THE SLOPE:
CONFIDENCE INTERVAL EXAMPLE
Confidence Interval Estimate of the Slope:
1
1 2 n b
b t S
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
=
75
PREDICTION OF INDIVIDUAL VALUES
Prediction Interval for Individual Response
Y
i
at a Particular X
i
Addition of 1 increases width of interval from that for the
mean of Y
2
2
2
1
( ) 1
1
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
+ +
76
EXAMPLE: PRODUCE STORES
Data for 7 Stores:
Regression Equation
Obtained:
Consider a store
with 2000 square
feet.
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
1636.415 1.487
i
Y X = +
77
ESTIMATION OF MEAN VALUES: EXAMPLE
Find the 95% confidence interval for the average
annual sales for stores of 2,000 square feet
2
2
2
1
( ) 1
4610.45 612.66
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
+ =
Predicted Sales
Confidence Interval Estimate for
|
i
Y X X
=
( )
= = = =
78
PREDICTION INTERVAL FOR Y : EXAMPLE
Find the 95% prediction interval for annual sales of
one particular store of 2,000 square feet
Predicted Sales)
2
2
2
1
( ) 1
1 4610.45 1687.68
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
+ + =
= = = =
79
MULTIPLE REGRESSION
TOPICS
The Multiple Regression Model
Residual Analysis
Coefficient of Multiple Determination
81
THE MULTIPLE REGRESSION MODEL
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population Y-
intercept
Population slopes Random
Error
Dependent (Response) variable Independent (Explanatory) variables
1 2 i i i k ki i
Y X X X | | | | c
0 1 2
= + + + + +
82
MULTIPLE REGRESSION EQUATION
The coefficients of the multiple regression model are estimated using
sample data
ki k 2i 2 1i 1 0 i
X b X b X b b Y
+ + + + =
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated
intercept
MULTIPLE REGRESSION EQUATION
Example with
two independent
variables
Y
X
1
X
2
2 2 1 1 0
X b X b b Y
+ + =
INTERPRETATION OF ESTIMATED
COEFFICIENTS
Slope (b
i
)
Estimated that the average value of Y changes by
b
i
for each 1 unit increase in X
i
holding all other
variables constant
Example: If b
1
= -2, then fuel oil usage (Y) is
expected to decrease by an estimated 2 gallons for
each 1 degree increase in temperature (X
1
) given
the inches of insulation (X
2
)
Y-Intercept (b
0
)
The estimated average value of Y when all X
i
= 0
85
MULTIPLE REGRESSION MODEL: EXAMPLE
Oil (Gal) Temp Insulation
275.30 40 3
363.80 27 3
164.30 40 10
40.80 73 6
94.30 64 6
230.90 34 6
366.70 9 6
300.60 8 10
237.80 23 10
121.40 63 3
31.40 65 10
203.50 41 6
441.10 21 3
323.00 38 3
52.50 58 10
(
0
F)
Develop a model for estimating
heating oil used for a single family
home in the month of January based
on average temperature and amount
of insulation in inches.
86
1 2
i i i k ki
Y b b X b X b X = + + + +
87
STANDARD ERROR OF ESTIMATE FOR MULTIPLE
REGRESSION
The standard error of estimate of dependent variable
Y on independent variables
( )
1
2
=
k n
Y Y
s
e
COEFFICIENT OF MULTIPLE DETERMINATION
Proportion of Total Variation in Y Explained by All X
Variables Taken Together
Never Decreases When a New X Variable is Added
to Model
2
12
Explained Variation
Total Variation
Y k
SSR
r
SST
-
= =
89
COEFFICIENT OF MULTIPLE DETERMINATION
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA
df SS MS F
Significanc
e F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficien
ts
Standard
Error t Stat P-value Lower 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.4640
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.7088
.52148
56493.3
29460.0
SST
SSR
R
2
= = =
52.1% of the variation in pie sales is
explained by the variation in price
and advertising
ADJUSTED COEFFICIENT OF MULTIPLE
DETERMINATION
Adding additional variables will necessarily reduce
the SSE and increase the r
2
.To account for this, the
adjusted coefficient of determination given by
Proportion of Variation in Y Explained by All X
Variables Adjusted for the Number of X Variables
Used and Sample Size
Penalizes Excessive Use of Independent Variables
Smaller than
Useful in Comparing among Models having different
exploratory variables
( )
2 2
12
1
1 1
1
adj Y k
n
r r
n k
-
(
=
(
2
12 Y k
r
- 91
ADJUSTED R
2
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA
df SS MS F
Significance
F
Regression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficien
ts
Standard
Error t Stat P-value Lower 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.44172 r
2
adj
=
44.2% of the variation in pie sales is explained by the
variation in price and advertising, taking into account
the sample size and number of independent variables
COEFFICIENT OF MULTIPLE DETERMINATION
Regressi on Stati sti cs
M u l t i p l e R 0 . 9 8 2 6 5 4 7 5 7
R S q u a re 0 . 9 6 5 6 1 0 3 7 1
A d j u s t e d R S q u a re 0 . 9 5 9 8 7 8 7 6 6
S t a n d a rd E rro r 2 6 . 0 1 3 7 8 3 2 3
O b s e rva t i o n s 1 5
Excel Output
2
12 Y
SSR
r
SST
-
=
Adjusted r
2
reflects the number
of explanatory
variables and sample
size
is smaller than r
2
93
INTERPRETATION OF COEFFICIENT OF
MULTIPLE DETERMINATION
96.56% of the total variation in heating oil can be
explained by temperature and amount of insulation
95.99% of the total fluctuation in heating oil can be
explained by temperature and amount of insulation
after adjusting for the number of explanatory
variables and sample size
2
12
.9656
Y
SSR
r
SST
-
= =
2
adj
.9599 r =
94
USING THE REGRESSION EQUATION TO
MAKE PREDICTIONS
Predict the amount of heating oil used for a
home if the average temperature is 30
0
and the
insulation is 6 inches.
The predicted heating
oil used is 278.97
gallons
( ) ( )
1 2
Y
1
X
2
X
96
RESIDUAL PLOTS: EXAMPLE
Insulation Residual Plot
0 2 4 6 8 10 12
No Discernible Pattern
Temperature Residual Plot
-60
-40
-20
0
20
40
60
0 20 40 60 80
R
e
s
i
d
u
a
l
s
Maybe some non-
linear relationship
97
TESTING FOR OVERALL SIGNIFICANCE
Shows if there is a Linear Relationship between all of
the X Variables together and Y
Use F test Statistic
Hypotheses:
H
0
: |
1
= |
2
= = |
k
= 0 (No linear relationship)
H
1
: At least one |
i
= 0 ( At least one independent variable
affects Y )
The Null Hypothesis is a Very Strong Statement
The Null Hypothesis is Almost Always Rejected
98
TESTING FOR OVERALL SIGNIFICANCE
Test Statistic:
where F has k numerator and (n-k-1)
denominator degrees of freedom
(continued)
( )
( )
all /
all
SSR k
MSR
F
MSE MSE
= =
99
TEST FOR OVERALL SIGNIFICANCE
EXCEL OUTPUT: EXAMPLE
ANOVA
df SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65411E-09
Residual 12 8120.603 676.7169
Total 14 236135.2
k = 2, the number of
explanatory variables
n - 1
p value
Test Statistic
MSR
F
MSE
=
100
TEST FOR OVERALL SIGNIFICANCE
EXAMPLE SOLUTION
F
0 3.89
H
0
: |
1
= |
2
= = |
k
= 0
H
1
: At least one |
i
= 0
o = .05
df = 2 and 12
Critical Value:
Test Statistic:
Decision:
Conclusion:
Reject at o = 0.05
There is evidence that at
least one independent
variable affects Y
o = 0.05
F
=
168.47
(Excel Output)
101
TEST FOR SIGNIFICANCE:
INDIVIDUAL VARIABLES
Shows if There is a Linear Relationship Between
the Variable X
i
and Y
Use t Test Statistic
Hypotheses:
H
0
: |
i
= 0 (No linear relationship)
H
1
: |
i
= 0 (Linear relationship between X
i
and Y)
102
t TEST STATISTIC OUTPUT: EXAMPLE
Coefficients Standard Error t Stat
Intercept 562.1510092 21.09310433 26.65093769
X Variable 1 -5.436580588 0.336216167 -16.16989642
X Variable 2 -20.01232067 2.342505227 -8.543127434
t Test Statistic for X
1
(Temperature)
t Test Statistic for X
2
(Insulation)
i
i
b
b
t
S
=
103
T TEST : EXAMPLE SOLUTION
H
0
: |
1
= 0
H
1
: |
1
= 0
df = 12
Critical Values:
Test Statistic:
Decision:
Conclusion:
Reject H
0
at o = 0.05
There is evidence of a
significant effect of
temperature on oil
consumption.
t
0
2.1788 -2.1788
.025
Reject H
0
Reject H
0
.025
Does temperature have a significant effect on
monthly consumption of heating oil? Test at o =
0.05.
t Test Statistic = -16.1699
104
CONFIDENCE INTERVAL ESTIMATE
FOR THE SLOPE
Confidence interval for the population slope
i
Example: Form a 95% confidence interval for the effect of changes in
price (X
1
) on pie sales, holding constant the effects of advertising:
-24.975 (2.1788)(10.832): So the interval is (-48.576, -1.374)
i
b 1 k n i
S t b
=
=
=
T
t
t
T
t
t t
e
e e
d
1
2
2
2
1
) (
116
In large samples
2
It can be shown that in large samples d tends to 2 2, where is the
parameter in the AR(1) relationship u
t
= u
t1
+ c
t
.
=
=
=
T
t
t
T
t
t t
e
e e
d
1
2
2
2
1
) (
2 2 d
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
117
In large samples
No autocorrelation
If there is no autocorrelation, is 0 and d should be distributed randomly
around 2.
=
=
=
T
t
t
T
t
t t
e
e e
d
1
2
2
2
1
) (
2 2 d
2 d
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
118
In large samples
No autocorrelation
Severe positive autocorrelation
If there is severe positive autocorrelation, will be near 1 and d will be
near 0.
=
=
=
T
t
t
T
t
t t
e
e e
d
1
2
2
2
1
) (
2 2 d
2 d
0 d
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
119
In large samples
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Likewise, if there is severe positive autocorrelation, will be near 1 and
d will be near 4.
=
=
=
T
t
t
T
t
t t
e
e e
d
1
2
2
2
1
) (
2 2 d
2 d
0 d
4 d
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
120
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Thus d behaves as illustrated graphically above.
2 d
0 d
4 d
2 4 0
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
121
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
To perform the DurbinWatson test, we define critical values of d. The null
hypothesis is H
0
: = 0 (no autocorrelation). If d lies between these values,
we do not reject the null hypothesis.
2 d
0 d
4 d
2 4 0
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
d
crit
d
crit
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
122
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
The critical values, at any significance level, depend on the number of
observations in the sample and the number of explanatory variables.
2 d
0 d
4 d
2 4 0
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
d
crit
d
crit
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
123
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Unfortunately, they also depend on the actual data for the explanatory
variables in the sample, and thus vary from sample to sample.
2 d
0 d
4 d
2 4 0 d
crit
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
d
crit
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
124
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
However Durbin and Watson determined upper and lower bounds, d
U
and
d
L
, for the critical values, and these are presented in standard tables.
2 d
0 d
4 d
2 4 0 d
L
d
U
d
crit
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
d
crit
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
125
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If d is less than d
L
, it must also be less than the critical value of d for
positive autocorrelation, and so we would reject the null hypothesis and
conclude that there is positive autocorrelation.
2 d
0 d
4 d
2 4 0 d
L
d
U
d
crit
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
d
crit
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
126
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If d is above than d
U
, it must also be above the critical value of d, and so we
would not reject the null hypothesis. (Of course, if it were above 2, we
should consider testing for negative autocorrelation instead.)
2 d
0 d
4 d
2 4 0 d
L
d
U
d
crit
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
d
crit
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
127
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If d lies between d
L
and d
U
, we cannot tell whether it is above or below the
critical value and so the test is indeterminate.
2 d
0 d
4 d
2 4 0 d
L
d
U
d
crit
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
d
crit
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
128
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Here are d
L
and d
U
for 45 observations and two explanatory variables, at
the 5% significance level.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
1.43 1.62
(n = 45, k = 3, 5% level)
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
129
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
There are similar bounds for the critical value in the case of negative
autocorrelation. They are not given in the standard tables because
negative autocorrelation is uncommon, but it is easy to calculate them
because are they are located symmetrically to the right of 2.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
2.38 2.57 1.43 1.62
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
(n = 45, k = 3, 5% level)
130
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
So if d < 1.43, we reject the null hypothesis and conclude that there is
positive autocorrelation.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
1.43 1.62 2.38 2.57
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
(n = 45, k = 3, 5% level)
131
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If 1.43 < d < 1.62, the test is indeterminate and we do not come to any
conclusion.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
1.43 1.62 2.38 2.57
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
(n = 45, k = 3, 5% level)
132
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If 1.62 < d < 2.38, we do not reject the null hypothesis of no autocorrelation.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
1.43 1.62 2.38 2.57
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
(n = 45, k = 3, 5% level)
133
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If 2.38 < d < 2.57, we do not come to any conclusion.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
1.43 1.62 2.38 2.57
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
(n = 45, k = 3, 5% level)
134
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If d > 2.57, we conclude that there is significant negative autocorrelation.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
1.43 1.62 2.38 2.57
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
(n = 45, k = 3, 5% level)
135
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Here are the bounds for the critical values for the 1% test, again with 45
observations and two explanatory variables.
2 d
0 d
4 d
2 4 0 d
L
d
U
positive
autocorrelation
negative
autocorrelation
no
autocorrelation
1.24 1.42 2.58 2.76
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
(n = 45, k = 3, 1% level)
136
Here is a plot of the residuals from a logarithmic regression of expenditure on housing services on
income and the relative price of housing services. The residuals exhibit strong positive
autocorrelation.
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
1959 1963 1967 1971 1975 1979 1983 1987 1991 1995 1999 2003
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
137
============================================================
Dependent Variable: LGHOUS
Method: Least Squares
Sample: 1959 2003
Included observations: 45
============================================================
Variable Coefficient Std. Error t-Statistic Prob.
============================================================
C 0.005625 0.167903 0.033501 0.9734
LGDPI 1.031918 0.006649 155.1976 0.0000
LGPRHOUS -0.483421 0.041780 -11.57056 0.0000
============================================================
R-squared 0.998583 Mean dependent var 6.359334
Adjusted R-squared 0.998515 S.D. dependent var 0.437527
S.E. of regression 0.016859 Akaike info criter-5.263574
Sum squared resid 0.011937 Schwarz criterion -5.143130
Log likelihood 121.4304 F-statistic 14797.05
Durbin-Watson stat 0.633113 Prob(F-statistic) 0.000000
============================================================
The d statistic is very low, below d
L
for the 1% significance test (1.24), so we would reject the null
hypothesis of no autocorrelation.
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
d
L
d
U
1.24 1.42
(n = 45, k = 3, 1% level)
138