You are on page 1of 47

Econometrics

OLS Regression with Dummy Variables


Copyright 2009 South-
Western/Cengage Learning
2
Interpretation of Regression Coefficients with a
Binary Regressor
Female is a dummy variable, which indicates whether a person
is female:

Consider a regression of wage on a constant, and Female:

For a male, the regression model is:

0
is the average wage for male workers.
For a female, the regression model is:

0
+
0
is the average wage for female workers.

0
is the difference in average wages between female and
male workers (how much more females earn relative to
males).
3

=
male if , 0
female if , 1
Female
i i i
u Female Wage + + =
0 0
o |
i i
u Wage + =
0
|
i i
u Wage + + =
0 0
o |
Regression for Females
-> sex = 0
Linear regression Number of obs= 12
F( 1, 10) = 2.21
Prob > F = 0.1683
R-squared = 0.2868
Root MSE = .38277
Robust
gpa Coef. Std. Err. t P>t [95% Conf. Interval]
hsgpa .869 .585 1.49 0.168 -.435 2.173
_cons .244 2.26 0.11 0.916 -4.785 5.273

4
Regression for Males
-> sex = 1
Linear regression Number of obs= 22
F( 1, 20) = 15.29
Prob > F = 0.0009
R-squared = 0.2236
Root MSE = .35021
Robust
gpa Coef. Std. Err. t P>t [95% Conf. Interval]
hsgpa .710 .182 3.91 0.001 .331 1.089
_cons .691 .711 0.97 0.342 -.791 2.173

5
GPA Example: Regression with Dummy Variables
regress gpa sex, robust
Linear regression Number of obs = 34
F( 1, 32) = 1.41
Prob > F = 0.2440
R-squared = 0.0443
Root MSE = .40364
Robust
gpa Coef. Std. Err. t P>t [95% Conf. Interval]
sex -.1764 .1486 -1.19 0.244 -.4792 .1264
_cons 3.540 .1231 28.75 0.000 3.2889 .7904
GPA = 3.540 0.1764 SEX, R
2
= 0.04
(0.12) (0.15)








6
Interpretation of Regression Coefficients with a
Binary Regressor
Consider a regression of wage on a constant, and Female:


For a male, the regression model is:


0
is the intercept for male workers.
For a female, the regression model is:


0
+
0
is the intercept for female workers.


0
is the shift in the intercept
7
i i i i
u Educ Female Wage + + + =
1 0 0
| o |
i i i
u Educ Wage + + =
1 0
| |
i i i
u Educ Wage + + + =
1 0 0
| o |
Copyright 2009 South-
Western/Cengage Learning
8
Example: Wage Discrimination
Consider a regression model:



(0.72) (0.26) (0.049) (0.012)


0
= -1.81

In a simple regression:

(0.21) (0.30)

9
i i i i i
u Exper Educ Female Wage + + + + =
2 1 0 0
| | o |
i
i
i i i i
u Exper Educ Female Wage + + + = 025 . 0 572 . 0 81 . 1 57 . 1
i i i
u Female Wage + = 51 . 2 10 . 7
Using Dummy Variables for Multiple Categories
4 groups: married men (MM), married women(MF), single
men (SM) and single women (SF)
Regression model:

(0.100) (0.055) (0.058) (0.056) (0.007) (0.005)

Which one is the base category?

10
i
u
i
Exper
i
Educ
i
SF
i
MF
i
MM
i
Wage Log + + + + = 027 . 0 079 . 0 110 . 0 198 . 0 213 . 0 321 . 0 ) (
Example: Effects of Physical Attractiveness on Wage
3 groups: Below average(BA), above average(AA), and
average(A)
Regression model for men:

(0.100) (0.046) (0.033)
Which one is the base category?
Regression model for women:

(0.100) (0.066) (0.049)

11
i
rs otherfacto
i
AA
i
BA
i
Wage Log + + = 016 . 0 164 . 0 321 . 0 ) (
i
rs otherfacto
i
AA
i
BA
i
Wage Log + + = 035 . 0 124 . 0 200 . 0 ) (
Outline
Last Time: What is a dummy variable?
How to interpret coefficients in a regression with
a dummy variable(s)?
Can we show the coefficient on a dummy
variable on a graph?

Today: Interaction terms and heteroskedasticity
Why do we need interaction terms?
3 types of interaction terms
What are the consequences of and the solution
for heteroskedasticity?



Interaction Terms Involving Dummy Variable
Consider a regression model:

(0.100) (0.056) (0.055) (0.072)


13
i
u
i
Married Female
i
Married
i
Female
i
Wage Log + + + = ... * 301 . 0 213 . 0 110 . 0 321 . 0 ) (
14
Interactions Between Independent
Variables: Test Score Example
- Perhaps the effect of class size reduction is bigger in districts
where many students are still learning English,
- i.e smaller classes help more if there are many English
learners, who need individual attention
- That is,
TestScore
STR
A
A
might depend on PctEL
- More generally,
1
Y
X
A
A
might depend on X
2

- How to model such interactions between X
1
and X
2
?
- We first consider binary Xs, then continuous Xs

15
(a) Interactions Between 2 Binary Variables
Y
i
= |
0
+ |
1
D
1i
+ |
2
D
2i
+ u
i

- D
1i
, D
2i
are binary
- |
1
is the effect of changing D
1
=0 to D
1
=1. In this specification,
this effect doesnt depend on the value of D
2
.
- To allow the effect of changing D
1
to depend on D
2
, include the
interaction term D
1i
D
2i
as a regressor:

Y
i
= |
0
+ |
1
D
1i
+ |
2
D
2i
+ |
3
(D
1i
D
2i
) + u
i


16
Interpreting the Coefficients
Y
i
= |
0
+ |
1
D
1i
+ |
2
D
2i
+ |
3
(D
1i
D
2i
) + u
i


General rule: compare various cases:
E(Y
i
|D
1i
=0) = |
0
+ |
2
D
2
(1)
E(Y
i
|D
1i
=1) = |
0
+ |
1
+ |
2
D
2
+ |
3
D
2
(2)

subtract (2) (1):
E(Y
i
|D
1i
=1) E(Y
i
|D
1i
=0) = |
1
+ |
3
D
2

- The effect of a change in D
1
depends on D
2
(what we wanted)
- |
3
= is the difference in the effect of a change in D
1
on Y
between those who have D
2
= 1 and those who have D
2
=0

Example: ln(wage) vs. gender and
completion of a college degree


F
is the effect of being female on wages,

C
is the effect of a college education on wages.
This regression does not allow the effect of obtaining a college
degree to depend on gender.

If
FC
is statistically different from zero, then the effect of
education on earnings is gender specific.

FC
shows by how much the wage differential between those
with college degree and those without college degree is larger
for females relative to males

17
i Ci C Fi F i
u D D Y + + + = | | |
0
i Ci Fi FC Ci C Fi F i
u D D D D Y + + + + = | | | |
0
18
Example: TestScore, STR, English learners
Let
HiSTR =
1 if 20
0 if 20
STR
STR
>

<

and HiEL =
1 if l0
0 if 10
PctEL
PctEL
>

<



TestScore = 664.1 18.2HiEL 1.9HiSTR 3.5(HiSTRHiEL)
(1.4) (2.3) (1.9) (3.1)

- Effect of HiSTR when HiEL = 0 is 1.9
- Effect of HiSTR when HiEL = 1 is 1.9 3.5 = 5.4
- Class size reduction is estimated to have a bigger effect when
the percent of English learners is large
- This interaction isnt statistically significant: t = 3.5/3.1

19
(b) Interaction between Continuous and
Binary Variables
Y
i
= |
0
+ |
1
D
i
+ |
2
X
i
+ u
i


- D
i
is binary, X is continuous
- The effect of X on Y (holding constant D) = |
2
, which does not
depend on D
- To allow the effect of X to depend on D, include the
interaction term D
i
X
i
as a regressor:

Y
i
= |
0
+ |
1
D
i
+ |
2
X
i
+ |
3
(D
i
X
i
) + u
i


20
b) Interaction between Continuous and
Binary Variables 2 Regression Lines
Y
i
= |
0
+ |
1
D
i
+ |
2
X
i
+ |
3
(D
i
X
i
) + u
i


- For observations with D
i
= 0 (the D = 0 group):

Y
i
= |
0
+ |
2
X
i
+ u
i
- The D=0 regression line

- For observations with D
i
= 1 (the D = 1 group):
Y
i
= |
0
+ |
1
+ |
2
X
i
+ |
3
X
i
+ u
i

= (|
0
+|
1
) + (|
2
+|
3
)X
i
+ u
i
The D=1 regression line
- The 2 regression lines have both different intercepts and
different slopes.

21
Interaction between Continuous and Binary
Variables 2 Regression Lines.
22
Interpreting the Coefficients
Y
i
= |
0
+ |
1
D
i
+ |
2
X
i
+ |
3
(D
i
X
i
) + u
i


General rule: compare the various cases
Y = |
0
+ |
1
D + |
2
X + |
3
(DX) (1)
Now change X:
Y + AY = |
0
+ |
1
D + |
2
(X+AX) + |
3
[D(X+AX)] (2)
subtract (2) (1):
AY = |
2
AX + |
3
DAX
Y
X
A
A
= |
2
+ |
3
D
- The effect of X depends on D (what we wanted)
- |
3
= increment to the effect of X, when D = 1

23
Example: TestScore, STR, HiEL
(=1 if PctEL > 10)
TestScore = 682.2 0.97STR + 5.6HiEL 1.28(STRHiEL)
(11.9) (0.59) (19.5) (0.97)

- When HiEL = 0:
Test Score = 682.2 0.97STR +u
- When HiEL = 1,
Test Score= 682.2 0.97STR + 5.6 1.28STR +u
= 687.8 2.25STR + u
- Two regression lines: one for each HiSTR group.
- Class size reduction is estimated to have a larger effect when
the percent of English learners is large.

24
Example: Testing hypotheses
Test Score = 682.2 0.97STR + 5.6HiEL 1.28(STRHiEL)
(11.9) (0.59) (19.5) (0.97)
- If the two regression lines have the same slope the
coefficient on STRHiEL is zero: t = 1.28/0.97 = 1.32
- The two regression lines have the same intercept the
coefficient on HiEL is zero: t = 5.6/19.5 = 0.29
- The two regression lines are the same population
coefficient on HiEL = 0 and population coefficient on
STRHiEL = 0: F = 89.94 (p-value < .001) !!
- We reject the joint hypothesis but neither individual
hypothesis (how can this be?)
- The regressors are highly correlated large standard errors
on individual coefficients

25
(c) Interaction between 2 Continuous
Variables
Y
i
= |
0
+ |
1
X
1i
+ |
2
X
2i
+ u
i


- X
1
, X
2
are continuous
- As specified, the effect of X
1
doesnt depend on X
2

- As specified, the effect of X
2
doesnt depend on X
1

- To allow the effect of X
1
to depend on X
2
, include the
interaction term X
1i
X
2i
as a regressor:

Y
i
= |
0
+ |
1
X
1i
+ |
2
X
2i
+ |
3
(X
1i
X
2i
) + u
i


26
Interpreting the Coefficients:
Y
i
= |
0
+ |
1
X
1i
+ |
2
X
2i
+ |
3
(X
1i
X
2i
) + u
i


General rule: compare the various cases
Y = |
0
+ |
1
X
1
+ |
2
X
2
+ |
3
(X
1
X
2
) (1)
Now change X
1
:
Y+ AY = |
0
+ |
1
(X
1
+AX
1
) + |
2
X
2
+ |
3
[(X
1
+AX
1
)X
2
] (2)
subtract (2) (1):
AY = |
1
AX
1
+ |
3
X
2
AX
1
or
1
Y
X
A
A
= |
1
+ |
3
X
2
- The effect of X
1
depends on X
2
(what we wanted)
- |
3
= increment to the effect of X
1
from a unit change in X
2


27
Example: TestScore, STR, PctEL
Test Score = 686.3 1.12STR 0.67PctEL + .0012(STRPctEL),
(11.8) (0.59) (0.37) (0.019)

The estimated effect of class size reduction is nonlinear because
the size of the effect itself depends on PctEL:
TestScore
STR
A
A
= 1.12 + .0012PctEL
PctEL
TestScore
STR
A
A

0 1.12
20%
1.12+.001220 = 1.10

28
Example: Hypothesis Tests
Test Score = 686.3 1.12STR 0.67PctEL + .0012(STRPctEL),
(11.8) (0.59) (0.37) (0.019)

- Does population coefficient on STRPctEL = 0?
t = .0012/.019 = .06 cant reject null at 5% level
- Does population coefficient on STR = 0?
t = 1.12/0.59 = 1.90 cant reject null at 5% level
- Do the coefficients on both STR and STRPctEL = 0?
F = 3.89 (p-value = .021) reject null at 5% level(!!)

(Why? high but imperfect multicollinearity)

Copyright 2009 South-
Western/Cengage Learning
29
30
Heteroskedasticity and Homoskedasticity
- What?
- Consequences of homoskedasticity
- Implication for computing standard errors

Homoskedasticity
If var(u|X
i
) is constant that is, if the variance of the
conditional distribution of u given X does not depend on X
then u is said to be homoskedastic. Otherwise, u is
heteroskedastic.

Example: Earnings of male and female
college graduates

Homoskedasticity: Var(u
i
) does not depend on Male
i
For women:
For men:
Homoskedasticity: the variance of earnings is
the same for men and for women

Equal group variances = homoskedasticity
Unequal group variances = heteroskedasticity

31
i i i
u Male Earnings + + =
1 0
| |
i i
u Earnings + =
0
|
i i
u Earnings + + =
1 0
| |
32
Homoskedasticity in a picture:
- E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)
- The variance of u does not depends on x: u is homoskedastic.

33
Heteroskedasticity in a picture:
- E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)
- The variance of u does depend on x: u is heteroskedastic

34
A real-data example from labor economics: average
hourly earnings vs. years of education (Data source:
Current Population Survey)
Heteroskedastic or homoskedastic?
35
So far we have (without saying so) assumed that
u might be heteroskedastic.
Heteroskedasticity and homoskedasticity concern var(u|X=x).
Because we have not explicitly assumed homoskedastic errors,
we have implicitly allowed for heteroskedasticity.

The OLS estimators remain unbiased, consistent and
asymptotically Normal even when the errors are heteroskedastic.

36
What if the errors are in fact homoskedastic?
- If Assumptions 1-4 hold and the errors are homoskedastic, OLS
estimators are efficient (have the lowest variance) among all linear
estimators. (Gauss-Markov theorem).
- The formula for the variance of
1

| and the OLS standard error


simplifies: If var(u
i
|X
i
=x) =
2
u
o , then

=
2
2
1
) (

2
1
)

(
X X
u
n
Var
i
i
|

Note: var(
1

| ) is inversely proportional to var(X): more spread in X


means more information about
1

|

)

( )

(
1 1
| | Var SE =

37
We now have two formulas for standard
errors for
1

|
- Homoskedasticity-only standard errors are valid only if the
errors are homoskedastic.
- The usual standard errors or heteroskedasticity robust
standard errors - are valid whether or not the errors are
heteroskedastic.
- The main advantage of the homoskedasticity-only standard
errors is that the formula is simpler. But the disadvantage is
that the formula is only correct if the errors are
homoskedastic.

38
Practical implications
- The homoskedasticity-only formula for the standard error of
1

| and the heteroskedasticity-robust formula differ so in


general, you get different standard errors using the different
formulas.
- Homoskedasticity-only standard errors are the default setting
in regression software sometimes the only setting (e.g.
Excel). To get the general heteroskedasticity-robust
standard errors you must override the default.
If you dont override the default and there is in fact
heteroskedasticity, your standard errors (and wrong t-
statistics and confidence intervals) will be wrong typically,
homoskedasticity-only SEs are too small.
Consequences of Heteroskedasticity
Homoskedasticity-only standard errors (that
Stata reports) are valid only if the errors are
homoskedastic.
Heteroskedasticity robust standard errors
(that Stata reports when we add the robust
option) - are valid whether or not the errors are
heteroskedastic.

39
Heteroskedasticity-robust standard errors
in STATA
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
- | Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057

If you use the , robust option, STATA computes heteroskedasticity-robust
standard errors
Otherwise, STATA computes homoskedasticity-only standard errors

40
The bottom line:
Heteroskedasticity-robust standard errors are
correct whether the errors are heteroskedastic
or homoskerdastic
If the errors are heteroskedastic and you use
the homoskedasticity-only standard errors,
your standard errors will be wrong
So, you should always use heteroskedasticity-
robust standard errors.

41
Evaluating the Results of Regression Analysis
Testing for Heteroskedasticity:
1. Visual Evidence:
- Does u_hat exhibit any systematic pattern?
- regress y x1 x2 x3
- predict uhat, residuals - Stata will create
a record residuals for the estimated model in a variable
uhat
- gen uhatsq=uhat*uhat
- scatter uhatsq x1
- scatter uhatsq x2
- scatter uhatsq x3

42
Evaluating the Results of Regression
Analysis
2. White Test for Heteroskedasticity
Regresse the squared error term from the OLS regression
on the independent variables in the regression, their
squares and interaction terms.
Calculate R-squared from this regression
n R-squared ~
2
q ,
where q is the number of regressors
including constant
regress y x1 x2 x3
imtest, white
Stata provides p-value for H
0
of homoskedasticity (low p-
value provides evidence for rejecting the null hypothesis)
43
Evaluating the Results of Regression
Analysis
Testing for Normality of the Error Terms
1. Visual Evidence:
- regress y x1 x2 x3
- predict uhat, residuals
- Histogram uhat, normal Stata will build
the histogram for the residuals and plot it on te
same graph with a normal density function


44
Evaluating the Results of Regression
Analysis
45
0
5
.
0
e
-
0
4
.
0
0
1
.
0
0
1
5
D
e
n
s
i
t
y
-1000 0 1000 2000
Residuals
Evaluating the Results of Regression
Analysis
2. Jarque-Bera Test for Normality
H
0
: error terms are Normal
Test statistic:

The 5% critical value is 5.99; if JB > 5.99, reject the
null of normality.
In Stata, there is no command to calculate this test
statistic directly.
summarize uhat, detail
Calculate JB test statistic manually


46
summarize uhat, detail
Residuals Percentiles
1% -661.5541 -835.7715
5% -507.4577 -799.6147
10% -416.4949 -723.7985 Obs 935
25% -249.1033 -721.2892 Sum of Wgt. 935
50% -42.96934 Mean 1.09e-07
Std. Dev. 372.7708
75% 197.4006 1544.31
90% 459.8176 1788.275 Variance 138958.1
95% 625.2347 2005.186 Skewness 1.210522
99% 1168.44 2225.102 Kurtosis 6.533908
JB = (n/6)*(skewness^2+(((kurtosis-3)^2)/4)) =715
Reject the null hypothesis of normality.
47

You might also like