Professional Documents
Culture Documents
6
Regression
Analysis
Outline
Introduc)on
Simple
Linear
Regression
Mul)ple
Regression
Logis)c
Regression
Introduc8on
Predic)ng
a
variable
by
using
another
variable
or
set
of
variables
Y
=
DV.
Predicted.
Criterion
variable
X
=
IV.
Predictor
variable.
Causal
variable
Simple
Linear
Regression:
One
IV
and
One
DV
Mul8ple
Linear
regression:
Mul)ple
IVs
and
One
DV
Logis8c
Regression:
Dichotomous
DV
Model
Y = + X + !
Y
is
the
dependent
variable
or
criterion
variable
or
eect
X
is
the
independent
variable
or
predictor
variable
or
cause
is
the
popula)on
regression
constant
or
Y
intercept
of
regression
line
is
the
popula)on
regression
coecient
or
slope
of
regression
line
or
per
unit
change
in
Y
as
X
changes
by
one
unit
is
the
residual
or
error
in
the
regression
equa)on
Model
Y = a + bX + e !
a
is
an
es)mator
of
or
sample
regression
constant
b
is
an
es)mator
of
or
sample
regression
coecient.
X
and
Y
are
sample
values
of
respec)ve
variables
e
is
sample
residual
Example
Researcher
is
interested
in
predic)ng
work
performance
by
using
conscien)ousness
personality
dimension
Popula)on
level
Performance = + ( Conscientiousness ) + !
Sample
level
Performance = a + ( b Conscientiousness ) + e !
joint
distribu)on
of
(xy)
are
not
falling
on
the
straight
line.
The
regression
line
has
a
slope
b
which
is
the
es)mator
of
popula)on
slope
.
The
regression
line
also
has
a
Y
intercept
a
which
is
an
es)mator
of
.
The
OLS
logic
of
ploRng
the
line
is
explained
in
the
subsequent
sec)on.
Since
this
is
a
sample
plot
we
should
have
wriTen
a
and
b
instead
of
and
.
Ordinary
Least
Squares
(OLS)
Ordinary
Least
Squares
(OLS)
is
the
es)ma)on
method
used
to
es)mate
the
parameter
values
and
Best
Fit
The
OLS
es)ma)on
of
beta
(
)
that
is
OLS
es)mator
b
is
es)mated
such
that
the
sum
of
squares
of
dierence
between
the
actual
Y
value
and
predicted
Y
values
is
the
smallest.
2
e = min !
Covxy =
( x x )( y y )
!
n 1
n
( X X)
2
SX2 = i=1
!
n 1
Covxy
b= 2
!
Sx
n
( X X )(Y Y ) n n
Covxy
i=1
( X X )(Y Y ) x y i i
bYX = = n 1 = i=1
= i=1
!
n n n
S X2
( X X) ( X X) x
2 2 2
i
i=1 i=1 i=1
n 1
Compute
a
and
b
using
Y bX = a ! example
6.1
and
6.2
GaussMarkov
Theorem
When
errors
have
expecta)on
zero,
errors
are
uncorrelated,
and
have
equal
variance
and
an
ordinary
least
squares
(OLS)
es)mator
gives
the
best
linear
unbiased
es)mator
(BLUE)
of
the
popula)on
regression
coecient
Standardized
Regression
Coecients
When
X
and
Y
are
converted
into
standardized
forms
and
regression
is
carried
out,
the
regression
coecients
obtained
are
called
as
standardized
regression
coecients.
The
covariance
(X,
Y)
is
correla)on
(X,
Y).
b
is
equal
to
correla)on
Y.X.
Accuracy
of
Predic8on
n
SSregression
(Yi Y ) r =
2
SSresiduals
2
!
SY X = i=1
= ! SSTotal
df n2
n n
SSregression = (Yi Y ) ! SSresidual = (Y Yi ) !
2 2
i=1 i=1
n 1
SY X = SY (1 r ) n 2 !
2 Compute
Accuracy
of
predic)on
using
Example
6.2
from
the
book
Hypothesis
Tes8ng
H0 : = 0 !
HA : 0 !
SSTotal = SSregression + SSresidual !
n n n
(Y Y ) = (Y Y ) + (Y Y )
2 2 2
i i i !
i=1 i=1 i=1
MSEregression
F= !
MSEresidual
Tes8ng
Hypothesis
Source' SS' df# MSE# F#
Regression/' n k"=!1! SSregression dfregression ! MSEregression
(Yi Y ) !
2
Explained' F= !
i=1
! MSEresidual
Residual/' n =n k 1 ! SSresidual dfresidual ! !
(Y Y )
2
Unexplained' i ! ! !
i=1
! !
Total' n
n!!1! !
(Y Y )
2
i !
i=1
!
!
t=
b 1
=
b
=
(
b ( SX ) n 1
!
)
sb SY X SY X
SX n 1
Y = a + b1 X1 + b2 X2 + +bk X k + e !
a !is!a!sample!estimate!of!population!intercept!
b1 ,! b2 ,!, bk are!sample!estimates!of!population! 1 ,! 2 k regression!
coefficients!respectively!
e is!a!sample!error!term!
!
Matrix
Equa8ons
R = R yi Bi !
2
R 2 is#percentage#of#variance#explained#by#regression#equation#
R yi row#matrix#of#correlations#between#the#k#IVs#and#DV#
Bi column#matrix#of#regression#coefficients#with#same#k#IVs#
# 1
Bi = R ii R iy !
where,&
Bi column&vector&of&standardized®ression&coefficients&&
R ii1 is&a&inverse&of&the&matrix&of&correlations&among&the&IVs&
R iy is&a&column&matrix&of&correlations&between&the&DVs&and&the&IVs&
&
Compu8ng
a
and
bi
SY
bi = Bi !
Si
where,&&
bi &is&unstandardized®ression&coefficient&associated&with&variable&i&
Bi &is&standardized®ression&coefficient&associated&with&variable&i"
SY is&standard&deviation&of&dependent&variable&
Si &is&standard&deviation&of&ith&independent&variable&
k
a = Y ( bi Xi ) !
i=1
2
(
C jj = diag ( X X )
1
) !
SE ( bi ) = C jj !
bi
ti = !
SE ( bi )
R
Code
for
Mul8ple
Regression
mr
<-
read.csv("/Users/macbook/Desktop/MR.csv",
header
=
T)
#
call
data
matrix
a]ach(mr)
#
a]ach
data
matrix
t
<-
lm(EP
~
EI
+
c
+
St
+
+GI)
#
run
mul8ple
regression
summary(t)
#
Summary
output
#
Output
Call:
lm(formula
=
EP
~
EI
+
c
+
St
+
+GI)
Residuals:
Min
1Q
Median
3Q
Max
-1.6955
-0.7213
-0.1629
0.8084
1.6715
Coecients:
Es)mate
Std.
Error
t
value
Pr(>|t|)
(Intercept)
1.02380
2.62148
0.391
0.70026
EI
0.14885
0.04002
3.720
0.00135
**
C
0.24610
0.05037
4.886
8.94e-05
***
St
-0.19680
0.09996
-1.969
0.06299
.
GI
0.23006
0.05476
4.201
0.00044
***
---
Signif.
codes:
0
***
0.001
**
0.01
*
0.05
.
0.1
1
Residual
standard
error:
1.061
on
20
degrees
of
freedom
Mul)ple
R-squared:
0.7552,
Adjusted
R-squared:
0.7062
F-sta)s)c:
15.42
on
4
and
20
DF,
p-value:
6.612e-06
Types
of
Mul8ple
Regression
Standard
Mul8ple
Regression:
unique
contribu)on
of
the
IV
to
DV
Sequen8al
Mul8ple
Regression
(Hierarchical
Mul8ple
Regression):
IVs
entered
one
aqer
another
in
order
specied
by
researchers
Sta8s8cal
or
Stepwise
Regression:
IVs
entered
one
aqer
another
in
order
using
sta)s)cal
criterion
sta)s)cal
selec)on
in
three
dierent
ways:
(i)
Forward
selec)on,
(ii)
Backward
elimina)on
(iii)
Stepwise
regression
Addi8onal
Model
Selec8on
Criterion
Model
Selec8on:
Which
of
the
IVs
to
be
kept
in
the
nal
model
The
R2
and
adjusted
R2
n
2 k ei
2
2k
RSS
Akaike
Informa8on
Criterion
(AIC)
AIC = e n i=1 = e n !
n n
n
Yi = + Xi* + i !
Yi = + ( Xi ei ) + i
!
Yi = + Xi ei + i
Simple
Media8on
Model
One
causal
variable
(X),
one
consequent
variable
(Y),
and
one
mediator
variable
(M).
Direct
eect
is
the
eect
of
causal
variable
(X)
on
consequence
variable
(Y)
without
any
other
variable
in
between.
Indirect
eect
is
the
eect
of
causal
variable
on
consequence
variable
through
media)ng
variable.
M
Use
R
Code
6.9
(Self-esteem)
for
media)onal
analysis
X
Y
(Op)mism)
c
(Happiness)
Direct Effect
Moderated
Variable
Regression
A
criterion
(Y),
a
predictor
(X),
and
a
moderator
(M)
variable
are
modeled
in
hierarchical
regression
equa)on
Y = a + b1 Xi + b2 M i + ei !
Y = a + b1 Xi + b2 M i + a + b3 Xi M i + ei !
The
slope
for
predictor
changes
as
the
levels
of
M
changes
Null
is
tested
Logis8c
Regression
Logis)c
regression
is
useful
to
make
predic)ons
when
the
dependent
variable
is
dichotomous
and
independent
variables
are
con)nuous.
u
The
DV,
that
is,
Y
is
either
0
or
1.
e
Y = i!
1+ eu
Y
ln = a + b1 Xi !
1 Y
ea+b1Xi
( x) = a+b1 Xi
!
1+ e
The
(x)
=
P(Y
=
1|X)
and
1
(x)
=
P(Y
=
0|X)
Logis8c
Regression
( xi ) 1 ( xi )
yi 1yi
!
n
l( ) = ( xi ) 1 ( xi )
yi 1yi
!
i=1
{ }
n
L( ) = ln [ l( )] = yi ln ( xi ) + (1 yi ) ln 1 ( xi ) !
i=1
Tes8ng
Signicance
likelihood constant only
G = = 2 ln
2
!
likelihood with Variables
Tes8ng
Signicance
and
Related
Topics
Wald
Sta8s8cs
R
and
R2
Cox
and
Snell
R2,
McFadden
R2,
and
Nagelkerke
R2.
Informa8on
criterion
AIC
and
BIC
Use
R
Code
6.11
to
carry
The
odds
ra8o
out
Logis)cs
regression
Classica8on
from
the
book