You are on page 1of 34

Chapter

6
Regression Analysis
Outline

Introduc)on
Simple Linear Regression
Mul)ple Regression
Logis)c Regression
Introduc8on
Predic)ng a variable by using another variable or set
of variables
Y = DV. Predicted. Criterion variable
X = IV. Predictor variable. Causal variable
Simple Linear Regression: One IV and One DV
Mul8ple Linear regression: Mul)ple IVs and One DV
Logis8c Regression: Dichotomous DV

Model
Y = + X + !
Y is the dependent variable or criterion variable or eect
X is the independent variable or predictor variable or cause
is the popula)on regression constant or Y intercept of
regression line
is the popula)on regression coecient or slope of
regression line or per unit change in Y as X changes by
one unit
is the residual or error in the regression equa)on
Model
Y = a + bX + e !
a is an es)mator of or sample regression constant
b is an es)mator of or sample regression
coecient.
X and Y are sample values of respec)ve variables
e is sample residual

Example
Researcher is interested in predic)ng work
performance by using conscien)ousness
personality dimension
Popula)on level
Performance = + ( Conscientiousness ) + !
Sample level
Performance = a + ( b Conscientiousness ) + e !
joint distribu)on of (xy) are not
falling on the straight line. The
regression line has a slope b
which is the es)mator of
popula)on slope . The regression
line also has a Y intercept a
which is an es)mator of . The
OLS logic of ploRng the line is
explained in the subsequent
sec)on. Since this is a sample plot
we should have wriTen a and b
instead of and .
Ordinary Least Squares (OLS)
Ordinary Least Squares (OLS) is the es)ma)on
method used to es)mate the parameter values
and
Best Fit
The OLS es)ma)on of beta ( ) that is OLS
es)mator b is es)mated such that the sum of
squares of dierence between the actual Y value
and predicted Y values is the smallest. 2
e = min !
Covxy =
( x x )( y y )
!
n 1
n

( X X)
2

SX2 = i=1
!
n 1

Covxy
b= 2
!
Sx
n

( X X )(Y Y ) n n

Covxy
i=1
( X X )(Y Y ) x y i i
bYX = = n 1 = i=1
= i=1
!
n n n
S X2
( X X) ( X X) x
2 2 2
i
i=1 i=1 i=1
n 1
Compute a and b using
Y bX = a ! example 6.1 and 6.2
GaussMarkov Theorem
When errors have expecta)on zero, errors
are uncorrelated, and have equal variance
and an ordinary least squares (OLS)
es)mator gives the best linear unbiased
es)mator (BLUE) of the popula)on
regression coecient
Standardized Regression
Coecients
When X and Y are converted into
standardized forms and regression is carried
out, the regression coecients obtained are
called as standardized regression coecients.
The covariance (X, Y) is correla)on (X, Y).
b is equal to correla)on Y.X.
Accuracy of Predic8on
n
SSregression
(Yi Y ) r =
2

SSresiduals
2
!
SY X = i=1
= ! SSTotal
df n2

n n
SSregression = (Yi Y ) ! SSresidual = (Y Yi ) !
2 2

i=1 i=1

n 1
SY X = SY (1 r ) n 2 !
2 Compute Accuracy of predic)on
using Example 6.2 from the book
Hypothesis Tes8ng
H0 : = 0 !
HA : 0 !
SSTotal = SSregression + SSresidual !
n n n

(Y Y ) = (Y Y ) + (Y Y )
2 2 2
i i i !
i=1 i=1 i=1

MSEregression
F= !
MSEresidual
Tes8ng Hypothesis
Source' SS' df# MSE# F#
Regression/' n k"=!1! SSregression dfregression ! MSEregression
(Yi Y ) !
2
Explained' F= !
i=1
! MSEresidual
Residual/' n =n k 1 ! SSresidual dfresidual ! !
(Y Y )
2
Unexplained' i ! ! !
i=1

! !
Total' n
n!!1! !
(Y Y )
2
i !
i=1

!
!

Compute the F value using example 6.1 and 6.2


Tes8ng Signicance of b
SY X
sb = !
SX n 1

t=
b 1
=
b
=
(
b ( SX ) n 1
!
)
sb SY X SY X
SX n 1

Condence Interval for b


SY X
CI(1 ) = b ( t /2 ) !
SX n 1
Use R Code 6.2 to carry out regression analysis for
Example 6.1 and 6.2
Mul8ple Regression
Predict a dependent variable (DV) using
mul)ple predictor variables (IVs).
For example, employee performance can be
predicted by conscien)ousness,
organiza)onal commitment, work LOC,
agreeableness, job sa)sfac)on, job
characteris)cs and intelligence
Equa8on
Y = + 1 X1 + 2 X2 + + k X k + !
Y is#a#dependent#variable#
#is#a#population#intercept#
1 ,# 2 k #are#population#regression#coefficients#associated#respectively#
with# X1 ,# X2 ,#, X k independent#variables#
is#error#associated#with#the#population#regression#equation.#
Y = a + b1 X1 + b2 X2 + +bk X k !

Y = a + b1 X1 + b2 X2 + +bk X k + e !
a !is!a!sample!estimate!of!population!intercept!
b1 ,! b2 ,!, bk are!sample!estimates!of!population! 1 ,! 2 k regression!
coefficients!respectively!
e is!a!sample!error!term!
!
Matrix Equa8ons
R = R yi Bi !
2

R 2 is#percentage#of#variance#explained#by#regression#equation#
R yi row#matrix#of#correlations#between#the#k#IVs#and#DV#
Bi column#matrix#of#regression#coefficients#with#same#k#IVs#
# 1
Bi = R ii R iy !
where,&
Bi column&vector&of&standardized&regression&coefficients&&
R ii1 is&a&inverse&of&the&matrix&of&correlations&among&the&IVs&
R iy is&a&column&matrix&of&correlations&between&the&DVs&and&the&IVs&
&
Compu8ng a and bi
SY
bi = Bi !
Si
where,&&
bi &is&unstandardized&regression&coefficient&associated&with&variable&i&
Bi &is&standardized&regression&coefficient&associated&with&variable&i"
SY is&standard&deviation&of&dependent&variable&
Si &is&standard&deviation&of&ith&independent&variable&
k
a = Y ( bi Xi ) !
i=1

Solve Example 6.4 using these equa)ons.


Tes8ng Signicance of Each
Predictor Variable
H 0 : i = 0 ! H A : i 0 !

2
(
C jj = diag ( X X )
1
) !
SE ( bi ) = C jj !
bi
ti = !
SE ( bi )
R Code for Mul8ple Regression
mr <- read.csv("/Users/macbook/Desktop/MR.csv", header = T) # call data matrix
a]ach(mr) # a]ach data matrix
t <- lm(EP ~ EI + c + St + +GI) # run mul8ple regression
summary(t) # Summary output

# Output
Call:
lm(formula = EP ~ EI + c + St + +GI)
Residuals:
Min 1Q Median 3Q Max
-1.6955 -0.7213 -0.1629 0.8084 1.6715

Coecients:
Es)mate Std. Error t value Pr(>|t|)
(Intercept) 1.02380 2.62148 0.391 0.70026
EI 0.14885 0.04002 3.720 0.00135 **
C 0.24610 0.05037 4.886 8.94e-05 ***
St -0.19680 0.09996 -1.969 0.06299 .
GI 0.23006 0.05476 4.201 0.00044 ***
---
Signif. codes:
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.061 on 20 degrees of freedom
Mul)ple R-squared: 0.7552, Adjusted R-squared: 0.7062
F-sta)s)c: 15.42 on 4 and 20 DF, p-value: 6.612e-06
Types of Mul8ple Regression
Standard Mul8ple Regression: unique contribu)on of the IV to DV
Sequen8al Mul8ple Regression (Hierarchical Mul8ple
Regression): IVs entered one aqer another in order specied by
researchers
Sta8s8cal or Stepwise Regression: IVs entered one aqer another
in order using sta)s)cal criterion
sta)s)cal selec)on in three dierent ways:
(i) Forward selec)on,
(ii) Backward elimina)on
(iii) Stepwise regression
Addi8onal Model Selec8on
Criterion
Model Selec8on: Which of the IVs to be kept in the nal
model
The R2 and adjusted R2 n

2 k ei
2
2k
RSS
Akaike Informa8on Criterion (AIC) AIC = e n i=1 = e n !
n n
n

Schwarz Informa8on Criterion (SIC) ei2


RSS
SIC = n k n i=1 = n k n !
n n
Mallowss Cp criterion RSS
Cp = p
(n 2 p) !
2
Issues in Mul8ple Regression
Sample size
Mul8variate outliers
Singularity and mul8collinearity
Heteroscedas8city
Independence of errors
Model specica8on errors
(a) omiRng a relevant variable, (b) including unnecessary
variable, (c) assuming incorrect func)onal form and (d)
errors in the measurement of X and Y.
Mul8collinearity
Problem: Perfect mul)collinearity leads to singularity of matrix, that
is, Rii-1 doesnt exists.
Detec8ng mul8collinearity
Insignicant t-values in spite of high R2
Correla)ons among the predictors, tolerance, and VIF
Determinant of XX
Theils mul)collinearity eect
Other methods
Solu8ons
Drop the variable; ridge regression; PCA; transforming variable;
and add new data
Heteroscedas8city
Problem: distribu)on of errors across the IVs is NOT uniform and stochas)c
Detec8ng Heteroscedas8city
Graphical methods
Glejser test
Park test Use R Code 6.8
GoldfeldQuandt (GQ) test from the book
BreuschPaganGodfrey test
Whites test for heteroscedas)city
Other tests
Dealing with Heteroscedas8c Data
Use WLS when i2 is known, use Whites heteroscedas)city-consistent
variances and standard errors, transforming variable
Model Specica8on Errors
(i) OmiRng relevant variable form regression
model
(ii) Including unnecessary variable(s) in the
model
nested and non-nested models
The J test
(iii) Incorrect func)onal form
Errors in Measurement
Dependent variable is measures with error and
independent variable is measured accurately.
Yi = + Xi + i ei !
Explanatory variable is measured with error and DV is
measured accurately.

Yi = + Xi* + i !
Yi = + ( Xi ei ) + i
!
Yi = + Xi ei + i
Simple Media8on Model
One causal variable (X), one consequent variable (Y), and one
mediator variable (M).
Direct eect is the eect of causal variable (X) on consequence
variable (Y) without any other variable in between.
Indirect eect is the eect of causal variable on consequence
variable through media)ng variable.



M Use R Code 6.9



(Self-esteem)
for media)onal


analysis

X Y

(Op)mism) c (Happiness)

Direct Effect
Moderated Variable Regression
A criterion (Y), a predictor (X), and a moderator (M)
variable are modeled in hierarchical regression equa)on
Y = a + b1 Xi + b2 M i + ei !

Y = a + b1 Xi + b2 M i + a + b3 Xi M i + ei !
The slope for predictor changes as the levels of M
changes
Null is tested
Logis8c Regression
Logis)c regression is useful to make predic)ons when the
dependent variable is dichotomous and independent variables
are con)nuous. u
The DV, that is, Y is either 0 or 1. e
Y = i!
1+ eu
Y
ln = a + b1 Xi !
1 Y
ea+b1Xi
( x) = a+b1 Xi
!
1+ e
The (x) = P(Y = 1|X) and 1 (x) = P(Y = 0|X)
Logis8c Regression
( xi ) 1 ( xi )
yi 1yi
!
n
l( ) = ( xi ) 1 ( xi )
yi 1yi
!
i=1

{ }
n
L( ) = ln [ l( )] = yi ln ( xi ) + (1 yi ) ln 1 ( xi ) !
i=1

Tes8ng Signicance
likelihood constant only
G = = 2 ln
2
!
likelihood with Variables
Tes8ng Signicance and
Related Topics
Wald Sta8s8cs
R and R2
Cox and Snell R2, McFadden R2, and Nagelkerke R2.
Informa8on criterion
AIC and BIC Use R Code 6.11 to carry
The odds ra8o out Logis)cs regression
Classica8on from the book

You might also like