Professional Documents
Culture Documents
Multiple Regression
and Correlation
Analysis
1 of 56
Learning Objectives
EXAMPLE Multiple
Regression Analysis
Interpretation
Y' 140.3187 - 12.3168X1 - 4.10131X 2 8.526762X 3
The age of furnace variable shows a direct
relationship. With an older furnace, the cost to heat
the home increases. Specifically, for each additional
year old the furnace is, we expect the cost to increase
$8.53 per month.
10
11
12
13
14
15
3.
4.
16
17
18
..
Y
1 1
2
2
k
k
19
H 0 : 1 2 ... k 0
H1 : Not all ' s equal 0
The test statistic is the F distribution with k (number
of independent variables) and
n-(k+1) degrees of freedom, where n is the sample
size
April 25, 2015
20
21
22
23
24
25
26
27
Evaluating the
Assumptions of Multiple Regression
1.
2.
3.
4.
5.
28
Correlation Matrix
A correlation matrix is a matrix showing the coefficients
29
Multicollinearity
Multicollinearity exists when independent variables (Xs)
are correlated
Correlated independent variables make it difficult to
make inferences about the individual regression
coefficients (slopes) and their individual effects on the
dependent variable (Y)
However, correlated independent variables do not affect a
multiple regression equations ability to predict the
dependent variable (Y)
30
Effect of Multicollinearity
Not a Problem: Multicollinearity does not affect a
multiple regression equations ability to predict the
dependent variable
A Problem: Multicollinearity may show unexpected
results in evaluating the relationship between each
independent variable and the dependent variable
(a.k.a. partial correlation analysis)
31
32
Analysis of Residuals
The best regression line passes through the centre of the
data in a scatter plot
You would find a good number of the observations above
the regression line (these residuals would have a positive
sign), and a good number of the observations below the
line (these residuals would have a negative sign)
Further, the observations would be scattered above and
below the line over the entire range of the independent
variable
April 25, 2015
33
34
35
Distribution of Residuals
Ideally, the residuals should follow a normal probability distribution
To evaluate this assumption, we can organize the residuals into a
frequency distribution
Another graph that helps to evaluate the assumption of normality
distributed residuals is called a normal probability plot
36
37
Independent Observations
The fifth assumption about regression and
correlation analysis is that successive
residuals should be independent
When successive residuals are correlated we
refer to this condition as autocorrelation
Autocorrelation frequently occurs when the
data are collected over a period of
time(i.e.Time series data)
April 25, 2015
38
39
40
41
For the house with an attached garage, a 1 is substituted for X4 in the regression
equation. The estimated heating cost is $244.00, found by:
=169.28-10.39(-3.4)-2.73(7.5)+59.87(1)=244.0
Dr. Mageed Hussain
42
H0: 4 = 0
H1: 4 0
April 25, 2015
43
44