You are on page 1of 32

Regression, Correlation and

Index
Chapter 12, 13, 16
Statistics for Management
Levin and Rubin

1
Regression
• In statistics, regression analysis is a method for
explanation of phenomena and prediction of
future events. In the regression analysis, a
coefficient of correlation r between
random variables X and Y is a quantitative index
of association between these two variables. In its
squared form, as a coefficient of determination r 2 ,
indicates the amount of variance in the criterion
variable Y that is accounted for by the variation in
the predictor variable X.

2
Regression

3
REGRESSION
• Regression Analysis: The study of the
relationship between variable. One of the most
commonly used tools for business analysis
1.Simple Regression:
– single explanatory variable
– straight-line relationship . Form: y=mx+b
1.Multiple Regression: includes any number of
explanatory variables.
– Non-linear: implies curved relationships, for
example logarithmic relationships

4
Types of Variable
• Dependant variable: the single
variable being explained/ predicted
by the regression model (response
variable)
• Independent variable: The
explanatory variable(s) used to
predict the dependant variable.
(predictor variable)

5
Data Type
• Cross Sectional: data
gathered from the same
time period
• Time Series: Involves data
observed over equally
spaced points in time.
6
Simple Regression Model
• Best fit using least squares method
• Can use to explain or forecast
• y = a + bx + e (Note: y = mx + b)
– Coefficients: a and b
– Variable a is the y intercept
– Variable b is the slope of the line
– Error (residual) is difference of actual data
point and the forecasted value of dependant
variable y given the explanatory variable x.

7
CORRELATION
• The Coefficient of Determination: Measure
of extent, or strength, of the association
that exists between two variables.
– Denoted by r2
– Two extreme values are 0 and 1.
• Correlation describes the strength of a
linear relationship

– It is described as between –1 and +1


– -1 strongest negative
– +1 strongest positive
– 0= no apparent relationship exists

8
Correlation
Thecorrelation
The correlationbetween
betweentwo
tworandom
randomvariables,
variables,XXand
andY,
Y,isisaameasure
measureof
ofthe
the
degreeof
degree of linear
linearassociation
associationbetween
betweenthe
thetwo
twovariables.
variables.

Thepopulation
The populationcorrelation,
correlation,denoted
denotedbyby ρρ , ,can
cantake
takeon
onany
anyvalue
valuefrom
from-1
-1toto
1.1.
ρρ == −−11 indicates
indicatesaaperfect
perfectnegative
negativelinear
linearrelationship
relationship
-1<<ρρ <<00
-1 indicatesaanegative
indicates negativelinear
linearrelationship
relationship
ρρ == 00 indicates
indicatesno nolinear
linearrelationship
relationship
00<<ρρ <<11 indicates
indicatesaapositive
positivelinear
linearrelationship
relationship
ρρ == 11 indicates
indicatesaaperfect
perfectpositive
positivelinear
linearrelationship
relationship

Theabsolute
The absolutevalue ofρρ indicates
valueof indicatesthe
thestrength
strengthor
orexactness
exactnessof
ofthe
the
relationship.
relationship.
Illustrations of Correlation

Y Y Y
ρ = -1 ρ =0
ρ =1

X X X

Y ρ = -.8 Y ρ =0 Y
ρ = .8

X X X
Covariance and Correlation
The covariance of two random variables X and Y:
Cov ( X , Y ) = E [( X − µ )(Y − µ )]
X Y
where µ and µ Y are the population means of X and Y respectively.
X

The population correlation coefficient:


Cov ( X , Y )
ρ=
σ σ
X Y

The sample correlation coefficient * :


SS
r= XY
SS SS
X Y

*Note:
Note: If ρ < 0, b1 < 0 If ρ = 0, b1 = 0 If ρ > 0, b1 >0
Hypothesis Tests about the
Regression Relationship
Constant Y Unsystematic Variation Nonlinear Relationship
Y Y Y

X X X
A hypothesis test for the existence of alinear relationshipbetween Xand Y:
H 0 :β 1 = 0
H1 :β 1 ≠ 0
Test statistic for he
t existence of a linear relationship between X and Y:
b
t = 1
(n - 2) s(b )
1
where b is the least -squares estimate ofthe regression slopeand s(b ) is the tsandard error of b .
1 1 1
When thenull hypothesis is rt ue, the statistic has at distribution withn - 2 degrees of freedom.
Hypothesis Tests for the
Regression Slope
Example 10 - 1: Example 10 - 4 :
H0: β1 = 0 H : β =1
0 1
H1: β 1 ≠ 0 H : β ≠1
1 1
b b −1
1 t = 1
t =
(n - 2) s(b ) ( n - 2) s (b )
1
1
1.24 - 1
1.25533 = =1.14
= = 25.25 0.21
0.04972
t =1.671 >1.14
t = 2.807 < 25.25 (0.05 ,58 )
( 0. 005 , 23 ) H is not rejected at the 10% level.
0
H 0 is rejected at the 1% level and we may
We may not conclude that the beta
conclude that there is a relationship between
coefficien t is different from 1.
charges and miles traveled.
How Good is the
Regression?
The coefficient of determination, r2, is a descriptive measure of the strength of
the regression relationship, a measure of how well the regression line fits the data.
( y − y ) = ( y − y) + ( y − y )
Y Total = Unexplained Explained
Deviation Deviation Deviation
. (Error) (Regression)

}
Y

Y
Unexplained Deviation
{ Total Deviation
2 2
∑ ( y − y ) = ∑ ( y − y) + ∑ ( y − y )
2

Y
Explained Deviation
{ SST = SSE + SSR

SSR SSE Percentage of


2
r = = 1− total variation
SST SST
X explained by
X the regression.
The Coefficient of
Determination
Y Y Y

X X X
SST SST S
SST
r2 = 0 SSE r2 = 0.50 SSE SSR r2 = 0.90 S
E
SSR

Exam ple10 - 1 :

SSR 64527736
.8
r = 2
= = 0.96518
SST 66855898
Problem 12-37:
Realtors are often interested in seeing how appraised
value of a home varies according to the size of the
home. Some data on area (in thousands of square feet)
and appraised value (in thousands of dollars) for a
sample of 11 homes follow.

Area 1.1 1.5 1.6 1.6 1.4 1.3 1.1 1.7 1.9 1.5 1.3
Value 75 95 110 102 95 87 82 115 122 98 90

1. Calculate correlation.
2. Estimate the least-square regression to predict
appraised value from size. 16
Correlation

17
A
Regression Table

18
Regression Estimation

Coefficients Standard t Stat P-value


Error

Intercept 15.97005988 6.518914 2.449804 0.036769

Area 55.95808383 4.424401 12.64761 4.91E-07

19
Multiple Regression &
Correlation
• In the multiple regression analysis, the set of
predictor variables X1, X2, ... is used to explain
variability of the criterion variable Y. A multivariate
counterpart of the coefficient of determination r 2 is
the coefficient of multiple determination, R 2 . The
square root of the coefficient of multiple
determination is the coefficient of multiple
correlation, R.

20
Multiple Regression
• The dependent variable (Y) may be a function of more
than 1 explanatory variable. (X1, X2..Xn)
– Weekly household Consumption is dependent on Income,
Wealth, size of family etc
• The two variable model can be extended to a
multiple/multivariate regression model
•  Smallest multivariate regression model will have at
least 2 explanatory variables or a 3variable model.
• The betas (β) are known as partial regression or partial
slope coefficients

21
Multiple Regression in
Practice
• The value of outcome variable depends on several
explanatory variables.
• F-test. To judge whether the explanatory variables
in the model adequately describe the outcome
variable.
• t-test. Applies to each individual explanatory
variable. Significant t indicates whether the
explanatory variable has an effect on outcome
variable while controlling for other X’s.

22
Interpreting the coefficients
• Each slope is a partial slope indicating
change in Y given 1 unit change in the
respective explanatory variable
keeping the other explanatory
variables constant.
• One cannot make decision on the
relative importance of the different
explanatory variables based on the
partial slopes
23
Data
Medical Sick Sick
Family Earning Education
Exp Member Days
100.00 20000.00 11.00 3.00 23.00
800.00 23000.00 2.00 1.00 15.00
140.00 24000.00 14.00 3.00 32.00
300.00 48000.00 50.00 2.00 25.00
350.00 61500.00 18.00 2.00 37.00
60.00 58000.00 25.00 2.00 13.00
200.00 23000.00 .00 3.00 41.00
350.00 26800.00 15.00 2.00 37.00
50.00 42000.00 33.00 1.00 30.00
300.00 30000.00 8.00 2.00 60.00
140.00 28800.00 7.00 2.00 22.00
50.00 33000.00 3.00 2.00 46.00
150.00 36000.00 .00 2.00 60.00
100.00 36000.00 21.00 1.00 8.00
100.00 18200.00 5.00 1.00 15.00
200.00 32900.00 16.00 3.00 25.00
280.00 19200.00 .00 2.00 18.00
1220.00 85000.00 58.00 3.00 40.00
350.00 42000.00 12.00 2.00 12.00
275.00 58000.00 11.00 2.00 18.00
100.00 60000.00 9.00 1.00 15.00
50.00 42000.00 27.00 1.00 6.00

50.00 6000.00 8.00 1.00 20.00


1800.00 93000.00 40.00 2.00 30.00
600.00 60000.00 8.00 4.00 70.00
120.00 28800.00 .00 1.00 15.00
24
Regression Analysis

Correlation Analys
Me
E
Medical Exp 1 25
Regression Analysis

ANOVA

Regression 26
Index Numbers

• Price Relatives
• Aggregate Price Indexes
• Computing an Aggregate Price Index
from Price Relatives
• Some Important Price Indexes
• Deflating a Series by Price Indexes
• Price Indexes: Other Considerations
• Quantity Indexes

27
Price Relatives
• Price relatives are helpful in
understanding and interpreting
changing economic and business
conditions over time.

28
Price Relatives
• A price relative shows how the current
price per unit for a given item compares
to a base period price per unit for the
same item.
• A price relative expresses the unit price
in each period as a percentage of the unit
price in the base period.
• A base period is a given starting point in
time.
Price in period t
Price relative in period t = (100)
Base period price

29
Example:
• Price Relatives
The prices Besco paid for newspaper
and television ads in 1992 and 1997 are
shown below. Using 1992 as the base
year, compute a 1997 price index for
newspaper and television ad prices.
1992 1997
Newspaper $14,794 $29,412
Television 11,469 23,904
30
Example: Besco Products
Newspaper
Television 23,904
29,412 I1997 = (100) = 208
I1997 = (100) = 199
14,794 11,469

Television advertising cost increased


at a greater rate.

31
Aggregate Price Indexes

• An aggregate price index is developed for the


specific purpose of measuring the combined
change of a group of items.
• An unweighted aggregate price index in period t,
denoted by It , is given by

It =
∑P it
(100 )
where ∑P i0

Pit = unit price for item i in period t


Pi 0 = unit price for item i in the base period

32

You might also like