CH I Introduction To Regression and Least Squares

Introduction to Rossis MGMT 264B Slides
I have made this longer than usual, because I lack the time to make it short. - Pascal I had the time! A lot of work has gone into simplifying and stripping down to the essence. This means that the slides will require careful reading. They are designed to be self-contained but efficient! I hope you enjoy reading them. Read them three times:
1.! Before class (write questions in margin) 2.! During class 3.! After class
Chapter I. Slide 1
Introduction to Rossis MGMT 264B Slides

At the end of each chapter, there are symbol and R command glossaries, and a list of important equations.
There are some symbols:
Note page available
Cool Move but difficult
Requires further thought

Chapter I. Slide 2
I. Introduction to Regression and Least Squares a.! b.! c.! d.! e.! f.! g.! Conditional Prediction Hedonic Pricing and Flat Panel TVs Linear Prediction Least Squares Intuition behind Least Squares Relationship between b and r Decomposing the variance and R2
Chapter I. Slide 3
a. Conditional Prediction
The basic problem:
Available data
Formulate a model to predict a variable of interest
Use prediction to make a business decision
Chapter I. Slide 4
a. Examples of Conditional Prediction 1.! Pricing Product Characteristics (Hedonic Pricing):

-! -! -! -! Predict market value for various product characteristics Decision: optimal product configuration Predict future joint distribution of asset returns Decision: Construct optimal portfolio (choose weights)
2.! Optimal portfolio choice:
3.! Determination of promotional strategy:

-! -! Predict sales volume response to a price reduction Decision: what is optimal promotional strategy?
Chapter I. Slide 5
b. Hedonic Regression A Hedonic regression is a method which relates market prices to product characteristics. The Hedonic regression provides a measure of what the market will bear. Contrast with conjoint that measures the demand side only what consumers are willing to pay. Closely related to Value Equivalence Line. Prediction problem: predict what market will bear for a product that doesnt necessarily exist, e.g. value a property with no market transaction.
Chapter I. Slide 6
b. Example: Predicting Flat-Panel TV Prices
Problem:
! Predict market price based on observed characteristics. For example, we are considering introducing a new model with 58 diagonal. Wed like to know what the market will bear for TV with average quality.
Solution:
! Look at existing products for which we know the price and some observed characteristics ! Build a prediction rule that predicts price as a function of the observed characteristics.
Chapter I. Slide 7
b. Flat-Panel TV Data What characteristics do we use? We must select factors useful for prediction and we have to develop a specific quantitative measure of these factors (a variable)
! Many factors or variables affect the price of a flat-panel TV
! ! ! Screen size Technology (plasma/LCD/LED) Brand name (?)
! Easy to quantify price and size but what about other variables such as aesthetics?
For simplicity lets focus only on screen size
Chapter I. Slide 8
b. R and course package, PERregress

Where to get R? http://cran.stat.ucla.edu/
R is a free package with versions for Windows, MAC OS, and Linux. Visit CRAN site nearest you (see above or google CRAN) and download and install R. Next you need to install the package made specifically for this course. This is easy to do. The package contains all of the datasets needed for the course as well as some customized functions. Start up R and select the Packages and Data menu item. Select a mirror site nearest you (such as UCLA) and then install the package PERregress.
Chapter I. Slide 9
b. Install course package, PERregress

.
R must be version 2.12.1 or higher!

Chapter I. Slide 10
b. Alternatively, install RStudio
See Videos section of CCLE web site.

Chapter I. Slide 11
b. Flat-Panel TV Data R stores the data as a table or spreadsheet called a dataframe.
Chapter I. Slide 12
b. Flat-Panel TV Data Another way is to think of the data is as a scatter plot. In other words, view the data as a collection of points in the plane (Xi, Yi)
Price
500
1000
1500
2000
2500
3000
3500
4000
35
40
45
50 Size
55
60
65
Chapter I. Slide 13
c. Linear Prediction
Lets plot a line fit by the eyeball method.
Price
R command c() means put two or more numbers in a list called a vector. Here the list is the intercept and slope!
500
1000
1500
2000
2500
3000
There appears to be a linear relationship between price and size.
3500
4000
35
40
45
50 Size
55
60
65
Chapter I. Slide 14
c. Linear Prediction Let Y denote the dependent variable and X the independent or predictor variable. Recall that the equation of a line is: Y = b 0 + b 1X Where: b0 = is the intercept b1 = is the slope
! The intercept units are in the units of Y ($) ! The slope is in unit of Y/unit of X ($/inch)
Chapter I. Slide 15
c. Linear Prediction Y
b1 b0
1 2
Y = b 0 + b 1X X
Our eyeball line has b0 = -1400 b1 = 55
Chapter I. Slide 16
c. Linear Prediction
We can now predict the price of a flat-panel TV when we know only the size. We simply read the value off the line that we drew: For example, given a flat-panel TV with a size = 58 Predicted price = -1400 + 55(58 inches) = $1790
NOTE: the unit conversion from inches to dollars is done for us by the slope coefficient (b1)
Chapter I. Slide 17
c. Linear Fitting & Prediction in R We can fit a line to the data using the R function, lm()!
Dependent variable
Independent variable
Chapter I. Slide 18
c. Linear Fitting & Prediction in R
3000
3500
R chooses a different line from ours: R fit:

Price = -1408.93 + 57.13 Size
Price
4000
1500
2000
2500
R line
500
1000
Eyeball line
35
40
45
50 Size
55
60
65
Chapter I. Slide 19
c. Linear Fitting & Prediction in R Lets look at the R prediction results when size = 58
= -1408.93 + 57.13 (58)
Price
500
1000
1500
2000
2500
3000
3500
4000
35
40
45
50 Size
55
60
65
Chapter I. Slide 20
c. Linear Fitting & Prediction in R
Natural questions to ask at this point:

! How does R select a best fitting line? ! Can we say anything about the accuracy of the predictions? ! What does all that other R output mean?
Chapter I. Slide 21
d. Least Squares A strategy for estimating the slope and intercept parameters Data: We observe the data recorded as the pairs (Xi, Yi) i = 1,!,N Problem: Choose a fitted line. (b0, b1) A reasonable way to fit a line is to minimize the amount by which the fitted value differs from the actual value. This is called the residual.
Chapter I. Slide 22
d. Least Squares: Fitted Values & Residuals What does fitted value mean?
Yi "i
Xi The dots are the observed values and the line represents our fitted values given by: "i = b0 + b1Xi
Chapter I. Slide 23
d. Least Squares: Fitted Values & Residuals What about the residual for the ith observation?
Yi "i
ei = Yi "i = Residual
Xi
Chapter I. Slide 24
d. Least Squares: Fitted Values & Residuals Yi
negative residuals positive residuals
Note: The length of the red and green lines are the residuals
Xi
The fitted value and the residual decompose the Yi observation into two parts: Yi = "i + (Yi "i) = "i + ei
Chapter I. Slide 25
d. The Least Squares Criterion Ideally we want to minimize the size of all residuals. We must therefore trade off between moving closer to some points and at the same time moving away from other points The line fitting process:
i.! Compute residuals ii.! Square residuals and add up iii.! Pick the best fitting line (intercept and slope)
Least Squares: choose b0 and b1 to minimize
2 e !i=1 i N
Chapter I. Slide 26
d. The Least Squares Criterion
Click on chart in slideshow to activate applet

Chapter I. Slide 27
d. The Least Squares Criterion

What are the formulas which do the job? Least Squares Solution:
b1
" =
i=1
(Xi ! X)(Yi ! Y) s XY = 2 N 2 sX (X ! X) " i=1 i

Where: sXY = sample covariance (X, Y) sX2 = sample variance of (X)
b0 = Y ! b1 X
Chapter I. Slide 28
d. The Least Squares Criterion Lets put the data into Excel:
Now simply calculate 10.75/5 to determine the slope = 2.15
Chapter I. Slide 29
e. Intuition Behind the Least Squares Formula
The equations for b0 and b1 show us how to process the data (X1, Y1), (X2, Y2),!, (XN, YN) to generate guesses for the intercept and slope. The formulas use sample quantities that we are familiar with and have used in the past to calculate sample covariances (numerator) and sample variances (denominator). Can we develop an intuition the formulas that is based on prediction?
Chapter I. Slide 30
e. Intuition Behind the Least Squares Formula The intercept:

The formula for b0 insures that the fitted regression line passes through the point of means (X, Y). If you put in the average value of X, the least squares prediction is the average value of Y. If we substitute in for the intercept using the LS formula, we obtain:
! Y = b ( X ! X) Y 1
If X is above the mean, then b1 tells us how much to scale this deviation above the mean to produce a forecast of Y relative to the mean of Y
Chapter I. Slide 31
e. Intuition Behind the Least Squares Formula We can think of least squares as a two-part process: i.! Plot the point of means ii.! Find a line rotating through that point which has the smallest sum of squared residuals There are many possible lines that pass thru the point of means. The least squares approach suggests that one line is the best. Can we understand this formula by application of a fundamental intuition derived from prediction?
Chapter I. Slide 32
Quick Review: Correlation

A intuition for the slope formula can be developed but it is more complicated. Lets review the basic concept of a correlation first. The sample coefficient between two variables is defined as:
rxy =
" (X !X)(Y!Y) " (X !X) " (Y!Y)

i i 2 i i
s XY = s Xs Y
Remember: the correlation coefficient is a unitless measure of linear association between two variables.
Chapter I. Slide 33
Quick Review: Correlation

Examples of various samples with varying degrees of correlation
4 1 3 0
Y
-1 -2 -2 -1 0 1 2
0 -2 -1 0 1 2
r=0
2 1 0 3.5 2.5 1 .5 0.5 -0.5 -1 .5 -2.5 -2 -3 -2 -1 0 1 2 -3.5 -4.5 -1 .5
r=1
-1
-0.5
0.5
1 .5
r=.75
r=.5
Chapter I. Slide 34
e. Intuition Behind the Least Squares Formula What is the intuition for the LS formula for the slope, b1? Does there seem to be any relationship between e and Size in the flat-panel TV dataset?
e 1500 -500 0 500 1000
35
40
45
50 Size
55
60
65
Chapter I. Slide 35
e. Intuition Behind the Least Squares Formula Does it make sense that e and size should be not be correlated? YES! Suppose we decide to ignore the least squares fit and make a bad choice for b1 (green line)
3500 4000
Price
500
1000
1500
2000
2500
Our bad choice -2543 + 80 * size
Least squares fit -1408.93 + 57.13 * size
3000
35
40
45
50 Size
55
60
65
Chapter I. Slide 36
e. Intuition Behind the Least Squares Formula Lets look at the relationship between e and Size for the bad fit:
We tend to underestimate (+e) the value of the TVs with size below the average size and overestimate (-e) the value of flat-panel TVs larger than average
1000 e1 -1000 -500 0 500
35
40
45
50 Size
55
60
65
Clearly, we have left something on the table in terms of prediction!

Chapter I. Slide 37
e. Intuition Behind the Least Squares Formula As long as the correlation between e and x is non-zero, we could always adjust our prediction rule to do better i.e. need to exploit all of the predictive power available in the X values and put this into ", leaving no Xness in the residuals. In summary: the following decomposition of each observation is made using the fitted and residual values:
Y="+e
made from X corr(!,X) = 1 unrelated to X Corr(e,X) = 0
Chapter I. Slide 38
e. Intuition Behind the Least Squares Formula The Fundamental Properties of Optimal Prediction: ! The optimal prediction of Y for an average or representative X should be the average or representative value of Y ! Prediction errors should be unrelated to the information used to formulate the predictions For linear models:
! !
=Y If X = X, Y
X = corr ( e, X ) = 0 corr Y ! Y,
Chapter I. Slide 39
f. The Relationship Between b and r b1, can be written as the ratio of the sample covariance to the sample variance of X:
b1 =
s XY s2 X
Close to, but not the same as the sample correlation. Recall that b must convert the units of x into the units of y and is expressed as units of y per unit of x. However the sample correlation coefficient is unit-less. The relationship is given by:
b1 =
s xy s
2 x
= rxy !
sy sx
Chapter I. Slide 40
f. The Relationship between b and r Returning to the flat-panel TV price data, we can use commands in R to compute the least squares estimate of slope using only simple statistics.
"
"
sxy
s x2
Chapter I. Slide 41
f. The Relationship between b and r Now lets use the relationship between the regression coefficient and the sample correlation to compute the slope estimate.
sy
sx
Chapter I. Slide 42
g. Decomposing the Variance The ANOVA Table

We now know that: !
!e
= 0
(implies
e = 0, as well!)
! corr(e,X) =0 ! corr(",e) = 0 We can now use these factors to decompose the total variance of Y: Var(Y) = Var(") + Var(e) + 2 cov(",e)
or,
Var(Y) = Var(") + Var(e)
What is true for the average square (variance) is also true for the the sum of total squares,
! ( Yi " Y )
i=1
" Y) ! (Y i
i=1
! ei
i=1
Total Sum of Squares SST
Regression SS SSR
Error SS SSE
Chapter I. Slide 43
g. Decomposing the Variance The ANOVA Table Lets find this on the R printout
SSE
SSR
Chapter I. Slide 44
g. A Goodness of Fit Measure: R2 We have a good fit if:

! SSR is large ! SSE is small ! Fit would be perfect if SST = SSR
To summarize how close SSR is to SST, we define the coefficient of determination:
SSR SSE R = = 1! SST SST

2
Two interpretations:
i.! Percentage of Variation in Y explained by X ii.! Square of correlation (hence r squared)
Chapter I. Slide 45
g. A Goodness of Fit Measure: R2 R2 on the R printout
Chapter I. Slide 46
g. Misuse of R2 R squared is often mis-used. Some establish arbitrary benchmarks for high values. For example, most regard values of over .8 as high. high values of R2 are often associated with claims that the model is: 1. adequate or correctly specified valid 2. highly accurate for prediction Unfortunately, neither is correct. More later!
Chapter I. Slide 47
Appendix: Derivation of LS Slope We have satisfied ourselves that corr(e, X) =0. Now let us use this intuition to derive the least squares formula for b1
Verification: realize that corr(e,X) = 0 is equivalent to cov(e, X) = 0 (why?)
cov(e,X) = cov(X,e) =1/(N ! 1)" Xi ! X (ei )

substitute in for e:
&(X # X)(Y # b
i i
# b1Xi ) = & Xi # X Yi # ! $ Y # b1X " % # b 1X i = 0

b0 = Y ! b1 X
)(
Chapter I. Slide 48
Appendix: Derivation of LS Slope

Verification (continued):
&(X # X)(Y # ! $Y # b X" % #b X ) = 0

i i 1 1 i
or
"(X ! X )(Y ! Y ! b (X ! X ) ) = 0
i i 1 i
or
! X # X Y # Y # b X # X 2" = 0 ($ i 1 i % & i '

Solving for b1
)(
b1
( X " X) ( Y " Y ) ! = ! ( X " X)

i i 2 i
Chapter I. Slide 49
Glossary of Symbols
"X= independent or explanatory variable Y= dependent variable "" b0 - least squares estimate of intercept b1 - least squares estimate of regression slope ei - least squares residual " - fitted value s - sample standard dev (subscript tells you of what var) s2 - sample variance (subscript tells you of what var) r - sample correlation coefficient SST - total sum of squares SSR - regression sum of squares SSE - error sum of squares R2 - coefficient of determination, goodness of fit measure
Chapter I. Slide 50
Important Equations "i = b0 + b1Xi ei = Yi "i = Residual

fitted value residual
b1
" =
i=1
(Xi ! X)(Yi ! Y) s XY = 2 N 2 sX (X ! X) " i=1 i
least squares formulae for slope and intercept
b0 = Y ! b1 X
Chapter I. Slide 51
Important Equations
! Y = b ( X ! X) Y 1
s xy s
2 x
least squares line passes thru point of means relationship between slope coef and correlation Alternative definitions for R-squared
b1 =
= rxy !
sy sx
SSR SSE R = = 1! SST SST

2
Chapter I. Slide 52
Glossary of R commands
-! abline(c(intercept,slope)): Adds one line through the current plot with the defined intercept and slope -! attach(A) : The data frame A is attached to the R search path. So objects in A can be accessed by simply giving their names. -! anova(): Computes analysis of variance (or deviance) tables for one or more fitted model objects. -! c(): Combine values into a vector or list -! cov(x,y),cor(x,y): Compute the covariance or correlation of variables x and y. -! data(A) : allows access to data frame A that is in the data library. -! head(A) : Returns the first parts of a data frame A. -! library(PERregress) : loads the R package containing datasets and customized functions for our class. -! lm(Y~X): Fits a linear model. Y is dep var and X is indep var.
Chapter I. Slide 53
Glossary of R commands
-! plot(X,Y): Plots X and Y -! predict(): Predictions from the results of various model fitting functions.! -! A=read.csv(file_name): Reads file in .csv format and creates data frame A from B. -! str(object): tells you the structure of an object, e.g. str (dataframe). -! summary(): Generic function used to produce result summaries of the results of various model fitting functions. -! var(X),mean(X),sd(X): Compute the variance, mean, standard deviation of a variable named X.!
Chapter I. Slide 54

CH I Introduction To Regression and Least Squares

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH I Introduction To Regression and Least Squares

Uploaded by

Copyright:

Available Formats

Introduction to Rossis MGMT 264B Slides

Introduction to Rossis MGMT 264B Slides

There are some symbols:

Note page available

Cool Move but difficult

Requires further thought

The basic problem:

Formulate a model to predict a variable of interest

Use prediction to make a business decision

a. Examples of Conditional Prediction 1.! Pricing Product Characteristics (Hedonic Pricing):

2.! Optimal portfolio choice:

3.! Determination of promotional strategy:

b. Example: Predicting Flat-Panel TV Prices

For simplicity lets focus only on screen size

b. R and course package, PERregress

b. Install course package, PERregress

R must be version 2.12.1 or higher!

b. Alternatively, install RStudio

See Videos section of CCLE web site.

b. Flat-Panel TV Data R stores the data as a table or spreadsheet called a dataframe.

Lets plot a line fit by the eyeball method.

There appears to be a linear relationship between price and size.

Our eyeball line has b0 = -1400 b1 = 55

c. Linear Fitting & Prediction in R

R chooses a different line from ours: R fit:

= -1408.93 + 57.13 (58)

c. Linear Fitting & Prediction in R

Natural questions to ask at this point:

d. Least Squares: Fitted Values & Residuals Yi

negative residuals positive residuals

Least Squares: choose b0 and b1 to minimize

d. The Least Squares Criterion

Click on chart in slideshow to activate applet

d. The Least Squares Criterion

(Xi ! X)(Yi ! Y) s XY = 2 N 2 sX (X ! X) " i=1 i

Now simply calculate 10.75/5 to determine the slope = 2.15

e. Intuition Behind the Least Squares Formula

e. Intuition Behind the Least Squares Formula The intercept:

Quick Review: Correlation

" (X !X)(Y!Y) " (X !X) " (Y!Y)

Quick Review: Correlation

Our bad choice -2543 + 80 * size

Least squares fit -1408.93 + 57.13 * size

Clearly, we have left something on the table in terms of prediction!

g. Decomposing the Variance The ANOVA Table

Var(Y) = Var(") + Var(e)

Total Sum of Squares SST

g. A Goodness of Fit Measure: R2 We have a good fit if:

To summarize how close SSR is to SST, we define the coefficient of determination:

SSR SSE R = = 1! SST SST

g. A Goodness of Fit Measure: R2 R2 on the R printout

cov(e,X) = cov(X,e) =1/(N ! 1)" Xi ! X (ei )

# b1Xi ) = & Xi # X Yi # ! $ Y # b1X " % # b 1X i = 0

Appendix: Derivation of LS Slope

&(X # X)(Y # ! $Y # b X" % #b X ) = 0

! X # X Y # Y # b X # X 2" = 0 ($ i 1 i % & i '

( X " X) ( Y " Y ) ! = ! ( X " X)

Important Equations "i = b0 + b1Xi ei = Yi "i = Residual

(Xi ! X)(Yi ! Y) s XY = 2 N 2 sX (X ! X) " i=1 i

least squares formulae for slope and intercept

SSR SSE R = = 1! SST SST

You might also like