You are on page 1of 54

Introduction to Rossis MGMT 264B Slides

I have made this longer than usual, because I lack the time to make it short. - Pascal I had the time! A lot of work has gone into simplifying and stripping down to the essence. This means that the slides will require careful reading. They are designed to be self-contained but efficient! I hope you enjoy reading them. Read them three times:
1.! Before class (write questions in margin) 2.! During class 3.! After class

Chapter I. Slide 1

Introduction to Rossis MGMT 264B Slides


At the end of each chapter, there are symbol and R command glossaries, and a list of important equations.

There are some symbols:

Note page available

Cool Move but difficult

Requires further thought


Chapter I. Slide 2

I. Introduction to Regression and Least Squares a.! b.! c.! d.! e.! f.! g.! Conditional Prediction Hedonic Pricing and Flat Panel TVs Linear Prediction Least Squares Intuition behind Least Squares Relationship between b and r Decomposing the variance and R2

Chapter I. Slide 3

a. Conditional Prediction

The basic problem:

Available data

Formulate a model to predict a variable of interest

Use prediction to make a business decision

Chapter I. Slide 4

a. Examples of Conditional Prediction 1.! Pricing Product Characteristics (Hedonic Pricing):


-! -! -! -! Predict market value for various product characteristics Decision: optimal product configuration Predict future joint distribution of asset returns Decision: Construct optimal portfolio (choose weights)

2.! Optimal portfolio choice:

3.! Determination of promotional strategy:


-! -! Predict sales volume response to a price reduction Decision: what is optimal promotional strategy?

Chapter I. Slide 5

b. Hedonic Regression A Hedonic regression is a method which relates market prices to product characteristics. The Hedonic regression provides a measure of what the market will bear. Contrast with conjoint that measures the demand side only what consumers are willing to pay. Closely related to Value Equivalence Line. Prediction problem: predict what market will bear for a product that doesnt necessarily exist, e.g. value a property with no market transaction.

Chapter I. Slide 6

b. Example: Predicting Flat-Panel TV Prices

Problem:
! Predict market price based on observed characteristics. For example, we are considering introducing a new model with 58 diagonal. Wed like to know what the market will bear for TV with average quality.

Solution:
! Look at existing products for which we know the price and some observed characteristics ! Build a prediction rule that predicts price as a function of the observed characteristics.

Chapter I. Slide 7

b. Flat-Panel TV Data What characteristics do we use? We must select factors useful for prediction and we have to develop a specific quantitative measure of these factors (a variable)
! Many factors or variables affect the price of a flat-panel TV
! ! ! Screen size Technology (plasma/LCD/LED) Brand name (?)

! Easy to quantify price and size but what about other variables such as aesthetics?

For simplicity lets focus only on screen size

Chapter I. Slide 8

b. R and course package, PERregress


Where to get R? http://cran.stat.ucla.edu/
R is a free package with versions for Windows, MAC OS, and Linux. Visit CRAN site nearest you (see above or google CRAN) and download and install R. Next you need to install the package made specifically for this course. This is easy to do. The package contains all of the datasets needed for the course as well as some customized functions. Start up R and select the Packages and Data menu item. Select a mirror site nearest you (such as UCLA) and then install the package PERregress.
Chapter I. Slide 9

b. Install course package, PERregress


.

R must be version 2.12.1 or higher!


Chapter I. Slide 10

b. Alternatively, install RStudio

See Videos section of CCLE web site.


Chapter I. Slide 11

b. Flat-Panel TV Data R stores the data as a table or spreadsheet called a dataframe.

Chapter I. Slide 12

b. Flat-Panel TV Data Another way is to think of the data is as a scatter plot. In other words, view the data as a collection of points in the plane (Xi, Yi)

Price

500

1000

1500

2000

2500

3000

3500

4000

35

40

45

50 Size

55

60

65

Chapter I. Slide 13

c. Linear Prediction

Lets plot a line fit by the eyeball method.

Price

R command c() means put two or more numbers in a list called a vector. Here the list is the intercept and slope!

500

1000

1500

2000

2500

3000

There appears to be a linear relationship between price and size.

3500

4000

35

40

45

50 Size

55

60

65

Chapter I. Slide 14

c. Linear Prediction Let Y denote the dependent variable and X the independent or predictor variable. Recall that the equation of a line is: Y = b 0 + b 1X Where: b0 = is the intercept b1 = is the slope
! The intercept units are in the units of Y ($) ! The slope is in unit of Y/unit of X ($/inch)
Chapter I. Slide 15

c. Linear Prediction Y

b1 b0
1 2

Y = b 0 + b 1X X

Our eyeball line has b0 = -1400 b1 = 55

Chapter I. Slide 16

c. Linear Prediction

We can now predict the price of a flat-panel TV when we know only the size. We simply read the value off the line that we drew: For example, given a flat-panel TV with a size = 58 Predicted price = -1400 + 55(58 inches) = $1790
NOTE: the unit conversion from inches to dollars is done for us by the slope coefficient (b1)

Chapter I. Slide 17

c. Linear Fitting & Prediction in R We can fit a line to the data using the R function, lm()!
Dependent variable

Independent variable

Chapter I. Slide 18

c. Linear Fitting & Prediction in R

3000

3500

R chooses a different line from ours: R fit:


Price = -1408.93 + 57.13 Size
Price

4000

1500

2000

2500

R line

500

1000

Eyeball line

35

40

45

50 Size

55

60

65

Chapter I. Slide 19

c. Linear Fitting & Prediction in R Lets look at the R prediction results when size = 58

= -1408.93 + 57.13 (58)

Price

500

1000

1500

2000

2500

3000

3500

4000

35

40

45

50 Size

55

60

65

Chapter I. Slide 20

c. Linear Fitting & Prediction in R

Natural questions to ask at this point:


! How does R select a best fitting line? ! Can we say anything about the accuracy of the predictions? ! What does all that other R output mean?

Chapter I. Slide 21

d. Least Squares A strategy for estimating the slope and intercept parameters Data: We observe the data recorded as the pairs (Xi, Yi) i = 1,!,N Problem: Choose a fitted line. (b0, b1) A reasonable way to fit a line is to minimize the amount by which the fitted value differs from the actual value. This is called the residual.

Chapter I. Slide 22

d. Least Squares: Fitted Values & Residuals What does fitted value mean?

Yi "i

Xi The dots are the observed values and the line represents our fitted values given by: "i = b0 + b1Xi
Chapter I. Slide 23

d. Least Squares: Fitted Values & Residuals What about the residual for the ith observation?

Yi "i

ei = Yi "i = Residual

Xi
Chapter I. Slide 24

d. Least Squares: Fitted Values & Residuals Yi

negative residuals positive residuals

Note: The length of the red and green lines are the residuals

Xi

The fitted value and the residual decompose the Yi observation into two parts: Yi = "i + (Yi "i) = "i + ei
Chapter I. Slide 25

d. The Least Squares Criterion Ideally we want to minimize the size of all residuals. We must therefore trade off between moving closer to some points and at the same time moving away from other points The line fitting process:
i.! Compute residuals ii.! Square residuals and add up iii.! Pick the best fitting line (intercept and slope)

Least Squares: choose b0 and b1 to minimize

2 e !i=1 i N
Chapter I. Slide 26

d. The Least Squares Criterion

Click on chart in slideshow to activate applet


Chapter I. Slide 27

d. The Least Squares Criterion


What are the formulas which do the job? Least Squares Solution:

b1

" =

i=1

(Xi ! X)(Yi ! Y) s XY = 2 N 2 sX (X ! X) " i=1 i


Where: sXY = sample covariance (X, Y) sX2 = sample variance of (X)

b0 = Y ! b1 X

Chapter I. Slide 28

d. The Least Squares Criterion Lets put the data into Excel:

Now simply calculate 10.75/5 to determine the slope = 2.15

Chapter I. Slide 29

e. Intuition Behind the Least Squares Formula

The equations for b0 and b1 show us how to process the data (X1, Y1), (X2, Y2),!, (XN, YN) to generate guesses for the intercept and slope. The formulas use sample quantities that we are familiar with and have used in the past to calculate sample covariances (numerator) and sample variances (denominator). Can we develop an intuition the formulas that is based on prediction?

Chapter I. Slide 30

e. Intuition Behind the Least Squares Formula The intercept:


The formula for b0 insures that the fitted regression line passes through the point of means (X, Y). If you put in the average value of X, the least squares prediction is the average value of Y. If we substitute in for the intercept using the LS formula, we obtain:

! Y = b ( X ! X) Y 1

If X is above the mean, then b1 tells us how much to scale this deviation above the mean to produce a forecast of Y relative to the mean of Y
Chapter I. Slide 31

e. Intuition Behind the Least Squares Formula We can think of least squares as a two-part process: i.! Plot the point of means ii.! Find a line rotating through that point which has the smallest sum of squared residuals There are many possible lines that pass thru the point of means. The least squares approach suggests that one line is the best. Can we understand this formula by application of a fundamental intuition derived from prediction?

Chapter I. Slide 32

Quick Review: Correlation


A intuition for the slope formula can be developed but it is more complicated. Lets review the basic concept of a correlation first. The sample coefficient between two variables is defined as:

rxy =

" (X !X)(Y!Y) " (X !X) " (Y!Y)


i i 2 i i

s XY = s Xs Y

Remember: the correlation coefficient is a unitless measure of linear association between two variables.

Chapter I. Slide 33

Quick Review: Correlation


Examples of various samples with varying degrees of correlation
4 1 3 0

Y
-1 -2 -2 -1 0 1 2

0 -2 -1 0 1 2

r=0
2 1 0 3.5 2.5 1 .5 0.5 -0.5 -1 .5 -2.5 -2 -3 -2 -1 0 1 2 -3.5 -4.5 -1 .5

r=1

-1

-0.5

0.5

1 .5

r=.75

r=.5
Chapter I. Slide 34

e. Intuition Behind the Least Squares Formula What is the intuition for the LS formula for the slope, b1? Does there seem to be any relationship between e and Size in the flat-panel TV dataset?
e 1500 -500 0 500 1000

35

40

45

50 Size

55

60

65

Chapter I. Slide 35

e. Intuition Behind the Least Squares Formula Does it make sense that e and size should be not be correlated? YES! Suppose we decide to ignore the least squares fit and make a bad choice for b1 (green line)
3500 4000

Price

500

1000

1500

2000

2500

Our bad choice -2543 + 80 * size

Least squares fit -1408.93 + 57.13 * size

3000

35

40

45

50 Size

55

60

65

Chapter I. Slide 36

e. Intuition Behind the Least Squares Formula Lets look at the relationship between e and Size for the bad fit:
We tend to underestimate (+e) the value of the TVs with size below the average size and overestimate (-e) the value of flat-panel TVs larger than average
1000 e1 -1000 -500 0 500

35

40

45

50 Size

55

60

65

Clearly, we have left something on the table in terms of prediction!


Chapter I. Slide 37

e. Intuition Behind the Least Squares Formula As long as the correlation between e and x is non-zero, we could always adjust our prediction rule to do better i.e. need to exploit all of the predictive power available in the X values and put this into ", leaving no Xness in the residuals. In summary: the following decomposition of each observation is made using the fitted and residual values:

Y="+e
made from X corr(!,X) = 1 unrelated to X Corr(e,X) = 0
Chapter I. Slide 38

e. Intuition Behind the Least Squares Formula The Fundamental Properties of Optimal Prediction: ! The optimal prediction of Y for an average or representative X should be the average or representative value of Y ! Prediction errors should be unrelated to the information used to formulate the predictions For linear models:
! !

=Y If X = X, Y
X = corr ( e, X ) = 0 corr Y ! Y,
Chapter I. Slide 39

f. The Relationship Between b and r b1, can be written as the ratio of the sample covariance to the sample variance of X:

b1 =

s XY s2 X

Close to, but not the same as the sample correlation. Recall that b must convert the units of x into the units of y and is expressed as units of y per unit of x. However the sample correlation coefficient is unit-less. The relationship is given by:

b1 =

s xy s
2 x

= rxy !

sy sx
Chapter I. Slide 40

f. The Relationship between b and r Returning to the flat-panel TV price data, we can use commands in R to compute the least squares estimate of slope using only simple statistics.
"

"

sxy

s x2

Chapter I. Slide 41

f. The Relationship between b and r Now lets use the relationship between the regression coefficient and the sample correlation to compute the slope estimate.
sy

sx

Chapter I. Slide 42

g. Decomposing the Variance The ANOVA Table


We now know that: !

!e

= 0

(implies

e = 0, as well!)

! corr(e,X) =0 ! corr(",e) = 0 We can now use these factors to decompose the total variance of Y: Var(Y) = Var(") + Var(e) + 2 cov(",e)
or,

Var(Y) = Var(") + Var(e)

What is true for the average square (variance) is also true for the the sum of total squares,

! ( Yi " Y )
i=1

" Y) ! (Y i
i=1

! ei
i=1

Total Sum of Squares SST

Regression SS SSR

Error SS SSE

Chapter I. Slide 43

g. Decomposing the Variance The ANOVA Table Lets find this on the R printout

SSE

SSR

Chapter I. Slide 44

g. A Goodness of Fit Measure: R2 We have a good fit if:


! SSR is large ! SSE is small ! Fit would be perfect if SST = SSR

To summarize how close SSR is to SST, we define the coefficient of determination:

SSR SSE R = = 1! SST SST


2

Two interpretations:
i.! Percentage of Variation in Y explained by X ii.! Square of correlation (hence r squared)
Chapter I. Slide 45

g. A Goodness of Fit Measure: R2 R2 on the R printout

Chapter I. Slide 46

g. Misuse of R2 R squared is often mis-used. Some establish arbitrary benchmarks for high values. For example, most regard values of over .8 as high. high values of R2 are often associated with claims that the model is: 1. adequate or correctly specified valid 2. highly accurate for prediction Unfortunately, neither is correct. More later!

Chapter I. Slide 47

Appendix: Derivation of LS Slope We have satisfied ourselves that corr(e, X) =0. Now let us use this intuition to derive the least squares formula for b1
Verification: realize that corr(e,X) = 0 is equivalent to cov(e, X) = 0 (why?)

cov(e,X) = cov(X,e) =1/(N ! 1)" Xi ! X (ei )


substitute in for e:

&(X # X)(Y # b
i i

# b1Xi ) = & Xi # X Yi # ! $ Y # b1X " % # b 1X i = 0


b0 = Y ! b1 X

)(

Chapter I. Slide 48

Appendix: Derivation of LS Slope


Verification (continued):

&(X # X)(Y # ! $Y # b X" % #b X ) = 0


i i 1 1 i

or

"(X ! X )(Y ! Y ! b (X ! X ) ) = 0
i i 1 i

or

! X # X Y # Y # b X # X 2" = 0 ($ i 1 i % & i '


Solving for b1

)(

b1

( X " X) ( Y " Y ) ! = ! ( X " X)


i i 2 i
Chapter I. Slide 49

Glossary of Symbols
"X= independent or explanatory variable Y= dependent variable "" b0 - least squares estimate of intercept b1 - least squares estimate of regression slope ei - least squares residual " - fitted value s - sample standard dev (subscript tells you of what var) s2 - sample variance (subscript tells you of what var) r - sample correlation coefficient SST - total sum of squares SSR - regression sum of squares SSE - error sum of squares R2 - coefficient of determination, goodness of fit measure

Chapter I. Slide 50

Important Equations "i = b0 + b1Xi ei = Yi "i = Residual


fitted value residual

b1

" =

i=1

(Xi ! X)(Yi ! Y) s XY = 2 N 2 sX (X ! X) " i=1 i

least squares formulae for slope and intercept

b0 = Y ! b1 X

Chapter I. Slide 51

Important Equations

! Y = b ( X ! X) Y 1
s xy s
2 x

least squares line passes thru point of means relationship between slope coef and correlation Alternative definitions for R-squared

b1 =

= rxy !

sy sx

SSR SSE R = = 1! SST SST


2

Chapter I. Slide 52

Glossary of R commands
-! abline(c(intercept,slope)): Adds one line through the current plot with the defined intercept and slope -! attach(A) : The data frame A is attached to the R search path. So objects in A can be accessed by simply giving their names. -! anova(): Computes analysis of variance (or deviance) tables for one or more fitted model objects. -! c(): Combine values into a vector or list -! cov(x,y),cor(x,y): Compute the covariance or correlation of variables x and y. -! data(A) : allows access to data frame A that is in the data library. -! head(A) : Returns the first parts of a data frame A. -! library(PERregress) : loads the R package containing datasets and customized functions for our class. -! lm(Y~X): Fits a linear model. Y is dep var and X is indep var.
Chapter I. Slide 53

Glossary of R commands

-! plot(X,Y): Plots X and Y -! predict(): Predictions from the results of various model fitting functions.! -! A=read.csv(file_name): Reads file in .csv format and creates data frame A from B. -! str(object): tells you the structure of an object, e.g. str (dataframe). -! summary(): Generic function used to produce result summaries of the results of various model fitting functions. -! var(X),mean(X),sd(X): Compute the variance, mean, standard deviation of a variable named X.!

Chapter I. Slide 54

You might also like