Professional Documents
Culture Documents
I have made this longer than usual, because I lack the time to make it short. - Pascal I had the time! A lot of work has gone into simplifying and stripping down to the essence. This means that the slides will require careful reading. They are designed to be self-contained but efficient! I hope you enjoy reading them. Read them three times:
1.! Before class (write questions in margin) 2.! During class 3.! After class
Chapter I. Slide 1
I. Introduction to Regression and Least Squares a.! b.! c.! d.! e.! f.! g.! Conditional Prediction Hedonic Pricing and Flat Panel TVs Linear Prediction Least Squares Intuition behind Least Squares Relationship between b and r Decomposing the variance and R2
Chapter I. Slide 3
a. Conditional Prediction
Available data
Chapter I. Slide 4
Chapter I. Slide 5
b. Hedonic Regression A Hedonic regression is a method which relates market prices to product characteristics. The Hedonic regression provides a measure of what the market will bear. Contrast with conjoint that measures the demand side only what consumers are willing to pay. Closely related to Value Equivalence Line. Prediction problem: predict what market will bear for a product that doesnt necessarily exist, e.g. value a property with no market transaction.
Chapter I. Slide 6
Problem:
! Predict market price based on observed characteristics. For example, we are considering introducing a new model with 58 diagonal. Wed like to know what the market will bear for TV with average quality.
Solution:
! Look at existing products for which we know the price and some observed characteristics ! Build a prediction rule that predicts price as a function of the observed characteristics.
Chapter I. Slide 7
b. Flat-Panel TV Data What characteristics do we use? We must select factors useful for prediction and we have to develop a specific quantitative measure of these factors (a variable)
! Many factors or variables affect the price of a flat-panel TV
! ! ! Screen size Technology (plasma/LCD/LED) Brand name (?)
! Easy to quantify price and size but what about other variables such as aesthetics?
Chapter I. Slide 8
Chapter I. Slide 12
b. Flat-Panel TV Data Another way is to think of the data is as a scatter plot. In other words, view the data as a collection of points in the plane (Xi, Yi)
Price
500
1000
1500
2000
2500
3000
3500
4000
35
40
45
50 Size
55
60
65
Chapter I. Slide 13
c. Linear Prediction
Price
R command c() means put two or more numbers in a list called a vector. Here the list is the intercept and slope!
500
1000
1500
2000
2500
3000
3500
4000
35
40
45
50 Size
55
60
65
Chapter I. Slide 14
c. Linear Prediction Let Y denote the dependent variable and X the independent or predictor variable. Recall that the equation of a line is: Y = b 0 + b 1X Where: b0 = is the intercept b1 = is the slope
! The intercept units are in the units of Y ($) ! The slope is in unit of Y/unit of X ($/inch)
Chapter I. Slide 15
c. Linear Prediction Y
b1 b0
1 2
Y = b 0 + b 1X X
Chapter I. Slide 16
c. Linear Prediction
We can now predict the price of a flat-panel TV when we know only the size. We simply read the value off the line that we drew: For example, given a flat-panel TV with a size = 58 Predicted price = -1400 + 55(58 inches) = $1790
NOTE: the unit conversion from inches to dollars is done for us by the slope coefficient (b1)
Chapter I. Slide 17
c. Linear Fitting & Prediction in R We can fit a line to the data using the R function, lm()!
Dependent variable
Independent variable
Chapter I. Slide 18
3000
3500
4000
1500
2000
2500
R line
500
1000
Eyeball line
35
40
45
50 Size
55
60
65
Chapter I. Slide 19
c. Linear Fitting & Prediction in R Lets look at the R prediction results when size = 58
Price
500
1000
1500
2000
2500
3000
3500
4000
35
40
45
50 Size
55
60
65
Chapter I. Slide 20
Chapter I. Slide 21
d. Least Squares A strategy for estimating the slope and intercept parameters Data: We observe the data recorded as the pairs (Xi, Yi) i = 1,!,N Problem: Choose a fitted line. (b0, b1) A reasonable way to fit a line is to minimize the amount by which the fitted value differs from the actual value. This is called the residual.
Chapter I. Slide 22
d. Least Squares: Fitted Values & Residuals What does fitted value mean?
Yi "i
Xi The dots are the observed values and the line represents our fitted values given by: "i = b0 + b1Xi
Chapter I. Slide 23
d. Least Squares: Fitted Values & Residuals What about the residual for the ith observation?
Yi "i
ei = Yi "i = Residual
Xi
Chapter I. Slide 24
Note: The length of the red and green lines are the residuals
Xi
The fitted value and the residual decompose the Yi observation into two parts: Yi = "i + (Yi "i) = "i + ei
Chapter I. Slide 25
d. The Least Squares Criterion Ideally we want to minimize the size of all residuals. We must therefore trade off between moving closer to some points and at the same time moving away from other points The line fitting process:
i.! Compute residuals ii.! Square residuals and add up iii.! Pick the best fitting line (intercept and slope)
2 e !i=1 i N
Chapter I. Slide 26
b1
" =
i=1
b0 = Y ! b1 X
Chapter I. Slide 28
d. The Least Squares Criterion Lets put the data into Excel:
Chapter I. Slide 29
The equations for b0 and b1 show us how to process the data (X1, Y1), (X2, Y2),!, (XN, YN) to generate guesses for the intercept and slope. The formulas use sample quantities that we are familiar with and have used in the past to calculate sample covariances (numerator) and sample variances (denominator). Can we develop an intuition the formulas that is based on prediction?
Chapter I. Slide 30
! Y = b ( X ! X) Y 1
If X is above the mean, then b1 tells us how much to scale this deviation above the mean to produce a forecast of Y relative to the mean of Y
Chapter I. Slide 31
e. Intuition Behind the Least Squares Formula We can think of least squares as a two-part process: i.! Plot the point of means ii.! Find a line rotating through that point which has the smallest sum of squared residuals There are many possible lines that pass thru the point of means. The least squares approach suggests that one line is the best. Can we understand this formula by application of a fundamental intuition derived from prediction?
Chapter I. Slide 32
rxy =
s XY = s Xs Y
Remember: the correlation coefficient is a unitless measure of linear association between two variables.
Chapter I. Slide 33
Y
-1 -2 -2 -1 0 1 2
0 -2 -1 0 1 2
r=0
2 1 0 3.5 2.5 1 .5 0.5 -0.5 -1 .5 -2.5 -2 -3 -2 -1 0 1 2 -3.5 -4.5 -1 .5
r=1
-1
-0.5
0.5
1 .5
r=.75
r=.5
Chapter I. Slide 34
e. Intuition Behind the Least Squares Formula What is the intuition for the LS formula for the slope, b1? Does there seem to be any relationship between e and Size in the flat-panel TV dataset?
e 1500 -500 0 500 1000
35
40
45
50 Size
55
60
65
Chapter I. Slide 35
e. Intuition Behind the Least Squares Formula Does it make sense that e and size should be not be correlated? YES! Suppose we decide to ignore the least squares fit and make a bad choice for b1 (green line)
3500 4000
Price
500
1000
1500
2000
2500
3000
35
40
45
50 Size
55
60
65
Chapter I. Slide 36
e. Intuition Behind the Least Squares Formula Lets look at the relationship between e and Size for the bad fit:
We tend to underestimate (+e) the value of the TVs with size below the average size and overestimate (-e) the value of flat-panel TVs larger than average
1000 e1 -1000 -500 0 500
35
40
45
50 Size
55
60
65
e. Intuition Behind the Least Squares Formula As long as the correlation between e and x is non-zero, we could always adjust our prediction rule to do better i.e. need to exploit all of the predictive power available in the X values and put this into ", leaving no Xness in the residuals. In summary: the following decomposition of each observation is made using the fitted and residual values:
Y="+e
made from X corr(!,X) = 1 unrelated to X Corr(e,X) = 0
Chapter I. Slide 38
e. Intuition Behind the Least Squares Formula The Fundamental Properties of Optimal Prediction: ! The optimal prediction of Y for an average or representative X should be the average or representative value of Y ! Prediction errors should be unrelated to the information used to formulate the predictions For linear models:
! !
=Y If X = X, Y
X = corr ( e, X ) = 0 corr Y ! Y,
Chapter I. Slide 39
f. The Relationship Between b and r b1, can be written as the ratio of the sample covariance to the sample variance of X:
b1 =
s XY s2 X
Close to, but not the same as the sample correlation. Recall that b must convert the units of x into the units of y and is expressed as units of y per unit of x. However the sample correlation coefficient is unit-less. The relationship is given by:
b1 =
s xy s
2 x
= rxy !
sy sx
Chapter I. Slide 40
f. The Relationship between b and r Returning to the flat-panel TV price data, we can use commands in R to compute the least squares estimate of slope using only simple statistics.
"
"
sxy
s x2
Chapter I. Slide 41
f. The Relationship between b and r Now lets use the relationship between the regression coefficient and the sample correlation to compute the slope estimate.
sy
sx
Chapter I. Slide 42
!e
= 0
(implies
e = 0, as well!)
! corr(e,X) =0 ! corr(",e) = 0 We can now use these factors to decompose the total variance of Y: Var(Y) = Var(") + Var(e) + 2 cov(",e)
or,
What is true for the average square (variance) is also true for the the sum of total squares,
! ( Yi " Y )
i=1
" Y) ! (Y i
i=1
! ei
i=1
Regression SS SSR
Error SS SSE
Chapter I. Slide 43
g. Decomposing the Variance The ANOVA Table Lets find this on the R printout
SSE
SSR
Chapter I. Slide 44
Two interpretations:
i.! Percentage of Variation in Y explained by X ii.! Square of correlation (hence r squared)
Chapter I. Slide 45
Chapter I. Slide 46
g. Misuse of R2 R squared is often mis-used. Some establish arbitrary benchmarks for high values. For example, most regard values of over .8 as high. high values of R2 are often associated with claims that the model is: 1. adequate or correctly specified valid 2. highly accurate for prediction Unfortunately, neither is correct. More later!
Chapter I. Slide 47
Appendix: Derivation of LS Slope We have satisfied ourselves that corr(e, X) =0. Now let us use this intuition to derive the least squares formula for b1
Verification: realize that corr(e,X) = 0 is equivalent to cov(e, X) = 0 (why?)
&(X # X)(Y # b
i i
)(
Chapter I. Slide 48
or
"(X ! X )(Y ! Y ! b (X ! X ) ) = 0
i i 1 i
or
)(
b1
Glossary of Symbols
"X= independent or explanatory variable Y= dependent variable "" b0 - least squares estimate of intercept b1 - least squares estimate of regression slope ei - least squares residual " - fitted value s - sample standard dev (subscript tells you of what var) s2 - sample variance (subscript tells you of what var) r - sample correlation coefficient SST - total sum of squares SSR - regression sum of squares SSE - error sum of squares R2 - coefficient of determination, goodness of fit measure
Chapter I. Slide 50
b1
" =
i=1
b0 = Y ! b1 X
Chapter I. Slide 51
Important Equations
! Y = b ( X ! X) Y 1
s xy s
2 x
least squares line passes thru point of means relationship between slope coef and correlation Alternative definitions for R-squared
b1 =
= rxy !
sy sx
Chapter I. Slide 52
Glossary of R commands
-! abline(c(intercept,slope)): Adds one line through the current plot with the defined intercept and slope -! attach(A) : The data frame A is attached to the R search path. So objects in A can be accessed by simply giving their names. -! anova(): Computes analysis of variance (or deviance) tables for one or more fitted model objects. -! c(): Combine values into a vector or list -! cov(x,y),cor(x,y): Compute the covariance or correlation of variables x and y. -! data(A) : allows access to data frame A that is in the data library. -! head(A) : Returns the first parts of a data frame A. -! library(PERregress) : loads the R package containing datasets and customized functions for our class. -! lm(Y~X): Fits a linear model. Y is dep var and X is indep var.
Chapter I. Slide 53
Glossary of R commands
-! plot(X,Y): Plots X and Y -! predict(): Predictions from the results of various model fitting functions.! -! A=read.csv(file_name): Reads file in .csv format and creates data frame A from B. -! str(object): tells you the structure of an object, e.g. str (dataframe). -! summary(): Generic function used to produce result summaries of the results of various model fitting functions. -! var(X),mean(X),sd(X): Compute the variance, mean, standard deviation of a variable named X.!
Chapter I. Slide 54