Professional Documents
Culture Documents
Lecture 2
Estimation
Assumption
Properties of OLS
Interpretation
Goodness of Fit
Dependent Variable
Yi =0 +1Xi +i
Fitted Line
(the models prediction)
E(Y|Xi)= B0 +B1Xi
Leftover term
Stochastic error, e
Estimated Model
Y = b 0 + b1X + ei
Y = b 0 + b1X
Residual, e
0 + b
1X
Y = b
PRF vs SRF
OLS
OLS is the most basic and most commonlyused regression technique.
Given Yi =0 +1Xi +i
We wish to estimate Y = b 0 + b1X
OLS permits the estimation of B0 and B1 such
that the sum of squared residuals (RSS) are
minimized.
Residual
The residuals is ei=Yi- i
OLS minimizes the sum of squared residuals
(RSS), means:
OLS minimizes
ei2
i =1,2,...,n
OLS
Goodness of Fit
The best fitting line may not be all that good,
so it is desirable to have some measure of fit
for how good the line is. We want to know
how well the regression line does in explaining
the movement of the dependent variable.
R2 provides us with a measure for how much
of the movement in the dependent variable
can be explained by the regression model.
Goodness of Fit : R2
Goodness of Fit
Once a regression equation is estimated, we
wish to determine the quality of the
estimation equation or the goodness of fit.
To do so, we use Total Sum of Squares (TSS),
Explained Sum of Squares (ESS), and Residual
Sum of Squares (RSS):
Goodness of Fit = R2
Goodness of Fit
Decomposition of Variance
R2
R2 and Adjusted R2
b 0 = -299.59
b1 = 0.722
Adjusted R2=
99.8%
Are We Finished?
Apakah SRF merepresentasikan PRF
Apakah Coefficients menggambarkan
Parameters?
OLS Assumptions
1. The regression model is: (a) linear in the coefficients, and
(b) correctly specified with the right independent
variables
2. No explanatory variable is a perfect linear function of any
other explanatory variable(s) (no perfect multicollinearity)
3. No explanatory variable is correlated with the error term
4. No serial correlation
5. Zero population mean of error term
6. Homoskedasticity of error term
7. Normally distributed error term
Linear Regression
Linear in Parameters
WAGEit
1 EDUCit
2 TENUREit
3 UNIONit
it
Linearity
Other Example:
Yi 0 1 Xi i
Transform: Xi* Xi
Thus: Yi 0 1Xi* i
No perfect multicollinearity.
They are really the same variable, or
That one (or more) has zero variance, or
Two independent variables sum to equal a
third, or
That a constant has been added to or
subtracted from one of the variables.
No serial correlation
Homoscedasticity
Homoskedasticity
Statistical Inference
Population: the entire group of items that
interests us.
Sample: the part of the population that we
actually observe.
Statistical inference: using the sample to draw
conclusions about the characteristics of the
population from which the sample came
We use samples because it is often not practical
or possible to consider the entire population
But each time we use a different sample, we will
obtain different estimates!
Sampling Distributions
A sample statistic, such as the sample mean or a
regression coefficient, is a random variable that
depends on which particular observations
happen to be selected for the random sample
Sampling error is the difference between the
value of one particular sample mean and the
average of the means of all possible samples of
this size; this error is not due to a poorly designed
experiment or sloppy procedure. It is the
inevitable result of the fact that the observations
in our sample are chosen by chance.
BLUE