You are on page 1of 6

Problem Set 2. Answer Key.

Introduction to Econometrics - Fall 2009 1. (a) We are told that Yi = 0 + 1 Xi + ui , where Yi is the score and Xi is the amount of time. Consequently, ui represents the impact on the score of all other elements of inuence. These could include how nervous a student is, how much she has studied, her particular skills, her background, etc. (b) The assignment of exam times is random. In other words, it is independent of any element ui could have. Since Xi and ui are independent, then E (ui | Xi ) = E (ui ). However, E (ui ) = 0, because ui represents deviations from the linear relationship that holds in average in the population (the population regression line). (c) Regarding Xi and Yi being i.i.d., all observations come from the same distribution and are taken independently from each other. Then, with respect to outliers, note that exam scores and amount of time are bounded. Consequently, there will be no outliers. (d) The estimated regression is Yi = 49 + 0.24Xi .
i. The estimated regression is Yi = 49 + 0.24Xi . Consequently, the estimated regressions prediction for the average student given 90 minutes will be 49 + 0.24 90 = 70.6 points whereas if the student is given 120 minutes, itll be 49 + 0.24 120 = 77.8. We cannot get any prediction for 150 minutes, since that element is out of the sample. ii. If a student is given 10 additional minutes, he gets a predicted average increase of 2.4 points.

2. (a) Compute R2 = ESS/T SS = 0.74 (b) Compute the sum of squared residuals: SSR = T SS ESS = 111. (c) Compute the standard error of the regression: SER= 0.75, see equation 4.19 in the text. 3. (a) When we condition on inc in computing an expectation, inc becomes
a constant. So E( u| inc) = E( inc e inc) = inc E( e| inc) = inc 0
1 SSR n2

because

E(e|inc) = E(e) = 0

(b) Again, when we condition on inc in computing a variance,


comes a constant. So V ar( u| inc) = V ar( inc e inc) = inc
2

inc be-

2 V ar( e| inc) = e inc

2 sinceV ar( e| inc) = e

(c) Families with low incomes do not have much discretion about spending;
typically, a low-income family must spend on food, clothing, housing, and other necessities. Higher income people have more discretion, and some might choose more consumption while others more saving. This discretion suggests wider variability in saving among higher income families.

___________________________________________________________________ - female  0 Variable | dist | yrsed | - female  1 Variable | dist | yrsed | Obs 2070 2070 Mean 1.718357 13.8256 Std. Dev. 2.107142 1.807032 Min 0 12 Max 16 18 - Obs 1726 1726 Mean 1.732793 13.83372 Std. Dev. 2.166001 1.822767 Min 0 12 Max 16 18 -

____________________________________________________________________

b. (Covariance) Find the Covariance between "distance from 4 year college" and "education attained" separately for males and females. Type in STATA: by female:correlate X Y,covariance
Solution: . by female: correlate dist yrsed, covariance ___________________________________ - female  0 (obs1726) | dist yrsed - dist | 4.69156 yrsed | -.393327 3.32248 ___________________________________ - female  1 (obs2070) | dist yrsed - dist | 4.44005 yrsed | -.284907 3.26537

c. (Statistics by hand) Let yrsed be the dependent variable and dist be the predictor. Using your results from part a and part b, calculate by hand the coefficient on distance from 4 year college variable. (Note that the sample variance of "dist" is its covariance with itself which you also get from part b.) Also find the intercept value. (Use the formulas around equation 4.7 from the textbook pg. 119 ).
Solution: For female0, s xy  "0. 393327 and @ 2  4. 69156 x *1  s xy  "0. 393327  "0. 083837 4. 69156 s2 x

* 0  Y " * 1 ' X  13. 83372 " "0. 083837 ' 1. 732793  13. 978992 For female1, s xy  "0. 284907 and @ 2  4. 44005 x

*1 

s xy  "0. 284907  "0. 0641676 4. 44005 s2 x

* 0  Y " * 1 ' X  13. 8256 " "0. 0641676 ' 1. 718357  13. 935863

d. (Statistics) We expect "yrsed" would increase if a person lives closer to a college. Measure this effect separately for males and female. (Type in STATA: by female:regress Y X ). Check that your answers in part c. and d. are the same.
Solution:
. by female: regress yrsed dist _______________________________________________________________________________ - female  0 Source | SS df MS Number of obs  1726 F( 1, 1724)  17.28 Prob  F  0.0000  0.0099

- Model | 56.8824751 1 56.8824751

Residual | 5674.39505 1724 3.29141244 - Total | 5731.27752 1725 3.32247972

R-squared

Adj R-squared  0.0094 Root MSE  1.8142

yrsed | Coef. Std. Err. t P|t| [95% Conf. Interval]

-dist | -.083837 .0201668 -4.16 0.000 -.1233911 -.044283 14.08869

_cons | 13.97899 .0559295 249.94 0.000

13.86929

_______________________________________________________________________________ - female  1 Source | SS df MS Number of obs  2070 F( 1, 2068)  11.64 Prob  F  0.0007  0.0056

- Model | 37.8250449 1 37.8250449

Residual | 6718.21795 2068 3.24865471 - Total | 6756.043 2069 3.26536636

R-squared

Adj R-squared  0.0051 Root MSE  1.8024

yrsed | Coef. Std. Err. t P|t| [95% Conf. Interval]

-dist | -.0641676 .0188052 -3.41 0.001 -.1010466 -.0272885 13.83561 14.03613

_cons | 13.93587 .0511233 272.59 0.000

e. (Confidence intervals) From the output of part d, write the 95% confidence interval for the coefficient on distance from 4 year college. Are the coefficients significant for male and female? (Hint: check if 0 is included in the 95% interval.)

Solution: For Female0, 95% Confidence Interval for * 1 is ". 1233911, ". 044283 95% Confidence Interval for * 0 is 13. 86929, 14. 08869 For Female1, 95% Confidence Interval for * 1 is ". 101046, ". 0272885 95% Confidence Interval for * 0 is 13. 83561, 14. 03613

f. (Hypothesis testing) Test whether the coefficient of distance from 4 years differ by sex? See section 3.4 in text (Comparing Means from Different Populations). i. Define your null and alternate hypothesis.
Solution: H 0  * male " * female  0 1 1 H 1  * male " * female p 0 1 1

male female ii. Find the variance of the joint variable: var* 1 " * 1 . Note that these beta estimates are
independent so this is just the sum of the individual variances which you can get from the Standard Errors output by Stata. Solution: male female male female Var* 1 " * 1  Var* 1  Var* 1 E

they are independent. male female D Var* 1 " * 1  0. 0201668 2  0. 0188052 2  0. 00076033

iii. Find the t-stat.


Solution: t " stat  male female *1 " *1 "0. 083837 " "0. 0641676 male female  0. 00076033 SE* 1 " * 1 D |t " stat| 0. 7133  1. 96

iv. Do you reject the null hypothesis?


Solution: E |t " stat| 0. 7133  1. 96 You cannot reject the null.

6. Computer Question II.


In this question we will do some small scale simulation. One of the advantages of doing simulations is that we can create a model where we know the true values of the parameters. In this case, we will generate a sample of size 2000 from X i L N6, @ 2 under 6  2 and @ 2  4. Then, we will test the null 6  2 versus 6 p 2 for each of these samples. We know that under the null hypothesis: " t  X i @ 2 L N0, 1 # which means that Pr|t|  1. 96  0. 05.

a. (Generate Data) Type in STATA: set obs 2000. This sets the number of observations to 2000. Now
lets create our "fake" dataset. Type in STATA: gen X2*invnorm(uniform())2. This should

create a vector of 2000 random numbers that come from N2, 4. Next write: gen tstat(X-2)/2. This creates a vector of t-statistics.

b. (Statistics) If our theory is correct, the vector of t-statistics that you got should have a normal distribution with
mean 0 and variance 1. To check whether this is the case type: sum tstat in STATA. Show your results and comment. Moreover, we know that Pr|t|  1. 96  0. 05. Since we have 2000 observations, this means that for approximately 100 observations (5% of 2000) the absolute value of our t-statistic should be greater than 1.96. Type in STATA: count if abs(tstat)1.96. Use a histogram to show your results (type in STATA: histogram tstat). Solution: . sum tstat Variable | Obs tstat | 2000 Mean .0087739 Std. Dev. Min Max - .9968548 -3.190681 3.166658

. count if abs(tstat)1.96 107

c. (Statistics II) Suppose instead that you want to test the null 6  6. In this case the t-statistic is

X i "6 2

. Now,

since the null hypothesis is FALSE, the distribution of the t-statistic is not standard normal. Generate a new t-statistic as: gen Newtstat(X-6)/2. Compute: sum Newtstat and then: count if abs(Newtstat)1.96. Save your results and comment. Is the distribution of Newtstat centered at zero with variance 1? What is the probability that Pr|Newtstat|  1. 96? Whats the implication of this in terms of the number of times you will reject or not the null hypothesis? Use a histogram to show your results (type in STATA: histogram Newtstat). What do you think would happen if you wanted to test the null 6  10? Solution: . sum Newtstat Variable | Obs Mean Std. Dev. Min Max -Newtstat | 2000 -1.991226 . count if abs(Newtstat)1.96 1004 Therefore Probability that we would reject thenull hypothesis is 1004/2000  50.2% of the time. With mean  10 the probability of rejecting increases. .9968548 -5.190681 1.166658

d. (Confidence intervals) Write the 95% confidence interval for the mean of X, type: ci X . Is 2 included in the
interval? What proportion of the class should arrive at the same conclusion? Everyone? Solution: . ci X Variable | Obs X | 2000 Mean 2.017548 Std. Err. .0445807 [95% Conf. Interval] 1.930118 2.104977 --

You might also like