Professional Documents
Culture Documents
Starting Microfit
-
Loading Data
Before it can do anything Microfit requires some data to work with. You can either
enter data:
(i)
via the keyboard
(ii)
load the data from a file already in Microfit format (as we are going to do)
(iii) use data inputted from a spreadsheet like Excel (see Exercise 1 and the
Appendix)
To load the data click on the drop down menu File and choose Open.
The data we are going to work with consists of earnings data (taken from Dougherty
which you might now be pretty familiar with). This is stored on the share drive [U
drive] as earnings.fit. Note this is just a sub-sample of the data. The whole data is in
fact in the excel data file Educational Attainment.
Now, since the version Microfit weve got does not allow sharing so we will have to
copy the file from the share drive to the deskstop to be able to use it.
Note you will need the software installed on the computer to be able to open and read
a fit file [Two rooms that I know of the software is installed is SSB 226 and SSB 119,
hence can be used only in these rooms].
By clicking on open the following Figure 2 should appear:
This screen constitutes what Microfit calls the Command Editor. From this window
you carry out a number of operations some of which we will be using. These are:
Graphing variables
Creating the constant term
Transforming variables (like transforming a variable into logs)
Obtaining basic descriptive statistics (like correlation between variables)
However, before we explore this further, there are two other windows which we
might find useful to view at this stage.
Just below the File Edit Options etc row there is a row of buttons something like:
Each give access to different windows. We will only be using the first four of these as
is meant for more advanced work outside the scope of this module.
We will be using
Process takes us to the Command Editor window were currently looking at.
You might want to confirm this by clicking the Process button which should
leave the window unchanged.
Data (click this now) pops up a sort of spreadsheet which you scroll and in
which the individual data can be edited. Cancel the window.
You can maximise the window and scroll through the data. We have observations on
540 individuals and for each individual we have hourly earnings (E), years of
schooling (S) and years of experience (X) among the variables listed for that
individual. Close the window.
Graphs
Suppose we are interested in viewing our data. Consider Earnings and Schooling. In
the Command Editor (you need to clear the editor of any previous command) type
SCATTER E S
Creating a constant
One important step before performing any regression analysis is to define the constant
term. To do this, select the option
at the bottom of the
Command Editor. A small window will prompt you to supply the name of the
constant. I tend to write CONSTANT. Click OK. To see what has happened click the
Variables button. Observe that CONSTANT has been described as the intercept
term. Close the window. By clicking the Data button you will see the addition of
CONSTANT in a new column with values 1. (Why?). Clicking on CLOSE will get
you back to the Command Editor.
Transforming Variables
We can also transform variables. During the lectures we looked at different functional
forms. We saw earnings in natural logs as ln E. So to transform hourly earnings
(variable E) into log E, in the Command Editor type
LE = LOG (E)
Note the brackets and the equal to sign. You can observe the newly created variable
by clicking Data. Clicking on CLOSE will get you back to the Command Editor.
Generating descriptive statistics
Recall during the lectures we discussed the possibility that regressors could be colinear (we may have high multicollinearity). Therefore, whenever we are conducting
regression analysis a good practice is to check the correlation between variables. In
Note there are other commands which Microfit performs which you may want to
explore in your own time
Regression Analysis
We are now ready to perform some regression analysis. The model we will first
estimate is the multiple earnings regression:
Ei = 1+ 2 Si + 3 Xi + ui
with hourly earnings E as dependent variable, years of schooling S and years of
experience X as our explanatory variables.
Remember the estimated regression is written as:
^
Ei = b1+ b2 Si + b3 Xi
How to estimate this regression in Microfit?
Click on the Single button at the top and the Estimate/Predict window will appear
(Figure 7):
6
CONSTANT
S X
Diagnostic Tests
*******************************************************************************
*
Test Statistics *
LM Version
*
F Version
*
*******************************************************************************
*
*
*
*
* A:Serial Correlation*CHSQ(
1)=
.66516[.415]*F(
1, 536)=
.66105[.417]*
*
*
*
*
* B:Functional Form
*CHSQ(
1)=
3.1644[.075]*F(
1, 536)=
3.1595[.076]*
*
*
*
*
* C:Normality
*CHSQ(
2)=
7921.3[.000]*
Not applicable
*
*
*
*
*
* D:Heteroscedasticity*CHSQ(
1)= 17.7646[.000]*F(
1, 538)= 18.3009[.000]*
*******************************************************************************
A:Lagrange multiplier test of residual serial correlation
B:Ramsey's RESET test using the square of the fitted values
C:Based on a test of skewness and kurtosis of residuals
D:Based on the regression of squared residuals on squared fitted values
One important step here is to save this result. This is important just like the data needs
to be saved you need to save your regression results for later use. In the results
window, look at the icons running across its top. If you hover the pointer over these
some text pops up to tell you what they are for. The first allows you to print. The next
allows you to save to a new results file. Select this and type a name for your file (I
choose earnings-multiple in correspondence to the data used but add simple to
make a note that this is the results from running a simple regression model. This result
is also saved for later use). Saving the results in this way allows me to access it using
Microsoft Word for instance. In addition, I can use the copy selected text to
clipboard icon to then paste it in Word file. If the copying gives something
which is all over the place, then choose Courier New and a font of 8 or 9 which
should do the trick.
Comments on regression output. The first part of the table gives estimates of the
regression coefficients with standard errors, t-ratio and p-values, the R-squared, the Fratio. The second part contains a variety of residual diagnostic tests such as the
Breusch-Godfrey serial correlation test and the Koenker-Bassett test for
heteroscedasticity.
Interpretation. The way to interpret the regression is in the same manner as I taught
you in class. For instance, the coefficient on S, that is 2.67, can be interpreted as
saying an extra year schooling would induce a $2.67 increase in hourly earnings (E),
holding years of experience (X) constant. You can interpret X in the same way. See
the Lecture notes.
Testing. Two tests I have shown you in class are the test of the individual
significance of a regression coefficient and test for the overall significance of the
regression. The former uses the t-test and the latter uses the F-test. Since these are
reported we only need to interpret them. Remember unlike in class we can use the pvalues to come to a conclusion as to whether to reject a hypothesis or not. However, it
is important to recall what you are actually testing. The p-values of 0.000 on both
estimated regression coefficients for S and X suggest these are significant in that they
individually have a significant impact on hourly earnings. The F-test reports an Fratio of 66.73 and p-value of 0.000. This means that our regression is significant. This
implies that both S and X have a joint significant influence on E.
Goodness of Fit. Next we need to comment on the overall fit of the regression. The
R-squared can be used. The reported R-squared is 0.199 suggesting that 19.9% of the
variation in hourly earnings is being explained by the regressors, i.e, S and X.
Testing for heteroscedasticity. The test to be used is reported in the second part of
the regression output. This is the Koenker-Bassett test (See Gujarati p.415 for a
discussion). The table reports a chi-square and an F-version of the test. We can use
both or any of these two. The basic point to remember is what we are testing. The null
hypothesis is there is no heteroscedasticity (there is homocedasticity) against the
alternative that there is heteroscedasticity. Remember all tests of heteroscedasticity
formulate similar hypotheses. Based on the reported p-values of 0.000 we can
conclude that our regression does suffer from heteroscedasticity. There are various
ways to deal with heteroscedasticity, which we are now going to explore.
Dealing with heteroscedasticity. In class we discussed 3 ways to deal with
heteroscedasticity: correction based on plausible pattern of heteroscedasticity; log
transformation; Whites heterosceasticity-consistent standard errors. Since Microfit
does not perform Whites correction we wont discuss it here. We consider the first
two solutions.
Correction based on plausible pattern of heteroscedasticity. The solution we
discussed in class is when the variance of the error term is unknown we can assert
certain plausible pattern that it may follow. Typically heteroscedasticity could be due
to the Var(u) being a function of one of the explanatory variables. To find which
variable the error term may be correlated with, we can plot first E on S and then E on
X. Actually the scatter graph we plotted earlier can give us some initial clues.
Although not clear observations do not appear evenly spread. This is suggestive that
the variable causing heteroscedasticity could be S. To be much clear about this we
could plot the scatter of E on X. Try this. What do you find? Not much uneven spread
at least compared to the plot of E on S appears. However, this is not sufficient to tell
us that S is the culprit variable. For that we need to go a step further and plot the
residuals (estimate of the errors) against these two variables separately. To do that we
need to save our residuals after having run our regression. Unfortunately, if we have
closed results window then we need to re-run our regression. If not, then after closing
the Results window a window displaying the Post Regression Menu will
appear. This offers various options. By default Move to hypothesis testing is chosen.
100
RESIDUALS
50
-50
0
10
15
20
25
100
RESIDUALS
50
-50
0
10
15
20
25
10
This somewhat confirms what we suspected earlier that S may be the culprit variable
accounting for the non-constant variance. We now apply the first correction tool.
Recall in class the correction amounted to dividing the initial regression by the culprit
variable S we get
E 1
X u
=
+ 2 + 3 +
S
S
S S
Before running this new regression we have to create some variables. So move to the
Process window and create the variables
ES = E/S; IS = 1/S; XS = X/S
Then we can run this regression by typing
ES
CONSTANT
IS XS
The abridged regression output shows the heterosecedasticity still hasnt been
completely corrected at the 5 and 10% levels but we wouldnt reject the null of
homocedasticity at a 1% level. Now maybe the form we assume may not be the
correct one. There are other potential forms like dividing the regression by S2, as the
Var(u) is proportional to S4. Try this and you will find that it does correct for
heteroscedasticity, as the p-value becomes 0.105. Note that to interpret these
transformed regressions we need to multiply through by the term we divided the
regression by in the first place. Also you could try other transformations as discussed
in Gujarati.
Correction based on log transformation. Another solution is to apply a log
transformation. We have created LE which we defined as the log of Earnings earlier.
We can do the same thing for the other variables in the regression by transforming
them into log and run a regression of
Ln Ei = 1+ 2 LnSi + 3 LnXi + ui
Type LE CONSTANT LS LX
11
The output next page should appear. Observe heteroscedasticity is only present at the
10% level.
Ordinary Least Squares Estimation
*******************************************************************************
Dependent variable is LE
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-2.2837
.42027
-5.4338[.000]
LS
1.5344
.12577
12.2000[.000]
LX
.39107
.070432
5.5524[.000]
*******************************************************************************
R-Squared
.22531
R-Bar-Squared
.22243
S.E. of Regression
.52378
F-stat.
F( 2, 537)
78.0920[.000]
Diagnostic Tests
*******************************************************************************
*
Test Statistics *
LM Version
*
F Version
*
*******************************************************************************
* D:Heteroscedasticity*CHSQ(
1)=
3.3709[.066]*F(
1, 538)=
3.3795[.067]*
*******************************************************************************
Ei = 1+ 2 Si + 3 Xi + 4 MALEi + ui
So type
E CONSTANT S X MALE
which produces the following regression output as shown below. MALE variable is
significant as shown by the p-value. I let you do the interpretation here (you can go
back to your lecture handout on Dummy Variables).
Ordinary Least Squares Estimation
*******************************************************************************
Dependent variable is E
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-26.7957
4.3942
-6.0980[.000]
S
2.5876
.22587
11.4561[.000]
X
.46792
.13577
3.4465[.001]
MALE
6.3785
1.1093
5.7499[.000]
*******************************************************************************
R-Squared
.24559
R-Bar-Squared
.24137
*******************************************************************************
12
CONSTANT
S X AGE
which produces
Ordinary Least Squares Estimation
*******************************************************************************
Dependent variable is E
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-19.2621
10.7854
-1.7859[.075]
S
2.6883
.23279
11.5479[.000]
X
.62709
.14426
4.3470[.000]
AGE
-.20706
.26432
-.78339[.434]
*******************************************************************************
R-Squared
.19998
R-Bar-Squared
.19550
*******************************************************************************
Age does not contribute much to the regression and suppose we want to drop it from
the regression. Then we can use the Wald test for the deletion of this variable. To
perform this test close the Results window (after having saved your results). This will
bring out the Post Regression Menu. Choose Move to hypothesis testing and then
choose option 5 Variable deletion test. Press Ok. A window asking us to list the
variable(s) we want to delete opens up. On the right-hand-side you will see a drop
out selection of variables. Select AGE as the variable to be dropped or for which we
want to test for its omission. Choose AGE and press OK. The following output is
obtained:
Variable Deletion Test (OLS case)
*******************************************************************************
Dependent variable is E
List of the variables deleted from the regression:
AGE
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-26.9316
4.5234
-5.9538[.000]
S
2.6740
.23200
11.5261[.000]
X
.59410
.13792
4.3074[.000]
*******************************************************************************
Joint test of zero restrictions on the coefficients of deleted variables:
Lagrange Multiplier Statistic
CHSQ( 1)=
.61757[.432]
Likelihood Ratio Statistic
CHSQ( 1)=
.61792[.432]
F Statistic
F( 1, 536)=
.61370[.434]
*******************************************************************************
Recall we are specifying a null that the coefficient is significantly equal to zero which
cannot be rejected by the Wald test. This is in fact similar to testing using the t-test
but it is more appropriate to use the Wald test as it provides a careful testing
procedure of removing one variable. Moreover when we want to drop two variables as
taught to you in class we need to use the Wald deletion test. In that case I would
specify the two variables I want dropped.
Testing for Linear restrictions [using the Wald Test].
Suppose we suspect that in the above regression that age and experience are
correlated. So we could impose a theoretical restriction on the regression by testing
for X = AGE. This means that experience and age are similar. To do that go back to
the Estimate/Predict window and re-run the regression or close from previous test the
results window which will bring you back to the Hypothesis Test Menu. Here
13
choose option 7: Wald test of linear restrictions and press OK. A new window
appears. Observe the line Coefficients A1 to A4 are assigned to the above regressors
respectively. This means Microfit is looking at our regression in the following form:
Ei = A1+ A2 Si + A3 Xi + A4 AGEi + ui
The two variables we want to equate are X and AGE. So specify A3 = A4 as being the
coefficients we want to equate. So type A3 = A4 which should yield
Wald test of restriction(s) imposed on parameters
*******************************************************************************
Based on OLS regression of E on:
CONSTANT
S
X
AGE
540 observations used for estimation from
1 to 540
*******************************************************************************
Coefficients A1 to A4 are assigned to the above regressors respectively.
List of restriction(s) for the Wald test:
A3=A4
*******************************************************************************
Wald Statistic
CHSQ( 1)=
6.1611[.013]
*******************************************************************************
The Wald statistic with a p-value of 0.013 can be rejected and we conclude that X and
AGE are not equally likely. This is not surprising as the two variables are not really
correlated!! This tells you a lot about how you need to actually check if something is
correct in statistics by actually testing for it rather than base your assertion on
supposition.
Before you exit make sure you save your modified data and save your output
files to be accessed later.
Exercises
1) Re-run the above earnings regression but instead of Gender introduce the
dummy variable Ethnicity which has 3 categories in your regression.
2) Go to the coursework folder on the share drive and try playing with data on
Educational Attainment which you are going to use for the project. You will
find data on a number of variables including S, A, SM and SF which stand for
years of schooling, intelligence (measured in IQ score), SM (mothers
schooling), SF (fathers schooling). Try to estimate the following regression
for starters:
Si = 1+ 2 Ai + 3 SMi + 4 SFi + ui
(i)
(ii)
14
15
In the cell number of observations select 540, as we have data on 540 individuals,
and choose number of variables to be 10. Then press OK.
Then the following screen will appear:
This is asking for some further description of the data. Our data is arranged with the
variables names in the first row/line (optional descriptions) followed by the data and
data only in the columns starting with the first column. So our option for the rows is
to choose the second option variable names with optional descriptions (usually
already default selected) and for the columns choice select data only (second
16
option). Press OK. If you successfully input the data in Microfit then the following
screen should in principle follow:
At this point you need to press GO so that Microfit completes the reading of the file.
When this is done you will find that the buttons in the toolbar, namely Process, Single
and Multi will become activated, implying that Microfit has completed the reading of
the data and we can use the data for transformation or regression running.
Another step you need to do is define the constant in the normal way in the
PROCESS editor.
After that an important step is to SAVE your data. To do this select the SAVE or
SAVE AS from the drop down menu FILE (note the file name will end with a .fit
meaning it is being saved as a Microfit data file).
Make sure you save your data by giving it an appropriate name and saving it in either
a directory or disk (personal directory or USB stick or floppy) if you want to use that
data later. In fact it is a good practice to keep saving time you modify it through
transformations like after having defined the constant.
From thereon it is similar to as if you were using an already created Microfit data file.
17