You are on page 1of 17

Introductory Econometrics Microfit Tutorial Session

Econometric packages, by performing most of the calculations necessary for


estimation and testing econometric models, make the mechanics of applied
econometrics research quite straightforward. The difficulties arise elsewhere, in the
formulation of interesting acceptable models and in the collection of data. Today you
will see how the econometric package Microfit removes the burden of computation
from econometrics by taking you through from some preliminary data analysis,
through model formulation, estimation and simple diagnostic procedures. You can go
at your own pace.
Objectives

To explore Microfits ability to allow graphical exploration of the data


To learn how simple transformation of existing data may be undertaken
To use Microfits capabilities to estimate the parameters of regression models,
along with various measures of fit and testing
To interpret and comment upon the results obtained

Starting Microfit
-

Login the network as usual [username + password]


To start Microfit: choose the Start > Programs > Microfit > Select the icon
Microfit for Windows.lnk

The screen that will appear will be similar to Figure 1.

Figure 1. Microfits initial screen

Loading Data
Before it can do anything Microfit requires some data to work with. You can either
enter data:
(i)
via the keyboard
(ii)
load the data from a file already in Microfit format (as we are going to do)
(iii) use data inputted from a spreadsheet like Excel (see Exercise 1 and the
Appendix)
To load the data click on the drop down menu File and choose Open.
The data we are going to work with consists of earnings data (taken from Dougherty
which you might now be pretty familiar with). This is stored on the share drive [U
drive] as earnings.fit. Note this is just a sub-sample of the data. The whole data is in
fact in the excel data file Educational Attainment.
Now, since the version Microfit weve got does not allow sharing so we will have to
copy the file from the share drive to the deskstop to be able to use it.
Note you will need the software installed on the computer to be able to open and read
a fit file [Two rooms that I know of the software is installed is SSB 226 and SSB 119,
hence can be used only in these rooms].
By clicking on open the following Figure 2 should appear:

Figure 2. Microfit following file load

This screen constitutes what Microfit calls the Command Editor. From this window
you carry out a number of operations some of which we will be using. These are:

Listing the data which has been loaded

Graphing variables
Creating the constant term
Transforming variables (like transforming a variable into logs)
Obtaining basic descriptive statistics (like correlation between variables)

However, before we explore this further, there are two other windows which we
might find useful to view at this stage.
Just below the File Edit Options etc row there is a row of buttons something like:

Each give access to different windows. We will only be using the first four of these as
is meant for more advanced work outside the scope of this module.
We will be using

to run our regression later.

Let us look at the other three command options:

Process takes us to the Command Editor window were currently looking at.
You might want to confirm this by clicking the Process button which should
leave the window unchanged.

Variables (click this now) is quite useful as it pops up a window showing a


description of our variables. In our case we are looking at earnings, schooling
and experience. Figure 3 shows this. Click on Close to remove this window.

Figure 3. Description of Variables

Data (click this now) pops up a sort of spreadsheet which you scroll and in
which the individual data can be edited. Cancel the window.

You should now be looking at the Command Editor window again.

Prelimminary Data Analysis


We are now going to explore how the Command Editor can be used to: (i) examine
data; (ii) generate graphs; (iii) create a constant term; (iv) transform existing
variables, and (v) generating descriptive statistics
Listing the data
One way to, other than through choosing the Data command, check on the data we
have is to type in Command Editor
LIST
Note any such command does not have to be in capital letters
You should see a window of 10 variables appear as shown in Figure 4:

Figure 4. Listed data

You can maximise the window and scroll through the data. We have observations on
540 individuals and for each individual we have hourly earnings (E), years of
schooling (S) and years of experience (X) among the variables listed for that
individual. Close the window.
Graphs
Suppose we are interested in viewing our data. Consider Earnings and Schooling. In
the Command Editor (you need to clear the editor of any previous command) type
SCATTER E S

Here we are commanding Microfit to provide a scatter of hourly earnings on years of


schooling. You should get the following graph that appears as in Figure 5. (What sort
of relationship the graph suggest exists between E and S? Also are the observations
spread evenly). Close the window.

Figure 5. Scatter of Earnings on Schooling

Creating a constant
One important step before performing any regression analysis is to define the constant
term. To do this, select the option
at the bottom of the
Command Editor. A small window will prompt you to supply the name of the
constant. I tend to write CONSTANT. Click OK. To see what has happened click the
Variables button. Observe that CONSTANT has been described as the intercept
term. Close the window. By clicking the Data button you will see the addition of
CONSTANT in a new column with values 1. (Why?). Clicking on CLOSE will get
you back to the Command Editor.
Transforming Variables
We can also transform variables. During the lectures we looked at different functional
forms. We saw earnings in natural logs as ln E. So to transform hourly earnings
(variable E) into log E, in the Command Editor type
LE = LOG (E)
Note the brackets and the equal to sign. You can observe the newly created variable
by clicking Data. Clicking on CLOSE will get you back to the Command Editor.
Generating descriptive statistics
Recall during the lectures we discussed the possibility that regressors could be colinear (we may have high multicollinearity). Therefore, whenever we are conducting
regression analysis a good practice is to check the correlation between variables. In

our particular dataset it is worthwhile to check if schooling and experience are


correlated. So to do this in the Command Editor type
COR E S X
So this line is commanding Microfit to find the correlation between E and S, E and X
and S and X. A first window describing the variables (mean, standard deviation,
minimum and maximum values will be given) will appear. Close this window. A
second window will appear which produces the correlation matrix. (Is there any
collinearity between S and X?)

Figure 6. Correlation Matrix

Note there are other commands which Microfit performs which you may want to
explore in your own time

Regression Analysis
We are now ready to perform some regression analysis. The model we will first
estimate is the multiple earnings regression:

Ei = 1+ 2 Si + 3 Xi + ui
with hourly earnings E as dependent variable, years of schooling S and years of
experience X as our explanatory variables.
Remember the estimated regression is written as:

^
Ei = b1+ b2 Si + b3 Xi
How to estimate this regression in Microfit?
Click on the Single button at the top and the Estimate/Predict window will appear
(Figure 7):
6

Figure 7. The Estimate/Predict window

Notice that along the top should appear the legend


Linear Regression Ordinary Least Squares
This is exactly how we want to estimate our earnings function: by using the OLS
method.
In this window you will find listed the Start of period End of period, which
represents our number of observations which is from 1 to 540. In running a regression
we may choose a sub-sample rather than choosing the whole sample of individuals. In
our case we will stick to all individuals in the sample. This is followed by Variables.
Looking at the listed variables you will find the variables which were part of the
original data and that part which we created (like ln E and the constant term though
the latter is not a variable strictly speaking).
The space provided below the instructions is where we need to type our command to
run the regression. Leaving space in between type:
E

CONSTANT

S X

So this line commands Microfit to run the regression of E (hourly earnings) on S


(schooling in years) and X (years of experience). When you are satisfied you have
entered the correct command then click on
regression result:

which gives the following

Table 1. OLS regression for E on S and X


Ordinary Least Squares Estimation
*******************************************************************************
Dependent variable is E
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-26.9316
4.5234
-5.9538[.000]
S
2.6740
.23200
11.5261[.000]
X
.59410
.13792
4.3074[.000]
*******************************************************************************
R-Squared
.19906
R-Bar-Squared
.19608
S.E. of Regression
13.0920
F-stat.
F( 2, 537)
66.7311[.000]
Mean of Dependent Variable
19.7192
S.D. of Dependent Variable
14.6015
Residual Sum of Squares
92041.6
Equation Log-likelihood
-2153.6
Akaike Info. Criterion
-2156.6
Schwarz Bayesian Criterion
-2163.0
DW-statistic
2.0691
*******************************************************************************

Diagnostic Tests
*******************************************************************************
*
Test Statistics *
LM Version
*
F Version
*
*******************************************************************************
*
*
*
*
* A:Serial Correlation*CHSQ(
1)=
.66516[.415]*F(
1, 536)=
.66105[.417]*
*
*
*
*
* B:Functional Form
*CHSQ(
1)=
3.1644[.075]*F(
1, 536)=
3.1595[.076]*
*
*
*
*
* C:Normality
*CHSQ(
2)=
7921.3[.000]*
Not applicable
*
*
*
*
*
* D:Heteroscedasticity*CHSQ(
1)= 17.7646[.000]*F(
1, 538)= 18.3009[.000]*
*******************************************************************************
A:Lagrange multiplier test of residual serial correlation
B:Ramsey's RESET test using the square of the fitted values
C:Based on a test of skewness and kurtosis of residuals
D:Based on the regression of squared residuals on squared fitted values

One important step here is to save this result. This is important just like the data needs
to be saved you need to save your regression results for later use. In the results
window, look at the icons running across its top. If you hover the pointer over these
some text pops up to tell you what they are for. The first allows you to print. The next
allows you to save to a new results file. Select this and type a name for your file (I
choose earnings-multiple in correspondence to the data used but add simple to
make a note that this is the results from running a simple regression model. This result
is also saved for later use). Saving the results in this way allows me to access it using
Microsoft Word for instance. In addition, I can use the copy selected text to
clipboard icon to then paste it in Word file. If the copying gives something
which is all over the place, then choose Courier New and a font of 8 or 9 which
should do the trick.
Comments on regression output. The first part of the table gives estimates of the
regression coefficients with standard errors, t-ratio and p-values, the R-squared, the Fratio. The second part contains a variety of residual diagnostic tests such as the
Breusch-Godfrey serial correlation test and the Koenker-Bassett test for
heteroscedasticity.
Interpretation. The way to interpret the regression is in the same manner as I taught
you in class. For instance, the coefficient on S, that is 2.67, can be interpreted as
saying an extra year schooling would induce a $2.67 increase in hourly earnings (E),

holding years of experience (X) constant. You can interpret X in the same way. See
the Lecture notes.
Testing. Two tests I have shown you in class are the test of the individual
significance of a regression coefficient and test for the overall significance of the
regression. The former uses the t-test and the latter uses the F-test. Since these are
reported we only need to interpret them. Remember unlike in class we can use the pvalues to come to a conclusion as to whether to reject a hypothesis or not. However, it
is important to recall what you are actually testing. The p-values of 0.000 on both
estimated regression coefficients for S and X suggest these are significant in that they
individually have a significant impact on hourly earnings. The F-test reports an Fratio of 66.73 and p-value of 0.000. This means that our regression is significant. This
implies that both S and X have a joint significant influence on E.
Goodness of Fit. Next we need to comment on the overall fit of the regression. The
R-squared can be used. The reported R-squared is 0.199 suggesting that 19.9% of the
variation in hourly earnings is being explained by the regressors, i.e, S and X.
Testing for heteroscedasticity. The test to be used is reported in the second part of
the regression output. This is the Koenker-Bassett test (See Gujarati p.415 for a
discussion). The table reports a chi-square and an F-version of the test. We can use
both or any of these two. The basic point to remember is what we are testing. The null
hypothesis is there is no heteroscedasticity (there is homocedasticity) against the
alternative that there is heteroscedasticity. Remember all tests of heteroscedasticity
formulate similar hypotheses. Based on the reported p-values of 0.000 we can
conclude that our regression does suffer from heteroscedasticity. There are various
ways to deal with heteroscedasticity, which we are now going to explore.
Dealing with heteroscedasticity. In class we discussed 3 ways to deal with
heteroscedasticity: correction based on plausible pattern of heteroscedasticity; log
transformation; Whites heterosceasticity-consistent standard errors. Since Microfit
does not perform Whites correction we wont discuss it here. We consider the first
two solutions.
Correction based on plausible pattern of heteroscedasticity. The solution we
discussed in class is when the variance of the error term is unknown we can assert
certain plausible pattern that it may follow. Typically heteroscedasticity could be due
to the Var(u) being a function of one of the explanatory variables. To find which
variable the error term may be correlated with, we can plot first E on S and then E on
X. Actually the scatter graph we plotted earlier can give us some initial clues.
Although not clear observations do not appear evenly spread. This is suggestive that
the variable causing heteroscedasticity could be S. To be much clear about this we
could plot the scatter of E on X. Try this. What do you find? Not much uneven spread
at least compared to the plot of E on S appears. However, this is not sufficient to tell
us that S is the culprit variable. For that we need to go a step further and plot the
residuals (estimate of the errors) against these two variables separately. To do that we
need to save our residuals after having run our regression. Unfortunately, if we have
closed results window then we need to re-run our regression. If not, then after closing
the Results window a window displaying the Post Regression Menu will
appear. This offers various options. By default Move to hypothesis testing is chosen.

We need the option just under: List/Plot/Save residuals . So select it by pointing


and clicking the radio button. Press OK. Another window with Display/Save
residuals .. should appear. Choose option 6: Save residuals. A small box will appear,
after you select this option, prompting you to give the residuals a name. I call it
residuals, nothing fancy!! After you press OK a variable would have been created
called residuals. Close all subsequent windows, which will bring you back to the
Estimate/Predict window. Since we need to graph the relationships between residuals
and the explanatory variables individually, go the Process window. Here to get the
graphs we want type in Scatter residuals S. This will give a scatter graph of the
residuals on S. What can you say about the relationship? [Does it appear constant or
non-constant? A constant link is suggestive of homocedasticity and non-constancy is
suggestive of heteroscedasticity]. Next, try Scatter residuals X. What can you say
about the relationship? [Does it appear constant or non-constant? A constant link is
suggestive of homocedasticity and non-constancy is suggestive of heteroscedasticity].
I provide these graphs below.
Scatter plot of RESIDUALS on S
150

100

RESIDUALS
50

-50
0

10

15

20

25

Scatter plot of RESIDUALS on X


150

100

RESIDUALS
50

-50
0

10

15

20

25

10

This somewhat confirms what we suspected earlier that S may be the culprit variable
accounting for the non-constant variance. We now apply the first correction tool.
Recall in class the correction amounted to dividing the initial regression by the culprit
variable S we get
E 1
X u
=
+ 2 + 3 +
S
S
S S

Before running this new regression we have to create some variables. So move to the
Process window and create the variables
ES = E/S; IS = 1/S; XS = X/S
Then we can run this regression by typing
ES

CONSTANT

IS XS

Ordinary Least Squares Estimation


*******************************************************************************
Dependent variable is ES
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
2.2533
.20734
10.8677[.000]
IS
-21.7809
3.5856
-6.0745[.000]
XS
.62285
.11465
5.4326[.000]
*******************************************************************************
R-Squared
.069713
R-Bar-Squared
.066248
S.E. of Regression
.88069
F-stat.
F( 2, 537)
20.1206[.000]
Diagnostic Tests
*******************************************************************************
*
Test Statistics *
LM Version
*
F Version
*
*******************************************************************************
* D:Heteroscedasticity*CHSQ(
1)=
5.9369[.015]*F(
1, 538)=
5.9807[.015]*
*******************************************************************************

The abridged regression output shows the heterosecedasticity still hasnt been
completely corrected at the 5 and 10% levels but we wouldnt reject the null of
homocedasticity at a 1% level. Now maybe the form we assume may not be the
correct one. There are other potential forms like dividing the regression by S2, as the
Var(u) is proportional to S4. Try this and you will find that it does correct for
heteroscedasticity, as the p-value becomes 0.105. Note that to interpret these
transformed regressions we need to multiply through by the term we divided the
regression by in the first place. Also you could try other transformations as discussed
in Gujarati.
Correction based on log transformation. Another solution is to apply a log
transformation. We have created LE which we defined as the log of Earnings earlier.
We can do the same thing for the other variables in the regression by transforming
them into log and run a regression of

Ln Ei = 1+ 2 LnSi + 3 LnXi + ui
Type LE CONSTANT LS LX

11

The output next page should appear. Observe heteroscedasticity is only present at the
10% level.
Ordinary Least Squares Estimation
*******************************************************************************
Dependent variable is LE
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-2.2837
.42027
-5.4338[.000]
LS
1.5344
.12577
12.2000[.000]
LX
.39107
.070432
5.5524[.000]
*******************************************************************************
R-Squared
.22531
R-Bar-Squared
.22243
S.E. of Regression
.52378
F-stat.
F( 2, 537)
78.0920[.000]
Diagnostic Tests
*******************************************************************************
*
Test Statistics *
LM Version
*
F Version
*
*******************************************************************************
* D:Heteroscedasticity*CHSQ(
1)=
3.3709[.066]*F(
1, 538)=
3.3795[.067]*
*******************************************************************************

Remember the interpretation in this regression is also different as the unit of


measurement has changed.
Dummy Variables.
In class I showed you how to create dummy variables to answer questions like
whether gender has a differential impact on earnings. We modify our regression to
include the dummy variable MALE defined as 1 for a male employee and 0 for a
female employee. Looking at the data you will see that we have two columns on two
variables namely MALE and FEMALE so we could as well have added the FEMALE
variable instead. But here we add the MALE variable.

Ei = 1+ 2 Si + 3 Xi + 4 MALEi + ui
So type
E CONSTANT S X MALE
which produces the following regression output as shown below. MALE variable is
significant as shown by the p-value. I let you do the interpretation here (you can go
back to your lecture handout on Dummy Variables).
Ordinary Least Squares Estimation
*******************************************************************************
Dependent variable is E
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-26.7957
4.3942
-6.0980[.000]
S
2.5876
.22587
11.4561[.000]
X
.46792
.13577
3.4465[.001]
MALE
6.3785
1.1093
5.7499[.000]
*******************************************************************************
R-Squared
.24559
R-Bar-Squared
.24137
*******************************************************************************

Omitting variables from regression [using the Wald Test].


Suppose we want to drop variables from our regression. We can use the Wald test to
do so. To demonstrate the usefulness of this technique we need to first introduce an
additional variable in our regression first. This is AGE. So type

12

CONSTANT

S X AGE

which produces
Ordinary Least Squares Estimation
*******************************************************************************
Dependent variable is E
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-19.2621
10.7854
-1.7859[.075]
S
2.6883
.23279
11.5479[.000]
X
.62709
.14426
4.3470[.000]
AGE
-.20706
.26432
-.78339[.434]
*******************************************************************************
R-Squared
.19998
R-Bar-Squared
.19550
*******************************************************************************

Age does not contribute much to the regression and suppose we want to drop it from
the regression. Then we can use the Wald test for the deletion of this variable. To
perform this test close the Results window (after having saved your results). This will
bring out the Post Regression Menu. Choose Move to hypothesis testing and then
choose option 5 Variable deletion test. Press Ok. A window asking us to list the
variable(s) we want to delete opens up. On the right-hand-side you will see a drop
out selection of variables. Select AGE as the variable to be dropped or for which we
want to test for its omission. Choose AGE and press OK. The following output is
obtained:
Variable Deletion Test (OLS case)
*******************************************************************************
Dependent variable is E
List of the variables deleted from the regression:
AGE
540 observations used for estimation from
1 to 540
*******************************************************************************
Regressor
Coefficient
Standard Error
T-Ratio[Prob]
CONSTANT
-26.9316
4.5234
-5.9538[.000]
S
2.6740
.23200
11.5261[.000]
X
.59410
.13792
4.3074[.000]
*******************************************************************************
Joint test of zero restrictions on the coefficients of deleted variables:
Lagrange Multiplier Statistic
CHSQ( 1)=
.61757[.432]
Likelihood Ratio Statistic
CHSQ( 1)=
.61792[.432]
F Statistic
F( 1, 536)=
.61370[.434]
*******************************************************************************

Recall we are specifying a null that the coefficient is significantly equal to zero which
cannot be rejected by the Wald test. This is in fact similar to testing using the t-test
but it is more appropriate to use the Wald test as it provides a careful testing
procedure of removing one variable. Moreover when we want to drop two variables as
taught to you in class we need to use the Wald deletion test. In that case I would
specify the two variables I want dropped.
Testing for Linear restrictions [using the Wald Test].
Suppose we suspect that in the above regression that age and experience are
correlated. So we could impose a theoretical restriction on the regression by testing
for X = AGE. This means that experience and age are similar. To do that go back to
the Estimate/Predict window and re-run the regression or close from previous test the
results window which will bring you back to the Hypothesis Test Menu. Here

13

choose option 7: Wald test of linear restrictions and press OK. A new window
appears. Observe the line Coefficients A1 to A4 are assigned to the above regressors
respectively. This means Microfit is looking at our regression in the following form:

Ei = A1+ A2 Si + A3 Xi + A4 AGEi + ui
The two variables we want to equate are X and AGE. So specify A3 = A4 as being the
coefficients we want to equate. So type A3 = A4 which should yield
Wald test of restriction(s) imposed on parameters
*******************************************************************************
Based on OLS regression of E on:
CONSTANT
S
X
AGE
540 observations used for estimation from
1 to 540
*******************************************************************************
Coefficients A1 to A4 are assigned to the above regressors respectively.
List of restriction(s) for the Wald test:
A3=A4
*******************************************************************************
Wald Statistic
CHSQ( 1)=
6.1611[.013]
*******************************************************************************

The Wald statistic with a p-value of 0.013 can be rejected and we conclude that X and
AGE are not equally likely. This is not surprising as the two variables are not really
correlated!! This tells you a lot about how you need to actually check if something is
correct in statistics by actually testing for it rather than base your assertion on
supposition.
Before you exit make sure you save your modified data and save your output
files to be accessed later.
Exercises
1) Re-run the above earnings regression but instead of Gender introduce the
dummy variable Ethnicity which has 3 categories in your regression.
2) Go to the coursework folder on the share drive and try playing with data on
Educational Attainment which you are going to use for the project. You will
find data on a number of variables including S, A, SM and SF which stand for
years of schooling, intelligence (measured in IQ score), SM (mothers
schooling), SF (fathers schooling). Try to estimate the following regression
for starters:
Si = 1+ 2 Ai + 3 SMi + 4 SFi + ui
(i)
(ii)

Comment on the estimated regression coefficients and their


individual significance
Comment on the overall significance of the regression

14

Appendix Loading Data in Microfit from an Excel file


This is the best way to load data into Microfit, especially if you are conducting
research where you would have to collect data from a particular source and inputted it
into Excel (rather than someone having collected the data and inputted it in Microfit
format for you!!).
In our case we will use the Excel file Earnings-MRA from the share drive. Copy the
data either via the keyboard or using the mouse. Note when you are copying the data
make sure you select only the data and dont choose cells outside the provided data.
This is due to the way Microfit reads the data from each cell.
After having copied the data, open Microfit and from the drop down menu option
EDIT select the option PASTE DATA. Having done that a screen, as shown below,
will appear prompting you to provide some information on the data: whether it is
undated, annual, half-yearly, quarterly or monthly. Since our data is on individuals we
choose UNDATED.

By selecting undated, the screen will change to resulting screen below:

15

In the cell number of observations select 540, as we have data on 540 individuals,
and choose number of variables to be 10. Then press OK.
Then the following screen will appear:

This is asking for some further description of the data. Our data is arranged with the
variables names in the first row/line (optional descriptions) followed by the data and
data only in the columns starting with the first column. So our option for the rows is
to choose the second option variable names with optional descriptions (usually
already default selected) and for the columns choice select data only (second

16

option). Press OK. If you successfully input the data in Microfit then the following
screen should in principle follow:

At this point you need to press GO so that Microfit completes the reading of the file.
When this is done you will find that the buttons in the toolbar, namely Process, Single
and Multi will become activated, implying that Microfit has completed the reading of
the data and we can use the data for transformation or regression running.
Another step you need to do is define the constant in the normal way in the
PROCESS editor.
After that an important step is to SAVE your data. To do this select the SAVE or
SAVE AS from the drop down menu FILE (note the file name will end with a .fit
meaning it is being saved as a Microfit data file).
Make sure you save your data by giving it an appropriate name and saving it in either
a directory or disk (personal directory or USB stick or floppy) if you want to use that
data later. In fact it is a good practice to keep saving time you modify it through
transformations like after having defined the constant.
From thereon it is similar to as if you were using an already created Microfit data file.

17

You might also like