You are on page 1of 16

Using Stata

THE OPENING DISPLAY

Command-this is where Stata command are typed

Results--output from commands, and error messages, appear here

Review-a listing of commands recently executed

Variables-names of variables in data and labels (if created)

Pull-down menus

Current path

EXITING STATA

File> Exit. Alternatively, simply type exit in the Command window and press Enter.

A working directory:

File < change working directory < go to folder where data is saved.

Cd copy and paste data address

New path is shown.

If you are working in a computer lab, you may want to have a storage device such as a

"flash"

cd copy and paste data address from flash data

1
OPENING STATA DATA FILES

The use command:

With Stata started, change your working directory to the where you have stored the Stata data

files. In the Command window type

Use file name and press Enter.

If you have a data file already open, and have changed it in some way, Stata will reply with

an error message. You can either save the previous data file [more on this below], or enter

clear in the Command window.

The clear command will clear what is in Stata's memory. If you want to open the data file and

clear memory, enter

Use file name, clear

Using the toolbar to open Stata data file:

File < open

Double click on Stata data file:

*dta file

Import excel file into Stata:

File < import < excel spread sheet

Open data editor, and copy data from excel and paste into Stata.

2
Import excel file from command line:

First save your excel file into csv file then use the command

Insheet using file name.csv

Using files on the internet:

Files can be loaded from a web site.

The Stata data files are stored at http://stata.com/data/s4poe. For example, to load

cps_small.dta, after saving previous data and/or clearing memory, enter in the Command

window

Use http://stata.com/data/s4poe/cps_small

Once the data are loaded onto your machine, you can save it using File> Save as and filling in

the resulting dialog box.

Example data in Stata

File < example data set

THE VARIABLES WINDOW

On the Stata pull-down menu select Data> Labels> Label Variable. OR

Label variable wage "earnings per hour"

Rename variable:

Data > variable manager >

3
Rename price p1

Label Variable:

Label variable gdp growth rate

Label values race

DESCRIBING DATA AND OBTAINING SUMMARY STATISTICS

The pull-down menu is Statistics > Summaries, tables, and tests > Summary and descriptive

statistics > Summary statistics.

Describe

Summarize

THE STATA HELP SYSTEM

Help> Search

Help summarize

If you wish to summarize the data using the dialog box, enter db summarize

STATA COMMAND SYNTAX

Command [varlist] [if] [in] [weight] [, options]

Syntax of summarize:

4
Summarize age, detail

summarize hour if wage >= 10

summar wage in 1/50

summarize wage in 1/50, detail

Learning syntax using the review window:

Statisics > Summary statistics> Summary and descriptive statistics> Summary Statistics from

the pull-down menu

SAVING YOUR WORK

One option is to highlight the output the Results window, then right-click. then paste it into a

document. While you may be using Times New Roman font for standard text, use Courier

New for Stata output. You may have to reduce the font size to 8 or 9 to make it fit.

Using a log file:

In addition to having results in the Results window in Stata, it is a very good idea to have all

results written to an output file, which Stata calls a log file.

To begin, click on the Log Begin/Close/Suspend/Resume icon on the Stata toolbar. OR

File> Log> Begin.

5
summarize wage, detail

Again click on the Log Begin/Close/Suspend/Resume icon used to open the log file. In the

resulting dialog box select Close log file.

Using Stata commands for log files:

log using gdp This will open .smcl file

log using gdp, replace will open the log file and replace one by the same name if it

exists .

log using gdp, append will open an existing log file and add new results at the end.

log close closes a log file that is open.

Viewing a log file:

File> Log> View

You can print the entire log file by clicking the printer icon. Alternatively, you can highlight

parts of the smcl file and right-click. Use one of the Copy options and then paste the result

into a document.

Translating a log file to a text file:

File> Log> Translate.

6
To translate the *. Smcl (log file) to a text file, in the current directory, enter

Translate gdp .smcl gdp .txt

If the text file already exists, and you wish to write over it, use

Translate gdp.smcl gdp.txt, replace

To print directly from the Command window, enter

Print gdp.smcl

USING STATA DO-FILES

These are files containing lists of commands that will be executed as a batch.

Right-click in the Review window and then Select All. After all commands are selected

right-click again and choose Send to Do-file Editor.

The Do-file Editor is opened. To save this file click on File> Save as.

log using gdp, replace the replace option deletes any old version of the log file.

Use consumption, clear the clear option deletes any data in memory.

7
CREATING AND MANAGING VARIABLES

Creating (generating) new variables:

Data> Create or change variables> Create new variable

Alternatively, in the Command window, enter db generate to open the dialog box.

Using the expression builder:

Data> Create or change variables> Create new variable, and then click Create, opening

Expression builder.

gen lgdp = log(gdp)

Generate gdp2 = gdp^2

Gen logx = log(x)

Using arithmetic operators:

+ addition

- subtraction (or create negative of value, or negation)

* multiplication

8
/ division

^ raise to a power

Gen gdpcons = gdp*cons

Dummy variable interaction term:

Lin-log and reciprocal form regression models:

Sort variable:

Sort x

Ordering variables:

Order var1 var2 var3

Replacing values in variable:

Replace var1 =1 if var1>= 15

Drop or keep a variable:

Data > Variable utilities > Keep or Drop.

9
Keep deletes all variables from the data file except the ones selected.

Drop gdp

keep gdp

Drop values:

Drop if gdp> 200

Drop in 420

USING STATA GRAPHICS

Graphics> Histogram

Histogram gdp, title (Histogram of gdp data)

Scatter diagrams

Graphics> two way graph (scatter, line, etc.)

twoway (scatter wage educ)

Scatter p1 q1

10
Fitted line scatter plot:

Twoway (scatter p1 q1) (lfit p1 q1)

Pie charts:

Graphics < pie chart

Graph pie var1 var2

T-Test for mean value of a variable:

Ttest gdp = 30

T-test to compare two means:

Ttest var1 = var2

Frequency and cumulative frequency:

Tab var

Pearsons chi-squared and Fisher exact test:

Statistics< summaries ,. < tables < two way tables < then click on Pearsons chi2 or fisher

test.

11
Tabulate var1 var2, chi2

Tabulate var1 var2, exact

Pearsons correlation coefficient:

Statistics< summaries ,.< summary and descriptive statistics< correlation and covariances<

define two or more variables.

Correlate var1 var2

Regression analysis:

Regress dependent var. independent var.

Compare q1 and guesq1:

Predict guesq1

List q1 guesq1

Fitted values and residuals:

Statistics> Postestimation > Predictions, residuals, etc.

predict yhat

12
predict ehat, residuals

Plotting the fitted regression line:

Graphics> Twoway graph> click on Accept

Computing an elasticity:

Postestimation > Marginal effects or elasticities.

mfx compute, eyex at (mean)

USING STATA TO OBTAIN PREDICTED VALUES:

Data editor> scroll down to last observation of independent variable and put value then press

Enter. Now use the command

Predict yhat1

Analysing the residuals:

Histogram ehat

Multiple regression:

Omitted variable bias,

13
Reg testscr str el_pct

Avplots or avplot el_pct

Logit and Probit regression:

Logit dep.var indep.var

Probit dep.var indep.var

Time series data analysis:

Tsset time

Tsline time oil gas coal

Scatter oil gas coal time

Twoway(scatter oil time)

Estimating and checking the linear relationship:

Reg oil time

predict yhat

predict ehat, residuals

14
construct a residual histogram:

histogram ehat

plot the fitted least squares line and the data scatter:

twoway (scatter yield time) (lfit yield time)

plot the residuals against time:

twoway connected ehat time, yline(0)

Residual diagnostic plots from menu bar:

Statistics >linear models and related > regression diagnostics > residual vs predictor plot,

dialog box opens, select time as independent variable, click on plot and select bar, click on

y-axis and select Reference lines. Add a reference line at y = 0, click Accept and then click

ok.

Diagnostics:

2.2 Checking Normality of Residuals

predict r, resid

kdensity r, normal

15
2.3 Checking Homoscedasticity of Residuals

A commonly used graphical method is to plot the residuals versus fitted

(predicted) values.

rvfplot, yline(0)

Breusch-Pagan test:

estat hettest

2.4 Checking for Multicollinearity

Vif

Preserve and restore

16

You might also like