You are on page 1of 24

2-1

Session 2

Reading a Raw Data FiIe

page

Data formats 2-2
Variable Names 2-3
Reading a raw data file into SPSS 2-3
Adding Labels 2-11
Variable Labels 2-11
Value Labels 2-11
Missing Values 2-14
Looking at your data using value labels 2-15
Changing data values using Value Labels 2-16
Default options in SPSS 2-16
Practical session 2 2-18
Additional Exercise Reading more than 1 record
of data per case 2-21

2-2
SESSION 2: Reading a Raw Data FiIe

Data formats

n the previous exercise you retrieved and created SPSS Data Files.
These are special files that only SPSS can read or create. n many
instances you may want SPSS to read a raw data file that has been
created by yourself using a wordprocessor, spreadsheet or by a
commercial data processing company typing in numbers from a survey
questionnaire. Raw data files, also called ASC files, can be arranged in
several ways. For instance, if you have only collected information for a
few variables for each person, the data could be written to the data file so
that a new line is started for each person. You could also decide that
each variable will occupy the same column in the data file.

1 2 3 4 5 6 7 8 9 Column numbers
d age sex v1 v2 v3 Variable names
1 2 2 M 4 2 1
2 4 0 F 2 3 1 Filename:
3 2 7 M 3 3 2 exampIe.dat
4 3 5 M 2 2 4
5 2 4 F 1 2 2
Table 2.1

This data file (shaded in Table 2.1 above) is in fixed format. Each
variable is in its own column(s) and together they take up a total of 9
columns. The maximum that SPSS can read is 1024 columns. However,
it is more normal to go on to the next line after column 80, which is the
width of the screen. Once again, each variable must be in the same
location for each case. So if the variable V101 is in column 5 on the
second record of data for person 1, then it must also be in that location for
the next and subsequent cases.

With 300 variables we could have 6 records to an individual, for e.g.

CASE1.1 . V001, V002 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - V80, V81
CASE1.2 . V82, - - - - - - - - - V101 - - - - - - - - - - - - - - - - - - - - - - - - - - -
CASE1.3 .- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CASE1.4 .- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CASE1.5 .- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CASE1.6 .- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - V300
CASE2.1 . V001, V002 - - - - - - - - - - - - - - - - - - - - - - - - - - - - V80, V81
CASE2.2 . V82,- - - - - - - - - - V101 - - - - - - - - - - - - - - - - - - - - - - - - - - -
...etc.

2-3
VariabIe names
You assign a name to each of the variables in your data and the following
conventions must be followed:
The length of the name cannot exceed 64 bytes (which typically means
64 characters). This is a change from earlier versions of SPSS, where
the maximum allowed was 8 characters.
The name must begin with a letter. The remaining characters can be
any letter, any digit, a period, or the symbols @, #, _, or $.
Variable names cannot end with a period.
Variable names that end with an underscore should be avoided (to
avoid conflict with variables automatically created by some
procedures).
Blanks and special characters (for example, !, ?, ', and *) cannot be
used.
Each variable name must be unique; duplication is not allowed.
Reserved keywords cannot be used as variable names. Reserved
keywords are: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO,
WTH.
Variable names can be defined with any mixture of upper- and
lowercase characters, and case is preserved for display purposes.

For more information, see the 'Variable names, rules' page of the Help
system.

Reading a raw data fiIe into SPSS
Start SPSS in the normal way and then select...
File
Read Text Data

The Open FiIe dialogue box appears for you to select the ASC data file
you wish to read into SPSS.

Raw data is usually in a file with either .txt or .dat as the suffix. The data
we want is H:\My Documents\spss data\ExampIe.dat, so we need to
change the type of file SPSS is looking for from .txt to .dat (see Figure
2.1)

2-4

Figure 2.1

An alternative to the above is to select

File
Open
Data.

And to change the type of file from .sav to .dat.

We select the file H:\My Documents\spss data\ExampIe.dat, and click
Open. This starts the Text Import Wizard, which guides you through 6
stages.

n the first step (Figure 2.2) we are asked if our text file has a predefined
format. n this case we answer No, and click on Next.

2-5

Figure 2.2

The second step (Figure 2.3) asks for the form of our data: DeIimited
(that is, variables are separated by spaces or commas for example) or
Fixed Width. Our data is fixed width (or fixed format), so we change the
selection.

Figure 2.3
2-6
Some data files have the variable names listed at the beginning of the file;
often the user has to edit the file to remove them before reading in the
data. However, SPSS asks whether this is the case, and can adjust the
file so that only the data is read in. n this case, there are no variable
names included, so we select No.

Click the Next button to go to the third step (Figure 2.4).


Figure 2.4

We now need to declare what line our data begins on (in this data set, line
1), and how many lines of data, or records, there are for each case. Here,
we have only one record per case.

We want to import, or read in, all the data we have, but you can choose to
read in only a sample, either a set number of cases starting at the
beginning, or a certain percentage, randomly chosen.

For our data, we want to keep all the default entries at this step (data
begins on line 1, 1 line per case, import all cases), so click on Next.

2-7

Figure 2.5

We use the fourth step (Figure 2.5) to define which column each variable
lies in. SPSS may already have made some sensible decisions (which
you can change), usually when there is a space in a column in every line
of the data. The variables we want to read in are aligned as follows:

id begins in column 1 and ends in column 1;
age begins in column 3 and ends in column 4;
sex begins in column 6 and ends in column 6;
var1 begins in column 7 and ends in column 7;
var2 begins in column 8 and ends in column 8;
var3 begins in column 9 and ends in column 9.

We need to place a vertical VariabIe Break Line before the start of each
variable. We do this by clicking the cursor in between the columns where
we want the VariabIe Break Line (Figure 2.6)


Figure 2.6
2-8
Where there are spaces (for example, columns 2 and 5 are blank), we can
place the VariabIe Break Line before or after the space; either way, the
space will be ignored.

At this stage, if you have more than one record per case, you have to
repeat step 4 for each line of data. You move between the lines of data
using the drop down menu in Figure 2.7, which only appears when you
have 2 or more records per case. After Practical session 2, there is an
additional exercise which will take you through the procedure for reading
in a data set with 2 records per case.


Figure 2.7

When you have defined the position of all your variables (on all records),
click on Next to go to step 5 (Figure 2.8)


Figure 2.8

2-9
At this stage, you can accept the default names and variable types that
SPSS assigns, and change them if required later in the VariabIe View
window. However, we can change them here.

Notice that id, age and var1 to var3 are all numeric variables, while sex,
entered as M and F is an alphanumeric or string variable. Let us
assume that variables var1, var2 and var3 record the respondent's views
about whether the government should do more about crime, heaIth and
education, respectively.

Click on the head of the first column in the Data Preview, labelled V1 to
select the variable (see the position of the cursor in Figure 2.8), and the
pre-assigned choices for the VariabIe name and Data format will appear.
Change the name to id and ensure that the Data format is set to Numeric
(see Figure 2.9).


Figure 2.9

Move across the columns, changing the VariabIe names and, where
necessary, the Data format.

TIP: Notice that Do Not Import is a choice for Data format. Choosing
this enables you to ignore a variable which you are not interested in. f
there are several variables in adjacent columns in your ASC data which
you want to ignore, you don't need to separate them with VariabIe Line
Breaks in step 4; you can leave them as one large variable, and then
choose Do Not Import in step 5 for this one variable.

To move to the final sixth stage, click on Next.

2-10

Figure 2.10

Step 6 (Figure 2.10) simply asks whether you wish to save the choices
you have made to read in your data. You are given 2 ways to save; as a
TextWizard Predefined Format (.tpf) file, which can only be used in the
Text Import Wizard, or to paste the commands as an SPSS syntax
(.sps) file, which can be used in any version of SPSS for Windows you
need to save this separately. You can say Yes to either, both or neither of
these questions. Using Paste means that the commands won't be
executed until you select them in the syntax file and choose Run.

Clicking on Finish reads the data into the Data Editor (Figure 2.11).


Figure 2.11
2-11
Adding LabeIs

After SPSS has read the data for each of the variables into the Data
Editor Window, labels can be defined to give meaningful descriptions for
them. Obviously, a variable named sex does not really need a label, but
one named crime clearly does so that anyone reading the SPSS output
can understand what information is stored as crime.
There are two kinds of label that can be applied to each variable,
variable labels and value labels. Variable labels expand on the variable
name e.g. they tell you what question was asked. Value labels tell you
what the code given to each response means. So "Sex of respondent
would be the variable label for sex and "Male and "Female would be the
value labels given to the codes M and F respectively.

VariabIe LabeIs

To label a variable, go to the VariabIe View and find the variable in the list.
A VariabIe LabeI for any variable can be typed into the cell in the LabeI
column e.g. for crime in Figure 2.12:
Figure 2.12

VaIue LabeIs

The VaIues column will initially have the entry None. To add VaIue
LabeIs, select the VaIues cell, and click on the button on the right to
open the VaIue LabeIs dialogue box (see Figure 2.13):

Figure 2.13

2-12
Each value is typed into the VaIue box, followed by its label in the VaIue
LabeI box. Then, the Add button is clicked. Any errors can be corrected
by highlighting the label in the big box, making it the current entry,
modifying it, and clicking on Change. Highlighting a label and clicking
Remove will delete the label.


Figure 2.14

When you have finished adding the VaIue LabeIs, click on OK.

f you have several variables which share the same value labels, these
can be copied (along with a range of other properties, such as the format
etc.) from one variable to another. n this data, crime, heaIth and
education all follow the same coding scheme.
Select
Data
Copy Data Properties.
This starts the 5 step wizard. Firstly, declare whether the properties you
wish to copy belong to a variable in another (external) file, or to a variable
in the current active dataset:


Figure 2.15
2-13
Choose the variable whose properties you wish to copy:

Figure 2.16

Then select the variable(s) which are to have the copied properties (hold
down the CtrI key to select more than one variable)

Figure 2.17

Declare which of the properties you wish to copy:

Figure 2.18
The final step allows you to either execute the command, or paste it into a
syntax file.
2-14
After doing this for heaIth and education, all three of the variables will
have the 5 value labels we defined earlier:

Figure 2.19

Missing VaIues

Codes representing information not collected or not applicable (e.g. the
code 99 for age, meaning 'No response') need to be specified as missing.
This will cause SPSS to omit respondents with these values from
calculations (it would not be correct to calculate the average age of the
sample including 99 as a valid value since that value does not mean that
the respondent is 99 years old, but that no information on age was
collected for that individual).

To add Missing VaIues, for a variable, select the Missing cell for the
variable in the VariabIe View, and click on the button. The Missing
VaIues dialogue box appears (see Figure 2.20).


Figure 2.20

You can either specify up to three separate missing values or a range of
values. E.g.
The values missing may be 0 for 'Not applicable', 8 for 'Don't know'
and 9 for 'No response'.
You may wish to use the missing values to exclude all those over 65
from an analysis and you could specify that 65 through to the highest
value is missing.

2-15
Looking at your data using vaIue IabeIs

f you select

View
Value Labels
Or click on the button (2
nd
from the right on the toolbar), the screen
changes to look like Figure 2.21:


Figure 2.21

The crime column isn't wide enough to be able to read the labels properly,
and the sex column is also cramped. To increase the width, we can
increase the value in the CoIumns cell in VariabIe View. Alternatively, we
can move the cursor over the line between the column headings crime and
heaIth until it changes shape and looks like Figure 2.22:


Figure 2.22

Holding down the left mouse button while you move the mouse to the right
or the left will stretch or shrink the column width.

2-16
Changing data vaIues using VaIue LabeIs

While you are viewing the data using the Value Labels view, you can
change data values in a different way from typing the new value into the
Data View window.

Let us suppose that the 5
th
case in our small data set shouldn't have been
coded 1 (Strongly agree) for crime, but 2 (Agree). Click on that cell, and a
down arrow appears which gives a drop down menu with the value labels
(see Figure 2.23).


Figure 2.23

You can change the data entry to Agree, and when the Value Label view is
removed, the cell will now have the value 2.

DefauIt options in SPSS
Some of the default options in SPSS are a little unusual, so we are going
to check on how they are set and if appropriate change them, Go to the
Edit menu,

Edit
Options

Go to the GeneraI tab, on this display find the VariabIes Lists sub box
and if it is set to DispIay IabeIs click on the DispIay names item.

2-17

Figure 2.24

Then click on the Pivot TabIes tab. f the Adjust coIumn widths for sub
box is set on LabeIs onIy, click on the LabeIs and data item, see
Figure 2.25.


Figure 2.25
2-18

And because like to see all the information about the variables in my
output, (in the first instance) also changed the items in the Output
LabeIs tab as follows.


Figure 2.26

PracticaI session 2

n this exercise you will be using SPSS for Windows to read a raw data file
and define VARABLE LABELS, VALUE LABELS and MSSNG VALUES
for each variable.

You will be using a very small set of data taken form the 1987 Social
Attitude Survey. This survey is carried out annually by The Social and
Community Planning Research Unit. We have extracted the responses of
25 people to five questions, from the survey. The data from these
questions will be put into five SPSS variables. At the end of this document
is a 'coding sheet' which shows details about the questions, Figure 2.27
shows what the ASC file H:\My Documents\spss data\SampIe.dat
looks like.



2-19

1 5732 5
2 3822 1
2 2534 1
2 5122 1
2 3123 2
2 9999 5
2 7133 1
1 3832 3
2 3732 4
2 2334 1
1 3522 4
1 7133 3
1 3322 3
2 4422 1
2 3922 3
1 3932 2
1 3132 3
2 5122 3
1 5122 3
2 5132 1
1 6833 8
2 2932 3
1 2121 3
1 4422 2
1 4522 3

Figure 2.27

Reading the raw data fiIe; defining variabIe IabeIs, vaIue IabeIs and
missing vaIues

Using the coding sheet on the last page of this handout, get SPSS to
read the five variables in the raw data file, sampIe.dat.

Add variable labels, value labels and missing values for all the variables.

Obtaining Frequencies

Finally, select .

Analyze
Descriptive Statistics
Frequencies.

and select all variables. Click on OK.

2-20
Check that all your variables have been labeled correctly and have missing
values by looking through the output. f you wish, print the output.

Saving your fiIes

With the Data Editor Window active, click on

File
Save As
Or use the disk button .

Save your SPSS data file to your networked drive. Give it the name
mysampIe.sav. Generally SPSS data files have the suffix .sav. These
files must be distinguished from raw data files, which have the suffix .dat.

With the output window active, select ...

File
Save As
Or use the disk button .

Save your output file with the name mysampIe.spo.

Exit SPSS

Select
File
Exit

2-21
AdditionaI Exercise - Reading more than 1 record of data
per case

The small ASC data set we read into SPSS in this session has been
rearranged so that the data for each case is over 2 lines. The new ASC
data file is in H:\My Documents\spss data\ExampIe2.dat, and is shown
below:
11 22 M
12 421
21 40 F
22 231
31 27 M
32 332
41 35 M
42 224
51 24 F
52 122

id is in column 1 on every line
record (the line or record number) is in column 2 on every line

n some raw data sets you won't always have a variable for the record
number, and the case identification variable might not be on each record.

age is on record 1 in columns 4-5
sex (a string variable) is on record 1 in column 7

var1, var2 and var3 are on record 2 in columns 4, 5 and 6

To start reading in the data, select

File
Read Text Data

to start the Text Import Wizard.

2-22
Change the file type to .dat
Choose H:\My Documents\spss data\ExampIe2.dat, and click Open.

Step 1
No to the 'predefined format' question.
Click Next.

Step 2
Variables are arranged as Fixed width.
No to the 'variable names included' question.
Click Next.

Step 3
First case begins on line 1.
Change the number of lines representing a case to 2.
mport all cases.
Click Next.

Step 4
With 1 Iine of 2 showing in the Line within case box, add VariabIe Break
Lines for record 1 to separate the variables id (col. 1), record (col. 2),
age (cols. 4-5), sex (col. 7) (see Figure 2.28).


Figure 2.28

Change the Line within case selection to 2 Iine of 2 (see Figure 2.29).

2-23

Figure 2.29

Add VariabIe Break Lines for record 2 to separate variables id (col. 1),
record (col. 2), var1 (col. 4), var2 (col. 5), var3 (col. 6) (see Figure 2.30).


Figure 2.30

Click Next.

Step 5
n the Data preview, click on the V1 column. Change VariabIe name to
id. Data format is numeric.
Select the V2 column. This is the record number, which we don't need to
read in (it was only used to keep track of the line number in the ASC
data). Change Data format to Do Not Import.
Select the V3 column. VariabIe name is age. Data format is numeric.
Select the V4 column. VariabIe name is sex. Data format is string.
Select the V5 column. This is a copy of the variable id. We don't need to
read it in again, so change Data format to Do Not Import.
Select the V6 column. Like V2, this is just the record number; choose Do
Not Import.
Select the V7, V8 and V9 columns in turn. The VariabIe names are var1,
var2 and var3 respectively; the Data format is numeric.
Click Next.

2-24
Step 6
(Save the formatting and/or paste the syntax if you wish)
Click Finish.


The data that now appears in your Data Editor window should be the
same as when we read in H:\My Documents\spss data\ExampIe.dat,
which had only 1 record per case (see Figure 2.11).


British SociaI Attitudes Survey, 1987 : Coding Sheet for the subset of
variabIes in SAMPLE.DAT

VariabIe
IabeI

VariabIe
name
CoIumns in
data fiIe
VaIue IabeIs Codes
Q1 Respondent's
sex
RSEX 2 Male
Female
1
2
Q2 Respondent's
age
RAGE 4-5 (Code is age in
years)
No response


99
Q3 Which income
group would you
place yourself?
SRNC 6 High income
Middle income
Low income
No response
1
2
3
9
Q4 How well are you
managing on
your income?
HNCDFF 7 Very well
Quite well
Not very well
Not at all well
Don't know
No response
1
2
3
4
8
9
Q5 Respondent's
social class
RRGCLASS 11 Professional
ntermediate
Skilled
Semi-skilled
Unskilled
Unable to classify
Not applicable
1
2
3
4
5
8

0


Treat the responses Don't know, No response, Not appIicabIe, UnabIe to cIassify as
missing values for all of the variables.

You might also like