You are on page 1of 69

SW388R6

Data Analysis
and Computers I

Simple Linear Regression

Slide 1

What is Regression Analysis


Different types of multiple regression
Standard multiple regression
Hierarchical multiple regression
Stepwise multiple regression
What is the level of Measurement Requiremennts

Collinearity Diagnostic(Multicollinearity)
Sample Size Requirements

SW388R6
Data Analysis
and Computers I

Regression Analysis

Slide 2

Regression analysis is the generic term for several


statistical tests for evaluating the relationship
between interval level dependent and independent
variables.
When we are considering the relationship between
one dependent variable and one independent
variable, we use Simple Linear Regression.
When we are considering the relationship between
one dependent variable and more than one
independent variable, we use Multiple Regression.
SPSS uses the same procedure for both Simple Linear
Regression and Multiple Regression, which adds some
complications to our interpretation.

SW388R6
Data Analysis
and Computers I
Slide 3

SW388R6
Data Analysis
and Computers I

Purpose of multiple regression

Slide 4

The purpose of multiple regression is to analyze the


relationship between metric or dichotomous
independent variables and a metric dependent
variable.
If there is a relationship, using the information in the
independent variables will improve our accuracy in
predicting values for the dependent variable.

SW388R6
Data Analysis
and Computers I

Types of multiple regression

Slide 5

There are three types of multiple regression, each of


which is designed to answer a different question:
Standard multiple regression is used to evaluate
the relationships between a set of independent
variables and a dependent variable.
Hierarchical, or sequential, regression is used to
examine the relationships between a set of
independent variables and a dependent variable,
after controlling for the effects of some other
independent variables on the dependent variable.
Stepwise, or statistical, regression is used to
identify the subset of independent variables that
has the strongest relationship to a dependent
variable.

SW388R6
Data Analysis
and Computers I

Standard multiple regression

Slide 6

In standard multiple regression, all of the


independent variables are entered into the
regression equation at the same time
Multiple R and R measure the strength of the
relationship between the set of independent
variables and the dependent variable. An F
test is used to determine if the relationship
can be generalized to the population
represented by the sample.
A t-test is used to evaluate the individual
relationship between each independent
variable and the dependent variable.

SW388R6
Data Analysis
and Computers I

Hierarchical multiple regression

Slide 7

In hierarchical multiple regression, the


independent variables are entered in two
stages.
In the first stage, the independent variables
that we want to control for are entered into
the regression. In the second stage, the
independent variables whose relationship we
want to examine after the controls are
entered.
A statistical test of the change in R from the
first stage is used to evaluate the importance
of the variables entered in the second stage.

SW388R6
Data Analysis
and Computers I

Stepwise multiple regression

Slide 8

Stepwise regression is designed to find the


most parsimonious set of predictors that are
most effective in predicting the dependent
variable.
Variables are added to the regression
equation one at a time, using the statistical
criterion of maximizing the R of the included
variables.
When none of the possible addition can make
a statistically significant improvement in R,
the analysis stops.

SW388R6
Data Analysis
and Computers I

Purpose of Simple Linear Regression - 1

Slide 9

The purpose of simple linear regression analysis is to


answer three questions that have been identified as
requirements for understanding the relationship
between an independent and a dependent variable:

Is there a relationship between the two variables?


How strong is the relationship (e.g. trivial, weak,
or strong; how much does it reduce error)?
What is the direction of the relationship (high
scores are predictive of high or low scores)?

SW388R6
Data Analysis
and Computers I

Purpose of Simple Linear Regression - 2

Slide 10

The question of the existence of a relationship


between the variables is answered by the hypothesis
test in regression analysis.
The strength of the relationship is based on
interpretation of the correlation coefficient, r (as
trivial, small, medium, large) and/or the coefficient
of determination, r-squared (as the proportion that
error was produced or accuracy was improved).

The question of the direction of the relationship is


based on the interpretation of the sign of the b
coefficient or the beta coefficient.

SW388R6
Data Analysis
and Computers I

Simple Linear Regression: Examples

Slide 11

There is a relationship between undergraduate GPAs


and graduate GPAs.
GRE scores are a useful predictor of graduate GPAs.
For social work students, the relationship between
GPA and future income enables us to predict future
earnings based on academic performance.

SW388R6
Data Analysis
and Computers I

Simple Linear Regression - 1

Slide 12

When we studied measures of central tendency, we


showed that the best measure of central tendency
for an interval level variable (assuming it is not badly
skewed) was the mean.
When we used the mean as the estimated score for
all cases in the distribution, the total error
computed for all of the cases was smaller than the
error would be for any other value used for the
estimate.
Error was measured as the deviation or difference
between the mean and the score for each case,
squared and summed.

SW388R6
Data Analysis
and Computers I

Simple Linear Regression - 2

Slide 13

Simple linear regression tests the existence of a


relationship between an independent and a
dependent variable by determining whether or not
there is a statistically significant reduction in total
error if we use the scores on the independent
variable to estimate the scores on the dependent
variable.
Regression analysis finds the equation or formula for
the straight line that minimizes the total error.

The regression equation takes the algebraic form for


a straight line: y = a + bx, where y is the dependent
variable, x is the independent variable, b is the slope
of the line, and a is the point at which the line
crosses the y axis.

SW388R6
Data Analysis
and Computers I

The Regression Equation

Slide 14

The regression equation is the algebraic formula


for the regression line, which states the
mathematical relationship between the
independent and the dependent variable.
We can use the regression line to estimate the
value of the dependent variable for any value of
the independent variable.
The stronger the relationship between the
independent and dependent variables, the closer
these estimates will come to the actual score that
each case had on the dependent variable.

SW388R6
Data Analysis
and Computers I

Components of the Regression Equation

Slide 15

The regression equation has two components.

The first component is a number called the y-intercept that


defines where the regression line crosses the vertical y axis.
The second component is called the slope of the line, and is
a number that multiplies the value of the independent
variable.

These two elements are combined in the general


form for the regression equation:

the estimated score on the dependent variable


= the y-intercept + the slope the score on the
independent variable

The Standard Form of the Regression


Equation

SW388R6
Data Analysis
and Computers I
Slide 16

The standard form for the regression equation or


formula is:
Y = a + bX

where

Y is the estimated score for the dependent variable


X is the score for the independent variable
b is the slope of the regression line, or the multiplier of X
a is the intercept, or the point on the vertical axis where
the regression line crosses the vertical y-axis

SW388R6
Data Analysis
and Computers I

Depicting the Regression Equation

Slide 17

The regression equation includes both the


y-intercept and the slope of the line. The
y-intercept is 1.0 and the slope is 0.5.
y = 1.0 + 0.5 x

5.0
4.5
4.0
3.5
3.0

y 2.5
2.0
1.5

The slope is the multiplier


of x. It is the amount of
change in y for a change
of one unit in x.

1.0
0.5
0.0

0
5.

5
4.

0
4.
5
3.

0
3.

5
2.

0
2.

5
1.
0
1.

5
0.

0
0.

The y-intercept is the


point on the vertical yaxis where the
regression line crosses
the axis, i.e. 1.0.

If x changes one unit


from 2.0 to 3.0, depicted
by the blue arrow, y will
change by 0.5 units, from
2.0 to 2.5 as depicted by
the red arrow.

SW388R6
Data Analysis
and Computers I

Deriving the Regression Equation - 1

Slide 18

In this plot, none of the points


fall on the regression line.
y = 0.8 + 0.6 x

The difference between the


actual value for the dependent
variable and the predicted
value for each point is shown
by the red lines. These
differences are called the
residuals, and represent the
errors between the actual and
predicted values.

5
4
3

y
2
1
0
0

SW388R6
Data Analysis
and Computers I

Deriving the Regression Equation - 2

Slide 19

The regression equation is


computed to minimize the
total amount of error in
predicting values for the
dependent variable. The
method for deriving the
equation is called the "method
of least squares," meaning that
the regression line minimizes
the sum of the squared
residuals, or errors between
actual and predicted values.

y = 0.8 + 0.6 x
5
4
3

y
2
1
0
0

Interpreting the Regression Equation:


the Intercept

SW388R6
Data Analysis
and Computers I
Slide 20

The intercept is the point on the vertical axis where


the regression line crosses the axis. It is the
predicted value for the dependent variable when the
independent variable has a value of zero.
This may or may not be useful information depending
on the context of the problem.

Interpreting the Regression Equation:


the Slope

SW388R6
Data Analysis
and Computers I
Slide 21

The slope is interpreted as the amount of change in


the predicted value of the dependent variable
associated with a one unit change in the value of the
independent variable.
If the slope has a negative sign, the direction of the
relationship is negative or inverse, meaning that the
scores on the two variables move in opposite
directions.
If the slope has a positive sign, the direction of the
relationship is positive or direct, meaning that the
scores on the two variables move in the same
direction.

Interpreting the Regression Equation:


the Slope equals 0

SW388R6
Data Analysis
and Computers I
Slide 22

If there is no relationship between two variables, the


slope of the regression line is zero and the regression
line is parallel to the horizontal axis.
A slope of zero means that the predicted value of
the dependent variable will not change, no matter
what value of the independent variable is used.

If there is no relationship, using the regression


equation to predict values of the dependent variable
is no improvement over using the mean of the
dependent variable.

SW388R6
Data Analysis
and Computers I

Simple Linear Regression: Hypotheses

Slide 23

The hypothesis tested in simple linear regression is


based on the slope or angle of the regression line.
Hypotheses:

Null: the slope of the regression line as measured by the b


coefficient = 0, i.e. there is no relationship
Research: the slope of the regression line as measured by
the b coefficient 0, i.e. there is a relationship

The b coefficient is tested with a two-tailed t-test

Decision:

Reject null hypothesis if pSPSS alpha

Simple Linear Regression:


Level of Measurement

SW388R6
Data Analysis
and Computers I
Slide 24

Dependent variable is interval level (ordinal with


caution)
Independent variable is interval level (ordinal with
caution) or dichotomous

Simple Linear Regression:


Sample Size Requirements - 1

SW388R6
Data Analysis
and Computers I
Slide 25

In previous semesters, the rule of thumb for required


sample size that I have used was a minimum of 5
cases for each independent variable included in the
analysis, and preferably 15 cases for each
independent variable. This rule was based on the
text Multivariate Data Analysis by Hair, Black,
Babin, Anderson, and Tatham.
Since attempting to incorporate more material on
power analysis, I find that rule to be inadequate
because we are unlikely to achieve statistical
significance in all but the simplest problems that
contain very strong relationships.

Simple Linear Regression:


Sample Size Requirements - 2

SW388R6
Data Analysis
and Computers I
Slide 26

In Using Multivariate Statistics, Tabachnick and


Fidell recommend that the required number of cases
should be the larger of the number of independent
variables x 8 + 50 or the number of independent
variables + 105.
Following this rule, simple linear regression with one
independent variable would require a sample of 106
cases.

SW388R6
Data Analysis
and Computers I

SAMPLE SIZE

Slide 27

Descriptiv e Statistics
Mean
HOW OFTEN R ATTENDS
RELIGIOUS SERVICES
STRENGTH OF
AFFILIATION
HOW OFTEN DOES R
PRAY

Std. Deviation

3.15

2.653

113

2.12

1.084

113

2.90

1.575

113

The minimum ratio of valid cases to


independent variables for multiple
regression is 5 to 1. With 113 valid
cases and 2 independent variables,
the ratio for this analysis is 56.5 to 1,
which satisfies the minimum
requirement.
In addition, the ratio of 56.5 to 1
satisfies the preferred ratio of 15 to 1.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 1

Slide 28

While we will base our problem solving on numeric


statistical results computed by SPSS, we can use a
scatterplot to demonstrate regression graphically.
We will use the variable "highest year of school
completed" [educ] as the independent variable and
"occupational prestige score" [prestg80] as the
dependent variable from the GSS2000R data set to
demonstrate the relationship graphically.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 2

Slide 29

A scatterplot of prestg80
by educ produced by
SPSS.

The dependent
variable is plotted
on the y-axis, or
the vertical axis.

The independent variable


is plotted on the x-axis,
or the horizontal axis.

The dots in the body of


the chart represented
the cases in the
distribution.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 3

Slide 30

I have drawn a
green horizontal line
through the mean of
prestg80 (44.17).

The differences between the


mean line and the dots (shown as
pink lines), are the deviations.

The sum of the squared


deviations is the measure of total
error when the mean is used as
the estimated score for each case.

NOTE: the plots were


created in SPSS by adding
features to the default plot.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 4

Slide 31

A regression line and


the regression
equation are added in
red to the
scatterplot.

The pink deviations from the


mean have been replaced with
the orange deviations from the
regression line. Deviations
between cases and the
regression line are called
residuals.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 5

Slide 32

The existence of a relationship


between the variables is supported
when the sum of the squared orange
residuals is significantly less than
the sum of the squared pink
deviations

Recall that both deviations and


residuals can be referred to as
errors. If there is a relationship,
we can characterize it as a
reduction in error.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 6

Slide 33

While it is difficult for us to


square and sum deviations and
residuals, SPSS regression output
provides us with the answer.

The squared sum of


the pink deviations
from the mean is the
Total Sum of Squares
in the ANOVA table
(49104.91).

The squared sum of


the orange residuals
from the regression
line is the Residual
Sum of Squares in the
ANOVA table
(37086.80).

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 7

Slide 34

The difference between the Total Sum of


Squares and the Residual Sum of Squares is
the Regression Sum of Squares.
The Regression Sum of Squares is the amount
of error that can be eliminated by using the
regression equation to estimate values of
prestg80 instead of the mean of prestg80.

The Regression Sum of


Squares in the ANOVA
table is 12018.11.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 8

Slide 35

We can compute the proportion or error


that was reduced by the regression by
dividing the Regression Sum of Squares
by the Total Sum of Squares:
12018.11 49104.91 = 0.245

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 9

Slide 36

The reduction in error that we


computed (0.245) is equal to the R
Square that SPSS provides in the
Model Summary table.
R is the coefficient of
determination which is usually
characterized as:
the proportion of variance in the
dependent variable explained by
the independent variable, or

In multiple regression, the symbol


for coefficient of determination is
R. In simple linear regression,
the symbol is r.

the reduction in error (or


increase in accuracy).

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 10

Slide 37

The correlation coefficient, Multiple R, is


the positive square root of R Square.
This can be misleading in Simple Linear
Regression when the correlation for the
relationship between the two variables,
r, can have a negative sign for an
inverse relationship. Aside from the
direction of the relationship, the value
of Multiple R will be the same as the
value for r in Simple Linear Regression.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 11

Slide 38

The ANOVA table tests the null hypothesis


that R = 0, i.e. the reduction in error
associated with the regression is zero.
The test of the this hypothesis is reported
for Multiple Regression as a test of an
overall relationship between the dependent
variable and the independent variables.
In Simple Linear Regression, we usually
report the hypothesis test that the slope = 0,
though we would reach the same conclusion
no matter which test we report.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 12

Slide 39

The test of the null hypothesis that the slope


of the regression line (b coefficient) = 0 is
reported in the Coefficients table.
Note that the significance of the t-test is that
same as the significance of the F-test.
Furthermore, in simple linear regression, the
value of the F-statistic (81.662) is the same
as the square of the t-statistic (9.037).

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 13

Slide 40

We can depict the hypothesis test visually.


The null hypothesis for simple linear
regression is that the slope of the
regression line is zero. The slope of the
green mean line is zero. The null
hypothesis means that the red regression
line would be the equal to the green line.

In this example, the


red regression line is
obviously different
from the green mean
line, which is verified
by the value of the
slope in the
regression equation
(2.36) and the t-test
of B.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 14

Slide 41

The regression equation is based on the


Unstandardized Coefficients (B) in the table
of Coefficients.
The B coefficient labeled (Constant) is the
intercept. The B coefficient for the variable
educ is the slope of the regression line.
The regression equation for the relationship
between prestg80 and educ is:
prestg80 = 12.928 + 2.359 x educ

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 15

Slide 42

The Standardized Coefficients (Beta) in the table of


Coefficients are the regression coefficients for the
relationship between the standardized dependent
variable (z-scores) and the standardized independent
variable (z-scores).
Since standardizing variables removes the unit of
measurement from the coefficients, we can compare
the Beta coefficients to interpret the relative
importance of each independent variable in Multiple
Regression.
In Simple Linear Regression, Beta will be equal to r, the
correlation coefficient. Multiple R, r, and Beta all have
the same numeric value, though Multiple R will be
positive even when r and Beta are negative.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis 16

Slide 43

The sign of the Beta coefficient, as well as


the sign of the B coefficient, tells us the
direction of the relationship.
If the coefficients are positive, the
relationship is characterized as direct or
positive, meaning that higher values of the
dependent variable are associated with
higher values of the independent
variables.
If the coefficients are negative, the
relationship is characterized as inverse or
negative, meaning that lower values of the
dependent variable are associated with
higher values of the independent
variables.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 17

Slide 44

The regression line represents the


estimated value of prestg80 for every value
of educ.
To obtain the estimate, we draw a line
perpendicular to the value on the x-axis to
the point where it intersects the regression
line. We then draw a line from the
intersection point to the y-axis. The
intersection point on the y-axis is the
estimated value for the dependent variable.

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 18

Slide 45

If we draw a vertical line from the educ


value of 5 to the regression line and then to
the horizontal axis, we see that the
estimated value for prestg80 is about 25.
We can compute the exact value by
substituting in the regression equation:
Prestg80 = 12.93 + 2.36 x 5 = 24.73

SW388R6
Data Analysis
and Computers I

Visualizing Regression Analysis - 19

Slide 46

If we draw a vertical line from the educ


value of 15 to the regression line and then
to the horizontal axis, we see that the
estimated value for prestg80 is about 50.
We can compute the exact value by
substituting in the regression equation:
Prestg80 = 12.93 + 2.36 x 15 = 48.33

Sample homework problem:


Simple linear regression

SW388R6
Data Analysis
and Computers I
Slide 47

Based on information from the data set GSS2000R, is the following


statement true, false, or an incorrect application of a statistic? Assume
that the assumptions of linear regression are satisfied. Use .05 for alpha.
Simple linear regression revealed a strong, positive relationship between
"highest academic degree" [degree] and "occupational prestige score"
[prestg80] ( = 0.546, t(250) = 10.30, p < .001). Survey respondents who
had higher academic degrees had more prestigious occupations. The
accuracy of predicting scores for the dependent variable "occupational
prestige score" will improve by approximately 30% if the prediction is
based on scores for the independent variable "highest academic degree"
(r = 0.298).
o
o
o
o

True
True with caution
False
Incorrect application of a statistic

This is the general framework


for the problems in the
homework assignment on simple
linear regression problems. The
description is similar to findings
one might state in a research
article.

Sample homework problem:


Data set and alpha

SW388R6
Data Analysis
and Computers I
Slide 48

Based on information from the data set GSS2000R, is the following


statement true, false, or an incorrect application of a statistic? Assume
that the assumptions of linear regression are satisfied. Use .05 for alpha.
The first paragraph identifies:

Simple linear regression revealed


a strong, positive relationship between
The data set to use, e.g. GSS2000R.Sav
The alpha
for the hypothesis
test score"
"highest academic degree" [degree]
andlevel
"occupational
prestige
[prestg80] ( = 0.546, t(250) = 10.30, p < .001). Survey respondents who
had higher academic degrees had more prestigious occupations. The
accuracy of predicting scores for the dependent variable "occupational
prestige score" will improve by approximately 30% if the prediction is
based on scores for the independent variable "highest academic degree"
(r = 0.298).
o
o
o
o

True
True with caution
False
Incorrect application of a statistic

Sample homework problem:


Specifications for the test - 1

SW388R6
Data Analysis
and Computers I
Slide 49

Based on information from the data set GSS2000R, is the following


statement true, false, or an incorrect application of a statistic? Assume
that the assumptions of linear regression are satisfied. Use .05 for alpha.
Simple linear regression revealed a strong, positive relationship between
"highest academic degree" [degree] and "occupational prestige score"
[prestg80] ( = 0.546, t(250) = 10.30, p < .001). Survey respondents who
had higher academic degrees had more prestigious occupations. The
accuracy of predicting scores for the dependent variable "occupational
The second paragraph states the finding that we
prestige score" will improve
by approximately
30% regression.
if the prediction
is
want to verify
with a simple linear
The
finding identifies:
based on scores for the
independent variable "highest academic degree"
The independent variable
(r = 0.298).
o
o
o
o

The dependent variable


The strength of the relationship
The direction of the relationship

True
True with caution
False
Incorrect application of a statistic

Sample homework problem:


Specifications for the test - 2

SW388R6
Data Analysis
and Computers I
Slide 50

Based on information from the data set GSS2000R, is the following


The second
paragraph
also
additional
statement
true,
false, or
anstates
incorrect
application of a statistic? Assume
statements the can be included in findings;
that the assumptions of linear regression are satisfied. Use .05 for alpha.
A interpretative statement about direction of
the relationship
The proportional reduction in error (PRE)
interpretation
of the
coefficient
Simple linear
regression
revealed
a of
strong, positive
determination r2

relationship between
"highest academic degree" [degree] and "occupational prestige score"
[prestg80] ( = 0.546, t(250) = 10.30, p < .001). Survey respondents who
had higher academic degrees had more prestigious occupations. The
accuracy of predicting scores for the dependent variable "occupational
prestige score" will improve by approximately 30% if the prediction is
based on scores for the independent variable "highest academic degree"
(r = 0.298).
o
o
o
o

True
True with caution
False
Incorrect application of a statistic

Sample homework problem:


Simple linear regression

SW388R6
Data Analysis
and Computers I
Slide 51

answer toisa the


problem
will be
Based
on information
from the data set The
GSS2000R,
following
The answer
will be
True
with
caution
if
the
analysis
True if all true,
parts of
the or an incorrect application of a statistic? Assume
statement
false,
supports the finding in the
finding in the problem
statement, but one or
statement
are
that
the assumptions
of linear regressionproblem
areofsatisfied.
Use
for alpha.
both
the variables
is .05
ordinal
correct.

level.

Simple linear regression revealed a strong, positive relationship between


"highest academic degree" [degree] and "occupational prestige score"
[prestg80] ( = 0.546, t(250) = 10.30, p < .001). Survey respondents who
had higher academic degrees had more prestigious occupations. The
accuracy of predicting scores for the dependent variable "occupational
prestige score" will improve by approximately 30% if the prediction is
based on scores for the independent variable "highest academic degree"
(r = 0.298).
The answer will be
o
o
o
o

False if any part of


the finding in the
problem
statement is not
correct.

True
True with caution
False
Incorrect application of a statistic

The answer to a problem


will Incorrect
application of a statistic
if the level of
measurement or sample
size requirement is
violated.

SW388R6
Data Analysis
and Computers I
Slide 52

Solving the problem with SPSS:


Level of measurement
Simple linear regression
requires that the dependent
variable be interval and the
independent variable be interval
or dichotomous. "Occupational
prestige score" [prestg80] is
interval level, satisfying the
requirement for the dependent
variable. "Highest academic
degree" [degree] is ordinal
level. However, we will follow
the common convention of
using ordinal variables with
interval level statistics, adding a
caution to any true findings.

SW388R6
Data Analysis
and Computers I
Slide 53

Solving the problem with SPSS:


Simple linear regression- 1

Before we can address


the other issues involved
in solving the problem,
we need to generate the
SPSS output.

Select Regression
> Linear from the
Analyze menu.

SW388R6
Data Analysis
and Computers I
Slide 54

Solving the problem with SPSS:


Simple linear regression- 2
First, move the
dependent variable
prestg80 to the
Dependent list
box.

The problem states that:


Simple linear regression
revealed a strong, positive
relationship between "highest
academic degree" [degree] and
"occupational prestige score"
[prestg80] ( = 0.546, t(250) =
10.30, p < .001).
We first enter the independent
and dependent variable in the
dialog box Unless the problem
statement clearly specifies
which variable is having an
effect on the other, we treat the
variable mentioned first as the
independent variable and the
one mentioned second as the
dependent variable.

Second, move the


independent variable
degree to the
Independents list box.

Third, click on the


Statistics button to add
the additional statistics.

SW388R6
Data Analysis
and Computers I
Slide 55

Solving the problem with SPSS:


Simple linear regression- 3

First, in addition to
the SPSS defaults,
we add the check
box for Descriptives
statistics.

Second, click on the


Continue button to
close the dialog box.

SW388R6
Data Analysis
and Computers I
Slide 56

Solving the problem with SPSS:


Simple linear regression- 4

When we return to the


Linear Regression dialog
box, we click on OK to
obtain the output.

SW388R6
Data Analysis
and Computers I
Slide 57

Solving the problem with SPSS:


Sample size
Using the rule of thumb from
Tabachnick and Fidell that
the required number of
cases should be the larger of
the number of independent
variables x 8 + 50 or the
number of independent
variables + 105, simple
linear regression requires
106 cases. With 252 valid
cases, the sample size
requirement is satisfied.

NOTE: this sample size requirement is


much larger than what I have used in the
past. Including the issue of power
analysis indicates that previous guidelines
would be substantially under-powered.

SW388R6
Data Analysis
and Computers I
Slide 58

Solving the problem with SPSS:


Interpreting the relationship - 1

The first sentence in the


finding states that:
Simple linear regression
revealed a strong, positive
relationship between
"highest academic degree"
[degree] and "occupational
prestige score" [prestg80]
( = 0.546, t(250) = 10.30,
p < .001).
From the table of Coefficients, we see
that the Beta () of 0.546 stated in
the finding is correct, as is the value
for the t-test of the b-coefficient
(10.303) and the probability of the tstatistic (<.001)

SW388R6
Data Analysis
and Computers I
Slide 59

Solving the problem with SPSS:


Interpreting the relationship - 2

The first sentence in the


finding states that:
Simple linear regression
revealed a strong, positive
relationship between
"highest academic degree"
[degree] and "occupational
prestige score" [prestg80]
( = 0.546, t(250) = 10.30,
p < .001).

SPSS does not provide the degrees of


freedom for the t-test. However, it is
easily calculated as the number of
cases in the sample the number of
predictors 1, or 252 1 -1 = 250
for this problem.

Since the probability of the test statistic (t =


10.30, p < .001) was less than or equal to alpha
(.05) the relationship between "highest academic
degree" [degree] and "occupational prestige
score" [prestg80] was statistically significant.

SW388R6
Data Analysis
and Computers I
Slide 60

Solving the problem with SPSS:


Interpreting the relationship - 3
The relationship was correctly characterized
as strong. Using Cohen 's criteria for
characterizing the strength of relationships,
the correlation coefficient (r = 0.546) was
correctly interpreted as a large or strong
relationship.
Cohens criteria:

r < .1
.1 r
.3 r
r .5

=
<
<
=

Trivial
.3 = Small
.5 = Medium or moderate
Large

In multiple regression, the Multiple R will always be


positive because it represents the strength of the
relationship for whatever number of independent variables
is included.
The r for individual relationships has the same value as
Multiple R in simple linear regression, but may be positive
or negative depending on the direction of the relationship.

SW388R6
Data Analysis
and Computers I
Slide 61

Solving the problem with SPSS:


Interpreting the relationship - 4

The relationship was correctly characterized


positive. The sign of the Beta coefficient, as
well as the B coefficient is positive or
direct, implying that the numeric values of
both variables move in the same direction
high with high and low with low.

The first sentence in the finding that


states that Simple linear regression
revealed a strong, positive relationship
between "highest academic degree"
[degree] and "occupational prestige
score" [prestg80] ( = 0.546, t(250) =
10.30, p < .001) is correct.

SW388R6
Data Analysis
and Computers I
Slide 62

Solving the problem with SPSS:


Interpreting the relationship - 5

The second sentence in the


finding states that:
Survey respondents who
had higher academic
degrees had more
prestigious occupations.

The sign of beta ( = 0.546) was positive


supporting the statement about the
direction of the relationship.
Since the sign of the beta coefficient ( =
0.546) was positive, the relationship
between the variables is direct. Higher
scores for the independent variable
"highest academic degree" [degree] are
associated with scores on the dependent
variable "occupational prestige score"
[prestg80].

The statement that survey respondents


who had higher academic degrees had
more prestigious occupation" is correct.

SW388R6
Data Analysis
and Computers I
Slide 63

Solving the problem with SPSS:


Interpreting the relationship - 6
The third sentence in the finding
states that:
The accuracy of predicting scores
for the dependent variable
"occupational prestige score" will
improve by approximately 30% if
the prediction is based on scores
for the independent variable
"highest academic degree" (r =
0.298).

Using the proportional reduction in error


interpretation of the coefficient of
interpretation, r, the statement that "the
accuracy of predicting scores for the
dependent variable "occupational prestige
score" will improve by approximately 30% if
the prediction is based on scores for the
independent variable "highest academic
degree" (r = 0.298)" is correct.

This statement is also true, so


the answer to the question is
True with caution. The caution
is for because the indepdnent
variable degree is ordinal level.

SW388R6
Data Analysis
and Computers I
Slide 64

Logic for simple linear regression:


Level of measurement

Measurement
level of
independent
variable?
Nominal

Interval/Ordinal
/Dichotomous

Inappropriate
application of
a statistic

Measurement
level of
dependent
variable?
Interval/ordinal

Strictly speaking, the


test requires an interval
level variable. We will
allow ordinal level
variables with a caution.

Nominal/
Dichotomous
Inappropriate
application of
a statistic

SW388R6
Data Analysis
and Computers I
Slide 65

Logic for simple linear regression:


Sample size requirement
Compute linear
regression including
descriptive statistics

Valid cases
satisfies
computed
requirement?

The sample size requirement is


the larger of :
the number of independent
variables x 8 + 50
the number of independent
variables + 105

Yes

No

Inappropriate
application of
a statistic

SW388R6
Data Analysis
and Computers I
Slide 66

Logic for simple linear regression:


Significant, non-trivial relationship
There are other
assumptions that
we will assume we
satisfy for this
weeks
assignment.

Probability for t-test


of B coefficient less
than or equal to
alpha?

Yes

Effect size (Multiple R) is


not trivial by Cohens scale,
i.e. equal to or larger than
0.10?
In simple linear regression,
r and Beta have the same
numeric value as Multiple
R, but may have a different
sign. The are also
measures of effect size.

Yes

No

False

No
False

SW388R6
Data Analysis
and Computers I
Slide 67

Logic for simple linear regression:


Strength of relationship

Strength of relationship
(effect size) correctly
interpreted based Multiple R?
No
Yes

False

SW388R6
Data Analysis
and Computers I
Slide 68

Logic for simple linear regression:


Direction of the relationship

Direction of relationship
correctly interpreted
based on B or Beta
coefficient?

Yes

No
False

SW388R6
Data Analysis
and Computers I
Slide 69

Logic for simple linear regression:


Proportional reduction in error

Reduction in error
correctly interpreted
based Multiple R?
No

Yes

False

The statistics in the SPSS


output match all of the
statistics cited in the problem?
No

Add caution if dependent


or independent variable
is ordinal.

Yes

True

False

You might also like