Lecture Linear Regression Analysis1212

SW388R6
Data Analysis
and Computers I
Simple Linear Regression
Slide 1
What is Regression Analysis

Different types of multiple regression
Standard multiple regression
Hierarchical multiple regression
Stepwise multiple regression
What is the level of Measurement Requiremennts
Collinearity Diagnostic(Multicollinearity)
Sample Size Requirements
SW388R6
Data Analysis
and Computers I
Regression Analysis
Slide 2
Regression analysis is the generic term for several

statistical tests for evaluating the relationship
between interval level dependent and independent
variables.
When we are considering the relationship between
one dependent variable and one independent
variable, we use Simple Linear Regression.
When we are considering the relationship between
one dependent variable and more than one
independent variable, we use Multiple Regression.
SPSS uses the same procedure for both Simple Linear
Regression and Multiple Regression, which adds some
complications to our interpretation.
SW388R6
Data Analysis
and Computers I
Slide 3
SW388R6
Data Analysis
and Computers I
Purpose of multiple regression
Slide 4
The purpose of multiple regression is to analyze the

relationship between metric or dichotomous
independent variables and a metric dependent
variable.
If there is a relationship, using the information in the
independent variables will improve our accuracy in
predicting values for the dependent variable.
SW388R6
Data Analysis
and Computers I
Types of multiple regression
Slide 5
There are three types of multiple regression, each of

which is designed to answer a different question:
Standard multiple regression is used to evaluate
the relationships between a set of independent
variables and a dependent variable.
Hierarchical, or sequential, regression is used to
examine the relationships between a set of
independent variables and a dependent variable,
after controlling for the effects of some other
independent variables on the dependent variable.
Stepwise, or statistical, regression is used to
identify the subset of independent variables that
has the strongest relationship to a dependent
variable.
SW388R6
Data Analysis
and Computers I
Standard multiple regression
Slide 6
In standard multiple regression, all of the

independent variables are entered into the
regression equation at the same time
Multiple R and R measure the strength of the
relationship between the set of independent
variables and the dependent variable. An F
test is used to determine if the relationship
can be generalized to the population
represented by the sample.
A t-test is used to evaluate the individual
relationship between each independent
variable and the dependent variable.
SW388R6
Data Analysis
and Computers I
Hierarchical multiple regression
Slide 7
In hierarchical multiple regression, the

independent variables are entered in two
stages.
In the first stage, the independent variables
that we want to control for are entered into
the regression. In the second stage, the
independent variables whose relationship we
want to examine after the controls are
entered.
A statistical test of the change in R from the
first stage is used to evaluate the importance
of the variables entered in the second stage.
SW388R6
Data Analysis
and Computers I
Stepwise multiple regression
Slide 8
Stepwise regression is designed to find the

most parsimonious set of predictors that are
most effective in predicting the dependent
variable.
Variables are added to the regression
equation one at a time, using the statistical
criterion of maximizing the R of the included
variables.
When none of the possible addition can make
a statistically significant improvement in R,
the analysis stops.
SW388R6
Data Analysis
and Computers I
Purpose of Simple Linear Regression - 1
Slide 9
The purpose of simple linear regression analysis is to

answer three questions that have been identified as
requirements for understanding the relationship
between an independent and a dependent variable:
Is there a relationship between the two variables?

How strong is the relationship (e.g. trivial, weak,
or strong; how much does it reduce error)?
What is the direction of the relationship (high
scores are predictive of high or low scores)?
SW388R6
Data Analysis
and Computers I
Purpose of Simple Linear Regression - 2
Slide 10
The question of the existence of a relationship

between the variables is answered by the hypothesis
test in regression analysis.
The strength of the relationship is based on
interpretation of the correlation coefficient, r (as
trivial, small, medium, large) and/or the coefficient
of determination, r-squared (as the proportion that
error was produced or accuracy was improved).
The question of the direction of the relationship is

based on the interpretation of the sign of the b
coefficient or the beta coefficient.
SW388R6
Data Analysis
and Computers I
Simple Linear Regression: Examples
Slide 11
There is a relationship between undergraduate GPAs

and graduate GPAs.
GRE scores are a useful predictor of graduate GPAs.
For social work students, the relationship between
GPA and future income enables us to predict future
earnings based on academic performance.
SW388R6
Data Analysis
and Computers I
Simple Linear Regression - 1
Slide 12
When we studied measures of central tendency, we

showed that the best measure of central tendency
for an interval level variable (assuming it is not badly
skewed) was the mean.
When we used the mean as the estimated score for
all cases in the distribution, the total error
computed for all of the cases was smaller than the
error would be for any other value used for the
estimate.
Error was measured as the deviation or difference
between the mean and the score for each case,
squared and summed.
SW388R6
Data Analysis
and Computers I
Simple Linear Regression - 2
Slide 13
Simple linear regression tests the existence of a

relationship between an independent and a
dependent variable by determining whether or not
there is a statistically significant reduction in total
error if we use the scores on the independent
variable to estimate the scores on the dependent
variable.
Regression analysis finds the equation or formula for
the straight line that minimizes the total error.
The regression equation takes the algebraic form for

a straight line: y = a + bx, where y is the dependent
variable, x is the independent variable, b is the slope
of the line, and a is the point at which the line
crosses the y axis.
SW388R6
Data Analysis
and Computers I
The Regression Equation
Slide 14
The regression equation is the algebraic formula

for the regression line, which states the
mathematical relationship between the
independent and the dependent variable.
We can use the regression line to estimate the
value of the dependent variable for any value of
the independent variable.
The stronger the relationship between the
independent and dependent variables, the closer
these estimates will come to the actual score that
each case had on the dependent variable.
SW388R6
Data Analysis
and Computers I
Components of the Regression Equation
Slide 15
The regression equation has two components.
The first component is a number called the y-intercept that

defines where the regression line crosses the vertical y axis.
The second component is called the slope of the line, and is
a number that multiplies the value of the independent
variable.
These two elements are combined in the general

form for the regression equation:
the estimated score on the dependent variable

= the y-intercept + the slope the score on the
independent variable
The Standard Form of the Regression

Equation
SW388R6
Data Analysis
and Computers I
Slide 16
The standard form for the regression equation or

formula is:
Y = a + bX
where
Y is the estimated score for the dependent variable

X is the score for the independent variable
b is the slope of the regression line, or the multiplier of X
a is the intercept, or the point on the vertical axis where
the regression line crosses the vertical y-axis
SW388R6
Data Analysis
and Computers I
Depicting the Regression Equation
Slide 17
The regression equation includes both the

y-intercept and the slope of the line. The
y-intercept is 1.0 and the slope is 0.5.
y = 1.0 + 0.5 x
5.0
4.5
4.0
3.5
3.0
y 2.5
2.0
1.5
The slope is the multiplier

of x. It is the amount of
change in y for a change
of one unit in x.
1.0
0.5
0.0
0
5.
5
4.
0
4.
5
3.
0
3.
5
2.
0
2.
5
1.
0
1.
5
0.
0
0.
The y-intercept is the

point on the vertical yaxis where the
regression line crosses
the axis, i.e. 1.0.
If x changes one unit

from 2.0 to 3.0, depicted
by the blue arrow, y will
change by 0.5 units, from
2.0 to 2.5 as depicted by
the red arrow.
SW388R6
Data Analysis
and Computers I
Deriving the Regression Equation - 1
Slide 18
In this plot, none of the points

fall on the regression line.
y = 0.8 + 0.6 x
The difference between the

actual value for the dependent
variable and the predicted
value for each point is shown
by the red lines. These
differences are called the
residuals, and represent the
errors between the actual and
predicted values.
5
4
3
y
2
1
0
0
SW388R6
Data Analysis
and Computers I
Deriving the Regression Equation - 2
Slide 19
The regression equation is

computed to minimize the
total amount of error in
predicting values for the
dependent variable. The
method for deriving the
equation is called the "method
of least squares," meaning that
the regression line minimizes
the sum of the squared
residuals, or errors between
actual and predicted values.
y = 0.8 + 0.6 x
5
4
3
y
2
1
0
0
Interpreting the Regression Equation:

the Intercept
SW388R6
Data Analysis
and Computers I
Slide 20
The intercept is the point on the vertical axis where

the regression line crosses the axis. It is the
predicted value for the dependent variable when the
independent variable has a value of zero.
This may or may not be useful information depending
on the context of the problem.

the Slope
SW388R6
Data Analysis
and Computers I
Slide 21
The slope is interpreted as the amount of change in

the predicted value of the dependent variable
associated with a one unit change in the value of the
independent variable.
If the slope has a negative sign, the direction of the
relationship is negative or inverse, meaning that the
scores on the two variables move in opposite
directions.
If the slope has a positive sign, the direction of the
relationship is positive or direct, meaning that the
scores on the two variables move in the same
direction.

the Slope equals 0
SW388R6
Data Analysis
and Computers I
Slide 22
If there is no relationship between two variables, the

slope of the regression line is zero and the regression
line is parallel to the horizontal axis.
A slope of zero means that the predicted value of
the dependent variable will not change, no matter
what value of the independent variable is used.
If there is no relationship, using the regression

equation to predict values of the dependent variable
is no improvement over using the mean of the
dependent variable.
SW388R6
Data Analysis
and Computers I
Simple Linear Regression: Hypotheses
Slide 23
The hypothesis tested in simple linear regression is

based on the slope or angle of the regression line.
Hypotheses:
Null: the slope of the regression line as measured by the b

coefficient = 0, i.e. there is no relationship
Research: the slope of the regression line as measured by
the b coefficient 0, i.e. there is a relationship
The b coefficient is tested with a two-tailed t-test
Decision:
Reject null hypothesis if pSPSS alpha
Simple Linear Regression:

Level of Measurement
SW388R6
Data Analysis
and Computers I
Slide 24
Dependent variable is interval level (ordinal with

caution)
Independent variable is interval level (ordinal with
caution) or dichotomous

Sample Size Requirements - 1
SW388R6
Data Analysis
and Computers I
Slide 25
In previous semesters, the rule of thumb for required

sample size that I have used was a minimum of 5
cases for each independent variable included in the
analysis, and preferably 15 cases for each
independent variable. This rule was based on the
text Multivariate Data Analysis by Hair, Black,
Babin, Anderson, and Tatham.
Since attempting to incorporate more material on
power analysis, I find that rule to be inadequate
because we are unlikely to achieve statistical
significance in all but the simplest problems that
contain very strong relationships.

Sample Size Requirements - 2
SW388R6
Data Analysis
and Computers I
Slide 26
In Using Multivariate Statistics, Tabachnick and

Fidell recommend that the required number of cases
should be the larger of the number of independent
variables x 8 + 50 or the number of independent
variables + 105.
Following this rule, simple linear regression with one
independent variable would require a sample of 106
cases.
SW388R6
Data Analysis
and Computers I
SAMPLE SIZE
Slide 27
Descriptiv e Statistics
Mean
HOW OFTEN R ATTENDS
RELIGIOUS SERVICES
STRENGTH OF
AFFILIATION
HOW OFTEN DOES R
PRAY
Std. Deviation
3.15
2.653
113
2.12
1.084
113
2.90
1.575
113
The minimum ratio of valid cases to

independent variables for multiple
regression is 5 to 1. With 113 valid
cases and 2 independent variables,
the ratio for this analysis is 56.5 to 1,
which satisfies the minimum
requirement.
In addition, the ratio of 56.5 to 1
satisfies the preferred ratio of 15 to 1.
SW388R6
Data Analysis
and Computers I
Visualizing Regression Analysis - 1
Slide 28
While we will base our problem solving on numeric

statistical results computed by SPSS, we can use a
scatterplot to demonstrate regression graphically.
We will use the variable "highest year of school
completed" [educ] as the independent variable and
"occupational prestige score" [prestg80] as the
dependent variable from the GSS2000R data set to
demonstrate the relationship graphically.
SW388R6
Data Analysis
and Computers I
Slide 29
A scatterplot of prestg80
by educ produced by
SPSS.
The dependent
variable is plotted
on the y-axis, or
the vertical axis.
The independent variable

is plotted on the x-axis,
or the horizontal axis.
The dots in the body of

the chart represented
the cases in the
distribution.
SW388R6
Data Analysis
and Computers I
Slide 30
I have drawn a
green horizontal line
through the mean of
prestg80 (44.17).
The differences between the

mean line and the dots (shown as
pink lines), are the deviations.
The sum of the squared

deviations is the measure of total
error when the mean is used as
the estimated score for each case.
NOTE: the plots were

created in SPSS by adding
features to the default plot.
SW388R6
Data Analysis
and Computers I
Slide 31
A regression line and

the regression
equation are added in
red to the
scatterplot.
The pink deviations from the

mean have been replaced with
the orange deviations from the
regression line. Deviations
between cases and the
regression line are called
residuals.
SW388R6
Data Analysis
and Computers I
Slide 32
The existence of a relationship

between the variables is supported
when the sum of the squared orange
residuals is significantly less than
the sum of the squared pink
deviations
Recall that both deviations and

residuals can be referred to as
errors. If there is a relationship,
we can characterize it as a
reduction in error.
SW388R6
Data Analysis
and Computers I
Visualizing Regression Analysis 6
Slide 33
While it is difficult for us to

square and sum deviations and
residuals, SPSS regression output
provides us with the answer.
The squared sum of

the pink deviations
from the mean is the
Total Sum of Squares
in the ANOVA table
(49104.91).
The squared sum of

the orange residuals
from the regression
line is the Residual
Sum of Squares in the
ANOVA table
(37086.80).
SW388R6
Data Analysis
and Computers I
Slide 34
The difference between the Total Sum of

Squares and the Residual Sum of Squares is
the Regression Sum of Squares.
The Regression Sum of Squares is the amount
of error that can be eliminated by using the
regression equation to estimate values of
prestg80 instead of the mean of prestg80.
The Regression Sum of

Squares in the ANOVA
table is 12018.11.
SW388R6
Data Analysis
and Computers I
Slide 35
We can compute the proportion or error

that was reduced by the regression by
dividing the Regression Sum of Squares
by the Total Sum of Squares:
12018.11 49104.91 = 0.245
SW388R6
Data Analysis
and Computers I
Slide 36
The reduction in error that we

computed (0.245) is equal to the R
Square that SPSS provides in the
Model Summary table.
R is the coefficient of
determination which is usually
characterized as:
the proportion of variance in the
dependent variable explained by
the independent variable, or
In multiple regression, the symbol

for coefficient of determination is
R. In simple linear regression,
the symbol is r.
the reduction in error (or

increase in accuracy).
SW388R6
Data Analysis
and Computers I
Slide 37
The correlation coefficient, Multiple R, is

the positive square root of R Square.
This can be misleading in Simple Linear
Regression when the correlation for the
relationship between the two variables,
r, can have a negative sign for an
inverse relationship. Aside from the
direction of the relationship, the value
of Multiple R will be the same as the
value for r in Simple Linear Regression.
SW388R6
Data Analysis
and Computers I
Slide 38
The ANOVA table tests the null hypothesis

that R = 0, i.e. the reduction in error
associated with the regression is zero.
The test of the this hypothesis is reported
for Multiple Regression as a test of an
overall relationship between the dependent
variable and the independent variables.
In Simple Linear Regression, we usually
report the hypothesis test that the slope = 0,
though we would reach the same conclusion
no matter which test we report.
SW388R6
Data Analysis
and Computers I
Slide 39
The test of the null hypothesis that the slope

of the regression line (b coefficient) = 0 is
reported in the Coefficients table.
Note that the significance of the t-test is that
same as the significance of the F-test.
Furthermore, in simple linear regression, the
value of the F-statistic (81.662) is the same
as the square of the t-statistic (9.037).
SW388R6
Data Analysis
and Computers I
Slide 40
We can depict the hypothesis test visually.

The null hypothesis for simple linear
regression is that the slope of the
regression line is zero. The slope of the
green mean line is zero. The null
hypothesis means that the red regression
line would be the equal to the green line.
In this example, the

red regression line is
obviously different
from the green mean
line, which is verified
by the value of the
slope in the
regression equation
(2.36) and the t-test
of B.
SW388R6
Data Analysis
and Computers I
Slide 41
The regression equation is based on the

Unstandardized Coefficients (B) in the table
of Coefficients.
The B coefficient labeled (Constant) is the
intercept. The B coefficient for the variable
educ is the slope of the regression line.
The regression equation for the relationship
between prestg80 and educ is:
prestg80 = 12.928 + 2.359 x educ
SW388R6
Data Analysis
and Computers I
Slide 42
The Standardized Coefficients (Beta) in the table of

Coefficients are the regression coefficients for the
relationship between the standardized dependent
variable (z-scores) and the standardized independent
variable (z-scores).
Since standardizing variables removes the unit of
measurement from the coefficients, we can compare
the Beta coefficients to interpret the relative
importance of each independent variable in Multiple
Regression.
In Simple Linear Regression, Beta will be equal to r, the
correlation coefficient. Multiple R, r, and Beta all have
the same numeric value, though Multiple R will be
positive even when r and Beta are negative.
SW388R6
Data Analysis
and Computers I
Slide 43
The sign of the Beta coefficient, as well as

the sign of the B coefficient, tells us the
direction of the relationship.
If the coefficients are positive, the
relationship is characterized as direct or
positive, meaning that higher values of the
dependent variable are associated with
higher values of the independent
variables.
If the coefficients are negative, the
relationship is characterized as inverse or
negative, meaning that lower values of the
dependent variable are associated with
higher values of the independent
variables.
SW388R6
Data Analysis
and Computers I
Slide 44
The regression line represents the

estimated value of prestg80 for every value
of educ.
To obtain the estimate, we draw a line
perpendicular to the value on the x-axis to
the point where it intersects the regression
line. We then draw a line from the
intersection point to the y-axis. The
intersection point on the y-axis is the
estimated value for the dependent variable.
SW388R6
Data Analysis
and Computers I
Slide 45
If we draw a vertical line from the educ

value of 5 to the regression line and then to
the horizontal axis, we see that the
estimated value for prestg80 is about 25.
We can compute the exact value by
substituting in the regression equation:
Prestg80 = 12.93 + 2.36 x 5 = 24.73
SW388R6
Data Analysis
and Computers I
Slide 46
If we draw a vertical line from the educ

value of 15 to the regression line and then
to the horizontal axis, we see that the
estimated value for prestg80 is about 50.
We can compute the exact value by
substituting in the regression equation:
Prestg80 = 12.93 + 2.36 x 15 = 48.33
Sample homework problem:

Simple linear regression
SW388R6
Data Analysis
and Computers I
Slide 47
Based on information from the data set GSS2000R, is the following

statement true, false, or an incorrect application of a statistic? Assume
that the assumptions of linear regression are satisfied. Use .05 for alpha.
Simple linear regression revealed a strong, positive relationship between
"highest academic degree" [degree] and "occupational prestige score"
[prestg80] ( = 0.546, t(250) = 10.30, p < .001). Survey respondents who
had higher academic degrees had more prestigious occupations. The
accuracy of predicting scores for the dependent variable "occupational
prestige score" will improve by approximately 30% if the prediction is
based on scores for the independent variable "highest academic degree"
(r = 0.298).
o
o
o
o
True
True with caution
False
Incorrect application of a statistic
This is the general framework

for the problems in the
homework assignment on simple
linear regression problems. The
description is similar to findings
one might state in a research
article.

Data set and alpha
SW388R6
Data Analysis
and Computers I
Slide 48

The first paragraph identifies:
Simple linear regression revealed

a strong, positive relationship between
The data set to use, e.g. GSS2000R.Sav
The alpha
for the hypothesis
test score"
"highest academic degree" [degree]
andlevel
"occupational
prestige
(r = 0.298).
o
o
o
o
True
True with caution
False

Specifications for the test - 1
SW388R6
Data Analysis
and Computers I
Slide 49

The second paragraph states the finding that we
prestige score" will improve
by approximately
30% regression.
if the prediction
is
want to verify
with a simple linear
The
finding identifies:
based on scores for the
independent variable "highest academic degree"
The independent variable
(r = 0.298).
o
o
o
o
The dependent variable

The strength of the relationship
The direction of the relationship
True
True with caution
False

Specifications for the test - 2
SW388R6
Data Analysis
and Computers I
Slide 50

The second
paragraph
also
additional
statement
true,
false, or
anstates
incorrect
application of a statistic? Assume
statements the can be included in findings;
A interpretative statement about direction of
the relationship
The proportional reduction in error (PRE)
interpretation
of the
coefficient
Simple linear
regression
revealed
a of
strong, positive
determination r2
relationship between
(r = 0.298).
o
o
o
o
True
True with caution
False

SW388R6
Data Analysis
and Computers I
Slide 51
answer toisa the

problem
will be
Based
on information
from the data set The
GSS2000R,
following
The answer
will be
True
with
caution
if
the
analysis
True if all true,
parts of
the or an incorrect application of a statistic? Assume
statement
false,
supports the finding in the
finding in the problem
statement, but one or
statement
are
that
the assumptions
of linear regressionproblem
areofsatisfied.
Use
for alpha.
both
the variables
is .05
ordinal
correct.
level.

(r = 0.298).
The answer will be
o
o
o
o
False if any part of

the finding in the
problem
statement is not
correct.
True
True with caution
False
The answer to a problem

will Incorrect
application of a statistic
if the level of
measurement or sample
size requirement is
violated.
SW388R6
Data Analysis
and Computers I
Slide 52
Solving the problem with SPSS:

Level of measurement
requires that the dependent
variable be interval and the
independent variable be interval
or dichotomous. "Occupational
prestige score" [prestg80] is
interval level, satisfying the
requirement for the dependent
variable. "Highest academic
degree" [degree] is ordinal
level. However, we will follow
the common convention of
using ordinal variables with
interval level statistics, adding a
caution to any true findings.
SW388R6
Data Analysis
and Computers I
Slide 53

Simple linear regression- 1
Before we can address

the other issues involved
in solving the problem,
we need to generate the
SPSS output.
Select Regression
> Linear from the
Analyze menu.
SW388R6
Data Analysis
and Computers I
Slide 54

First, move the
dependent variable
prestg80 to the
Dependent list
box.
The problem states that:

revealed a strong, positive
relationship between "highest
academic degree" [degree] and
"occupational prestige score"
[prestg80] ( = 0.546, t(250) =
10.30, p < .001).
We first enter the independent
and dependent variable in the
dialog box Unless the problem
statement clearly specifies
which variable is having an
effect on the other, we treat the
variable mentioned first as the
independent variable and the
one mentioned second as the
dependent variable.
Second, move the

independent variable
degree to the
Independents list box.
Third, click on the

Statistics button to add
the additional statistics.
SW388R6
Data Analysis
and Computers I
Slide 55

First, in addition to
the SPSS defaults,
we add the check
box for Descriptives
statistics.
Second, click on the

Continue button to
close the dialog box.
SW388R6
Data Analysis
and Computers I
Slide 56

When we return to the

Linear Regression dialog
box, we click on OK to
obtain the output.
SW388R6
Data Analysis
and Computers I
Slide 57

Sample size
Using the rule of thumb from
Tabachnick and Fidell that
the required number of
cases should be the larger of
the number of independent
variables x 8 + 50 or the
number of independent
variables + 105, simple
linear regression requires
106 cases. With 252 valid
cases, the sample size
requirement is satisfied.
NOTE: this sample size requirement is

much larger than what I have used in the
past. Including the issue of power
analysis indicates that previous guidelines
would be substantially under-powered.
SW388R6
Data Analysis
and Computers I
Slide 58

Interpreting the relationship - 1
The first sentence in the

finding states that:
"highest academic degree"
[degree] and "occupational
prestige score" [prestg80]
( = 0.546, t(250) = 10.30,
p < .001).
From the table of Coefficients, we see
that the Beta () of 0.546 stated in
the finding is correct, as is the value
for the t-test of the b-coefficient
(10.303) and the probability of the tstatistic (<.001)
SW388R6
Data Analysis
and Computers I
Slide 59

The first sentence in the

"highest academic degree"
[degree] and "occupational
prestige score" [prestg80]
( = 0.546, t(250) = 10.30,
p < .001).
SPSS does not provide the degrees of

freedom for the t-test. However, it is
easily calculated as the number of
cases in the sample the number of
predictors 1, or 252 1 -1 = 250
for this problem.
Since the probability of the test statistic (t =

10.30, p < .001) was less than or equal to alpha
(.05) the relationship between "highest academic
degree" [degree] and "occupational prestige
score" [prestg80] was statistically significant.
SW388R6
Data Analysis
and Computers I
Slide 60

The relationship was correctly characterized
as strong. Using Cohen 's criteria for
characterizing the strength of relationships,
the correlation coefficient (r = 0.546) was
correctly interpreted as a large or strong
relationship.
Cohens criteria:
r < .1
.1 r
.3 r
r .5
=
<
<
=
Trivial
.3 = Small
.5 = Medium or moderate
Large
In multiple regression, the Multiple R will always be

positive because it represents the strength of the
relationship for whatever number of independent variables
is included.
The r for individual relationships has the same value as
Multiple R in simple linear regression, but may be positive
or negative depending on the direction of the relationship.
SW388R6
Data Analysis
and Computers I
Slide 61

The relationship was correctly characterized

positive. The sign of the Beta coefficient, as
well as the B coefficient is positive or
direct, implying that the numeric values of
both variables move in the same direction
high with high and low with low.
The first sentence in the finding that

states that Simple linear regression
revealed a strong, positive relationship
between "highest academic degree"
[degree] and "occupational prestige
score" [prestg80] ( = 0.546, t(250) =
10.30, p < .001) is correct.
SW388R6
Data Analysis
and Computers I
Slide 62

The second sentence in the

Survey respondents who
had higher academic
degrees had more
prestigious occupations.
The sign of beta ( = 0.546) was positive

supporting the statement about the
direction of the relationship.
Since the sign of the beta coefficient ( =
0.546) was positive, the relationship
between the variables is direct. Higher
scores for the independent variable
"highest academic degree" [degree] are
associated with scores on the dependent
variable "occupational prestige score"
[prestg80].
The statement that survey respondents

who had higher academic degrees had
more prestigious occupation" is correct.
SW388R6
Data Analysis
and Computers I
Slide 63

The third sentence in the finding
states that:
The accuracy of predicting scores
for the dependent variable
"occupational prestige score" will
improve by approximately 30% if
the prediction is based on scores
for the independent variable
"highest academic degree" (r =
0.298).
Using the proportional reduction in error

interpretation of the coefficient of
interpretation, r, the statement that "the
accuracy of predicting scores for the
dependent variable "occupational prestige
score" will improve by approximately 30% if
the prediction is based on scores for the
independent variable "highest academic
degree" (r = 0.298)" is correct.
This statement is also true, so

the answer to the question is
True with caution. The caution
is for because the indepdnent
variable degree is ordinal level.
SW388R6
Data Analysis
and Computers I
Slide 64
Logic for simple linear regression:

Level of measurement
Measurement
level of
independent
variable?
Nominal
Interval/Ordinal
/Dichotomous
Inappropriate
application of
a statistic
Measurement
level of
dependent
variable?
Interval/ordinal
Strictly speaking, the

test requires an interval
level variable. We will
allow ordinal level
variables with a caution.
Nominal/
Dichotomous
Inappropriate
application of
a statistic
SW388R6
Data Analysis
and Computers I
Slide 65

Sample size requirement
Compute linear
regression including
descriptive statistics
Valid cases
satisfies
computed
requirement?
The sample size requirement is

the larger of :
variables x 8 + 50
variables + 105
Yes
No
Inappropriate
application of
a statistic
SW388R6
Data Analysis
and Computers I
Slide 66

Significant, non-trivial relationship
There are other
assumptions that
we will assume we
satisfy for this
weeks
assignment.
Probability for t-test

of B coefficient less
than or equal to
alpha?
Yes
Effect size (Multiple R) is

not trivial by Cohens scale,
i.e. equal to or larger than
0.10?
In simple linear regression,
r and Beta have the same
numeric value as Multiple
R, but may have a different
sign. The are also
measures of effect size.
Yes
No
False
No
False
SW388R6
Data Analysis
and Computers I
Slide 67

Strength of relationship
Strength of relationship
(effect size) correctly
interpreted based Multiple R?
No
Yes
False
SW388R6
Data Analysis
and Computers I
Slide 68

Direction of the relationship
Direction of relationship
correctly interpreted
based on B or Beta
coefficient?
Yes
No
False
SW388R6
Data Analysis
and Computers I
Slide 69

Proportional reduction in error
Reduction in error
correctly interpreted
based Multiple R?
No
Yes
False
The statistics in the SPSS

output match all of the
statistics cited in the problem?
No
Add caution if dependent

or independent variable
is ordinal.
Yes
True
False

Lecture Linear Regression Analysis1212

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Linear Regression Analysis1212

Uploaded by

Copyright:

Available Formats

SW388R6

Simple Linear Regression

What is Regression Analysis

Regression analysis is the generic term for several

Purpose of multiple regression

The purpose of multiple regression is to analyze the

Types of multiple regression

There are three types of multiple regression, each of

Standard multiple regression

In standard multiple regression, all of the

Hierarchical multiple regression

In hierarchical multiple regression, the

Stepwise multiple regression

Stepwise regression is designed to find the

Purpose of Simple Linear Regression - 1

The purpose of simple linear regression analysis is to

Is there a relationship between the two variables?

Purpose of Simple Linear Regression - 2

The question of the existence of a relationship

The question of the direction of the relationship is

Simple Linear Regression: Examples

There is a relationship between undergraduate GPAs

Simple Linear Regression - 1

When we studied measures of central tendency, we

Simple Linear Regression - 2

Simple linear regression tests the existence of a

The regression equation takes the algebraic form for

The Regression Equation

The regression equation is the algebraic formula

Components of the Regression Equation

The regression equation has two components.

The first component is a number called the y-intercept that

These two elements are combined in the general

the estimated score on the dependent variable

The Standard Form of the Regression

The standard form for the regression equation or

Y is the estimated score for the dependent variable

Depicting the Regression Equation

The regression equation includes both the

The slope is the multiplier

The y-intercept is the

If x changes one unit

Deriving the Regression Equation - 1

In this plot, none of the points

The difference between the

Deriving the Regression Equation - 2

The regression equation is

Interpreting the Regression Equation:

The intercept is the point on the vertical axis where

Interpreting the Regression Equation:

The slope is interpreted as the amount of change in

Interpreting the Regression Equation:

If there is no relationship between two variables, the

If there is no relationship, using the regression

Simple Linear Regression: Hypotheses

The hypothesis tested in simple linear regression is

Null: the slope of the regression line as measured by the b

The b coefficient is tested with a two-tailed t-test

Reject null hypothesis if pSPSS alpha

Simple Linear Regression:

Dependent variable is interval level (ordinal with

Simple Linear Regression:

In previous semesters, the rule of thumb for required

Simple Linear Regression: