Multiple Regression

Slide 16.
Chapter 16
Linear and multiple linear regression
Mayers, Statistics and SPSS in Psychology PowerPoints on the Web, 1st edition Pearson Education Limited 2013
Slide 16.2
Overview
What the test measures
Theory and rationale
Working examples
Demonstrations in SPSS
How to interpret output
How to write-up in results
See Chapter 16
Slide 16.3
What does linear regression do?
Investigates relationships
Examines amount of variance in outcome scores (dependent variable)
Numerical outcome = scores or some kind of count
Income, exam scores, and quality of life perception scores
Outcome scores can vary as a result of several factors
Income may vary according to peoples qualifications
Exam scores might differ according to revision undertaken
Quality of life scores might fluctuate due to perceptions of physical

health
that can be explained by one or more predictor (independent

variable)
Or scores may simply vary due to random factors
The extent that the outcome scores vary is called variance

Slide 16.4
What does linear regression do?

(Continued)
Linear regression helps determine how much variation is explained
By factors that we have accounted for
And how much is unexplained
Random factors or those which we have not accounted for
Linear regression is described in terms of a model
We use it to predict outcomes scores
e.g. qualifications, revision, and perceived physical health
From predictor variable values or conditions
We tend not to use dependent variable and independent variable to

describe variables
Instead we use outcome (dependent variable)
And predictor (independent variable)

Slide 16.5
Types of linear regression
Simple linear regression

Examines variance in outcome explained by one predictor
For example
Outcome variable: quality of life scores
Predictor: perceived physical health
Multiple linear regression
Examines variance in outcome explained by several predictors
More realistic application of linear regression?
Quality of life may be explained by a whole series of factors
Predictors: perceived physical health, income, job
satisfaction, relationship satisfaction, and depression
status
We look at both types here
Slide 16.6
Research example
Centre for Healthy Independent Living and Learning (CHILL)
Investigating what influences quality of life perceptions
Use questionnaire to capture those perceptions
Questions scored according to quality of life ratings
CHILL also measure other factors
Two studies
Quality of life (outcome) vs. perceived physical health
(predictor variable)
Simple linear regression
Quality of life vs. perceived physical health, income, job
satisfaction, relationship satisfaction, current level

depression

Slide 16.7
Simple linear regression: How it

works?
Line of best fit (regression line)
Data points are drawn on a scatter plot (graph)
We can draw a line through these points
Line is described in terms of the gradient,
One that approximates the average of those points

And where it crosses the Y axis (the intercept)
Correlation measures strength of relationship
But gradient determines if predictor significantly

contributes to variance in outcome scores
If gradient significantly greater than 0
Regression model describes how each score varies from line
Slide 16.8
Line of best fit
Slide 16.9
Line of best fit (Continued)
Use line of best fit to predict outcome score from predictor score
e.g. take line from where physical health scores = 40
Up to blue line and draw across to Y axis
Gives outcome of
roughly 35
Figure 16.1 Scatterplot: quality of life perceptions vs. perceptions of physical health
Slide 16.10
But we can also use equation to plot outcome score more precisely
Yi = 0 + 1X1 + i
Y = outcome variable score
i = specific outcome score for participant (or case) i
0 = constant (where line crosses Y axis)
1 = gradient of line
X = predictor variable score
= error
Slide 16.11
If we knew that gradient (1)= 0.776; and intercept (0) = 3.863
Then: Y = 3.863 + (0.776 x 40) = 34.903
However the error part of that equation is missing!
Strictly speaking, we have just calculated Y1 = 0 + 1X1
We would hope that best line of fit has little error (residual)
We can estimate this by finding (Y1 Y)
We will see how to find the gradient and intercept later
The smaller that outcome the better
We need a series of measures to examine the success of our model
Slide 16.12
Components in regression model
Variance and correlation play central role in linear regression
Variance (R2)
Actual variation of scores either side of mean
Correlation (r)
Sum of squared differences between each case score and the

mean score
Standardised relationship between outcome and predictor
In simple linear regression, we are only concerned with correlation

between the two variables in the model
In multiple linear regression, correlation becomes more complex
Use semi-partial correlation to examine each additional variable
As we will see later
Slide 16.13
Components in regression model

(Continued)
Regression model
Success of model
Depends on how closely predicted values match actual outcome
Remember the regression equation we saw earlier
Difference between predicted and actual outcome = error
(residual)
Measured by F ratio
If F ratio significant model better at predicting outcome
than some random method
Gradient of slope
Measured by Beta values (1 or B is SPSS)
Indicates how outcome values change for each unit change in
predictor
And if gradient significance is greater than 0
Predictor significance contributes to variance in outcome
scores
Slide 16.14
Putting it all together
Ultimately a simple linear regression model has three components:
How much variance is explained (R2)
Whether model significance better at predicting outcome than

random methods (determined by F ratio)
Whether gradient is significantly > 0
Slide 16.15
Assumptions and restrictions
There are only a few in simple linear regression
Much fewer than multiple linear regression
Outcome must be continuous numerical score
Preferably interval
But ordinal scores are frequently used
Categorical outcomes measured using logistic regression
Outcome variable scores should be normally distributed
Predictor variable can be continuous or categorical
If categorical, this must be dichotomous (two groups)
And must be coded as 0 and 1 in SPSS value labels
Slide 16.16
Categorical predictors
If three or more groups
For example, if the predictor represented ethnicity
Predictor must be converted to dichotomous

British, Asian, and African
Would need to set up three new variables
British (0 = yes, 1 = no)
Asian (0 = yes, 1 = no)
African (0 = yes, 1 = no)
However, you would now have a multiple regression!
For which there several additional assumptions and restrictions
Slide 16.17
Measuring model outcomes
In Chapter 16 you can see how to calculate outcomes manually
Maths, formulae, etc.
Very much encourage you to do this
Meanwhile, we will now see how to do this in SPSS
Simple linear regression data (Centre for Healthy Independent

Living and Learnings first research question):
Outcome variable = quality of life scores
Predictor variable = perceived physical health scores
For both variables, higher scores indicate better outcomes
Slide 16.18
Simple linear regression in SPSS
Need reasonable normal distribution
Examine via KolmogorovSmirnov or ShapiroWilk test
But we have done that to death now
So we will leave it for today
Using SPSS data set quality of life and health
Select Analyze Regression Linear transfer Quality of

life to Dependent: transfer Physical health scores to
Independent(s) click OK
Slide 16.19
SPSS output
Model summary
Figure 16.7 Linear regression: model summary
Total variance explained by model R2 = .591
59.1% of variance in quality of life scores explained by physical

health perceptions
We could use adjusted R2
But that is generally reserved for multiple linear regression
Slide 16.20
SPSS output (Continued)
Significance of model
Figure 16.8 Significance of model
Model significantly predicts the outcome variable
F (1, 8) = 11.572, p = .009
It is significantly better at predicting outcome than some

random method
Slide 16.21
SPSS output (Continued)
Model parameters
Figure 16.9 Model parameters
Intercept (0) shown as Constant in SPSS: 3.863

Gradient for Physical health scores = 0.776
For every unit that physical health scores increase
Quality of life scores increase by 0.776 of a point
Gradient significantly greater than 0: t = 3.402, p = .009
But then it always will if model is also significant
Slide 16.22
Simple linear regression results
Produce your own table
Table 16.2 Linear regression analysis of quality of life scores
Write something like this:

Table 16.2 confirms that changes in perceived physical health
scores were significantly able to predict variance in quality of
life scores. The linear regression model explained 59.1% of the
overall variance in quality of life scores (R2 =.591), which was
found to significantly predict outcome, F (1, 8) = 11.572, p = .
009.
Slide 16.23
We still investigate how variance can be explained
But now we have several predictors

Predictors: perceived physical health, income, job
satisfaction, relationship satisfaction, and depression status
Model has several regression lines
Each with their own gradient
Regression equation now a little more complex
But constant will change each time another predictor added
Yi = 0 + 1X1 + 2X2 + nXn + i
Gradient for each predictor
1X1 for first predictor, 2X2 for second predictor
Slide 16.24
Semi-partial correlation
Correlation
Standardised relationship between two variables
Partial correlation
Relationship between two variables
After controlling for third variable (held constant for both

of the original variables)
Semi-partial correlation
Third variable held constant for only one of the original

variables
Simple linear regression focuses on standard correlation
Multiple linear regression all about semi-partial correlation
Slide 16.25
Semi-partial correlation (Continued)
We can illustrate with correlation for simple linear regression
Using SPSS file quality of life and health
Select Analyze Correlate Bivariate select Quality of

life scores and Physical health scores click on arrow by
Variables tick boxes for Pearson and Two-tailed click OK
Figure 16.10 Correlation between quality of life and physical health
Strong correlation quality of scores vs. physical health scores
r = .769, p = .005
Slide 16.26
Correlation also equates to the R figure in regression earlier
Recall the other outcome
Significant model: F (1, 8) = 11.572, p = .009
Predictor contributed to variance: t = 3.402, p = .009
But, what if we suspected that quality of life scores had more to

do with mood than physical health perceptions ?
Despite strength of observed relationship
We can use semi-partial correlation to explore that
Examine relationship quality of life vs. physical health

perceptions
But hold mood scores constant for physical health scores
Slide 16.27
Using the SPSS file quality of life and health
Select Analyze Regression Linear (in new window) select

Quality of life scores click arrow by Dependent select
Physical health scores and Mood click arrow by Independent(s)
click Statistics (in new window) tick Estimates, Model fit, and
Part and partial correlations click Continue click OK
Figure 16.11 Semi-partial correlation (and regression coefficients)
Semi-partial correlation quality of life scores vs. physical health

scores considerably weaker than initial correlation: r = .151
Slide 16.28
Now lets look at the regression outcome:
Figure 16.13 Significance of model
Still a significant model, but
Figure 16.14 Regression coefficients
Physical health no longer contributes variance: t = 0.858, p = .419

Despite strong relationship between predictor and outcome
But, mood scores do contribute: t = 2.488, p = .042
Slide 16.29
Multiple linear regression components
Variance
How much variance is explained
But using adjusted R2
Adjusts for number of predictors and sample size
Success of multiple regression model
How closely predicted values match actual outcome
Difference represented by error (or residuals)
Illustrated by significance of F ratio
Gradients
Each beta value (1 or B is SPSS) measures unit change in
outcome score for every unit change in predictor
But only significance contributes to variance in outcome if that
gradient significantly > 0
Slide 16.30
Reasonable correlation across outcome variable

Check outliers too many Type II errors
Extreme scores furthest from line of best fit
If too far from line they may be outliers
Examine in SPSS via standardised residuals
Range of values converted to z-scores
Can undermine strength of model Type II errors
Need to be related to some standardised cut-off points
Table 16.3 z-score limits

Slide 16.31

(Continued)
Ratio of cases to predictors
Avoid too many predictor variables
Mixed opinion about what is too many!
Some books say you need at least 10x participants as predictors
Others more prescriptive e.g. Tabachnick and Fidell (2007)
At least 8x participants as predictors plus 50
i.e. N 50 + 8m
m = number of predictors, N = sample size
Slide 16.32

(Continued)
Correlation
At least moderate (r = .30 or higher)
Linearity
Scores on outcome variable must be linear (straight line)
Particularly important if correlation weak
Nonlinear (curved or quadratic) relationship maybe

masking weak correlation
Examine with scatterplot
Might expect quality of life scores as income
This would be in a linear relationship
Slide 16.33
Linear relationship
Figure 16.15 Scatterplot: quality of life perceptions vs. income ( 000s)
Clear linear trend cluster of data points bottom left to top right
Slide 16.34
Nonlinear (quadratic) relationship
But maybe money does not buy happiness!
Figure 16.16 Scatterplot: quality of life perceptions vs. income ( 000s)

Slide 16.35

(Continued)
Multicollinearity
On other hand, correlation must not be too high
If two predictors perfectly correlated
They are no longer independent
Too much multicollinearity can increase Type II errors
Measured in SPSS via collinearity option (see later)
Correlations above r = 0.8 should be avoided
Slide 16.36

(Continued)
Independent errors
Residuals (error terms) should not be correlated to each other
We can measure this via DurbinWatson test
Ask for that outcome when we set up SPSS
DW produces a statistic between 0 and 4
Score of 2 = no correlation
Score <2 positive correlation
Score >2 negative correlation
We should reject anything <1 or >3
Anything close to 2 is good
Slide 16.37
Methods of entering data
Several ways to enter, and examine, predictors
Forced
Although only two types

All predictors entered simultaneously
Hierarchical methods
Each predictor entered one at a time
This method should only be used when good rationale to do

so
Most common forced entry type: Enter
Most common hierarchical method: Stepwise
Generally best to use Enter method (model testing)
But Stepwise good for model building
See Chapter 16 for overview of all methods

Slide 16.38
Measuring model outcomes
The maths and formulae for this are complex

Attempt this if you dare (see end of Chapter 16)
Meanwhile, we will now see how to do this in SPSS
The data set (CHILLs second research question):
Outcome variable = quality of life scores
Higher scores represent better quality of life
Predictor variables:
1. Physical health
2. Income (000s)
3. Job satisfaction
4. Relationship satisfaction
5. Depression: 1 = yes; 0 = no
Higher scores are better for predictors 1 4
Slide 16.39
Running multiple regression in SPSS
We will only look at the Enter method here

Refer to Chapter 16 to see how to do Stepwise method
Using SPSS data set Quality of life
Select Analyze Regression Linear transfer Quality of
life to Dependent: transfer Depressed, Relationship
satisfaction, Job satisfaction, Income, and Physical health to
Independent(s) select Enter in pull-down options for Method
click Statistics tick box for Estimates under Regression
Coefficients tick boxes for DurbinWatson and Casewise
diagnostics under Residuals tick radio button for Outlier
outside set Standard deviations to 2 tick boxes for Model
Fit, Part and partial correlations and Collinearity diagnostics
click Continue click OK
Slide 16.40
Checking assumptions
Is sample large enough?

Tabachnick & Fidell: N 50 + 8m
So, 5 predictors 50 + (8 x 5) = 90; our sample = 98
Outliers
We refer to Casewise diagnostics output
Figure 16.22 Casewise diagnostics

No more than 5%
of z-scores should be > 1.96

Sample = 98, so 5% = 5 cases: we had 4 (so thats OK)
No more than 1% should be > 2.58
In this sample 1% = 1 case: we had none
Slide 16.41
Checking assumptions (Continued)
Correlation
Figure 16.23 Correlation
Needs to be generally between .30 and .80
We have achieved that pretty well
Linearity
As correlation at least moderate not a concern
But we will check linearity for job satisfaction
Just to see how its done!

Slide 16.42
Linearity
Look at Chapter 16 for procedure
Relationship appears linear
Figure 16.25 Scatterplot: quality of life scores vs. job satisfaction (with line of best fit)
Slide 16.43
Multicollinearity
Collinearity diagnostics
Figure 16.26 Collinearity diagnostics
No predictor should be highly correlated with any dimension

Ideally, each predictor should be located on different dimension
May be minor problem with Physical health: Variance = .92
Slightly above the ideal maximum
Depressed and Relationship satisfaction located on Dimension 6
Not ideal, but it is the only case
Overall, while not perfect, this is quite satisfactory
Slide 16.44
Linearity and multicollinearity
Collinearity statistics
Figure 16.27 Collinearity statistics
Tolerance data should not to be too close to 0

Scores below .1 are of serious concern
Scores below .2 might cause some concern
No problems there
VIF figure should not be above 10
OK with that too
Slide 16.45
Checking assumptions
Independent errors
There should be no correlation between the residuals
DurbinWatson outcome tells us this
Figure 16.28 Correlation between residuals
This outcome measured on a scale of 0 to 4

2 = no correlation
Avoid < 1 and > 3
So 1.906 is fine
Slide 16.46
Checking model outcome
Explained variance
Figure 16.29 Explained variance
79.7% of variance in quality of life scores explained by

variations in predictors (in this sample: R2 = .797)
But we should also report adjusted R2 (.786)
Adjusts variance for number of predictors and sample

size
R is the multiple correlation in multiple linear regression

Slide 16.47
Checking model outcome (Continued)
Significance of model
Figure 16.30 Significance of the model
We have a highly significant model
F (5, 92) = 72.310, p < .001
The regression model is significantly better at predicting

outcome than a random method
Slide 16.48
Regression parameters and predictor contribution
Figure 16.31 Regression parameters and predictor contribution
Constant = 6.100
This would be 0 in regression equation
Slide 16.49
Regression parameters and predictor contribution
We also need to report regression line gradients for each

predictor
Those gradients Significance > 0 shown in red font below
Depressed: B = 8.607, t = 2.490, p = .015
Relationship satisfaction : B = 0.520, t = 5.823, p < .001
Job satisfaction: B = 0.286, t = 1.279, p = .075
Income: B = 0.202, t = 2.490, p = .204
Physical health: B = 0.544, t = 5.165, p < .001
If gradient Significance > 0
Predictor significantly contributes to outcome variance
Slide 16.50
Interpreting the gradients
Only the significant predictors need interpretation
Depressed
Quality of life scores decrease (worsen) by 8.607 between

categorisation of not depressed (SPSS value code 0) and
depressed (1)
Relationship satisfaction
But see Chapter 16 for greater detail on all predictors
For every unit improvement in relationship satisfaction scores,

quality of life scores increase (improve) by 0.520
Physical health
For every unit improvement in physical health scores, quality of

life scores increase (improve) by 0.544
Slide 16.51
Writing-up results
Produce your own table
Table 16.6 Multiple linear regression analysis of quality of life scores (n = 98)
Slide 16.52
Writing-up results (Continued)
Write something like this

A multiple linear regression was undertaken to examine variance
in quality of life scores. Five predictors were loaded into the
model using the Enter method. The model was able to explain
79.7% of the sample outcome variance (Adj. R2 = .786), which
was found to significantly predict outcome, F (5, 92) = 72.310,
p < .001. Three of the predictor variables significantly
contributed to the model. Being depressed was related to poorer
quality of life ( = 8.607, t = 2.490, p = .015), while
increased relationship satisfaction ( = 0.520, t = 5.823,
p < .001), and better physical health ( = 0.544, t = 5.165,
p < .001) were significantly associated with improved quality of
life scores. Two other predictor variables, job satisfaction and
income, did not significantly contribute to variance
Slide 16.53
Stepwise regression
We will not run through that here
But do read Chapter 16 to see how it is done
Interpretation of output is very different
You may need to know this some time
So do learn it
Slide 16.54
Summary
We have been learning about linear regression
Explores how much variance in outcome variable can be explained

by a series of predictors
Simple linear regression: one predictor
Multiple linear regression: several predictors
Builds a model to best predict outcome
Outcome variable must be numerical
Predictor variable can be categorical or continuous
But categorical variable must be dichotomous
Slide 16.55
Summary (Continued)
Outcome is described in three stages
How much variance is explained?
R2 for simple linear regression
Whether model significance better at predicting outcome than random

methods
Adjusted R2 for multiple linear regression
Described via F ratio
Significance of gradient
For single predictor in simple linear regression
If independent t-test significance > 0 significance

contributes to outcome
With multiple linear regression
Determines which gradients significance contribute to outcome

variance

Multiple Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression

Uploaded by

Copyright:

Available Formats

Slide 16.

What the test measures

Theory and rationale

How to interpret output

How to write-up in results

What does linear regression do?

Examines amount of variance in outcome scores (dependent variable)

Numerical outcome = scores or some kind of count

Income, exam scores, and quality of life perception scores

Outcome scores can vary as a result of several factors

Income may vary according to peoples qualifications

Exam scores might differ according to revision undertaken

Quality of life scores might fluctuate due to perceptions of physical

that can be explained by one or more predictor (independent

Or scores may simply vary due to random factors

The extent that the outcome scores vary is called variance

What does linear regression do?

Linear regression helps determine how much variation is explained

By factors that we have accounted for

And how much is unexplained

Random factors or those which we have not accounted for

Linear regression is described in terms of a model

We use it to predict outcomes scores

e.g. qualifications, revision, and perceived physical health

From predictor variable values or conditions

We tend not to use dependent variable and independent variable to

Instead we use outcome (dependent variable)

And predictor (independent variable)

Types of linear regression

Simple linear regression

Centre for Healthy Independent Living and Learning (CHILL)

Investigating what influences quality of life perceptions

Use questionnaire to capture those perceptions

Questions scored according to quality of life ratings

CHILL also measure other factors

Quality of life (outcome) vs. perceived physical health

Simple linear regression

Quality of life vs. perceived physical health, income, job

satisfaction, relationship satisfaction, current level

Multiple linear regression

Simple linear regression: How it

Line of best fit (regression line)

Data points are drawn on a scatter plot (graph)

We can draw a line through these points

Line is described in terms of the gradient,

One that approximates the average of those points

Correlation measures strength of relationship

But gradient determines if predictor significantly

If gradient significantly greater than 0

Regression model describes how each score varies from line

Line of best fit

Line of best fit (Continued)

e.g. take line from where physical health scores = 40

Up to blue line and draw across to Y axis

Line of best fit (Continued)

Y = outcome variable score

i = specific outcome score for participant (or case) i

0 = constant (where line crosses Y axis)

X = predictor variable score

Line of best fit (Continued)

If we knew that gradient (1)= 0.776; and intercept (0) = 3.863

Then: Y = 3.863 + (0.776 x 40) = 34.903

However the error part of that equation is missing!

Strictly speaking, we have just calculated Y1 = 0 + 1X1