You are on page 1of 33

Relationships Among

Variables
Correlation and Regression

KNES 510
Research Methods in
Kinesiology
1

Correlation
Correlation is a statistical technique used to
determine the relationship between two or more
variables
We use two different techniques to determine
score relationships:
1. graphing technique
2. mathematical technique called correlation

Graphs of the Relationship


Between Variables

Types of Relationships
The scattergram can indicate a positive
relationship, a negative relationship, or a
zero relationship
What are the characteristics of positive,
negative, and zero relationships?

Mathematical Technique: The


Correlation Coefficient (r)
The correlation coefficient, r,* represents
the relationship between the z-scores of
the subjects on two different variables
(usually designated X and Y)
This can be stated mathematically as the
mean of the z-score products for all
subjects
*A more complete name for this statistic is Pearsons product-moment
correlation coefficient
5

Formula for the Correlation


Coefficient
The correlation coefficient can be calculated as
follows:

ZY

N
6

The values of the coefficient will always


range from +1.00 to -1.00
A correlation coefficient near 0.00
indicates no relationship

SPSS Bivariate Correlation Output

Correlations
X
X

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

1
4
.947
.053
4

Y
.947
.053
4
1
4

Interpreting the Correlation


Coefficient
Because the relationship between two
sets of data is seldom perfect, the
majority of correlation coefficients are
fractions (0.92, -0.80, and the like)
When interpreting correlation coefficients
it is sometimes difficult to determine what
is high, low, and average

The Correlation Coefficient and


Cause-and-Effect
There is a high correlation between a
person's shoe size and their math skills in
grades K through 6
Is this an example of cause-and-effect?
Can we predict math skill based on shoe
size in grade K through 6 students?

10

Coefficient of Determination
The coefficient of determination is the
amount of variability in one measure that
is explained by the other measure
The coefficient of determination is the
square of the correlation coefficient ( r2).
For example, if the correlation coefficient
between two variables is r = 0.90, the
coefficient of determination is (0.90) 2 =
0.81
11

Regression
When two variables are related
(correlated), it is possible to predict a
persons score on one variable (Y) by
knowing their score on the second
variable (X)

12

13

This scatterplot illustrates that there is a


strong, positive relationship between fatfree body mass and daily energy
expenditure
Correlations

Fat-Free Mass (kg)

Energy Expenditure (kcal)

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

Energy
Fat-Free
Expenditure
Mass (kg)
(kcal)
1
.981**
.000
7
7
.981**
1
.000
7
7

**. Correlation is significant at the 0.01 level (2-tailed).


14

Regression Line (Line of Best Fit)


The regression line is a line that best
describes the trend in the data
This line is as close as possible to the
data points
The equation for this line is:
Y' = bX = C
15

Fitting a Regression Line

16

Simple Prediction
Tests have been developed to predict VO 2
max from the time it takes a person to run
1.5 miles
A person's VO2 max can thus be predicted
from their 1.5 mile run time because a
prediction or regression equation has
been developed

17

The simple linear prediction or regression


equation takes the following form:
Y' = a + bX
Y' = predicted value
a = intercept of the regression line (Y intercept)
b = slope of the regression line (change in Y
with each change in X)
X = score on the predictor variable
18

Determining Error in Prediction


Unless two variables are perfectly related
(-1.00 or +1.00) there will always be error
associated with a prediction equation
We find the standard deviation of this
error, the standard error of prediction
(syx), using the following formula:

s y x s y 1 r

19

Prediction and Residuals

20

A predicted score (Y) syx yields a range


of scores within which a persons true
score on the predicted variable lies
If the standard error of prediction may be
interpreted as the standard deviation of
residuals, what are the odds that a
persons true score lies between Y syx?

21

The standard error of prediction for


percent body fat estimated using the
skinfold method is 3.5%
If a person has their percent body fat
estimated at 12%, between what two
values does their true body fat lie (95%
probability)?

22

Which of the following will more


precisely predict job performance?

A: r = 0.168

B: r = 0.686

23

Sample SPSS Output


Here is the SPSS output for regressing
Work Simulation Job Performance
(Dependent Variable) against Supervisor
Ratings (Independent Variable)
Coefficientsa

Model
1

(Constant)
Supervisor Ratings

Unstandardized
Coefficients
B
Std. Error
-1.156
.675
.033
.016

Standardized
Coefficients
Beta
.168

t
-1.712
2.053

Sig.
.089
.042

a. Dependent Variable: Work Simulation Job Performance

24

This information can be used to create a


prediction (regression) equation for
predicting work performance of future
applicants from supervisor ratings
Y = 1.156 + 0.033 X

25

Work Simulation Job Performance may


also be predicted from Arm Strength
Here is the SPSS output:
Coefficientsa

Model
1

(Constant)
Arm Strength (lbs)

Unstandardized
Coefficients
B
Std. Error
-4.095
.392
.055
.005

Standardized
Coefficients
Beta
.686

t
-10.454
11.353

Sig.
.000
.000

a. Dependent Variable: Work Simulation Job Performance

26

This information can be used to create a


prediction (regression) equation for
predicting work performance of future
applicants from supervisor ratings
Y = 4.095 + 0.055 X

27

We now have two regression equations for


predicting Work Simulation Job
Performance
Which is the better equation for accurate
prediction?
To determine this, we must examine the
standard error of prediction for each
equation
28

Standard error of prediction using Supervisor Ratings:


Model Summary
Model
1

R
.168a

R Square
.028

Adjusted
R Square
.022

Std. Error of
the Estimate
1.66078

a. Predictors: (Constant), Supervisor Ratings

Standard error of prediction using Arm Strength:


Model Summary
Model
1

R
.686a

R Square
.471

Adjusted
R Square
.467

Std. Error of
the Estimate
1.22582

a. Predictors: (Constant), Arm Strength (lbs)

Which is the better equation?


29

Multiple Prediction
A prediction formula using a single
measure X is usually not very accurate for
predicting a person's score on measure Y
Multiple correlation-regression
techniques allow us to predict score Y
using several X scores

30

The general form of a two predictor


multiple regression equation is:
Y' = a + b1X1 + b2X2

31

An example of multiple correlationregression is the prediction of percent


body fat from multiple skinfold
measurements
DB (g/cc) = 1.0994921 - 0.0009929 ( 3SKF)
+ 0.0000023 (3SKF)2 0.0001392 (age)

32

Next Class
Chapters 9 & 11
Mock Proposals in class!

33

You might also like