You are on page 1of 37

CoCo-relation Analysis

Session 10

Introduction
 Is

there an association between two or more variables? If yes, what is form and degree of that relationship? the relationship strong or significant enough to be useful to arrive at a desirable conclusion? the relationship be used for predictive purposes, that is, to predict the most likely value of a dependent variable corresponding to the given value of independent variable or variables?

 Is

 Can

Definition
 Correlation

exists between two variables when one of them is related to the other in some way

Assumptions
1)

The sample of paired data (x,y) is a x,y) random sample. sample. The pairs of (x,y) data have a x,y) normal distribution. distribution. bivariate

2)

Methods of Correlation Analysis


In this chapter, the following methods of finding the correlation coefficient between two variables x and y are discussed:
 Scatter  Karl

Diagram method Rank Correlation method Least-squares

Pearsons Coefficient of Correlation method

 Spearmans  Method of

Figure shows how the strength of the association between two variables is represented by the coefficient of correlation.
Negative Correlation 1.00 Strong negative correlation Perfect negative correlation 0.50 Weak negative correlation 0 Positive Correlation + 0.50 + 1.00

Moderate negative correlation

Weak positive Strong positive correlation correlation Perfect positive Moderate positive No correlation correlation correlation

Definition
 Scatterplot

(or scatter diagram)

is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. axis. Each individual (x,y) pair is plotted as a single point. point.

Scatter Diagram of Paired Data

Scatter Diagram of Paired Data

Positive Linear Correlation


y y y

x
(a) Positive (b) Strong positive

x
(c) Perfect positive

Scatter Plots

Negative Linear Correlation


y y y

x
(d) Negative (e) Strong negative

x
(f) Perfect negative

Scatter Plots

No Linear Correlation
y y

x
(g) No Correlation

x
(h) Nonlinear Correlation

Scatter Plots

Karl Pearson's Correlation Coefficient

Definition Karl Pearson [For Classified data]


dx = Xi A dy = Yi A

 Correlation Coefficient r

7fdxdy - (7fdx)(7fdy)/N r= (SDx) (SDy) SDx = fdx ( fdx)/N SDy = fdy ( fdy)/N

Example 1


Find coefficient of correlation between height (X) and weight (Y) from the following data. Also, obtain the data. two regression line. line. Height Weight 61 62 65 55 68 70 62 60 60 53

Answer 1
r

= 0.65 63.2 = 0.32(Y 60) 63. 32(Y 60) 60 = 1.33(X 63.2) 33(X 63.

X Y

Example 2
Given the two regression lines  4x 5y + 33 = 0  20x 9y 107 = 0 20x And variance of x being 9, calculate 1) Mean x and Mean y 2) Correlation Coefficient of x & y 3) SD of y

Answer 2
 Mean r

X = 13, Mean Y = 17 13,

= 0.6 of y = 16

 Variance

Rounding the Linear Correlation Coefficient r


 

Round to three decimal places Use calculator or computer if possible

Properties of the Linear Correlation Coefficient r


1. -1 e r e 1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship.

Interpreting the Linear Correlation Coefficient




r = + 1 : Perfect Positive Correlation r = 1 : Perfect Negative Correlation r = 0 : Uncorrelated Correlation Standard Error (S.E.) = (1 r )/N, N = pair of (S. )/N, observations Probable Error = 0.6745 X S.E.
2

Spearmans Rank Correlation Coefficient


This method is applied to measure the association between two variables when only ordinal (or rank) data are available. In other words, this method is applied in a situation in which quantitative measure of certain qualitative factors such as judgment, brands personalities, TV programmes, leadership, colour, taste, cannot be fixed, but individual observations can be arranged in a definite order (also called rank). The ranking is decided by using a set of ordinal rank numbers, with 1 for the individual observation ranked first either in terms of quantity or quality; and n for the individual observation ranked last in a group of n pairs of observations. Mathematically, Spearmans rank correlation coefficient is defined as: where R = rank correlation coefficient R1 = rank of observations with respect to first variable R2 = rank of observations with respect to second variable d = R1 R2, difference in a pair of ranks n = number of paired observations or individuals being ranked The number 6 is placed in the formula as a scaling device, it ensures that the possible range of R is from 1 to 1. While using this method we may come across three types of cases.

Advantages
 This  This

method is easy to understand and its application is simpler than Pearsons method. method is useful for correlation analysis when variables are expressed in qualitative terms like beauty, intelligence, honesty, efficiency, and so on. method is appropriate to measure the association between two variables if the data type is at least ordinal scaled (ranked) sample data of values of two variables is converted into ranks either in ascending order or descending order for calculating degree of correlation between two variables.

 This  The

Disadvantages
 Values

of both variables are assumed to be normally distributed and describing a linear relationship rather than nonlinear relationship. large computational time is required when number of pairs of values of two variables exceed 30. method cannot be applied to measure the association between two variable grouped data.

A

 This

Rank Order Correlation


Hits 1 2 3 4 5 6 7 8 9 10 Rank 10 9 8 7 6 5 4 3 2 1 HR 3 4 5 1 7 6 2 10 9 8 Rank 8 7 6 10 4 5 9 1 2 3 D 2 2 2 -3 2 0 -5 2 0 2 D2 4 4 4 9 4 0 25 4 0 4

Rank Order Correlation, cont

Rho = 1- [6
Hits 1 2 3 4 5 6 7 8 9 10 Rank 10 9 8 7 6 5 4 3 2 1 HR 3 4 5 1 7 6 2 10 9 8 Rank 8 7 6 10 4 5 9 1 2 3 D 2 2 2 -3 2 0 -5 2 0 2

2) (D
D2 4 4 4 9 4 0 25 4 0 4

/N

2-1)] (N

Rho = 1- [6(58)/10(102-1)] Rho = 1- [348 / 10 (100 -1)] Rho = 1- [348 / 990] Rho = 1- 0.352 Rho = 0.648

N=10

(D2 = 58)

Pearsons r
Hits
1 2 3 4 5 6 7 8 9 10

HR
3 4 5 1 7 6 2 10 9 8

7xy
3 8 15 4 35 36 14 80 81 80

7xy/n - (7x/n)(7y/n) r= (SDx) (SDy)


r = 32.86 - (5.5) (5.5)/(3.03) (3.03) r = 35.86 - 30.25 / 9.09 r = 5.61 / 9.09 r = 0.6172

7x/n 7x/n =5.5 = 5.5

7xy/n =32.86

Example 3


Compute the correlation coefficient:

Age of Age of wives husbands 15-25 25-35 35-45 45-55 55-65 65-75 Total 15253545556515-25 1525-35 2535-45 354545-55 55-65 5565-75 65Total 1 2 3 1 12 4 17 1 10 3 14 1 6 2 9 1 4 1 6 2 2 4 2 15 15 10 8 3 53

Sol.3 X Y dy d
15-25 1525-35 2535-45 354545-55 55-65 5565-75 65f fdx fdx fd fdxdy

15-25 15-

25-35 25-

35-45 35-

4545-55

55-65 55-

65-75 65f fdy fdy fd fdxdy

1525Sol.3 X 15-25 25-35 Y dy dx -2 -1 15-25 1525-35 2535-45 3545-55 4555-65 5565-75 65-

35-45 35-

4545-55

55-65 55-

65-75 65f fdy fdy fd fdxdy

0
1 10 3 -

+1 +2 +3
1 6 2 1 4 1 9 6 2 2 4

-2 -1 0 +1 +2 +3
f fdx fdx fd fdxdy

1 2 3

1 12 4 17

2 15 15 10 8 3 53

14

Sol.3 Y
15-25 1525-35 2535-45 3545-55 4555-65 5565-75 65-

15-25 15-

25-35 25-

35-45 35-

4545-55

55-65 55-

65-75 65f fdy fdy fd fdxdy

X dy d x -2 1 -1 2 0 +1 +2 +3 f fdx

-2
4 4

-1
1 12 4
0 2 12

0
1 10 3 0 0 0

+1 +2 +3
1 6 2 0 6 4

1 4 1 9 6 9 9 10 12 24 24
2 16 6

2 2
12 18

2 15

-4 0

8 15 0 10 32 27 92

6 16 0 8 32 24 86

15 -15 10 +10 8 +16 3 +9 53 +16 +1 0 98 86

3 -6 12 8 17 -17 17 14

14 0 0 0

4 12 36 30

fdx fd fdxdy

15Sol.3 X 15-25 Y dy d -2

25-35 25-

35-45 35-

4545-55

55-65 55-

65-75 65f fdy fdy fd fdxdy

-1
1 12 4
0 2 12

0
1 10 3 0 0 0

+1 +2 +3
1 6 2 4 0 6

15-25 1525-35 2535-45 3545-55 4555-65 5565-75 65-

-2 -1 0 +1 +2 +3
f fdx

1 2 3

4 4

1 4 1 6 12 24 24
2 16 6

2 2
12 18

2 15

-4 0

8 15 0 10 32 27 92

6 16 0 8 32 24 86

15 -15 10 +10 8 +16 3 +9 53 +16 +1 0 98 86

17

-6 12 8

r = 0.907
-17 17 14 0 9 0 0 9 10

14

4 12 36 30

fdx fd fdxdy

Is there a significant linear correlation?


Data from the Garbage Project
x Plastic (lb) y Household

0.27 1.41 2 3

2.19 3

2.83 6

2.19 4

1.81 2

0.85 1

3.05 5

Is there a significant linear correlation?


Data from the Garbage Project
x Plastic (lb) y Household

0.27 1.41 2 3

2.19 3

2.83 6

2.19 4

1.81 2

0.85 1

3.05 5

Plastic 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05

Household 2 3 3 6 4 2 1 5

Is there a significant linear correlation?


Data from the Garbage Project
x Plastic (lb) y Household

0.27 1.41 2 3

2.19 3

2.83 6

2.19 4

1.81 2

0.85 1

3.05 5

r = 0.842 R2 = 0.71

Correlation Analysis Vs. Regression Analysis

Correlation means the relationship between two or more variables to measure the direction and degree of linear relationship. relationship. Regression analysis aims at establishing the functional relationship. relationship.

Correlation does not imply causation




Correlation does not imply causation is a phrase used in the sciences and statistics to emphasize that correlation between two variables does not imply there is a cause-and-effect relationship cause-andbetween the two. Its converse, correlation proves two. causation, is a logical fallacy by which two events that occur together are claimed to have a causecauseandand-effect relationship. For example, relationship. A occurs in correlation with B.  Therefore, A causes B.


This is a logical fallacy because there are at least four other possibilities: possibilities:

Correlation does not imply causation


1. 2. 3.

4.

B may be the cause of A, or some unknown third factor is actually the cause of the relationship between A and B, or the "relationship" is so complex it can be labeled coincidental (i.e., two events occurring at the same (i. time that have no simple relationship to each other besides the fact that they are occurring at the same time). time). B may be the cause of A at the same time as A is the cause of B (contradicting that the only relationship between A and B is that A causes B). B). This describes a self-reinforcing system. selfsystem.

You might also like