You are on page 1of 12

Unit 2 Lecture 6 Correlation

Meaning
in various types of analysis discussed so far, we have confined ourselves to such series where various items assumed different values of one variables. the analysis and comparison of such series is done with the help of measures of central tendency, dispersion or skewness. however, there can be such series also where each item assumes the values of two or more variables. for example, if the heights and weights of a group of persons are measured we shall get such series where each member of the group would assume two values one relating to height and the other relating to weight. if besides heights and weights, the chest measurements were also taken then each member of the group would have three values relating to three different variables. in such cases we can calculate averages, dispersion or skewness etc. but some times it appears that the values of the various variables so obtained are inter related. the term correlation (or co variance! indicates the relationship between two such variables in which with changes in the values of one variable, the values of the other variable also changes.

Definition
"orrelation analysis attempts to determine the degree of relationship between variables. "orrelations means that between two series or groups of data there exists some casual connection. #hus, it makes it clear that the term correlation refers to the study of relationship between two or more variables.

Utility
#he utility of the study of correlation is immense both in physical as well as social sciences. (i! (ii! (iii! (iv! $t reduces the range of uncertainty associated with decision making. $n understanding the economic behaviour. $t helps in locating such variables on which other variables depend. $t helps in identifying such factors which can stabili%e a disturbed economic situation. $t also helps us to estimate the likely change in a variable with a particular amount of change in related variable.

Is Correlation Cause and Effect Relationship?


#hough the term correlation is used in the sense of mutual dependence of two or more variables, yet it is not necessary that it should always be so. &ven a high degree of correlation between two variables does not necessarily indicate a cause and effect relationship between them. #here could be a correlation between two variables due to many, others reasons vi%. 'ue to random or chance factor (n account of spurious correlation between the two variables. )elated variables might be mutually affecting each other so their neither of them could be designated is a cause or effect. "orrelated variables could get affected by a third variable or by more than one variable.

Types of Correlation
a correlation could be* (a! (b! (c! +ositive or ,egative -imple, .ultiple or +artial Linear or ,on linear

Positive and Negative Correlation


(i Correlation can either !e positive or negative /hen the values of two variables move in the same direction i.e. when an increase in the value of one variable is associated with an increase in the value of other variable, and a decrease in the value of one variable is associated with the decrease in the value of the other variable, correlation is said to be 0positive0.

$f, on the other hand, the values of two variables more in opposite directions, so that with an increase in the value of one variable the value of the other variable decreases, and with the decrease in the value of one variable the value of the other variable increases, correlation is said to be 0negative0. "i#ple$ Multiple and Partial correlation

(ii

$n a simple correlation we study only two variables say price and demand. $n multiple correlation we study together the relationship between three or more factors like productivity rainfall and the use of fertili%ers. $n partial correlation though more than two factors are involved but correlation is studied only between two factors and ther other factors are assumed to be constant. %inear and Non&%inear Correlation

(iii

#he correlation between two variable is said to be linear if corresponding to a unit change in the value of one variable there is a constant change in the value of the other variable to y 1 a 2 bx

$f a 1 3, the relation becomes y 1 bx. $n such case the values of the variables are in constant ratio. #he correlation between two variable is said to be non linear or curvilinear if corresponding to a unit change in the value of one variable the other variable does not change at a constant rate but at a 0fluctuating0 rate.

Methods of "tudying Correlation


#he various methods by which correlation studies are made are* (i! (ii! (iii! (iv! (v! (vi! -catter 'iagram "orrelation 4raph "oefficient of "orrelation "oefficient of "orrelation by )ank 'ifference "oefficient of "oncurrent 'eviation .ethod of Least -5uare

"catter Diagra#
'iagrams and graphs can be drawn to have an idea about the relationship between two or more variables -uppose in a data each item assumes one value in each of the two variables height and weight, if height is represented by 6x6 and weight by 6y6.

7urther if x variable is plotted on the hori%ontal scale and y variable on the vertical scale, only one point would be plotted for each set of two values. (n the hori%ontal scale it would the height and on vertical scale the weight. $n this way the whole data (say! about 833 people can be plotted. $f these point show some trend either upward or downward, the two variables are said to be correlated. 9owever, if the plotted points do not show any trend, the two variables are not correlated. $f the trend of the points is upward rising from left bottom and going upwards towards the right top, correlation is positive. $f, on the other hand, the tendency is reversed so that the points show a downward trend from left top to the right bottom, the correlation is negative.

(ig) * Positive Correlation

(ig) * Negative Correlation

(ig) * No Correlation

Merits
(i! (ii! (iii! $t is very easy to draw a scatter diagram $t can be easily understood and interpreted :alues of extreme items do not affect this methods

De#erits
(i! $t only gives a visual picture of the relationship of two variables.

(ii! $t does not give an idea about the precise degree of relationship and is not amenable to mathematical treatment.

Correlation 'raph
4raph can also be used to study correlation between two series.

4raphs disclose whether there is any relationship between the two variables and, if it is, whether it is positive or negative. $f the graph, the two curves representing the two variables show similar tendency, it is an indicating of positive correlation. $f, on the other hand, the two curves more in a different directions, correlation is negative. "upply (,--- of n) <= <3 @= @3 2= 23 8= 83 = 3 .ear ;> ;? ;; ;6 ;= ;< ;@ ;2 ;8 ;3

Price +tn) (in Rs) ;2 6< =6 <? <3 @2 2< 86 ? 3

Coefficient of Correlation
Purpose of Calculation
"oefficient of correlation is calculated to study the extent or degree of correlation between two variables. #he presence of correlation between two variables does not mean that their relationship is functional or constant. $f the value of a variable is known it is not always possible to obtain the exact value of the other variable. #his can be done only where there is linear relationship between two variables.

Perfect Correlation
$f the relationship between two variables is such that with an increase in the value of one, value of other increases or decreases, in a fixed proportion, correlation between them is said to be perfect. #he presence of correlation between two variables does not mean that their relationship is functional or constant. $f the value of a variable is known it is not always possible to obtain the exact value of the other variable. #his can be done only where there is linear relationship between two variables.

Perfect Correlation

$f the relationship between two variables is such that with an increase in the value of one, value of other increases or decreases, in a fixed proportion, correlation between them is said to be perfect. 9owever, if the two series move in reverse directions, and the variations in their values are always proportionate, the two series are said to be perfectly negative correlation. $t is also likely that there may be no relationship between the variations of the two series in which case there is said to be no correlation between them.

+erfect +ositive "orrelation

+erfect ,egative "orrelation

#he value of coefficient of correlation always varies between the two limits of 28 and 8. /hen there is perfect positive correlation, its value is 28 and when there is perfect negative correlation its value is 8.

Calculation of coefficient of Correlation


Aarl +earsons has given a formula for calculating the coefficient of correlation. Bccording to him.

= i i ................................................................................. (i) nxy


x y

$t is clear that if positive.

xy

is positive, the coefficient of correlation would also be

$f on the other hand, be negative.

xy

is negative, coefficient of correlation would also

Direct Method
r= N x 2 ( x) N y 2 ( y)
2

N xy ( x )( y )

..(1)

E/a#ple "alculate the coefficient of correlation between the values of x and y from the data given below. :alues of x :alues of y * * 6=, 6;, 66, 6?, 6;, 6=, 6;, 6?, @?, ;2, @>, ;2, ;3, 6>, ;2 ;8

"olution Calculation of Coefficient of Correlation (0 6= 66 6; @; @? @> ;3 ;2 (. 6; @? @= @? ;2 ;2 6> ;8 (01 <22= <@=6 <<?> <<?> ;62< <;68 8>33 =8?< 2 ( x ) = 37028 (.1 <<?> <62< <22= <62< =8?< =8?< <;68 =3<8 2 ( y ) = 38132 (0. <@== <<?? <@== <==6 <?>6 <>6? <?@3 =882

x = 544

y = 552
r = = =

( xy ) = 37560

(N x
192.0

n ( XY ) ( x )( y )
2

( x)
2

)( N y

y)

{(8 37028 ( 544) )} {8 38132 ( 552) }


2

8 37560 544 552

288 352

192 192 = = 0.60 16.97 8.76 318.4

"hort&cut Method
$n this method, while calculating coefficient of correlation, the actual means of the two series have to be calculated. #hen deviations of various items of the two series from their respective mean values care found out. $f the means of the series are in fractions like 8?3.3> or 3.2?<, then the task of calculating deviations and then finding out their values of x 2 and y2 and xy becomes very tedious. $n such situations, it is advisable to assume a mean and modify the formula for the calculation of the coefficient of correlation.

#he modify formula for the calculation of coefficient of correlation, when the deviations of x and y are taken from assumed mean values is*
r= n dx 2 ( dx ) ndy 2 ( dy )
2

ndxdy ( dx )( dy )

wheredx = ( x A)

dy = ( y - B)

) and B are ass#(ed (eans !' t$e x - series and y - series, res%e"ti&e y t$is "!# d a s! be written as : r=

dx

dxdy ( dx )
N

( dx )( dy )
n

dy

( dy )
N

$f the deviations of x and y are taken from their respective means (arithmetic! dx = 0, dy = 0, then then
r=

dxdy
dx 2 dy 2

E/a#ple "alculate the coefficient of correlation between the values of x and r from the data. :alues of x :alues of y "olution 0 8= 86 8; 8? 8> 23 n1 6 Dev) of / fro# ass) Mean *2 (d/ 2 8 3 28 22 2@ Dev) of y fro# dy ass) #ean (3i45 dy < ?3 2<3 ? 8 ;= 2@= ; 3 63 223 < dx dy 8 <3 3 3 dxdy 83 N < @3 2 r = 2 > 23 23 < dy 2 ( dx ) 2 2 2 dy +13 = dx =19 dx N dy N 3 13 39 45.5 6 = = 17.5 120.83 ( 3) 2 149 (13) 2 19 6 6 45.5 45.5 = = = 0.99 4.18 10.99 45.94 d/1 . dy1 6< <> 86 3 < 86
2

* *

8=, ?3,

86, ;=,

8;, 63,

8?, <3,

8>, @3,

23 23

d/dy 86 ; 3 3 < 82

)(

dx = +3

dy

=149

dxdy = 39

Pro!a!le Error of the Coefficient of correlation and its Interpretation


Bfter the calculation of coefficient of correlation the next thing is find out the extent to which it is dependable. 7or this purpose the probable error of the coefficient of correlation is calculated. $f the probable error is added to and subtracted from the coefficient of correlation it would give two such limits where the value of coefficient of correlation would lie. $t means that if from the score universe, another set of samples was selected on the basis of random sampling, he coefficient of correlation between the two variables in this new sample would not fall outside the limits so established. #he formula for finding out the probable error is*
= 0.6745 1r2 n

/here r stand for coefficient of correlation and n for the number of pair observations.

Merits and %i#itations of Coefficient of Correlation


#he chief merit of this coefficient is that it not only gives an idea about the covariance of the two series but also indicates the direction of relationship.

%i#itation
$t assumes a linear relationship between variables even though it may not have. $t is liable to be mis interpreted on a high degree of correlation does not necessarily mean very close relationship between the variables. $t is tedious to calculate. $t is unduly affected by the values of extreme items.

Ran6 Correlation Coefficient


$f it is desired to have a study of association between two such attributes which are incapable of measurement such as honesty, intelligence, morality etc., Aarl +earsonCs coefficient of correlation cannot be calculated as these attributes cannot be assigned definite values. $n such situations, we can study correlation between such attributes by a method known as -pearmanCs )ank "orrelation formula*

or or

r = 1 r = 1

6 d 2 n3 n

n( n 2 1)

6d 2

/here r denotes the -pearmanCs )ank "orrelation, d denotes the difference of the rank of the same individual in the two attribute, and n denotes the number of pairs.

E/a#ple #en students were given tests in &nglish and .athematics. #heir marks are given below* ,o. of -tudents .arks in &nglish .arks in .athematics * * * 8 ;? ;3 2 <3 63 @ =3 63 < == ;= = =2 6> 6 <> == ; 63 ;3 ? =< 6= > => 6= 83 =? 63

'etermine the -pearmanCs )ank "orrelation "oefficient. "olution "erial No) 8 2 @ < = 6 ; ? > 83 Mar6s in English Ran6 Mar6s in Mathe#atics Ran6 Diff) ( ;? 8 ;3 2.= 8.= <3 83 63 ?.= 8.= =3 ? 63 ?.= 3.= == = ;= 8 < =2 ; 6> < @ <> > == 83 8 63 2 ;3 2.= 3.= 2 =< 6 6= =.= 3.= 6 d 6 46.5 279 711 r = 1 = 1 = 1 = = 0 . 72 => @ 6= =.= 2.= 990 990 n n 2 < 1 10 10 2 1 =? 68 ; @ d1 2.2= 2.2= 3.2= 86.3 >.3 8.3 3.2= 3.2= 6.2= >.3 <6.=

n3 n 1 3 1 6 d 2 + 1 n 3 n + + n n 12 12 12 R= 2 N N 1

) (
3

2 2 1 3 1 646.5 + 1 2 3 2 + + 2 2 12 12 12 = 10 99 1 6{ 46.5 + 0.5 + 0.5 + 0.53} = = 1 0.291 = 0.71 990

) (

Ran6 Correlation
E7ual Ran6
$n some cases it may be found necessary to rank two or more individuals or eateries as e5ual. $n such cases it is customary to give each individual an average rank. #hus, if two individuals are ranked e5ual at fifth place, they are each given the 5 +6 rank , that is =.= while if there are ranked e5ual at fifth place, they are 2 5+6+7 = 6. given the rank 3 7urther, where e5ual ranks are assigned to some eateries, an adDustment in the above formula for calculating the rank coefficient of correlation is made. #he adDustment consist, of adding

1 ( n 3 n) 12

to the value of

, where

n stands for the number of items whose ranks are common. $f there are more than one such group of items whose ranks are common, this value is added as many times the number of such group. #he formula can thus be written

R = 1
E/a#ple

1 3 1 ( n n) + ( n 3 n) 12 12 N3 N

(btain the rank correlation coefficient between the variables E and F from the following pairs of observed values 0 =3 == 6= R* 2 <.= ;.= . 883 883 88= R1 8.= 8.= <.3 (R*&R1 3.= @.3 @.= D1 (R*&R1 1 3.2= >.33 82.2=

=3 == 63 =3 6= ;3 ;=

2 <.= 6.3 2 ;.= >.3 83.3

82= 8<3 88= 8@3 823 88= 863

;.3 >.3 <.3 ?.3 6.3 <.3 83.3

=.3 <.= 2.3 6.3 8.= =.3 3.3

2=.33 23.2= <.33 @6.33 2.2= 2=.33 3.33 8@<

E* =3 G@HI == G2HI 6= G2H F* 883 G2HI 88= G@H

1 1 1 6D 2 + ( n 3 n ) + ( n 3 n ) + + ( n 3 n ) 12 12 12 R =1 2 N ( N 1) 1 1 1 1 1 6134 + ( 33 3) + ( 2 3 2 ) + ( 2 3 2 ) + ( 2 3 2 ) + ( 33 3) 12 12 12 12 12 = 1 10(100 1) 6{134 + 2 + 0.5 + 0.5 + 0.5 + 2} = 1 990 6{139} = 1 990 = 1 0.845 = 0.155

You might also like