You are on page 1of 64

Correlation Analysis

Introduction
So far, we have confined ourselves to such series where various items assumed different values of one variable. There can be, however, such series also where each item assumes the values of two or more variables. For example, heights and weights of a group of persons.

If, besides heights and weights, the chest measurements were also taken, each member of the group would assume three values relating to three different variables. In such situations, sometimes the values of the various variables are inter-related. Similarly, if data are collected about the prices of a commodity and the quantities sold at different prices, two series would be obtained. In these two series we are again likely to find some relationship.

Such relationships can be find in many other types of series also. The term correlation (or co-variation) indicates the relationship between two (or more) such variables in which with changes in the value of one variable, the value of the other variable also changes.

Definition
If two or more quantities vary in sympathy so that movements in the one tend to be accompanied by corresponding movements in the other(s) then they are said to be correlated.

Utility
The study of correlation reduces the range of uncertainty associated with decision making. Correlation analysis is very helpful in understanding economic behavior. Correlation study helps in identifying such factors which can stabilize a disturbed economic situation.

Utility
Correlation study helps to estimate the likely change in a variable with a particular amount of change in a related variable. (Here we take help of regression analysis.) Inter-relationship studies between different variables are very helpful tools in promoting research and opening new frontiers of knowledge.

There can be correlation between two variables due to any one or more of the following reasons: Cause-effect relationship between the variables. Both the correlated variables are being affected by a third variable. Related variables might be mutually affecting each other so that neither of them could be designated as a cause or effect.

There can be correlation between two variables due to any one or more of the following reasons: Correlation may be due to random or chance factors. There might be a situation of nonsense or spurious correlation between the two variables.

Types of Correlation
Positive or negative correlation Simple, multiple or partial correlation Linear or non-linear correlation

Positive or Negative Correlation


There may be positive, negative, or nil (zero) correlation between two variables.

Simple, Multiple or Partial Correlation


In simple correlation we study only two variables say price and demand. In multiple correlation, we study together the relationship between three or more factors like production, rainfall and use of fertilizers. In partial correlation, though more than two factors are involved, but correlation is studied only between two factors and the other factors are assumed to be constant.

Linear or Non-Linear Correlation


Correlation may be also classified on the basis whether it is linear or non-linear (curvilinear).

Methods of Studying Correlation


1. 2. 3. 4. 5. Scatter Diagram Correlation Graph Coefficient of Correlation Coefficient of Correlation by Rank Differences Coefficient of Concurrent Deviation

Scatter Diagram
It indicates the scatter of various points, when the two variables are represented on the two axes. These points are not in any mathematical relationship and as such they only indicate the trend of the data.

Positive Correlation
18
16 14

12

10

0 0 10 20 30 40 50 60 70

Negative Correlation
35 30

25

20

15

10

0 0 5 10 15 20 25 30 35

No Correlation
35 30

25

20

15

10

0 0 5 10 15 20 25 30 35 40

Correlation Graph
Graph can also be used to study correlation. Graph discloses whether there is any relationship, and if there is, whether it is positive or negative.

Positive Correlation
80
70

60

50

40

Price per quintal (in Rs.) Supply ('000 quintal)

30

20

10

0 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981

Negative Correlation
80
70 60

50

40

Price per quintal (in Rs.) Supply ('000 quintal)

30

20

10

0 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981

20

40

60

80

100

120

140

160

0 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 Supply ('000 quintal) Price per quintal (in Rs.)

No Correlation

Perfect Positive Correlation


180 160

140

120

100

80

60

40

20

0 0 2 4 6 8 10 12 14 16 18

Perfect Negative Correlation


180
160 140

120

100

80

60

40

20

0 0 2 4 6 8 10 12 14 16 18

Karl Pearsons Coefficient of Correlation


where dx = deviations of x series from mean of x series dy = deviations of y series from mean of y series n = number of pair of observations x = standard deviation of x series y = standard deviation of y series

Karl Pearsons Coefficient of Correlation

When deviations are taken from assumed mean

When deviations are taken from actual mean

Find out the coefficient of correlation between the sales and expenses of the following 10 firms (figures in '000 Rs.)
Firms: Sales: Expenses: 1 2 3 4 5 6 7 8 9 10 50 50 55 60 65 65 65 60 60 50 11 13 14 16 16 15 15 14 13 13

D I R E C T M E T H O D

X
50 50 55 60 65 65 65 60 60 50 58 =6 r = 0.787

dx dy dxdy dx^2 dy^2 11 -8 -3 24 64 9 13 -8 -1 8 64 1 14 -3 0 0 9 0 16 2 2 4 4 4 16 7 2 14 49 4 15 7 1 7 49 1 15 7 1 7 49 1 14 2 0 0 4 0 13 2 -1 -2 4 1 13 -8 -1 8 64 1 14 70 360 22 = 1.48

S H O R T C U T

M E T H O D

X Y XY X^2 Y^2 50 11 550 2500 121 50 13 650 2500 169 55 14 770 3025 196 60 16 960 3600 256 65 16 1040 4225 256 65 15 975 4225 225 65 15 975 4225 225 60 14 840 3600 196 60 13 780 3600 169 50 13 650 2500 169 580 140 8190 34000 1982

METHOD OF DEVIATIONS FROM ACTUAL MEAN


X 50 50 55 60 65 65 65 60 60 50 58 Y dx dy dxdy dx^2 dy^2 11 -8 -3 24 64 9 13 -8 -1 8 64 1 14 -3 0 0 9 0 16 2 2 4 4 4 16 7 2 14 49 4 15 7 1 7 49 1 15 7 1 7 49 1 14 2 0 0 4 0 13 2 -1 -2 4 1 13 -8 -1 8 64 1 14 70 360 22

METHOD OF DEVIATIONS FROM ASSUMED MEAN X

50 50 55 60 65 65 65 60 60 50 Ax = 57

dx dy dxdy dx^2 dy^2 11 -7 -2 14 49 4 13 -7 0 0 49 0 14 -2 1 -2 4 1 16 3 3 9 9 9 16 8 3 24 64 9 15 8 2 16 64 4 15 8 2 16 64 4 14 3 1 3 9 1 13 3 0 0 9 0 13 -7 0 0 49 0 Ay = 13 10 10 80 370 32

Mathematical Properties of Coefficient of Correlation


It lies between -1 and +1. It cannot exceed unity. It is not affected by change of scale or origin.

The following are the monthly figures of the advertising expenditure and sales of a firm. It is generally found that advertising expenditure has its impact on sales after two months. Allowing for this time lag, calculate the coefficient of correlation.
Month Jan Feb Mar Apr May Jun Ad. Exp. 50 60 70 90 120 150 Sales 1200 1500 1600 2000 2200 2500 Month Jul Aug Sep Oct Nov Dec Ad. Exp. 140 160 170 190 200 250 Sales 2400 2600 2800 2900 3100 3900

Month Ad. Exp. Solution: 50 Adjustment required: Jan Feb 60 Mar 70 Apr 90 May 120 Jun 150 Jul 140 Aug 160 Sep 170 Oct 190 Nov 200 Dec 250

Sales 1200 1500 1600 2000 2200 2500 2400 2600 2800 2900 3100 3900

D I R E C T

M E T H O D

Total: Average: SD: Correlation:

Y 50 1600 60 2000 70 2200 90 2500 120 2400 150 2600 140 2800 160 2900 170 3100 190 3900 1200 26000 120 2600 47.117 603.324 0.918 ANS

dx dy -70 -1000 -60 -600 -50 -400 -30 -100 0 -200 30 0 20 200 40 300 50 500 70 1300

dxdy (dx)^2 70000 4900 36000 3600 20000 2500 3000 900 0 0 0 900 4000 400 12000 1600 25000 2500 91000 4900 261000 22200 2220

(dy)^2 1000000 360000 160000 10000 40000 0 40000 90000 250000 1690000 3640000 364000

C h a n g e o f S c a l e

dx dy dxdy (dx)^2 (dy)^2 5 16 -7 -10 70 49 100 6 20 -6 -6 36 36 36 7 22 -5 -4 20 25 16 9 25 -3 -1 3 9 1 12 24 0 -2 0 0 4 15 26 3 0 0 9 0 14 28 2 2 4 4 4 16 29 4 3 12 16 9 17 31 5 5 25 25 25 19 39 7 13 91 49 169 120 260 261 222 364 12 26 22.2 36.4 4.712 6.033 0.918 ANS

C h a n g e o f O r i g i n

0 0 10 400 20 600 40 900 70 800 100 1000 90 1200 110 1300 120 1500 140 2300 700 10000 70 1000 47.117 603.324 0.918 ANS

dx -70 -60 -50 -30 0 30 20 40 50 70

dy -1000 -600 -400 -100 -200 0 200 300 500 1300

dxdy (dx)^2 70000 4900 36000 3600 20000 2500 3000 900 0 0 0 900 4000 400 12000 1600 25000 2500 91000 4900 261000 22200 2220

(dy)^2 1000000 360000 160000 10000 40000 0 40000 90000 250000 1690000 3640000 364000

Change of Scale & Origin

dx dy dxdy (dx)^2 (dy)^2 0 0 -7 -10 70 49 100 1 4 -6 -6 36 36 36 2 6 -5 -4 20 25 16 4 9 -3 -1 3 9 1 7 8 0 -2 0 0 4 10 10 3 0 0 9 0 9 12 2 2 4 4 4 11 13 4 3 12 16 9 12 15 5 5 25 25 25 14 23 7 13 91 49 169 70 100 261 222 364 7 10 22.2 36.4 4.712 6.033 0.918 ANS

In case of grouped data

where dx and dy are deviations from assumed mean

Age of Husbands (X-Series) Age of Wives (Y- 20-30 30-40 40-50 50-60 60-70 Total Series) 15-25 25-35 5 9 10 3 25 2 17 37

35-45 45-55
55-65 Total

1 20 -

12 4

2 16
4

5
2 7

15 25
6 100

44

24

Age of Wives (YSeries) Ay=40

y dy dx

Age of Husbands (X-Series) 20-30 30-40 40-50 50-60 60-70 Total X (Ax=45) (f) 25 35 45 55 65 -20 -10 0 10 20

fdy fdy^2

15-25 25-35 35-45 45-55 55-65 Total (f) r = 0.795

20 30 40 50 60

2000 1800 0 -20 5 9 1000 0 -10 10 0 0 0 1 0 10 20 -

5 20 fdx -100 -200 fdx^2 2000 2000

3-200 25 20 12 21600 1000 4 16 5 800 800 4 2 44 24 7 0 240 140 0 2400 2800

17 -340 6800 37 -370 3700 15 25 6 0 0

250 2500 120 2400

100 -340 15400 80 9200 8800

Coefficient of Correlation by the Method of Least Squares


will be discussed while studying regression analysis

Spearmans Rank Correlation Coefficient

where d = difference between ranks of the two series N = number of pair of observations

Calculate Spearmans Rank coefficient of correlation: Price of Tea 120 134 150 115 110 140 142 100 Price of Coffee 75 88 95 70 60 80 81 50

Solution: X Y Rx Ry D D^2 120 75 4 4 0 0 134 88 5 7 -2 4 150 95 8 8 0 0 115 70 3 3 0 0 110 60 2 2 0 0 140 80 6 5 1 1 142 81 7 6 1 1 100 50 1 1 0 0 D^2= 6

Spearmans Rank Correlation Coefficient


In case of equal ranks

where m = number of observations with equal ranks

The following data relate to the marks obtained by 10 students of a class in Statistics and Accounting. Calculate Spearmans Rank coefficient of correlation: Statistics Accounting 30 38 28 27 28 23 30 33 28 35 29 27 22 29 20 29 18 21 27 22

Solution:

X Y Rx Ry D D^2 30 29 6.5 9 -2.5 6.25 38 27 10 6.5 3.5 12.25 28 22 4 4.5 -0.5 0.25 27 29 2 9 -7 49 28 20 4 2 2 4 23 29 1 9 -8 64 30 18 6.5 1 5.5 30.25 33 21 8 3 5 25 28 27 4 6.5 -2.5 6.25 35 22 9 4.5 4.5 20.25 TOTAL 217.5

Solution:

Coefficient of Concurrent Deviation

where c = number of concurrent deviations N = number of deviations (one less than the
number of pair of observations)

Calculate coefficient of concurrent deviations for the following data: Supply 65 40 35 75 63 80 35 20 80 60 50 Demand 60 55 50 56 30 70 40 35 80 75 80

Solution:

X Y Dx Dy DxDy 65 60 40 55 - + 35 50 - + 75 56 + + + 63 30 - + 80 70 + + + 35 40 - + 20 35 - + 80 80 + + + 60 75 - + 50 80 - + TOTAL 9

From the data given below find i. Karl Pearsons coefficient of correlation ii. Spearmans rank correlation coefficient iii. Coefficient of concurrent deviation X Y 57 16 24 65 17 18 9 40 33 48 19 7 9 20 4 15 6 24 14 13

Karl Pearsons Coefficient: X Y dx dy dxdy 57 19 24.3 5.9 143.37 16 7 -16.7 -6.1 101.87 24 9 -8.7 -4.1 35.67 65 20 32.3 6.9 222.87 17 4 -15.7 -9.1 142.87 18 15 -14.7 1.9 -27.93 9 6 -23.7 -7.1 168.27 40 24 7.3 10.9 79.57 33 14 0.3 0.9 0.27 48 13 15.3 -0.1 -1.53 32.7 13.1 865.3

dx^2 590.49 278.89 75.69 1043.29 246.49 216.09 561.69 53.29 0.09 234.09 3300.1

dy^2 34.81 37.21 16.81 47.61 82.81 3.61 50.41 118.81 0.81 0.01 392.9 r = 0.760

Spearmans Coefficient: X Y Rx Ry 57 19 9 8 16 7 2 3 24 9 5 4 65 20 10 9 17 4 3 1 18 15 4 7 9 6 1 2 40 24 7 10 33 14 6 6 48 13 8 5

D^2 1 1 -1 1 1 1 1 1 2 4 -3 9 -1 1 -3 9 0 0 3 9 D^2= 36 r = 0.782

Concurrent Deviation: X Y Dx Dy DxDy 57 19 16 7 - + 24 9 + + + 65 20 + + + 17 4 - + 18 15 + + + 9 6 - + 40 24 + + + 33 14 - + 48 13 + TOTAL 8 r = 0.882

You might also like