You are on page 1of 49

Correlation-Regression

It deals with association between two or


more variables
Correlation analysis deals with
covariation between two or more
variables
Types
1. Positive or negative
Simple or multiple
Linear or non-linear

Methods of Measuring correlation


1. Graphic Method
2. Diagramatic Method- Scatter Diagram
3. Algebraic method
a. Karl Pearsons Coefficient of correlation
b. Spearmans Rank Co-efficient Correlation
c. Coefficient of Concurrent deviations
d. Least Squares Method

Karl Pearsons Coefficient of Correlation


dx dy
( Gamma) = ------------------------ dx2 dy2
dx dy
= ------------------------N xy
dx = x-xbar
dy = y- ybar
dx dy = sum of products of deviations from respective
arithmetic means of both series

Karl Pearsons Coefficient of Correlation


After calculating assumed or working mean Ax & Ay
dx dy ( dx) x( dy)
( Gamma) = ------------------------------- [ N dx2 - ( dx)2 x [ Ndy2 - ( dy)2 ]
dx dy = total of products of deviation from assumed
means of x and y series
dx = total of deviations of x series
dy = total of deviations of y series
dx2 = total of squared deviations of x series
dy2 = total of squared deviations of y series
N= No. of items ( no. of paired items

Karl Pearsons Coefficient of Correlation


After calculating assumed or working mean Ax & Ay
dx x dy
dx dy - ---------------N
( Gamma) = ------------------------( dx)2
( dy)2
[ dx2 - --------- ] x [ dy2 - ------------]
N
N

Assumptions of Karl Pearsons Coefficient of Correlation


1. Linear relationship exists between the variables
Properties of Karl Pearsons Coefficient of Correlation
1.value lies between +1 & - 1
2.Zero means no correlation
3. ( Gamma) = bxy X byx
Where bxy X byx are regression coefficicent
Merit
Convenient for accurate interpretation as it gives degree &
direction of relationship between two variables

Limitations
1. Assumes linear relationship , even though it
may not be
2. Method & process of calculation is difficult &
time consuming
3. Affected by extreme values in distribution

Probable Error of Karl Pearsons Coefficient of


Correlation
1- 2
Probable Error of ( Gamma) = 0.6745 -------N

Q7.Calculate coefficient of correlation for following data


65

63

67

64

68

62

70

66

68

67

69

71

68

66

68

65

69

66

68

65

71

67

68

70

X
Y

Ans
dx dy
( Gamma) = ------------------------ dx2 dy2
dx dy
= ------------------N xy

10

11

12

Su
mX

Xbar

65

63

67

64

68

62

70

66

68

67

69

71

800

66.67

68

66

68

65

69

66

68

65

71

67

68

70

811

67.58

-1.67

-3.67

0.33

-2.67

1.33

-4.67

3.33

-0.67

1.33

0.33

2.33

4.33

2.78

13.44

0.11

7.11

1.78

21.78

11.11

0.44

1.78

0.11

5.44

18.78

84.
67

-0.69

5.81

0.14

6.89

1.89

7.39

1.39

1.72

4.56

-0.19

0.97

10.47

40.
33

dy=y-ybar

0.42

-1.58

0.42

-2.58

1.42

-1.58

0.42

-2.58

3.42

-0.58

0.42

2.42

dy2

0.17

2.51

0.17

6.67

2.01

2.51

0.17

6.67

11.67

0.34

0.17

5.84

dx=x-xbar

dx2

dx.dy

dx dy

sum dx2*
sumdy2

dx2
dy2
coeff of
correlation =

3294.
9

57.40

0.70

38.
92

Q8. following information about age of husbands


& wives. Find correlation coefficient
23 27 28 29 30 31 33 35 36 39
Husband
Wife

18 22 23 24 25 26 28 29 30 32

( Gamma) =0.99

10

Sum
X

23

27

28

29

30

31

33

35

36

39

311

31.10

18

22

23

24

25

26

28

29

30

32

257

25.70

dx=xxbar

-8.10

-4.10

-3.10

-2.10

-1.10

-0.10

1.90

3.90

4.90

7.90

dx2

65.61

16.81

9.61

4.41

1.21

0.01

3.61

15.21

24.01

62.41

202.
9

dx.dy

62.37

15.17

8.37

3.57

0.77

-0.03

4.37

12.87

21.07

49.77

178.
3

dy=yybar

-7.70

-3.70

-2.70

-1.70

-0.70

0.30

2.30

3.30

4.30

6.30

dy2

59.29

13.69

7.29

2.89

0.49

0.09

5.29

10.89

18.49

39.69

dx dy

sum dx2* sumdy2

dx2 dy2
coeff of correlation
=

32078.4
9

179.10

1.00

158.
1

Xbar

Rank Correlation : some times variable are not


quantitative in nature but can be arranged in
serial order.
Specially while eading with attributes like
honesty , beauty , character , morality etc
To deal with such situations , Charles Edward
Spearman , in 1904 developed a formula for
obtaining correlation coefficient between ranks
of n individuals in two attributes under study , or
ranks given by two or three judges

Rank coefficient of correlation


2
6 d
(rho) = 1 - ------------------N3-N
6 d2
(rho) = 1 - ------------------2
N(N -1)
d2 = total of squared difference
N = number of items

Q9. ten competitors in a cooking competition are ranked


by three judges in the following way .by using rank
coorelation method find out which pair of judges have
nearest approach
P

10

10

10

10

Rp-Rq

Rq-Rr

dqr2

Rp-Rr

dpr2

-2

-3

-5

25

-3

-1

-4

16

10

36

-4

16

-4

16

36

10

-8

64

64

-1

10

64

-9

81

-1

10

-1

200

214

60

1200

1284

360

1.21

1.297

0.3636

-0.21

-0.297

0.636364

1000

6Sigma d2

N3-N

dpq2

990

6Sigma d2/N3-N

(rho)

Regression Analysis is the process of


developing a statistical model which is used
to predict the value of a dependant variable
by an independent variable
Application
Advertising v/s sales revenue
First used by Sir Francis Gatton in 1877 for
study of height of sons w.r.t height of fathers

Regression Analysis going back or to revert to


the former condition or return
Refers to functional relationship between x & y
and estimates of value of depebdent variable y
for given values of independeny variable x
Relationship between income of employees and
savings
Regression coefficients can be used to calculate ,
correlation coeffecient. ( Gamma) = bxy X
byx

Types of Regression
1. Simple & Multiple Regression
2. Total or Partial
3. Linear / Non-linear
Methods of Regression Analysis
1. Scatter Diagram
2. Regression Equations
3. Regression Lines
Regression of x on y y= a + bx
Regression of y on x x= a + by

Regression coefficients coefficient of regression


of x on y = coefficient of regression of x on y =
( x- x-) (y- y-) dx dy
bxy= ------------------= ------ (y- y-)2
dy2
coefficient of regression of y on x
( x- x-) (y- y-) dx dy
byx= ------------------= --------- (x- x-)2
dx2

Q2.From the data given below find


two regression coefficients
two regression equations
coefficient of correlation between marks in
Economics & statistics
most likely marks in statistics when marks in
Economics are 30
let marks in Economics be x and that in statistics
be y
Marks in Eco 25
Marks in Stat 43

28
46

35
49

32
41

31
36

36
32

29
31

38
30

34
33

32
39

Marks in
Eco

25

28 35

32 31 36 29 38 34 32 x

320 x-

32

Marks in
Stat

43

46 49

41 36 32 31 30 33 39 y

380 y-

38

Marks in
Eco

25

Marks in
Stat

43

dx=x- x=x-32

-7

dy=y- y=x-38

35
8

49
6

-4 3
8

11

0
3

x
2

320 x-

3 3 3 3 3 3 y
1 6 2 1 0 3 9

380 y-

3
2

3
1

-1 4

2
6

3
9

-3 6

3
8

3
4

-2 -6 -7 -8 -5 1

dx
dy

3
2

3
8

0
0

3
3

Marks in
Eco

25

28

35

32

31

36

29

38

34

32

320

x-

32

Marks in
Stat

43

46

49

41

36

32

31

30

33

39

380

y-

38

dx=x- x-=x32

-7

-4

-1

-3

dx

33

33

dy=y- y-=x38

11

-2

-6

-7

-8

-5

dy

dx2

49

16

16

36

dx2

140

dy2

25

64

121

36

49

64

25

dy2

398

dx dy

-35

32

33

24

21

48

10

dxd
y

-93

Regression coefficients coefficient of regression


of x on y = coefficient of regression of x on y =
( x- x-) (y- y-) dx dy -93
bxy= ------------------= ------- = ------ = -0.2337
(y- y-)2
dy2
398
coefficient of regression of y on x =
( x- x-) (y- y-) dx dy -93
byx= ------------------= ---------- = --------= -0.6643
(x- x-)2
dx2
140

regression of x on y
x-x- = bxy (y-y-)
x-32 = -0.2337(y-38)
= - 0.2337 y +0.2337 *38
= -0.2337y + 8.8806
x = -0.2337y +32 + 8.8806
x = -0.2337y +40.8806

Correlation Coefficient = bxy *byx


= -0.2337 *-0.6643 = 0.1552 = -0.394
Since byx & bxy are both negative

regression of y on x
y-y- = bxy (x-x-)
y-38 = -0.6643(x-32)
y -38= -0.6643x+0.6643*32
y = -0.6643x+38+0.6643*32
y = -0.6643x+38+21.2576
y = -0.6643x+59.2576

In order to estimate most likely marks in statistics


(y) when Economics (x) are 30 , we shall use the
line regression of y x viz
The required estimate is given by
y = -0.6643* 30+59.2576= -19.929+59.2576 =
=39.3286

Sum of Squares- x&y

(x )*(y)
SSxy = ( x-x- ) ( y-y- ) = = xy - -------------n
Sum of Squares xx
(x )
SSxx = ( x-x- )2 =x2 - ------------n

Sales &advt expenses in Rs.1000. Develop a regression model


advt

sales
92
94

930
900

97
98
100
102
104
105
105
107
107
110

1020
990
1100
1050
1150
1120
1130
1200
1250
1220

SSxy
b = -----------SSxx
y=a+bx
y= a+b x
y= n* a+b x
n* a = b x - y
y - b x
y
b x
a = ----------- = ------- - ------n
n
n

xi=

yi=

advt

sales

(yi-y-)

xy

92

930

8464

85560

94

900

8836

97

1020

9409

98

990

9604

(yi-y-)2

residual

y^=fits

yi-y^

( yi-y^)2

902.4

27.6

84600

940.54

-40.54

98940

997.75

22.25

97020

1016.8
2

-26.82
45.04

100

1100

10000

110000

1054.9
6

102

1050

10404

107100

1093.1

-43.1

119600

1131.2
4

18.76

117600

1150.3
1

-30.31

118650

1150.3
1

-20.31

128400

1188.4
5

11.55

133750

1188.4
5

61.55

134200

1245.6
6

-25.66

1335420

13059.
99

0.01

104
105
105

107
107
110
1221
x

x2

predict
ed

1150
1120
1130

1200
1250
1220
13060
y

10816
11025
11025

11449
11449
12100
124581
x2

xy

(yi-yc)

y^-y-

(y^-y-)2

xi=

yi=

advt

sales

x2

(yi-y-)

xy

- 2

(yi-y )

predicted

residual

y^=fits

yi-y^

( yi-y^)2

25069.44
92

930

8464

85560

-158.33

94

900

8836

84600

-188.33

97

1020

9409

98940

-68.33

98

990

9604

97020

100

1100

10000

102

1050

104

y^-y

- 2

(y^-y )

761.76

185.93

34571.20

1643.49

147.79

21842.87

902.4

27.6

940.54

-40.54

4669.44

997.75

22.25

495.06

-90.58

8205.34

-98.33

9669.44

1016.82

-26.82

719.31

-71.51

5114.16

110000

11.67

136.11

1054.96

45.04

2028.60

-33.37

1113.78

10404

107100

-38.33

1469.44

1093.1

-43.1

1857.61

4.77

22.72

1150

10816

119600

61.67

3802.78

1131.24

18.76

351.94

42.91

1840.98

105

1120

11025

117600

31.67

1002.78

1150.31

-30.31

918.70

61.98

3841.11

105

1130

11025

118650

41.67

1736.11

1150.31

-20.31

412.50

61.98

3841.11

107

1200

11449

128400

111.67

12469.44

1188.45

11.55

133.40

100.12

10023.35

107

1250

11449

133750

161.67

26136.11

1188.45

61.55

3788.40

100.12

10023.35

110

1220

12100

134200

131.67

17336.11

1245.66

-25.66

658.44

157.33

24751.68

1221

13060

124581

1335420

0.00

138966.667

13059.99

0.01

13769.21

-0.01

125191.6

x2

xy

35469.44

(yi-yc)

1221
x- = ------------- = 101.75
12
(x *y)
1221*13060
SSxy = xy - ------------= 1335420 - -------------- =6565
n 12
(x )2
( 1221)2
SSxx = x2 - -------------= 124581 - ------- =
= 344.25
n
12

SSxy
6565
b = ------------- = ----------------= 19.0704
SSxx
344.25
y=a+bx
y= a+b x
y= n* a+b x
n* a = b x - y
y - b x
y
b x 13060 19.0704*1221
a = ----------- = ------- - ------- = ---------- - -------------n n
n
12
12

= - 852.08

equation for simple regression line


y= a+bx
y= -852.08+ 19.0704 x
for regression of y on x

For testing the Fit


yi = yi- value of y recorded value in the given data
y- = Mean ( Average )of y
y^ = Predicted Values from regression line
deviation = (yi- y-) = difference in actual value of y from
mean
Residuals = (yi- y^)= gap ( error , difference ) between
actual value of y & predicted value calculated from
regression line
Deviation of predicted value from mean = (y^- y-)
a = intercept on y -axis
b= slope of regression line

total sum of squares = SST = (yi-y-)2

regression sum of squares = SSR = (y^- y-)2


Error sum of squares = SSE = (yi-y^)2
SSR
coefficient of determination = 2= ------SST

SSE
Standard Error of Estimate =Syx= ---------------n-2
In order to to determine whether a significant linear relationship
exists between independent variable x and dependent variable y we
perform whether population slope is zero
b-
t= ---------Sb
Syx
Sb = Standard error of b= ---------- SSxx

H0:Slope of thr regression line is zero


H1-Slope of the regression line is not zero

Syx
Sb = Standard error of b= ---------- SSxx

SSE
Syx= Standard Error of Estimate =-------n-2
(yi-y^)2 13769.21
= -------- = ------------ = 1376.92 = 37.1068
n-2
10-2
(x )2
(1221)2
SSxx = x2 - -------- = 124581 - -------= 344.25
n
12

Syx
Sb = Standard error of b= ---------- SSxx
b-
19.07-0
t= ---------- = ------------------------------- = 9.53
Sb
37.1068/( 344.25)
As calculated value of t is more than table
value of t for 12-2 = 10 degrees of freedom
Null hypothesis is rejected

Coefficient of Determination Definition


The Coefficient of Determination, also known as R Squared,
is interpreted as the goodness of fit of a regression.
The higher the coefficient of determination, the better the
variance that the dependent variable is explained by the
independent variable.
The coefficient of determination is the overall measure of
the usefulness of a regression.
For example,r2 is given at 0.95. This means that the
variation in the regression is 95% explained by the
independent variable. That is a good regression.

The Coefficient of Determination can be


calculated as the Regression sum of squares,
SSR, divided by the total sum of squares, SST
SSR
Coefficient of Determination 2 = ---------SST

Campus Overview
Ahmedabad

907/A Uvarshad,
Gandhinagar
Highway, Ahmedabad
382422.

Kolkata

Infinity Benchmark,
10th Floor, Plot G1,
Block EP & GP,
Sector V, Salt-Lake,
Kolkata 700091.

Mumbai

Goldline Business Centre


Linkway Estate,
Next to Chincholi Fire
Brigade, Malad (West),
Mumbai 400 064.

Thank You

You might also like