Professional Documents
Culture Documents
∑
n i =1
( xi − X ) ∑
n i =1
( yi − Y ) 2
cov( X , Y )
rxy =
σxσ y
∑ x y − nXY
i i
rxy = n
i =1
n
∑x ∑y
2 2
i − nX 2
i − nY 2
i =1 i =1
Result: -1≤rxy≤1.
When rxy=±1, we say that X and Y are perfectly
(positively or negatively) correlated. When
rxy=0, we say that X and Y are uncorrelated.
Notice that rxy=ryx.
Method of Least Squares and Regression
Lines
We assume that for two random variables X and
Y
Y=α+βX+e
where e is a random variable, called error term.
The above equation is called the linear
regression equation of Y on X. The rv Y is called
the dependent variable and the rv X is called the
independent variable. Here α and β are (usually
unknown) parameters and called the regression
coefficients. The coefficient α is the intercept
term and β is the slope coefficient. Since α and
β are usually unknown, we estimate these
coefficients on the basis of given set of
observations.
Let (x1,y1),…,(xn,yn) be n paired observations on
two variables X and Y. For estimating the
regression coefficients α and β, we use the
method of least squares.
Let us write
yi=α+βxi+ei , i=1,2,…,n
where ei is the error term corresponding to i-th
observation. In method of least squares we
estimate α and β by minimizing the error sum
of squares
Σei2= Σ(yi- α-βxi)2
with respect to α and β.
For minimizing error sum of squares, we
differentiate it with respect to α and β and put
it equal to zero. This yields the following set
of equations:
n n
∑ yi = nα + β ∑ xi
i =1 i =1
n n n
∑ x1 yi = α ∑ xi + β ∑ xi2
i =1 i =1 i =1
These equations are called the normal equations.
The resulting estimators obtained by solving the
above normal equations and denoted by a and
byx, and called the least squares estimators of α
and β respectively.
Result: The least squares estimators of the
regression coefficients a and b are given by
(r is same as rxy)
σy
byx = r ; a = Y − byx X
σx
100
80
60
Series1
40
20
0
0 5 10 15 20 25
For r=1 or –1, the two lines coincide (θ=0) and
for r=0, two lines are perpendicular to each
other.
Result: The point of intersection of two lines of
regression is ( X , Y ) .
Multiple regression: Suppose we have more
than one independent variables, say, X 1,X2,
…,Xp, the multiple regression equation of Y
on X1,…,Xp is given by
Y = α+β1 X1+…+ βpXp+e
For given set of observations, we can fit the
multiple regression equation by using the
method of least squares.
Spearman’s Rank Correlation Coefficient:
Ex: The rakings given by two judges to the
works of six artists are as follows:
Judge 1: 6 4 2 5 3 1
Judge 2: 2 5 4 1 3 6
Find the correlation between the rankings of two
judges.
If we have pairs of rankings for different units,
then, to find the correlation between two
rankings, we use rank correlation coefficient.
The formula for rank correlation coefficient is
given by
n
6∑ d i2
r =1− i =1
n( n − 1)
2