You are on page 1of 18

Curve Fitting, Regression

and Correlation
By: Prof. M S
Kirkire

Curve fitting
A relationship between two or more

variables
Express the relationship in mathematical
form by determining an equation
connecting the variables
Example:

Temperature (x) and pressure(y)


n no. of Samples would reveal
the x1, x2, .., xn temperatures and the
corresponding pressures y1, y2, ., yn
2

Steps in curve fitting


Plot the x and y points i.e. (x1, y1), (x2,

y2),. (xn, yn) on a rectangular coordinate


system.
The resulting set of points is called a

scatter diagram.
The data can be approximated through a

smooth curve, such a curve is called an


approximating curve
3

Possible relationships between the


variables
Linear relationship

y = a + bx
Non linear relationship
Parabolic or quadratic
y = a + bx + cx2
No relationship

Regression
Main purpose of curve fitting is to estimate

one of the variables (dependent


variable) from the other (independent
variable)
The process of estimation is called as
regression
When y is to be estimated from x by means
of some equation:
it is called as regression equation of y
on x and the corresponding curve a
regression curve of y on x
5

Method of least squares


Generally more than one curve will appear

to fit a set of data


To avoid individual judgment in
constructing the lines, parabola or other
approximating curves, the best fitting
line and best fitting parabola is required
to be defined.

Consider, a possible definition asfor the given value of x say x1, there will
be a difference between y1 and the
corresponding value as determined from
the curve C.
This difference is denoted by d1, which is
referred as residual or a deviation error.

Measureness of goodness of fit of the curve

C to the set of data is provided by the quantity:


d12 + d22+ + dn2
Best fitting curve:
Of all curves approximating a given set of data
points, the curve having the property that
d12 + d22+ + dn2 = a minimum
is the best fitting curve.
The curve having this property is called a least
squares regression curve and the line having
this property is called as least squares line, a
parabola is called least squares parabola
8

The least squares line


Has the equation

y = a + bx
The constants a & b are determined by
solving the simultaneous equationsy = an + bx
xy = ax + b

x2

a & b can be determined by-

The least squares line in terms of


sample variances and covariance

10

The least squares parabola


y = a + bx + cx2
a, b, c can be determined from following equationsy = na + bx+ c

x2

x2+ cx3
x2y = ax2 + b 3+ c 4
x
x
xy = ax + b

11

Multiple regression
If there is a linear relationship between a

dependent variable(z) and two independent


variables (x & y), the equation is
a regression equation of z on x & y
z= a + bx +cy
To find least square regression plane,
determine a, b, c
z = na + bx+ c y

xz = ax + b 2+ cxy

yz = ay + bxy+ c 2

12

Standard error of estimate


Standard error of estimate of y on xSx.y =
In case of a regression
line

yest = a + bx

13

The linear correlation


coefficient

14

Generalized Correlation Coefficient

15

Rank Correlation

16

Probability interpretation of regression


Given the joint density function or probability
function, f(x, y), of two random variables X
and Y,
A curve with equation y = g(x) having
property asE {[Y- g(X) 2} = a minimum
is called a least squares regression curve
of Y on X.
17

Probability Interpretation of Correlation

18

You might also like