You are on page 1of 30

Dr.

Siti Mariam binti Abdul Rahman


Faculty of Mechanical Engineering
Office: T1-A14-01C
e-mail: mariam4528@salam.uitm.edu.my
 To fit curves to data using available techniques.
 To assess the reliability of the answers obtained.
 To choose the preferred method for any particular
problem.
 To study different techniques to fit curves or approximating
functions to the set of discrete data and to manipulate these
approximating functions.
 Least-squares regression. Get the ‘best’ straight line to fit
through a set of uncertain data points.
 Interpolation. Estimate intermediate values between precise
data points by deriving polynomials in equation forms. Three
methods to be investigated:
(a) Newton’s interpolating polynomial,
(b) Lagrange interpolating polynomial, and
(c) Spline interpolation (fitting data in piecewise fashion).
 Typical data
◦ is discrete but we are interested to know the intermediate value
◦ need to estimate these intermediate values
Curve fitting:
◦ finding a curve (approximation) which has the best fit to a series of
discrete data
◦ The curve is the estimate of the trend of the dependent variables
◦ the curve can be used to determine the intermediate estimate of the
data.
How to draw the curve?
◦ need to define the function of the curve – can be linear or non-linear
Approaches for curve fitting:
1. Least-square regression
 Data with significant error or noise
 Curve doesn’t pass all data points – curve represent general trend
of the data
2. Interpolation
 Data is known to be precise
 Curve passes all data point
Regression?
◦ modeling of relationship between dependent and
independent variables
◦ finding a curve which represent the best approximation of a
series of data points
◦ the curve is the estimate of the trend of dependent
variables

 How to find the curve?


◦ by deriving the function of the curve
◦ functions can be linear, polynomial & exponential
 Given n data points (x1, y1), (x2,y2),…. , (xn,yn), best fit y = f(x)
to the data set. The best fit is generally based on minimizing
the sum of the square of the residuals, Sr.
 Regression model;
y p  f (x)

 Residual at any point is


i  yi  f (xi )
 Sum of the square of the residuals
n
Sr   yi  f (xi )
2

i 1
 Fit a straight line to a set of n data points (x1,y1), (x2,y2), ....., (xn, yn).
 Equation of the line (regression model) is given by

y  a0  a1x  e
 a1- slope
 a0 - intercept
 e - error, or residual, between the model and the measurement

• Ideally, if all the residuals are zero,
one may have found an equation in
which all the points lie on the
model.
• Thus, minimization of the residual
is an objective of obtaining
regression coefficients.
 The most popular method to minimize the residual is the least
squares methods, where the estimates of the constants of the
models are chosen such that the sum of the squared residuals, Sr is
minimized.

 The ‘best’ straight line would be the one that minimize the total
error. Several criteria may be used,
n n
min  e   (y
i i  a0  a1xi )
or i 1
n
i 1
n
min  e   (y
i 1
i
i 1
i  a0  a1xi )


n = total number of points
This is an inadequate criterion  no unique model

 Examples of some criteria for “best fit”
that are inadequate for regression:

a) minimizes the sum of the


residuals,
b) minimizes the sum of the absolute
values of the residuals, and
c) minimizes the maximum error of
any individual point.

 However, a more practical criterion


for least-squares approach is to
minimize the sum of the squares of
the residuals, that is
n n
Sr   ei   (yi  a0  a1xi ) 2
2

i 1 i 1
 Best strategy!  Yields a unique line for a given set of data.
 Using the regression model:
y  a0  a1x

 the slope and intercept producing the best fit can be found
using:
 n xi yi   xi  yi
a1 
n xi   x 
2 2
i

a0 
 y  a1  xi
i

n
 y  a1x
Fit the best straight line to the following set of x and y values
using the method of least-squares.

x 0 1 2 3 4 5 6
y 2 5 9 15 17 24 25

Solution: xi yi xi2 x i yi
0 2 0 0
1 5 1 5
2 9 4 18
3 15 9 45
4 17 16 68
5 24 25 120
6 25 36 150
Σ 21 97 91 406
Knowing the linear equation and using known value:

a1 
n xi yi   xi  yi
a0
 y i a1  xi
n xi   x  n
2 2
i
97  4.1071(21)
7(406)  (21)(97) 
 7
7(91)  21
2
97  86.2491
2842  2037 
 7
637  441 10.7509
805 
 7
196  1.5357
 4.1071


Least-square fit is given by: y  1.5357  4.1071x


 for a straight line, the sum of the squares of the estimate
residuals:
n n
Sr   ei   (yi  a0  a1xi ) 2
2

i 1 i 1

• Quantify the spread of data


around the regression line
 • Used to quantify the ‘goodness’
of a fit

 Standard error of the estimate:


St
Sy  St   (yi  y ) 2
n 1
Sr Sr   (yi  a0  a1xi ) 2
Sy / x 
n 2 

 Standard error of estimate, Sy/x  Quantify the spread of data
around regression line
 Regression data showing
◦ the spread of data around the mean of the dependent data, Sy
◦ the spread of the data around the best fit line, Sy/x.

◦ The reduction in spread represents the improvement due to linear


regression.
 how well regression line represent the real data.
 Represent the % of the data which is closest to the line of best fit!
 r2 is the difference between the sum of the squares of the data
residuals, St and the sum of the squares of the estimate residuals, Sr,
normalized by the sum of the squares of the data residuals:

St  Sr
r2  St   (yi  y ) 2 Sr   (yi  a0  a1xi ) 2
St
 St – Sr: quantify the improvement (or error reduction) due to
describing data in term of straight line rather than an average value.
r2 represents the percentage of the
original uncertainty explained by
 
the model. (i.e. % of data that is closest to the regression line)
 For a perfect fit, Sr = 0 and r2 = 1.
 If r2 = 0  St = Sr, there is no improvement over simply picking the
mean.
 If r2 < 0, the model is worse than simply picking the mean!
 Determine the coefficient of correlation for the linear
regression line obtained in the Example 1

xi yi Fest  1.5357  4.1071xi


0 2
St   (yi  y ) 2  480.8571
1 5
2 9 Sr   (yi  a 0  a1xi ) 2  8.5357
3 15
480.8571
4 17 Sy   8.9523
5 24 7 1
6 25 8.5357
Sy / x   1.3066
72
98.22% of the original
uncertainty has been 480.8571 8.5357
r 2
 0.9822
explained by the linear model 480.8571
 How good is the model?

 Check for adequacy


◦ Standard error of estimate, Sy/x
◦ Coefficient of determination, r2
◦ Plot graph and check visually
Examples of functions that can be linearized are
1. Exponential function: y   e1 x where a and β are constant
1
coefficients
Linearized: ln y  ln 1  1x

2. Power function:
 y  2x 2 where a and β are constant

coefficients.
Linearized:
 log y  log  2  2 log x
x
3. Saturation-growth-rate:
 y  3
3  x
1 1 3 1

Linearized:   transform these equations
y 3 3 x into linear form so that
simple linear regression
 can be used.
 Example of nonlinear transformation

In their
transformed
forms, these model
can use linear
regression to
evaluate the
constant
coefficients.
 Use a power model to fit the following set of data.

y   2 x 2 log y  log  2  2 log x


xi y log xi log yi
1 0.5 0.000 -0.301
 2 1.7 0.301  0.226
3 3.4 0.477 0.534
4 5.7 0.602 0.753
5 8.4 0.699 0.922

 Use linearized plot to determine the coefficient of power equation


◦ Linear regression is log y = 1.75*log x – 0.300
 Not all equations can be easily transformed!!!!
◦ Alternative method  Nonlinear regression
 The linear least-squares regression
procedure can be readily extended to
fit data to a higher-order polynomial.
 Again, the idea is to minimize the sum
of the squares of the estimate
residuals.
 The figure shows the same data fit
with:
a) A first order polynomial
b) A second order polynomial
 For second order polynomial
regression:

y  a0  a1x  a2x2  e
 For a second order polynomial, the best fit would mean minimizing:
n n
Sr   ei   (yi  a0  a1xi  a2 xi 2 ) 2
2

i 1 i 1
 In general, this would mean minimizing:
n n
S r   ei   ( yi  a0  a1 xi  a2 xi    am xi ) 2
2 2 m

i 1 i 1

 The standard error for fitting an mth order polynomial to ndata
points is:
Sr
Sy / x 
n  (m  1)
because the mth order polynomial has (m+1) coefficients.
 The coefficient of determination r2 is still found using:
 St  Sr
r2 
St
 To find the constants of the polynomial model, we partially
differentiate it with respect to each of the unknown
coefficients and set them equal to zero.

S r n
  2.( yi  a0  a1 xi  a2 xi    am xi )(1)  0
2 m

a0 i 1
S r n
  2.( yi  a0  a1 xi  a2 xi    am xi )( xi )  0
2 m

a1 i 1
S r n
  2.( yi  a0  a1 xi  a2 xi    am xi )( xi )  0
2 m 2

a2 i 1

S r n
  2.( yi  a0  a1 xi  a2 xi    am xi )( xi )  0
2 m m

am i 1
 In general, these equations in matrix form are given by

 n  xi2  xi 3  mi 1   a0    y i 
2 m
 x
 a   x y 
  xi  xi 3  xi 4   xim2   1    i2 i 
 x 2  xi  xi   xi   a2     xi yi 
 i
   
          
 xm 2m 
am   xi m yi 
 i  i  i  i 
m 1 m2
x x  x  

The above equations are then solved for a0, a1,…,am


 Fit a second-order polynomial

m  2; n  6; x  2.5; y  25.433;
 xi  15;  yi  152.6;  i  55;
x 2
 i  225;
x 3

 i  979;
x 4
 xi yi  585.6;  i yi  2488.8;
x 2
m  2; n  6; x  2.5; y  25.433;
x i  15;  y  152.6;  x  55;  x  225;
i i
2
i
3

x i
4
 979;  x y  585.6;  x y  2488.8;
i i i
2
i

 n  xi2  xi 3  mi 1   a0    y i 
2 m
 x
 a   x y 
   xi  xi 3  xi 4   xim2   1    i2 i 
 x 2  xi  xi   xi   a2     xi yi 
 i
   
          
 xm
 i  xi  xi  xi  am   xi yi 
m 1 m2 2m m
  

 Solving gives; a0 = 2.47847, a1=2.35929 & a2=1.86071


 y = 2.47847 + 2.35929x + 1.86071x2
m  2; n  6; x  2.5; y  25.433;
x i  15;  y  152.6;  x  55;  x  225;
i i
2
i
3

x i
4
 979;  x y  585.6;  x y  2488.8;
i i i
2
i

6 15 55  a0  152.6 


15 55 225 a  585.6 
    1   

55 225 979
 
a2 
 
2488.8

Solving gives;
 a0 = 2.47847, a1 = 2.35929 & a2 = 1.86071

 y = 2.47847 + 2.35929x + 1.86071x2
 Sr: sum of the squares of the estimate residuals
n n
Sr   ei   (yi  a0  a1xi  a2 xi 2 ) 2
2

i 1 i 1
 St: sum of the squares of the data residuals

St   (yi  y ) 2

 Sy/x: standard error of the estimate

 Sr
Sy / x 
n  (m  1)

 r2: the coefficient of determination

 St  Sr
r 
2

St

You might also like