Professional Documents
Culture Documents
i yi f (xi )
Sum of the square of the residuals
n
Sr yi f (xi )
2
i 1
Fit a straight line to a set of n data points (x1,y1), (x2,y2), ....., (xn, yn).
Equation of the line (regression model) is given by
y a0 a1x e
a1- slope
a0 - intercept
e - error, or residual, between the model and the measurement
• Ideally, if all the residuals are zero,
one may have found an equation in
which all the points lie on the
model.
• Thus, minimization of the residual
is an objective of obtaining
regression coefficients.
The most popular method to minimize the residual is the least
squares methods, where the estimates of the constants of the
models are chosen such that the sum of the squared residuals, Sr is
minimized.
The ‘best’ straight line would be the one that minimize the total
error. Several criteria may be used,
n n
min e (y
i i a0 a1xi )
or i 1
n
i 1
n
min e (y
i 1
i
i 1
i a0 a1xi )
n = total number of points
This is an inadequate criterion no unique model
Examples of some criteria for “best fit”
that are inadequate for regression:
i 1 i 1
Best strategy! Yields a unique line for a given set of data.
Using the regression model:
y a0 a1x
the slope and intercept producing the best fit can be found
using:
n xi yi xi yi
a1
n xi x
2 2
i
a0
y a1 xi
i
n
y a1x
Fit the best straight line to the following set of x and y values
using the method of least-squares.
x 0 1 2 3 4 5 6
y 2 5 9 15 17 24 25
Solution: xi yi xi2 x i yi
0 2 0 0
1 5 1 5
2 9 4 18
3 15 9 45
4 17 16 68
5 24 25 120
6 25 36 150
Σ 21 97 91 406
Knowing the linear equation and using known value:
a1
n xi yi xi yi
a0
y i a1 xi
n xi x n
2 2
i
97 4.1071(21)
7(406) (21)(97)
7
7(91) 21
2
97 86.2491
2842 2037
7
637 441 10.7509
805
7
196 1.5357
4.1071
Least-square fit is given by: y 1.5357 4.1071x
for a straight line, the sum of the squares of the estimate
residuals:
n n
Sr ei (yi a0 a1xi ) 2
2
i 1 i 1
St Sr
r2 St (yi y ) 2 Sr (yi a0 a1xi ) 2
St
St – Sr: quantify the improvement (or error reduction) due to
describing data in term of straight line rather than an average value.
r2 represents the percentage of the
original uncertainty explained by
the model. (i.e. % of data that is closest to the regression line)
For a perfect fit, Sr = 0 and r2 = 1.
If r2 = 0 St = Sr, there is no improvement over simply picking the
mean.
If r2 < 0, the model is worse than simply picking the mean!
Determine the coefficient of correlation for the linear
regression line obtained in the Example 1
2. Power function:
y 2x 2 where a and β are constant
coefficients.
Linearized:
log y log 2 2 log x
x
3. Saturation-growth-rate:
y 3
3 x
1 1 3 1
Linearized: transform these equations
y 3 3 x into linear form so that
simple linear regression
can be used.
Example of nonlinear transformation
In their
transformed
forms, these model
can use linear
regression to
evaluate the
constant
coefficients.
Use a power model to fit the following set of data.
y a0 a1x a2x2 e
For a second order polynomial, the best fit would mean minimizing:
n n
Sr ei (yi a0 a1xi a2 xi 2 ) 2
2
i 1 i 1
In general, this would mean minimizing:
n n
S r ei ( yi a0 a1 xi a2 xi am xi ) 2
2 2 m
i 1 i 1
The standard error for fitting an mth order polynomial to ndata
points is:
Sr
Sy / x
n (m 1)
because the mth order polynomial has (m+1) coefficients.
The coefficient of determination r2 is still found using:
St Sr
r2
St
To find the constants of the polynomial model, we partially
differentiate it with respect to each of the unknown
coefficients and set them equal to zero.
S r n
2.( yi a0 a1 xi a2 xi am xi )(1) 0
2 m
a0 i 1
S r n
2.( yi a0 a1 xi a2 xi am xi )( xi ) 0
2 m
a1 i 1
S r n
2.( yi a0 a1 xi a2 xi am xi )( xi ) 0
2 m 2
a2 i 1
S r n
2.( yi a0 a1 xi a2 xi am xi )( xi ) 0
2 m m
am i 1
In general, these equations in matrix form are given by
n xi2 xi 3 mi 1 a0 y i
2 m
x
a x y
xi xi 3 xi 4 xim2 1 i2 i
x 2 xi xi xi a2 xi yi
i
xm 2m
am xi m yi
i i i i
m 1 m2
x x x
m 2; n 6; x 2.5; y 25.433;
xi 15; yi 152.6; i 55;
x 2
i 225;
x 3
i 979;
x 4
xi yi 585.6; i yi 2488.8;
x 2
m 2; n 6; x 2.5; y 25.433;
x i 15; y 152.6; x 55; x 225;
i i
2
i
3
x i
4
979; x y 585.6; x y 2488.8;
i i i
2
i
n xi2 xi 3 mi 1 a0 y i
2 m
x
a x y
xi xi 3 xi 4 xim2 1 i2 i
x 2 xi xi xi a2 xi yi
i
xm
i xi xi xi am xi yi
m 1 m2 2m m
x i
4
979; x y 585.6; x y 2488.8;
i i i
2
i
i 1 i 1
St: sum of the squares of the data residuals
St (yi y ) 2
Sy/x: standard error of the estimate
Sr
Sy / x
n (m 1)
St Sr
r
2
St