You are on page 1of 16

Linear and nonlinear regression techniques using Excel Solver tool

Spreadsheets by S.K. Dentel 10/06.


These worksheets describe the use of regression for data analysis.
It is assumed that you already know how to use the Excel Regression tool.
Here we will cover:
1) Linear regression using Excel Solver
2) Nonlinear regression using Excel Solver
We will not worry with derivations at this point.
Why are we doing linear regression by two different methods? This is for two reasons:
First, I want you to learn how to use the Solver tool because it is useful for lots of situations when you
cannot get an analytical solution for an unknown.
Second, if we first use the Solver to duplicate another method, we can be sure that it works
before applying it to more difficult problems like nonlinear regression.

So the data below (in yellow) are what we want to analyze. First we graph them to check for outliers or other anomalies.
The graph is shown below, and it has some noise, but nothing that's totally unreasonable.
Now we want to compare these data to models, which means equations that we think will fit the data. We would like to see
which model equation fits the data best. This equation can then be used to predict or upscale the adsorption process,
from the 1-liter experiment to the pond, and to any carbon dose.
Here are three possibilities:
2. Langmuir equation

q = mc + b
Carbon dose
mg/L
0 (control)
1
2.5
5
10
25
100

C
mg/L
0.110
0.059
0.015
0.01
0.008
0.001
0.0006
0

q = qmax
C0-C = X
mg/L
0.051
0.095
0.100
0.102
0.109
0.109

KLc

q = KFc

70.0

X/M = q
mg/g
51.0
38.0
20.0
10.2
4.36
1.09
0.00

3. Freundlich equation

1/n

1 + KLc
60.0
50.0
q (mg/g)

1. Linear equation

40.0
30.0
20.0
10.0
0.0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (mg/L)

rs or other anomalies.

ata. We would like to see


adsorption process,

1. Now let's try fitting the adsorption data to the linear model, using Solver.
First, we'll do a linear regression on q vs. C. I know, the data don't look linear, but this is to show you
how to do a regression using the SOLVE function. Since you already know how to use Excel's
linear regression capability (if not, go to the last worksheet), you'll be able to check your results.
So our function is q = mC+b. We want the best values for m and b. Here are the steps:
1. Put estimated values for m and b in a couple of cells (shown here in orange).
We'll use these in the above equation for our estimated q values.
2. Put the estimated q's in a new column. We'll call this "q-hat".
3. We'll now create a number that indicates how good our estimated q values are. In another column, compute
q [the measured value] minus q-hat, and SQUARE this. Sum these up : it's the "regression sum of squares" or RSS.
4. Note that if all of our estimates were exactly equal to the data, this RSS would be zero. The better
the fit, the lower RSS is. So if we can readjust m and b to get a better fit, the RSS will go down. The best
values of m and b will minimize the RSS. There is a tool that does this automatically, called SOLVER.
If it's not under the TOOLS menu, you can get it by going to Tools/Add-Ins and adding it.
5. In Solver, set the target cell as the cell where the RSS is calculated. Have it find the MIN value,*
by changing the cells that have the values for m and b. When you hit SOLVE, m and b will be changed
iteratively until the RSS is minimized. The linear regression is done!
Values:
m=
b=
C
-C
=
X
Carbon dose
C
X/M = q
q-hat (q-q-hat)^2
Do this yourself by filling in the
0
mg/L
0 (control)
1
2.5
5
10
25
100

mg/L
0.110
0.059
0.015
0.01
0.008
0.001
0.0006
0

mg/L

mg/g

0.051
0.095
0.1
0.102
0.109
0.1094

51.0
38.0
20.0
10.2
4.36
1.09
0.00

mg/g
values:
(exclude) (exclude)

equations in the cells to the left,


then use SOLVE to minimize RSS.
When done, go to the next sheet.

RSS = Sum:

*Note that when you use "minimum" rather t


"equals," you are not really using Solver to S
only to get the lowest value possible. It's sti
a trial-and-error approach, done by numerica
iteration.

her column, compute


on sum of squares" or RSS.

down. The best


ed SOLVER.

ll be changed

urself by filling in the


in the cells to the left,
SOLVE to minimize RSS.

ne, go to the next sheet.

when you use "minimum" rather than


you are not really using Solver to Solve,
t the lowest value possible. It's still
-error approach, done by numerical

Example use of linear and nonlinear regression techniques to find best-fit model
Below is what you should have obtained.
Is this the right result? We can check it using Excel's built-in linear regression capability.
Choose Data/Data Analysis/Regression, choose the q values for the Y-range,
the C values for the X-range. If you did it right, the m and b values will be the same.
It is always a good policy to GRAPH the model line against the data. This is done to the right.-->

70
60

q (m g/g)

50
40
30
20

10
Of course, the built-in linear regression also gives an r value and lots of other information. We can calculate all of that
0
0 0.01 0.02 0.03 0

Values:
m = 839.30751
b = 6.5849737

Carbon dose
mg/L
0 (control)
1
2.5
5
10
25
100

C
mg/L
0.110
0.059
0.015
0.01
0.008
0.001
0.0006
0

C0-C = X
mg/L

X/M = q
mg/g

q-hat
mg/g

0.051
0.095
0.1
0.102
0.109
0.1094

51.0
38.0
20.0
10.2
4.36
1.09
0.00

56.1041168
19.1745864
14.9780488
13.2994338
7.42428121
7.08855821
6.5849737

q-bar:

Exercise: add the equations to compute

(q-q-hat)^2 (q-q-bar)^2 the residual sums of squares between q


values:
values: and then the sum of these, which is the T
26.052009
354.3962
25.219994
9.6064898
9.3898193
35.934728
43.361879
503.96112
Sum=RSS Sum=TSS

SUMMARY OUTPUT

After you have this column


and its sum, compute r
below. See if it agrees with
the value in "Summary Output" below.

r = 1-RSS/TSS=
If it does NOT agree, check your cell entries for
correctness. The cell entry for q-bar, the right-

Regression Statistics
Multiple R
0.886283
R Square
0.785498
Adjusted R S 0.742597
Standard Erro 10.03953
Observations
7

most column and its sum for TSS, and the formula

for r. should all be the same as on the next sheet.

When done, go to the next sheet.

ANOVA
df
Regression
Residual
Total

Intercept
X Variable 1

SS
MS
F
Significance F
1 1845.483 1845.483 18.3097707 0.0078711
5 503.9611 100.7922
6 2349.444

Coefficients
Standard Error t Stat
P-value
Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
6.584967 4.612779 1.427549 0.21277735
-5.27254 18.44247 -5.27254 18.44247
839.308 196.1462 4.278992 0.00787106 335.09896 1343.517 335.099 1343.517

70
60

q (m g/g)

50
40
30
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (m g/L)

add the equations to compute q-bar,

al sums of squares between q and q-bar,


he sum of these, which is the TSS.

have this column


m, compute r
ee if it agrees with
in "Summary Output" below.

T agree, check your cell entries for


The cell entry for q-bar, the right-

n and its sum for TSS, and the formula

d all be the same as on the next sheet.

ne, go to the next sheet.

Upper 95.0%

Example use of linear and nonlinear regression techniques to find best-fit model
Why did we go to the trouble of using SOLVE to do a linear regression, when Excel has a
built-in capability to do this?
Linear regressions are easy, because there are explicit solutions for m, b, and r. For other cases,
it's not so straightforward. But now that you see how a linear regression works, you can use the same approach-and SOLVER--for ANY equation.. If the equation is anything other than the equation of a line, it's known as
NON-linear regression. The method even works for complex models that use more than one
equation.
KLc
q = qmax
So let's try try a Langmuir isotherm,
1 + KLc
This is easy. Change the labels for the two fitting parameters to "q max" and "KL."
Rewrite the equation for "q-hat" to use these parameters as in the Langmuir equation.
Use "SOLVE" exactly as before, minimizing the RSS by adjusting q max and KL.
You'll get the best values for these two parameters, and the r value.
You can also solve by maximizing the cell with the r value - you'll get the same result.
Once again: ALWAYS plot the data and the model on the same graph for visual examination!
Values:
(A smoother plot is constructed further down on this sheet)
qmax= 73.26604

70
60

KL= 42.60169
mg/L
0.051
0.095
0.1
0.102
0.109
0.1094
q-bar:

X/M = q
mg/g
51.0
38.0
20.0
10.2
4.36
1.09
0.00
17.8

q-hat
mg/g
52.41332
28.56506
21.88794
18.62306
2.99372
1.826078
0

(q-q-hat)^2 (q-q-bar)^2
values:
values:
1.997466
89.0181
3.564314
70.94802
1.866722
0.535938
0
167.9306

q (m g/g)

C
mg/L
0.110
0.059
0.015
0.01
0.008
0.001
0.0006
0

50

1101.728
407.7284
4.806117
57.87732
180.841
279.3482
317.1147
2349.444

Sum=RSS Sum=TSS

40
30
20
10
0

0 0.01 0.02 0.03 0.04 0.0

C (m g

r = 1-RSS/TSS=

For a smoother model curve, create a column with many more values for the x-axis,
and compute the y-value for each of them:
C
q-hat
0
0
0.002
5.752391
70
0.004
10.66726
60
0.006
14.91509
50
0.008
18.62306
0.01
21.88794
40
0.012
24.78466
30
0.014
27.37218
20
0.016
29.6975
0.018
31.79854
10
0.02
33.70627
0
0.022
35.44618
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.024
37.03949
q (m g/g)

Carbon dose
mg/L
0 (control)
1
2.5
5
10
25
100

C0-C = X

C (m g/L)

0.1

q
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

0.026
0.028
0.03
0.032
0.034
0.036
0.038
0.04
0.042
0.044
0.046
0.048
0.05
0.052
0.054
0.056
0.058
0.06
0.062
0.064
0.066
0.068
0.07
0.072
0.074
0.076
0.078
0.08
0.082
0.084
0.086
0.088
0.09
0.092
0.094
0.096
0.098
0.1

38.50398
39.85467
41.10432
42.26386
43.34269
44.34897
45.28978
46.17129
46.99895
47.77754
48.51131
49.204
49.85899
50.47927
51.06752
51.62616
52.15737
52.66313
53.14522
53.60527
54.04474
54.465
54.86728
55.2527
55.62231
55.97705
56.31781
56.6454
56.96056
57.264
57.55634
57.8382
58.11012
58.37262
58.62618
58.87126
59.10826
59.33758

C (m g/L)

e the same approachne, it's known as

70
60

q (m g/g)

50
40
30
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (m g/L)

0.928523
the x-axis,

4 0.05 0.06 0.07 0.08 0.09 0.1


m g/L)

4 0.05 0.06 0.07 0.08 0.09 0.1


m g/L)

Example use of linear and nonlinear regression techniques to find best-fit model
So that's it. In general, your steps for optimizing the fit of ANY equation or model
to data goes like this:
1. Tabulate and graph the data.
2. Choose the equation you want to use, and place estimated values for the equation parameters in cells.
3. Add a column to your table with computed y values ("y-hat") using the x's, the equation, and the fitting parameters.
4. Add a column with computed squares of residuals (y - y-hat)^2, and sum these to get the residual sum of squares, RSS.
5. Add a cell to compute the average y (y-bar), and use it in a new column to compute (y - y-bar)^2.
6. Sum these to get the total sum of squares, TSS, and use it with the RSS to compute r.
7. Use SOLVE to find the parameter values that maximize r or minimize the RSS. This is why it's called a LEAST SQUARES
8. You're not done until you PLOT the model equation along with the data for visual comparison.
If you do the above steps for more than one model equation (such as a comparison of the Langmuir and
Freundlich equations), the model with the highest r is the best fit. (This is NOT the case if using the
more conventional method of plotting linearized equations such as reciprocal or log-log plots
to get the fitting parameters, because this method gives r values for the linearized plot, not the original
plot, so they're not comparisons on the same basis.)

Below is a fit for the Freundlich isotherm, and a plot showing it and the Langmuir.
The Langmuir is a better fit.
1/n= 0.535729
KF= 243.3178
C0-C = X
mg/L
0.051
0.095
0.1
0.102
0.109
0.1094
q-bar:

X/M = q
mg/g
51.0
38.0
20.0
10.2
4.36
1.09
0.00
17.8

q-hat
mg/g
53.417428052
25.647917053
20.640243689
18.314592872
6.0115343475
4.5722968212
0

5.843958
152.574
0.409912
65.84662
2.727566
12.09855
0
239.5006

0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
0.022

q-hat
0
8.7147709726
12.63358549
15.69870771
18.314592872
20.640243689
22.758023884
24.717236793
26.55020717
28.279506659
29.921644987
31.489136234

50

1101.728
407.7284
4.806117
57.87732
180.841
279.3482
317.1147
2349.444

Sum=RSS Sum=TSS

60

(q-q-hat)^2 (q-q-bar)^2
values:
values:

40

q (m g/g)

C
mg/L
0.110
0.059
0.015
0.01
0.008
0.001
0.0006
0

30
20
10
0

0 0.01 0.02 0.03 0

r = 1-RSS/TSS=

Smoother model curve:

70
60
50
q (mg/g)

Carbon dose
mg/L
0 (control)
1
2.5
5
10
25
100

70

40
Data

30

Langmuir

20

Freundlich

10
0
0

0.02

0.04 0.06
C (mg/L)

0.08

0.1

q(

Data

30

Langmuir

20

Freundlich

10
0
0.024
0.026
0.028
0.03
0.032
0.034
0.036
0.038
0.04
0.042
0.044
0.046
0.048
0.05
0.052
0.054
0.056
0.058
0.06
0.062
0.064
0.066
0.068
0.07
0.072
0.074
0.076
0.078
0.08
0.082
0.084
0.086
0.088
0.09
0.092
0.094
0.096
0.098
0.1

32.991737961
34.437235423
35.831959908
37.181143802
38.489171215
39.759759468
40.996093428
42.200926875
43.376660285
44.525401416
45.649013138
46.749151637
47.827297275
48.884779747
49.922798789
50.942441358
51.944695988
52.930464878
53.90057413
54.855782476
55.796788752
56.724238344
57.63872876
58.540814478
59.431011178
60.309799456
61.177628089
62.034916917
62.882059403
63.719424901
64.547360686
65.366193763
66.176232493
66.977768055
67.771075758
68.556416236
69.334036521
70.104171021
70.867042408

0.02

0.04 0.06
C (mg/L)

0.08

0.1

he fitting parameters.
idual sum of squares, RSS.

t's called a LEAST SQUARES method).

70
60

q (m g/g)

50
40
30
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (m g/L)

0.898061

Data
Langmuir
Freundlich

4 0.06
mg/L)

0.08

0.1

Data
Langmuir
Freundlich

4 0.06
mg/L)

0.08

0.1

The "Langmuir-Freundlich" isotherm has the form

For the data, evaluate this model.

( K Lc )
q = qmax
1 + ( K L c )n

You might also like