Ch04le 1

Outline/Harris,2e
Chapter 4 - Statistics
We have already
seen in Chapter 3
that all laboratory
measurements
have errors.
The Two Types of Errors are
Determinate Errors
Random Errors
Determinate Errors systematic errors that cause
a measurement to always be too high or too low
which can be traced to an identifiable source.
Examples include
Use of an uncalibrated or faulty tool or
instrument
Use of wrong values, such as molar mass,
conversion factor, etc.
A good way to detect the existence of
determinate errors is to use different methods of
analyzing the same material.
Random Errors errors that are random in nature.
They occur when a calibrated instrument is correctly
used to its most sensitive degree of measurement.
For example, using the analytical balance (sensitivity
of 0.1mg), you see variations in the last digit when
re-weighing the same object. In this chapter we will
focus our attention on ways to evaluate random
errors.
o All measurements have some random errors.
No measurements contain experimental errors.
Statistics allows us to accept conclusions that
have a high probability of being correct and to
reject conclusions that have a low probability.
o Statistics apply only to random errors; the
analyst would eliminate all determinate errors
before making sensitive measurements.
Random errors follow a Gaussian distribution of
values about the central measurement.
A Gaussian distribution is characterized by the mean
value and a standard deviation
Mean value or average is a measurement of central

tendency
xmean = i (xi) / n
where i represents each individual measurement,

means the summation, and n is the number of
measurements in that set of data.
Standard deviation is the measure of the width
of the distribution about the central value.
______________
s= i (xi xmean) 2 / (n-1)
The above defined standard deviation is for a
limited or small set of data; for a large set of data
the standard deviation is indicated by and is
defined as
______________
= i (xi xmean) 2 / n
As the size of the data set increases there (n 1) n, so
s
Ordinarily analytical chemists will use the first

value (s) for the standard deviation since we will
typically deal with a small population or small
data set.
The larger the
value of s, the
broader is the
Gausssian curve.
The relative standard deviation is the standard
deviation divided by the mean value, that is
s / xmean.
The relative standard deviation may be

expressed in % (parts per hundred) or ppt (parts
per thousand)
Relation standard deviation (%) = s x100/ xmean
Relation standard deviation (ppt) = s x 1000 /xmean

Other important terms
Median middle value in an ordered set (ascending
or descending); when n is an even number, it is the
average of the 2 middle values.
Range difference between the highest and lowest

values in the set of data. May be stated as
(High Low) or that value.
For example in a set of data where 25.11 is the
highest and 24.85 the lowest, we could describe
the range as (25.11 24.85) or 0.26
Find the mean, median, standard deviation,
relative standard deviation and the range of the
following set of student data acquired in the
analysis of chloride in a sample:
xi
18.56%
18.65%
18.49%
18.54%
18.70%
18.53%
The sum of the individual values = 111.47
there were 6 measurements
The mean (xmean) = 111.47 / 6 = 18.578 = 18.58

The quantity (xmean xi) is calculated next
xi (xmean xi) (xmean xi) 2
18.56% -0.02 0.0004

18.65% 0.07 0.0049
18.49% -0.09 0.0081
18.54% -0.04 0.0016
18.70% 0.12 0.0144
18.53% -0.05 0.0025
The sum of deviations squared
( i (xmean xi)2 )= 0.0319
0.0319/(6-1) = 0.00638
________
s = (0.00638) = 0.0798 = 0.08 or 0.080 to use the
authors method.
Note that s is reported to the number of decimal places as the data.
The relative standard deviation = s/xmean

= 0.0798 x 100 / 18.578 = 0.429
this could be reported as 0.43% or 4.3 ppt
To find the median, first arrange in order (a/d);
I choose d(escending)
x I 18.70 18.65 18.56 18.54 18.53 18.49
1 2 3 4 5 6
Since n = 6, the median is between ordered #3
and #4, so the median = (18.56 + 18.54)/2 =
median = 18.55
For the ideal Gaussian distribution 68.3% of the
measurements lie within 1 (standard
deviation) of the mean value, 95.5% within 2
and 99.7% within 3.
This means that for real data of a small

population we can expect only 4.5% to fall
outside the 2s limits and only 0.3% outside the
3s limits from the mean value.
Students t test
The Students t test is a test developed by W. S.

Gossett who used the pseudonym Student to
publish this statistical test in 1908. It is used to
express confidence intervals for a set of data and to
statistically compare the results of different
experiments.
Students t test
The true mean is denoted as . From a small
number of data points it is not possible to determine
either or . Instead, we have xmean and s. We would
like to be able to state the probability that the true value
is within some quantity of xmean . The confidence
interval does this in the form
= xmean t s / n
and may stated at a certain probability such as 90%,
95%, or 99%, etc. The values of t for various degrees of
freedom and confidence levels are shown in Table 4-2,
page 78 of your textbook.
Students t test
Lets go back to the % chloride data and calculate the 50%,

90%, 95% and 99% confidence intervals for the results.
xi xmean = 18.58 s = 0.08
18.56%
18.65%
18.49%
18.54%
18.70%
18.53%
At the 50% CI, = 18.58 (0.727)(0.079 / 6 = 18.58 0.023
= 18.58 0.02. Note that the value for t is at the intersection of
the 50% column and the row for number of degrees of freedom = 5
Students t test
Now repeating the calculation with the appropriate

values of t
At the 90% CI, = 18.58 (2.015)(0.079 / 6 =

18.58 0.065 = 18.58 0.07
At the 95% CI, = 18.58 (2.571)(0.079 / 6 =
18.58 0.082 = 18.58 0.08.
At the 99% CI, = 18.58 (4.032)(0.079 / 6 =
18.58 0.130 = 18.58 0.13.
Students t test
Note that the tolerance quantity ( t s / n) becomes larger
as we increase the percent probability that we desire to
include. Or, another way of looking at it is that at the 50%
CI there is a 50% probability that the true value () lies
outside the 0.02, whereas at the 99% CI that is a 1%
probability that lies outside the 0.13
Also note that the tolerance quantity ( t s / n) is reported

to the same number of decimal places as the mean value,
though I carried an extra place through the calculation and
rounded after the final step.
Students t test
From the equation = xmean t s / n we see that

the size of the ( t s / n) is inversely proportional
to the n; thus, one way to increase the probability
that a x mean value is close to the true value is to
increase the number of results, assuming that x mean
and s are not affected by the multiple runs.
Students t test
Problem For n = 3 the x mean and s were found to

be 15.78 and 0.30 respectively. Calculate the 95%
confidence interval.
For n = 3, (n - 1) = 2; t 95, 2 = 4.303
= 15.78 (4.303)(0.30 / 3 = 15.78 0.745
= 15.78 0.75
Relative uncertainty = (0.75/15.78) X 100 = 4.75%
Students t test
Repeat the previous calculation for n = 7 with the

same x mean and s values:
For n = 7, (n-1) = 6; t 95, 6 = 2.447
= 15.78 (2.447)(0.30 / 7 = 15.78 0.277 =
= 15.78 0.28
Relative uncertainty = (0.28/15.78) X 100 = 1.77%

Students t test
The t test is also valuable to compare two different

sets of data to determine if they are the same or
different, or stated statistically, are there
significant differences between the two sets of
data?
Students t test
Example As the director of a research laboratory

you are paid to decide if there is a significant
difference between the mean values of two sets of
data obtained by two different scientists, a senior
scientist and one recently hired.
Data of Senior Scientist: xmean = 24.66%

with s = 0.06% for n = 5
Data of the New Kid: xmean = 24.55%

with s = 0.10% for n = 7
Students t test
What we need to do here is the compare the two mean

values, x1 mean to x2 mean as their difference (x1mean- x2 mean) to
( t s / n). Because there are two different standard
deviations, we need to calculate the pooled standard
deviation, spool which is defined as
_________________________________
spool = {(n1 1)s12 + (n2 1)s22} / (n1 + n2 2)
spool = {(5 1)(0.06)2 + (7 1)(0.10)2 / (5 + 7 2)}1/2

Students t test
spool = {(5)(0.0036) + (6)(0.010) / (10)}1/2 =

{(0.018 + 0.060) / (10)}1/2 = {0.0078 }1/2
spool = 0.088 = 0.09
Note that the value of spool will always fall between

the two individual values of s; it is like a weighed
average value.
Students t test
____________
Test if |(x1 mean- x2 mean)| > t spool / n1 + n2 / n1 n2 ) ?
We will use the value of t 95 for 7 + 5 2 or 10 degrees of
freedom; according to Table 4-2, t 95,10 = 2.228.
Substitution, is | 24.66 24.55| > {(2.228)(0.088) / (12/35)}1/2 ?
0.11 > {(0.196) / (0.343)}1/2 ?
0.11 > {(0.196) / (0.343)}1/2 ?
0.11 > {(0.572)}1/2 ?
0.11 > 0.756 ?
No, there is no significant difference between the mean values
of the two scientists.
Students t test
The testing for significant differences between the

true value () and the mean value (xmean) of a set of
data is very similar to the previous test.
If |( - x mean)| > t spool / n1 + n2 / n1 n2 ), there is a
significant difference between the true value and the
mean.
F test for Differences in Precisions
In addition to comparing a mean value to the true value and
two mean values, it is often valuable to compare the
precisions of two different sets of data. Your textbook
does not discuss this test, so I will briefly explain it and
apply it to a typical problem.
The variance v is defined as the standard deviation squared,

that is, v = s2. Variances are calculated for both sets of
data. The larger variance is placed in the numerator of a
term known as Fc and defined as Fc = vlarger / vsmaller. The
value of Fc is then compared to the tabulated values of Ft at
a specified confidence level, generally 95%.
Values for Ft For the Comparison of Variances
at the 95% Confidence Level
Number of Number of Observations, Numerator

Observations, ----------------
----------------
----------------
----------------
----------------
----------------
--------
Denominator 3 4 5 6 7 10
3 19.00 19.16 19.25 19.30 19.33 19.38 19.50
4 9.55 9.28 9.12 9.01 8.94 8.81 8.53
5 6.94 6.59 6.39 6.26 6.16 6.00 5.63
6 5.79 5.41 5.19 5.05 4.95 4.78 4.36
7 5.14 4.76 4.53 4.39 4.28 4.10 3.67
10 4.26 3.86 3.63 3.48 3.37 3.18 2.71
2.99 2.60 2.37 2.21 2.09 1.88 1.00
Problem Were there significant differences between the

precisions of the two scientists in the last problem above?
Data of Senior Scientist: xmean = 24.66%, s = 0.06% for n = 5
Data of the New Kid: xmean = 24.55%, s = 0.10% for n = 7
For the new kid, v = (0.10)2 = 0.010;

For the senior scientist, v = 0.0036.
Fc = (0.010 / 0.0036) = 2.78. From the Ft table, Ft = 6.16.
Since Fc < Ft there are no significant differences between the
precisions of the two scientists.
Conclusions of the Differences between Mean
Values and Precisions of the Two Scientists
1) The first test allowed us to test for significant

differences in the mean values obtained by the two
scientists. Since the difference in the 2 mean values
was less than the tolerance quantity, there is no
significant difference between the mean values of the
two scientists at the 95% confidence level.
2) The second test (F-test) allowed us to test for
differences in the precision of the two scientists. Since
the calculated value of F cal < Ftable , there is no
significant difference between the precision of the 2
scientists at the 95% confidence level.
Rejection of Suspect Data
1) The Q-Test
Occasionally in a set of data there is one value that appears
to not belong with the rest of the set. If the experimenter
is aware of some mistake or malfunction, she/he do not
need to employ one of these tests to reject that result. If no
known error has occurred (so that the suspect result
appears to be random, the analyst is then faced with
whether to retain or reject this suspect value. He/she needs
some sound basis for their decision, not just eyeballing it.
Your textbook describes one such test, the Q-test. After I
have discussed the Q-test, I will then discuss two
additional less rigorous, but useful tests for rejection of
suspect data.
Problem Given the following set of data for the

determination of % Acidic Substance in a Cleansing
Agent. May the suspect result be rejected, or must it be
retained by the criteria of the Q-test?
% Acid 10.19% 10.08% 10.52% 10.13%
Calculate the mean values both retaining and rejecting the

suspect value (which is the 10.52 result).
xmean (retaining) = 10.23%
xmean (rejecting) = 10.13%
Clearly the suspect value undutifully influences the

mean value. To employ the Q-test we need the range
and the difference between the suspect value and the
value nearest it.
Range = (10.52 10.08) = 0.44
Difference of Suspect and its nearest value
= (10.52 10.19) = 0.33
Qcal = (xsuspect xnearest) / (Range)
= 0.33 / 0.44 = 0.75
Since Qcal < Qtable (0.75 < 0.76) we must retain the suspect
value at the 90% confidence level.
38
Referring to Table 4-4, textbook page 82 Qt = 0.76 for

n = 4 at the 90% Confidence Level. Thus we must
retain the suspect value by this criterion.
(Not in your textbook, but Qtable at the 96% confidence

level has a value of 0.85 for n = 4 a; by this criterion,
the suspect value of 10.52% would also be retained.)
aSkoog and West, Fundamentals of Analytical Chemistry, 4e, c1982,
CBS College Publishing, p62.
39
2) The 4d and 2.5d Rules

Although less rigorous, this test may also be used to
decide whether to retain or reject a suspect. In order to
use it, one needs to calculate the average deviation
which is defined as
average deviation = i |(x i xmean)| / n
40
di =
% Acid Note (x i -x mean ) |di |
10.19 0.06 0.06
10.08 -0.05 0.05
10.52 (?) 0.39
10.13 0 0
x mean reject ? 10.13 sum di 0.11
avg d 0.037
2.5 x avg d 0.093
4 X avg d 0.147
Since 4 x avg d < di for the suspect value from the

mean, we could reject the suspect value. The 2.5d
is done identically except the multiplier is 2.5
instead of 4; 2.5d equals 0.093 or 0.09 in this
problem. Clearly the 2.5d rule allows easier
rejection than the 4d rule. The deviation of the
suspect value (0.39) could be rejected by both of
these criteria.
In the analysis of your laboratory results,

you may use any of the above tests in an
attempt to reject one suspect result; if you
meet the criterion for rejection, reject the
suspect value and state that basis in your
laboratory report.
Corrections to Errors in Earlier Slides
The following slides are corrections to the errors in the

earlier slides.
Clearly the suspect value undutifully influences the

mean value. To employ the Q-test we need the range
and the difference between the suspect value and the
value nearest it.
Range = (10.52 10.08) = 0.44
Difference of Suspect and its nearest value
= (10.52 10.19) = 0.33
Qcal = (xsuspect xnearest) / (Range)
= 0.33 / 0.44 = 0.75
Since Qcal < Qtable (0.75 < 0.76) we must retain the suspect
value at the 90% confidence level.
38
Referring to Table 4-4, textbook page 82 Qt = 0.76 for

n = 4 at the 90% Confidence Level. Thus we must
retain the suspect value by this criterion.
(Not in your textbook, but Qtable at the 96% confidence

level has a value of 0.85 for n = 4 a; by this criterion,
the suspect value of 10.52% would also be retained.)
aSkoog and West, Fundamentals of Analytical Chemistry, 4e, c1982,
CBS College Publishing, p62.
39
Note the correction (underlined) is the last statement in

the proceeding slide. I could not find a less restrictive
Q-Table (Confidence Level less than 90%). If such a
table exists, say at 50% CL, its Qtable would be less than
the 0.76 value at the 90% CL used in this problem.
di =
% Acid Note (x i -x mean ) |di |
10.19 0.06 0.06
10.08 -0.05 0.05
10.52 (?) 0.39
10.13 0 0
x mean reject ? 10.13 sum di 0.11
avg d 0.037
2.5 x avg d 0.093
4 X avg d 0.147
Since 4 x avg d < di for the suspect value from the

mean, we could reject the suspect value. The 2.5d
is done identically except the multiplier is 2.5
instead of 4; 2.5d equals 0.093 or 0.09 in this
problem. Clearly the 2.5d rule allows easier
rejection than the 4d rule. The deviation of the
suspect value (0.39) could be rejected by both of
these criteria.

Ch04le 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch04le 1

Uploaded by

Copyright:

Available Formats

Outline/Harris,2e

Mean value or average is a measurement of central

where i represents each individual measurement,

Ordinarily analytical chemists will use the first

The relative standard deviation may be

Relation standard deviation (%) = s x100/ xmean

Relation standard deviation (ppt) = s x 1000 /xmean

Range difference between the highest and lowest

The mean (xmean) = 111.47 / 6 = 18.578 = 18.58

xi (xmean xi) (xmean xi) 2

18.56% -0.02 0.0004

The relative standard deviation = s/xmean

This means that for real data of a small

The Students t test is a test developed by W. S.

Lets go back to the % chloride data and calculate the 50%,

Now repeating the calculation with the appropriate

At the 90% CI, = 18.58 (2.015)(0.079 / 6 =

Also note that the tolerance quantity ( t s / n) is reported

From the equation = xmean t s / n we see that

Problem For n = 3 the x mean and s were found to

Repeat the previous calculation for n = 7 with the

Relative uncertainty = (0.28/15.78) X 100 = 1.77%

The t test is also valuable to compare two different

Example As the director of a research laboratory

Data of Senior Scientist: xmean = 24.66%

Data of the New Kid: xmean = 24.55%

What we need to do here is the compare the two mean

spool = {(5 1)(0.06)2 + (7 1)(0.10)2 / (5 + 7 2)}1/2

spool = {(5)(0.0036) + (6)(0.010) / (10)}1/2 =

Note that the value of spool will always fall between

The testing for significant differences between the

The variance v is defined as the standard deviation squared,

Number of Number of Observations, Numerator

Problem Were there significant differences between the

Data of the New Kid: xmean = 24.55%, s = 0.10% for n = 7

For the new kid, v = (0.10)2 = 0.010;

1) The first test allowed us to test for significant

Problem Given the following set of data for the

% Acid 10.19% 10.08% 10.52% 10.13%

Calculate the mean values both retaining and rejecting the

Clearly the suspect value undutifully influences the

Referring to Table 4-4, textbook page 82 Qt = 0.76 for

(Not in your textbook, but Qtable at the 96% confidence

2) The 4d and 2.5d Rules

average deviation = i |(x i xmean)| / n

Since 4 x avg d < di for the suspect value from the

In the analysis of your laboratory results,

The following slides are corrections to the errors in the

Clearly the suspect value undutifully influences the

Referring to Table 4-4, textbook page 82 Qt = 0.76 for

(Not in your textbook, but Qtable at the 96% confidence

Note the correction (underlined) is the last statement in

Since 4 x avg d < di for the suspect value from the

You might also like