You are on page 1of 59

Outline/Harris,2e

Chapter 4 - Statistics
We have already
seen in Chapter 3
that all laboratory
measurements
have errors.
The Two Types of Errors are

Determinate Errors

Random Errors
Determinate Errors systematic errors that cause
a measurement to always be too high or too low
which can be traced to an identifiable source.
Examples include
Use of an uncalibrated or faulty tool or
instrument
Use of wrong values, such as molar mass,
conversion factor, etc.
A good way to detect the existence of
determinate errors is to use different methods of
analyzing the same material.
Random Errors errors that are random in nature.
They occur when a calibrated instrument is correctly
used to its most sensitive degree of measurement.
For example, using the analytical balance (sensitivity
of 0.1mg), you see variations in the last digit when
re-weighing the same object. In this chapter we will
focus our attention on ways to evaluate random
errors.
o All measurements have some random errors.
No measurements contain experimental errors.
Statistics allows us to accept conclusions that
have a high probability of being correct and to
reject conclusions that have a low probability.
o Statistics apply only to random errors; the
analyst would eliminate all determinate errors
before making sensitive measurements.
Random errors follow a Gaussian distribution of
values about the central measurement.
A Gaussian distribution is characterized by the mean
value and a standard deviation

Mean value or average is a measurement of central


tendency

xmean = i (xi) / n

where i represents each individual measurement,


means the summation, and n is the number of
measurements in that set of data.
Standard deviation is the measure of the width
of the distribution about the central value.
______________
s= i (xi xmean) 2 / (n-1)
The above defined standard deviation is for a
limited or small set of data; for a large set of data
the standard deviation is indicated by and is
defined as
______________
= i (xi xmean) 2 / n
As the size of the data set increases there (n 1) n, so
s

Ordinarily analytical chemists will use the first


value (s) for the standard deviation since we will
typically deal with a small population or small
data set.
The larger the
value of s, the
broader is the
Gausssian curve.
The relative standard deviation is the standard
deviation divided by the mean value, that is
s / xmean.

The relative standard deviation may be


expressed in % (parts per hundred) or ppt (parts
per thousand)

Relation standard deviation (%) = s x100/ xmean

Relation standard deviation (ppt) = s x 1000 /xmean


Other important terms
Median middle value in an ordered set (ascending
or descending); when n is an even number, it is the
average of the 2 middle values.

Range difference between the highest and lowest


values in the set of data. May be stated as
(High Low) or that value.
For example in a set of data where 25.11 is the
highest and 24.85 the lowest, we could describe
the range as (25.11 24.85) or 0.26
Find the mean, median, standard deviation,
relative standard deviation and the range of the
following set of student data acquired in the
analysis of chloride in a sample:
xi
18.56%
18.65%
18.49%
18.54%
18.70%
18.53%
The sum of the individual values = 111.47
there were 6 measurements

The mean (xmean) = 111.47 / 6 = 18.578 = 18.58


The quantity (xmean xi) is calculated next

xi (xmean xi) (xmean xi) 2

18.56% -0.02 0.0004


18.65% 0.07 0.0049
18.49% -0.09 0.0081
18.54% -0.04 0.0016
18.70% 0.12 0.0144
18.53% -0.05 0.0025
The sum of deviations squared
( i (xmean xi)2 )= 0.0319

0.0319/(6-1) = 0.00638
________
s = (0.00638) = 0.0798 = 0.08 or 0.080 to use the
authors method.
Note that s is reported to the number of decimal places as the data.

The relative standard deviation = s/xmean


= 0.0798 x 100 / 18.578 = 0.429
this could be reported as 0.43% or 4.3 ppt
To find the median, first arrange in order (a/d);
I choose d(escending)
x I 18.70 18.65 18.56 18.54 18.53 18.49
1 2 3 4 5 6
Since n = 6, the median is between ordered #3
and #4, so the median = (18.56 + 18.54)/2 =

median = 18.55
For the ideal Gaussian distribution 68.3% of the
measurements lie within 1 (standard
deviation) of the mean value, 95.5% within 2
and 99.7% within 3.

This means that for real data of a small


population we can expect only 4.5% to fall
outside the 2s limits and only 0.3% outside the
3s limits from the mean value.
Students t test

The Students t test is a test developed by W. S.


Gossett who used the pseudonym Student to
publish this statistical test in 1908. It is used to
express confidence intervals for a set of data and to
statistically compare the results of different
experiments.
Students t test
The true mean is denoted as . From a small
number of data points it is not possible to determine
either or . Instead, we have xmean and s. We would
like to be able to state the probability that the true value
is within some quantity of xmean . The confidence
interval does this in the form
= xmean t s / n
and may stated at a certain probability such as 90%,
95%, or 99%, etc. The values of t for various degrees of
freedom and confidence levels are shown in Table 4-2,
page 78 of your textbook.
Students t test

Lets go back to the % chloride data and calculate the 50%,


90%, 95% and 99% confidence intervals for the results.
xi xmean = 18.58 s = 0.08
18.56%
18.65%
18.49%
18.54%
18.70%
18.53%
At the 50% CI, = 18.58 (0.727)(0.079 / 6 = 18.58 0.023
= 18.58 0.02. Note that the value for t is at the intersection of
the 50% column and the row for number of degrees of freedom = 5
Students t test

Now repeating the calculation with the appropriate


values of t

At the 90% CI, = 18.58 (2.015)(0.079 / 6 =


18.58 0.065 = 18.58 0.07
At the 95% CI, = 18.58 (2.571)(0.079 / 6 =
18.58 0.082 = 18.58 0.08.
At the 99% CI, = 18.58 (4.032)(0.079 / 6 =
18.58 0.130 = 18.58 0.13.
Students t test
Note that the tolerance quantity ( t s / n) becomes larger
as we increase the percent probability that we desire to
include. Or, another way of looking at it is that at the 50%
CI there is a 50% probability that the true value () lies
outside the 0.02, whereas at the 99% CI that is a 1%
probability that lies outside the 0.13

Also note that the tolerance quantity ( t s / n) is reported


to the same number of decimal places as the mean value,
though I carried an extra place through the calculation and
rounded after the final step.
Students t test

From the equation = xmean t s / n we see that


the size of the ( t s / n) is inversely proportional
to the n; thus, one way to increase the probability
that a x mean value is close to the true value is to
increase the number of results, assuming that x mean
and s are not affected by the multiple runs.
Students t test

Problem For n = 3 the x mean and s were found to


be 15.78 and 0.30 respectively. Calculate the 95%
confidence interval.
For n = 3, (n - 1) = 2; t 95, 2 = 4.303
= 15.78 (4.303)(0.30 / 3 = 15.78 0.745
= 15.78 0.75
Relative uncertainty = (0.75/15.78) X 100 = 4.75%
Students t test

Repeat the previous calculation for n = 7 with the


same x mean and s values:
For n = 7, (n-1) = 6; t 95, 6 = 2.447
= 15.78 (2.447)(0.30 / 7 = 15.78 0.277 =
= 15.78 0.28

Relative uncertainty = (0.28/15.78) X 100 = 1.77%


Students t test

The t test is also valuable to compare two different


sets of data to determine if they are the same or
different, or stated statistically, are there
significant differences between the two sets of
data?
Students t test

Example As the director of a research laboratory


you are paid to decide if there is a significant
difference between the mean values of two sets of
data obtained by two different scientists, a senior
scientist and one recently hired.

Data of Senior Scientist: xmean = 24.66%


with s = 0.06% for n = 5

Data of the New Kid: xmean = 24.55%


with s = 0.10% for n = 7
Students t test

What we need to do here is the compare the two mean


values, x1 mean to x2 mean as their difference (x1mean- x2 mean) to
( t s / n). Because there are two different standard
deviations, we need to calculate the pooled standard
deviation, spool which is defined as
_________________________________
spool = {(n1 1)s12 + (n2 1)s22} / (n1 + n2 2)

spool = {(5 1)(0.06)2 + (7 1)(0.10)2 / (5 + 7 2)}1/2


Students t test

spool = {(5)(0.0036) + (6)(0.010) / (10)}1/2 =


{(0.018 + 0.060) / (10)}1/2 = {0.0078 }1/2
spool = 0.088 = 0.09

Note that the value of spool will always fall between


the two individual values of s; it is like a weighed
average value.
Students t test
____________
Test if |(x1 mean- x2 mean)| > t spool / n1 + n2 / n1 n2 ) ?
We will use the value of t 95 for 7 + 5 2 or 10 degrees of
freedom; according to Table 4-2, t 95,10 = 2.228.
Substitution, is | 24.66 24.55| > {(2.228)(0.088) / (12/35)}1/2 ?
0.11 > {(0.196) / (0.343)}1/2 ?
0.11 > {(0.196) / (0.343)}1/2 ?
0.11 > {(0.572)}1/2 ?
0.11 > 0.756 ?
No, there is no significant difference between the mean values
of the two scientists.
Students t test

The testing for significant differences between the


true value () and the mean value (xmean) of a set of
data is very similar to the previous test.
If |( - x mean)| > t spool / n1 + n2 / n1 n2 ), there is a
significant difference between the true value and the
mean.
F test for Differences in Precisions
In addition to comparing a mean value to the true value and
two mean values, it is often valuable to compare the
precisions of two different sets of data. Your textbook
does not discuss this test, so I will briefly explain it and
apply it to a typical problem.

The variance v is defined as the standard deviation squared,


that is, v = s2. Variances are calculated for both sets of
data. The larger variance is placed in the numerator of a
term known as Fc and defined as Fc = vlarger / vsmaller. The
value of Fc is then compared to the tabulated values of Ft at
a specified confidence level, generally 95%.
F test for Differences in Precisions
Values for Ft For the Comparison of Variances
at the 95% Confidence Level

Number of Number of Observations, Numerator


Observations, ----------------
----------------
----------------
----------------
----------------
----------------
--------
Denominator 3 4 5 6 7 10
3 19.00 19.16 19.25 19.30 19.33 19.38 19.50
4 9.55 9.28 9.12 9.01 8.94 8.81 8.53
5 6.94 6.59 6.39 6.26 6.16 6.00 5.63
6 5.79 5.41 5.19 5.05 4.95 4.78 4.36
7 5.14 4.76 4.53 4.39 4.28 4.10 3.67
10 4.26 3.86 3.63 3.48 3.37 3.18 2.71
2.99 2.60 2.37 2.21 2.09 1.88 1.00
F test for Differences in Precisions

Problem Were there significant differences between the


precisions of the two scientists in the last problem above?
Data of Senior Scientist: xmean = 24.66%, s = 0.06% for n = 5

Data of the New Kid: xmean = 24.55%, s = 0.10% for n = 7

For the new kid, v = (0.10)2 = 0.010;


For the senior scientist, v = 0.0036.
Fc = (0.010 / 0.0036) = 2.78. From the Ft table, Ft = 6.16.
Since Fc < Ft there are no significant differences between the
precisions of the two scientists.
Conclusions of the Differences between Mean
Values and Precisions of the Two Scientists

1) The first test allowed us to test for significant


differences in the mean values obtained by the two
scientists. Since the difference in the 2 mean values
was less than the tolerance quantity, there is no
significant difference between the mean values of the
two scientists at the 95% confidence level.
2) The second test (F-test) allowed us to test for
differences in the precision of the two scientists. Since
the calculated value of F cal < Ftable , there is no
significant difference between the precision of the 2
scientists at the 95% confidence level.
Rejection of Suspect Data

1) The Q-Test
Occasionally in a set of data there is one value that appears
to not belong with the rest of the set. If the experimenter
is aware of some mistake or malfunction, she/he do not
need to employ one of these tests to reject that result. If no
known error has occurred (so that the suspect result
appears to be random, the analyst is then faced with
whether to retain or reject this suspect value. He/she needs
some sound basis for their decision, not just eyeballing it.
Your textbook describes one such test, the Q-test. After I
have discussed the Q-test, I will then discuss two
additional less rigorous, but useful tests for rejection of
suspect data.
Rejection of Suspect Data

Problem Given the following set of data for the


determination of % Acidic Substance in a Cleansing
Agent. May the suspect result be rejected, or must it be
retained by the criteria of the Q-test?

% Acid 10.19% 10.08% 10.52% 10.13%

Calculate the mean values both retaining and rejecting the


suspect value (which is the 10.52 result).
xmean (retaining) = 10.23%
xmean (rejecting) = 10.13%
Rejection of Suspect Data

Clearly the suspect value undutifully influences the


mean value. To employ the Q-test we need the range
and the difference between the suspect value and the
value nearest it.
Range = (10.52 10.08) = 0.44
Difference of Suspect and its nearest value
= (10.52 10.19) = 0.33
Qcal = (xsuspect xnearest) / (Range)
= 0.33 / 0.44 = 0.75
Since Qcal < Qtable (0.75 < 0.76) we must retain the suspect
value at the 90% confidence level.
38
Rejection of Suspect Data

Referring to Table 4-4, textbook page 82 Qt = 0.76 for


n = 4 at the 90% Confidence Level. Thus we must
retain the suspect value by this criterion.

(Not in your textbook, but Qtable at the 96% confidence


level has a value of 0.85 for n = 4 a; by this criterion,
the suspect value of 10.52% would also be retained.)
aSkoog and West, Fundamentals of Analytical Chemistry, 4e, c1982,
CBS College Publishing, p62.
39
Rejection of Suspect Data

2) The 4d and 2.5d Rules


Although less rigorous, this test may also be used to
decide whether to retain or reject a suspect. In order to
use it, one needs to calculate the average deviation
which is defined as

average deviation = i |(x i xmean)| / n

40
Rejection of Suspect Data

di =
% Acid Note (x i -x mean ) |di |
10.19 0.06 0.06
10.08 -0.05 0.05
10.52 (?) 0.39
10.13 0 0
x mean reject ? 10.13 sum di 0.11
avg d 0.037
2.5 x avg d 0.093
4 X avg d 0.147
Rejection of Suspect Data

Since 4 x avg d < di for the suspect value from the


mean, we could reject the suspect value. The 2.5d
is done identically except the multiplier is 2.5
instead of 4; 2.5d equals 0.093 or 0.09 in this
problem. Clearly the 2.5d rule allows easier
rejection than the 4d rule. The deviation of the
suspect value (0.39) could be rejected by both of
these criteria.
Rejection of Suspect Data

In the analysis of your laboratory results,


you may use any of the above tests in an
attempt to reject one suspect result; if you
meet the criterion for rejection, reject the
suspect value and state that basis in your
laboratory report.
Corrections to Errors in Earlier Slides

The following slides are corrections to the errors in the


earlier slides.
Rejection of Suspect Data

Clearly the suspect value undutifully influences the


mean value. To employ the Q-test we need the range
and the difference between the suspect value and the
value nearest it.
Range = (10.52 10.08) = 0.44
Difference of Suspect and its nearest value
= (10.52 10.19) = 0.33
Qcal = (xsuspect xnearest) / (Range)
= 0.33 / 0.44 = 0.75
Since Qcal < Qtable (0.75 < 0.76) we must retain the suspect
value at the 90% confidence level.
38
Rejection of Suspect Data

Referring to Table 4-4, textbook page 82 Qt = 0.76 for


n = 4 at the 90% Confidence Level. Thus we must
retain the suspect value by this criterion.

(Not in your textbook, but Qtable at the 96% confidence


level has a value of 0.85 for n = 4 a; by this criterion,
the suspect value of 10.52% would also be retained.)
aSkoog and West, Fundamentals of Analytical Chemistry, 4e, c1982,
CBS College Publishing, p62.
39
Rejection of Suspect Data

Note the correction (underlined) is the last statement in


the proceeding slide. I could not find a less restrictive
Q-Table (Confidence Level less than 90%). If such a
table exists, say at 50% CL, its Qtable would be less than
the 0.76 value at the 90% CL used in this problem.
Rejection of Suspect Data

di =
% Acid Note (x i -x mean ) |di |
10.19 0.06 0.06
10.08 -0.05 0.05
10.52 (?) 0.39
10.13 0 0
x mean reject ? 10.13 sum di 0.11
avg d 0.037
2.5 x avg d 0.093
4 X avg d 0.147
Rejection of Suspect Data

Since 4 x avg d < di for the suspect value from the


mean, we could reject the suspect value. The 2.5d
is done identically except the multiplier is 2.5
instead of 4; 2.5d equals 0.093 or 0.09 in this
problem. Clearly the 2.5d rule allows easier
rejection than the 4d rule. The deviation of the
suspect value (0.39) could be rejected by both of
these criteria.

You might also like