Professional Documents
Culture Documents
is related
to the actual or true device under test error
by the
equation
.
Errors (such as
and
), are statistical
quantities represented by random variables and characterized by probability density functions.
These distributions represent the relative likelihood of any specific error (
and
) or
measurement observation (
is only an approximation of reality
,
where
is what is obtained via the measurement process. At anytime, the precise value of
is
always unknown due to the possibility of measurement process errors
described by
2011 NCSL International Workshop and Symposium
uncertainty
and
The UUT is actually OUT of tolerance
and
The UUT is observed to be IN tolerance
Likewise incorrect (false) reject decisions are made where the condition exists that:
and
The UUT is actually OUT of tolerance
and
The UUT is observed to be IN tolerance
Integration over the entire joint probability region will yield a value of 1 as would be expected.
In the ideal case, if the measurement uncertainty was zero, the probability of measurement errors
occurring would be zero. The measurement observations would then reflect the behavior of
the UUT perfectly and the distribution of possible measurement results would be limited to the
distribution of actual UUT errors. That is,
would equal
, Excel
, or
commercially available risk analysis software. However, the collection, management, and
logistics of risk analysis can be quite cumbersome and time consuming, especially for
multifunction instrumentation.
To comply with Z540.3 (5.3b), the PFA must not exceed 2 %. However, computing an actual
numeric value for PFA is not necessarily required in order to comply with the 2 % rule. To
understand how this is possible, the boundary conditions of PFA can be investigated by varying
the TUR and EOPR over a wide range of values and observing the resultant PFA. This is best
illustrated by a three dimensional surface plot, where the x and y axis represent TUR and EOPR,
and the height of the surface on the z-axis represents PFA. See Figures 7 & 8.
9.80 V
9.85 V
9.90 V
9.95 V
10.00 V
10.05 V
10.10 V
10.15 V
#0 #1 #2 #3 #4 #5
V
o
l
t
a
g
e
Calibration Scenarios
Possible Measurement Results of a 10 V Source (UUT)
Measured Value (Observed)
UUT Actual (True)
Upper Tolerance Limit (+L )
Lower Tolerance Limit (-L)
UUT Nominal Voltage
e
uut
e
obs
e
std
False Reject
False Accept
e
uut
e
obs
e
std
e
obs
= e
uut
+ e
std
2011 NCSL International Workshop and Symposium
Figure 7. Topographical Contour Map of False Accept Risk as a Function of TUR and EOPR
Figure 8. Surface Plot of False Accept Risk as a Function of TUR and EOPR
2011 NCSL International Workshop and Symposium
This surface plot combines both aspects affecting false accept risk into one visual representation
that can further illustrate the relationship between the variables TUR and EOPR. One curious
observation is that the PFA can never be greater than 13.6 % for any combination of TUR and
EOPR. That is, all calibration processes result in a PFA of less than 13.6 %, regardless of how
low the TUR is and how low the EOPR is. No calibration scenario can be proposed which will
result in a PFA greater than 13.6 %. This maximum value of 13.6 % PFA results when the TUR
is approximately 0.3 to 1 and the EOPR is 41 %. Any change, higher or lower, for either the
TUR or EOPR will result in a PFA lower than 13.6 %.
One particularly useful observation is that, for all values of EOPR, the PFA never exceeds 2 %
when the TUR is above 4.6 to 1. That is, regardless of what the actual EOPR might be for a
UUT, the PFA has an upper boundary condition of 2 % as long as the TUR is greater than 4.6 to
1. Notice in Figure 8 that the darkest blue region of the PFA surface is always below 2 %. Even
if the TUR axis in the above graph is extended to infinity, the darkest blue PFA region would
continue to fall below the 2 % threshold. Calibration laboratory managers will find this an
efficient risk mitigation technique for compliance with Z540.3. This fact can eliminate the
burden of collecting, analyzing, and managing EOPR data in circumstances where the TUR has
been evaluated and shown to be greater than 4.6 to 1.
This concept can further be illustrated if the perspective (viewing angle) of the above surface
plot in Figure 8 is rotated. This allows the two dimensional maximum outer-envelope to be easily
viewed. With this perspective, PFA can be plotted only as a function of TUR (Figure 9). In this
instance, the worst-case EOPR is used whereby the maximum PFA is produced for each TUR.
Figure 9. Worst Case False Accept Risk vs. TUR
0 %
2 %
4 %
6 %
8 %
10 %
12 %
14 %
16 %
0 1 2 3 4 5 6 7
P
r
o
b
a
b
i
l
i
t
y
o
f
F
a
l
s
e
A
c
c
e
p
t
(
R
i
s
k
)
Test Uncertainty Ratio (TUR)
Max Risk vs TUR
(Assumes Worst-Case EOPR for a given TUR)
False Accept Risk is
always below 2 % for
TUR 4.6 to 1
2011 NCSL International Workshop and Symposium
There is no EOPR which would yield a PFA above 2 % for TURs greater than 4.6 to 1.
Therefore, whatever the EOPR is for items exhibiting TUR above 4.6 to 1, it is adequate and
need not be investigated to ensure that PFA is less than 2 %. The left-hand side of the graph in
Figure 9 might not appear intuitive at first. Why would the PFA suddenly decrease as the TUR
drops below 0.3 to 1 and approaches zero? While a full explanation is beyond the scope of this
paper, the answer lies in the number of items rejected (falsely or otherwise) when extremely low
TUR exists.
Another benefit of examining the boundary conditions of the surface plot can be realized by
noting that the PFA is always below 2 % where the true EOPR is greater than 95 %. This is true
regardless of how low the TUR is. Even cases with extremely low TURs (even below 1:1) will
always produce a PFA less than 2 % where the true EOPR exceeds 95 %. Again, if the
perspective of the PFA surface plot in Figure 8 is properly rotated, a two dimensional outer-
envelope is produced whereby PFA can be plotted only as a function of EOPR (Figure 10). In
this case, the worst-case TUR is used, maximizing the PFA. This results in a graph of PFA that is
a function only of the EOPR, with each instantaneous point computed at the TUR which
maximizes the PFA. In other words, a worst-case TUR has been assumed at each and every
point on the curve below. This curve represents the absolute worst possible PFA for any given
EOPR and knowledge of the TUR is not required.
Figure 10. Worst Case False Accept Risk vs. EOPR
0%
2%
4%
6%
8%
10%
12%
14%
16%
0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 %
P
r
o
b
a
b
i
l
i
t
y
o
f
F
a
l
s
e
A
c
c
e
p
t
(
R
i
s
k
)
True End Of Period Reliability (EOPR)
Max Risk vs EOPR
(Assumes Worst-Case TUR for a given EOPR)
False Accept Risk is
always below 2 % for
true EOPR 95 %
2011 NCSL International Workshop and Symposium
Figure 11. EOPR Data Three Cases (Poor, Moderate, Excellent)
As was the case of low TUR, a similar phenomenon is noted on the left-hand side of the graph in
Figure 10; the maximum PFA decreases for true EOPR values below 41 %. As EOPR
approaches zero on the left side, most of the UUT values lie far outside of the tolerance limits.
When the values are not in close proximity to the tolerance limits, the risk of falsely accepting an
item is low. Likewise on the right-hand side of the graph, where the EOPR is very good (near
100 %), the false accept risk is low. Both ends of the graph represent areas of low PFA because
most of the UUT values have historically been found to lie far away from the tolerance limits,
either significantly Out-of-tolerance (left side), or significantly In-tolerance (right side). The
PFA is highest, in the middle of the graph, where EOPR is only moderately poor and much of the
data is found near the tolerance limits. Refer to Figure 11.
8. True vs. Observed EOPR
Until now, this discussion has been limited to the concept of true EOPR. This caveat deserves
further attention. The idea of a true EOPR implies that an immaculate value for reliability exists,
which has not been influenced by any non-ideal factors. However, as with all empirical data,
this is not the case. In the calibration laboratory, reliability data is collected from a history of
real-world observations or measurements. These observations of UUTs are made using actual
equipment often expensive reference standards with very low uncertainty under controlled
conditions. But even the best available standards have some finite uncertainty and the UUT
itself often contributes noise and other undesirable effects. All of this uncertainty impinges on
the integrity of compliance decisions, manifesting in observed reliability data that is not a true
reflection of reality. The measurement uncertainty contaminates the EOPR data to some
degree, calling into question its validity. The observed EOPR is never a completely accurate
representation of the true EOPR.
Region 3
EOPR = 100 %
Region 2
EOPR = 50 %
Region 1
EOPR = 20 %
9.75 V
9.80 V
9.85 V
9.90 V
9.95 V
10.00 V
10.05 V
10.10 V
10.15 V
10.20 V
V
o
l
t
a
g
e
Low Risk High Risk Low Risk
EOPR Data
UUT As Received Condition
L
-L
2011 NCSL International Workshop and Symposium
The difference between the observed and true EOPR becomes more pronounced as the
measurement uncertainty increases (i.e. TUR drops). A low TUR can result in a significant
deviation between what is observed and what is true regarding the reliability data. This concept
has been presented in some depth previously [23, 29, 30, 31]. The reported or observed EOPR
from calibration history includes all influences from the measurement process. In this case, the
standard deviation of the observed distribution is given by
where
and
are derived from statistically independent events. The corrected or true standard
deviation can be approximated by removing the effect of measurement uncertainty and solving
for
where
uut
have been misrepresented.
0 %
2 %
4 %
6 %
8 %
10 %
12 %
14 %
16 %
0 % 20 % 40 % 60 % 80 % 100 %
P
r
o
b
a
b
i
l
i
t
y
o
f
F
a
l
s
e
A
c
c
e
p
t
(
R
i
s
k
)
End of Period Reliability
Max False Accept Risk vs EOPR
(Assumes Worst-Case TUR for a given EOPR)
Observed
TRUE
-
-
-
False Accept Risk
always below 2 % for
observed EOPR 89 %
2011 NCSL International Workshop and Symposium
is. Measurement uncertainty always hinders the quest for accurate data; it never helps. It should
be noted that the true value of a single data point may indeed be higher or lower than the
measured value, due to uncertainty. For any single instance, it is never known whether the
measurement uncertainty contributed an additive error or negative error. Therefore, it is not
possible to remove the effect of measurement uncertainty from a single measurement result. But
EOPR data is a historical collection of many Pass/Fail compliance decisions which can be
represented by a normal probability distribution with a standard deviation
. Sometimes the
measurement uncertainty
where
-1
represents the inverse normal distribution.
EOPR is a numerical quantity arrived at by statistical means applied to empirical data
analogous to a Type A estimate in the language of the GUM [33]. The data comes from repeated
measurements made over time rather than accepting manufacturers claims at face value -
analogous to Type B or heuristic estimates. However, the influence of the measurement process
is always present. This method of removing measurement uncertainty from the EOPR data is a
best estimate of the true reality or reliability which is sought through measurement.
There are many other factors that affect EOPR. For items which are routinely re-submitted to the
calibration lab, shortening or lengthening the calibration interval will affect EOPR. Laboratories
which are presently ISO-17025 compliant may not currently have the mechanisms in place to
determine and manage calibration intervals or adjustment policies. Such a policy has a direct
influence on the probability of false acceptance for the population of instruments. A
laboratorys adjustment policy can also directly affect the false accept risk to individual
instruments and to the population at large. Laboratories which require the technician to adjust or
align an instrument exhibiting an error, which exceeds some established percentage of its
allowable tolerance, can control false accept risk at the bench level and the program level. An
adjustment policy affects the false accept risk to the population of instruments by virtue of
affecting the In-tolerance probability or EOPR of all similar instruments.
To summarize, the 2 % PFA maximum boundary condition, formed by either 4.6 to 1 TUR or
89 % observed EOPR, can greatly reduce the effort and labor required for the modern calibration
laboratory in managing false accept risk for a significant portion of the M&TE submitted for
calibration. For some labs, obtaining EOPR data might be the most burdensome task, while
TUR could be the most difficult parameter for other labs to produce. The PFA boundary
condition can be leveraged from either perspective, providing benefit to practically all
laboratories. However, there will still be instances where the TUR is lower than 4.6 to 1 and the
observed EOPR is less than 89 %. In these instances, it is still possible for the PFA to be less
than 2 %. In these cases, a full PFA calculation is required to compute risk to show the 2 %
2011 NCSL International Workshop and Symposium
requirement has not been exceeded. However, other techniques can be employed to ensure that
the PFA is held below 2 %.
9. Guardbanding
During a compliance decision, it is sometimes helpful to establish acceptance limits A at the
time-of-test that are more stringent (tighter) than the manufacturers tolerance limits L.
Acceptance limits are often called guardband limits or test-limits. These time-of-test constraints
are only imposed during the calibration process when making compliance decisions in order to
reduce the risk of a false acceptance. It is only necessary to implement acceptance limits A,
which differ from the tolerance limits L, when the false accept risk is higher than desired or as
part of a program to keep risk below a specified level. Acceptance limits may be chosen to
mitigate risk at the bench level or program level. PFA calculations may be used to establish
acceptance limits based on the mandated risk requirements. In most instances, where guard
bands are applied, the tolerance limits are temporarily tightened or reduced to create
acceptance limits needed to meet a PFA goal. The subject of guardbanding is extensive and
novel approaches exists for computing and establishing acceptance limits to mitigate risk, even
where EOPR data is not available [25]. However, in the simplified case of no guardbanding, the
acceptance limits A are set equal to the tolerance limits L (A = L).
The Z540.3 Handbook references six possible methods to achieve compliance with the 2 % rule
[16]. One particularly useful method employing a guardbanding technique is described in
Method 6 of the Handbook [16, 25]. Strictly speaking, this method does not require EOPR data
to be available because the method relies on using worst-case EOPR, computed for a specified
TUR value. Using this approach, a guardband multiplier is computed as a function of TUR. The
acceptance limits are expressed as follows:
was
previously calculated by Dobbert [25] by fitting a line though the data points that mitigate risk to
a level of 2 % and is given by the following simplified formula.
It can be seen that the line is a good fit for the condition where . The intent was to
keep the equation simple for ease of use but cover the appropriate range of TUR values that
make physical sense. It has been shown in this paper that for .
To verify this we can set
. This
implies that a calibration lab could actually increase the acceptance limits A beyond the UUT
tolerances L and still be compliant with the 2 % rule. While not a normal operating procedure
for most calibration laboratories, setting guard band limits outside the UUT tolerance limits is
conceivably possible while maintaining compliance with the program level risk requirement of
Z540.3. In fact, laboratory policies often require items to be adjusted back to nominal for
observed errors greater that a specified portion of their allowable tolerance limit L.
2011 NCSL International Workshop and Symposium
Figure 13. Guardband Multiplier for Acceptable Risk Limits as a Function of TUR
10. Conclusion and Summary
Organizations must determine if risk is to be controlled for individual workpieces at the bench
level, or mitigated for the population of items at the program level
7
. Where ISO-17025 was not
specific in methodologies to account for the uncertainty of measurement, Z540.3 provides for
detailed treatment of the uncertainty during compliance testing and much more. Z540.3 points to
ISO-17025 as suitable for the core competency of calibration labs, but it also levies the
requirements of section 5.3. For the calibration program, it places an upper limit on the
probability associated with the risk of incorrectly accepting measurements as In-tolerance during
compliance decisions.
ISO-17025 contains no specific provision for utilizing a TUR as a method to account for the
uncertainty, while Z540.3 does allow the limited use of a 4 to 1 TUR. However, under Z540.3, a
4 to 1 TUR is provided only as a secondary alternative to the preferred method of limiting PFA
to less than 2 % and, even then, only when calculating PFA is impracticable. However, the
indiscriminate blanket use of 4 to 1 TUR, in lieu of calculating and limiting the PFA, is not
acceptable for compliance with Z540.3. It must first be demonstrated that the more rigorous,
primary method of limiting false accept risk to <2 % is not practicable. From this perspective,
laboratories which have operated in compliance with the 4 to 1 Test Accuracy Ratio (TAR)
requirement of ANSI/NCSL Z540-1 [6] cannot simply transition to Z540.3 by defaulting to the 4
to 1 TUR provision in Z540.3. Moreover, the definition differences in the Z540-1 TAR
7
Bayesian analysis can be performed to determine the risk to an individual workpiece using both the measured
value on the bench and program-level EOPR data to yield the most robust estimate of false accept risk. Such
discussions are deferred to other publications [32].
-175 %
-150 %
-125 %
-100 %
-75 %
-50 %
-25 %
0 %
25 %
50 %
75 %
0 2 4 6 8 10 12 14
G
u
a
r
d
b
a
n
d
M
u
l
t
i
p
l
i
e
r
(
M
)
Test Uncertainty Ratio (TUR)
Guardband Multiplier vs TUR
2 % PFA
1 % PFA
3 % PFA
5 % PFA
Dobbert 2008
2011 NCSL International Workshop and Symposium
requirement and the Z540.3 TUR requirement would also pose significant challenges to the
transition process. Compliance with the 2 % rule in Z540.3 must be accomplished by calculating
PFA and/or limiting its probability to less than 2 %.
Computation of PFA requires the use of double integral calculus formulas of joint probability
density functions, the solutions of which are not trivial. The input variables to these formulas
can be reduced to EOPR and TUR. While expanded uncertainties (and possibly TURs) may be
well documented for most labs which are already in compliance with ISO-17025, it is EOPR that
is of pivotal importance to organizations considering the adoption of Z540.3. For the purposes
of complying with the 2 % rule in Z540.3, properly calculating PFA requires the availability of
EOPR data.
While some calibrations laboratories currently maintain historical EOPR data, these metrics are
almost universally retained at the equipment-level (i.e. did the M&TE instrument as a whole,
Pass or Fail?). For multi-parameter, multi-test-point instruments, a single failure of one test-
point represents a Fail at the instrument-level. Historical data at many laboratories simply
indicates whether the instrument Passed or Failed; not which specific test-point was Out-of-
tolerance. Therefore, EOPR history data or metrics are not readily available at the test-point-
level for the vast majority of calibration laboratories. This can, however, be used as a
conservative estimate of EOPR for PFA evaluation. Even for laboratories which maintain a
comprehensive database of all measurement data ever acquired for all M&TE calibrated at the
test-point-level, these results are stored in a plethora of different manners and systems including
handwritten, hardcopy, paper datasheets. Obtaining and/or extracting this data for the
computation of PFA represent an enormous logistical challenge with a correspondingly high
cost. Moreover, if laboratories do not already posses this EOPR history, many years of
calibration events may be required to obtain the reliability data necessary for the calculation of
PFA. Therefore, efficient mitigation strategies are required.
There are six methods listed in the Z540.3 Handbook for complying with the 2 % false accept
risk requirement [16]. It should be noted that this paper has specifically focused on some efficient
approaches to this objective. This does not, in any way, negate the use of other methods nor
does it imply that the ones discussed here are necessarily the best methods for any particular
laboratory or program. It presents some efficient methods for taking the uncertainty into
account and mitigating risk, even where a numeric value for the measurement uncertainty or
EOPR might be unknown. Basic strategies for handling risk without rigorous computation of
PFA are:
Analyze EOPR data. This will most likely be done at the instrument-level, as opposed to
the test-point level, depending on data collection methods. If the observed EOPR data
meets the required level of 89 %, then the 2 % PFA rule has been satisfied.
If this is not the case, then further analysis is needed. TUR must be determined at each
test point. If the analysis reveals that the TUR is greater than 4.6 to 1, no further action is
necessary and the 2 % PFA rule has been met.
If neither the EOPR nor TUR threshold is met, a Method #6 guardband can be applied.
2011 NCSL International Workshop and Symposium
Compliance with the 2 % rule can be accomplished by either calculating PFA and/or limiting its
probability to less than 2% by the methods presented above. If these methods are not sufficient,
other strategies must be used. Beyond guardbanding, alternative methods of mitigating PFA are
available [16].
Determine if the acceptable tolerance can be expanded and still meet customer
requirement. This is sometimes known as a Limited Calibration to de-rated
specifications.
Determine if the measurement uncertainty of the calibration process can be improved
perhaps by changing calibration equipment, calibration method, etc. Shortening the
calibration interval may improve EOPR and reduces risk over time.
However a poorly specified UUT, which is being tested for compliance to overly optimistic
specifications, presents a serious problem for the calibration lab and for the end-user of M&TE.
In some circumstances, shortening the calibration interval, implementing conservative guard-
bands, and applying rigorous laboratory adjustment policies cannot render a dog into a gem.
Occasionally, no amount of effort or action on the part of the calibration laboratory can force a
UUT to comply with unrealistic expectations of accuracy and performance. Contacting the
manufacturer with this evidence may result in the issuance of revised/amended specifications of
a more realistic nature.
Assumptions, approximations, estimations, and uncertainty have always been part of metrology
and will continue to be present for all of time; there are few guarantees or absolutes. Practically
no process can guarantee that instruments will provide the desired accuracy, or will function
within their assigned tolerances during any particular application or use. However, through a
well managed calibration process, confidence can be attained that an instrument will perform as
expected and within limits. Quantification of this confidence can be realized via analysis of
uncertainty, EOPR, and false accept risk. Reducing the number of assumptions and improving
the estimations involved during calibration can provide higher levels of confidence, reduced risk,
and improved quality. Historical reliance on several key assumptions has limited the
effectiveness of time-honored practices to manage risk. Z540.3 provides the organizational
framework to advance the state of metrology beyond these conventional limitations and efficient
methods exists for compliance with this standard.
Acknowledgement
The authors would like to thank the many people who contributed to an understanding of the
subject matter presented here. Specifically, the contributions of Perry King (Bionetics), Scott
Mimbs (NASA), and Jim Wachter (Millennium Engineering and Integration) at Kennedy Space
Center were invaluable. Several graphics were generated using PTCs MathCad
14. Where
numerical methods were more appropriate, Microsoft Excel