Professional Documents
Culture Documents
1. Introduction
Statistical tolerance intervals are important in many industrial applications
ranging from engineering to the pharmaceutical industry. See, for example, Hahn
and Chandra (1981) and Hahn and Meeker (1991). The goal of a tolerance
interval is to contain at least a specied proportion of the population, , with a
specied degree of condence, 1. More specically, let X be a random variable
with cumulative distribution function F . An interval (L(X), U (X)) is said to be a
-content, (1)-condence tolerance interval for F (called a (, 1) tolerance
interval for short) if
P {[F (U (X)) F (L(X))] } = 1 .
(1.1)
906
mainly due to the diculty in deriving explicit expression for the tolerance intervals in the discrete case. Zacks (1970) proposed a criterion to select tolerance
limits for monotone likelihood ratio families of discrete distributions. The most
widely used tolerance intervals to date for Poisson and Binomial distributions
were proposed by Hahn and Chandra (1981). The intervals are constructed by a
two-step procedure. See Hahn and Meeker (1991) for a survey of these intervals.
Although tolerance intervals are useful and important, their properties, such
as their coverage probability, have not been studied as much as those of condence
intervals. As we shall see in Section 2, the tolerance intervals given in Hahn
and Chandra (1981) tend to be very conservative in terms of their coverage
probability. Techniques for the construction of tolerance intervals in the literature
often vary from distribution to distribution.
In this paper, we introduce a unied analytical approach using the Edgeworth
expansions for the construction of tolerance intervals for the discrete distributions
in exponential families with quadratic variance functions. We show that these
tolerance intervals enjoy desirable probability matching properties and outperform existing tolerance intervals in the literature. The most satisfactory aspects
of our results are the constancy of the phenomena, and uniformity in the nal
resolutions of these problems. Edgeworth expansions have also been used very
successfully for the construction of condence intervals in discrete distributions.
See Hall (1982), Brown, Cai and DasGupta (2002, 2003), and Cai (2005). Construction of tolerance interval is closely related to the construction of condence
interval for quantiles. A one-sided tolerance bound is equivalent to a one-sided
condence bound on a quantile of the distribution. See Hahn and Meeker (1991).
Therefore, the proposed method can be also employed for the quantile estimation
problem.
The paper is organized as follows. We begin in Section 2 by briey reviewing the existing tolerance intervals for Binomial and Poisson distributions and
showing that they have serious deciencies in terms of coverage probability. The
serious deciency of these intervals calls for better alternatives. After Section 3.1,
in which basic notations and denitions of natural exponential family are summarized, the rst-order and second-order probability matching tolerance intervals
are introduced. As in the case of condence intervals, the coverage probability
of the tolerance intervals for the lattice distributions such as Binomial and Poisson distributions contains two components: oscillation and systematic bias. The
oscillation in the coverage probability, which is due to the lattice structure of
the distributions, is unavoidable for any non-randomized procedures. The systematic bias, which is large for many existing tolerance intervals, can be nearly
eliminated. We show that our new tolerance intervals have better coverage properties in the sense that they have nearly vanishing systematic bias in all the
distributions under consideration.
907
In Section 4, two-sided tolerance intervals are constructed by using onesided upper and lower probability matching tolerance bounds. In addition to the
coverage properties, parsimony in expected length of the two-sided intervals is
also discussed. Section Appendix is an appendix containing detailed technical
derivations of the tolerance intervals. The derivations are based on the two-term
Edgeworth and Cornish-Fisher expansions.
2. Tolerance Intervals: Existing Methods
As mentioned in the introduction, we construct tolerance intervals for discrete distributions in the exponential families. In this section we review the existing tolerance intervals for two important discrete distributions, the Binomial
and Poisson distributions. These tolerance intervals will be used for comparison
with the new intervals constructed in the present paper.
The most widely used method for constructing tolerance intervals for the
Binomial and Poisson distributions was proposed by Hahn and Chandra (1981).
Suppose x is the observed value of a random variable X having a Binomial
distribution B(n, ) or a Poisson distribution P oi(n), and that one wishes to
construct a tolerance interval based on x. The method introduced by Hahn and
Chandra (1981) for constructing a (, 1 ) tolerance interval (L(x), U (x)) has
two steps.
(i) Construct a two-sided (1 )-level condence interval (l, u) for , where l
and u depends on x.
(ii) Find the minimum number U (x) and the maximum number L(x) such that
p(X U (x)| = )
1+
2
1+
.
2
908
where F(a;r1 ,r2 ) denotes the 1 a quantile of the F distribution with r1 and r2
degrees of freedom. For the Poisson distribution, the suggested (1) condence
intervals in Hahn and Meeker (1991) are
( )1/2
(l, u) = z/2
,
n
)
(
2(1/2;2x+2)
2(/2;2x)
, 0.5
,
(l, u) = 0.5
n
n
(2.3)
(2.4)
where 2(a;r1 ) is a quantile of the chi-square distribution with r1 degrees of freedom. The in (2.1) and (2.3) denotes the sample mean. The condence bounds
for one-sided tolerance intervals in both Binomial and Poisson distributions are
given analogously.
Figure 1 presents the coverage probabilities of both the two-sided and onesided tolerance intervals for the Binomial and Poisson distributions. It can easily
be seen from the plots that these tolerance intervals are too conservative with
higher or lower coverage probability than the nominal level for both distributions.
As in the case of condence intervals, the coverage probability of the tolerance intervals contains two components: oscillation and systematic bias. The
oscillation in the coverage probability, due to the lattice structure of the Binomial and Poisson distributions, is unavoidable for any non-randomized procedures. However, the systematic biases for the existing tolerance intervals are
signicantly larger than we expected. This is partly due to the poor behavior of
the condence intervals used in the construction of the tolerance intervals. Note
that (2.1) and (2.2) are the Wald and Clopper-Pearson intervals for the binomial
proportion. It is known that these condence intervals have poor performances.
See, for example, Agresti and Coull (1998) and Brown et al. (2002). It is thus
possible to improve the performance of the tolerance intervals by using better
condence intervals, like those presented in Brown et al. (2002) for the binomial
distribution.
However, we do not take this approach here. The goal of this paper is to
provide a unied analytical approach to the construction of desirable tolerance
intervals for exponential families with certain optimality properties. The results
show that the Edgeworth expansion approach is a powerful tool for solving this
problem.
In Section 3 we introduce new tolerance intervals using the Edgeworth expansion. These intervals have better coverage properties in the sense that they
have nearly vanishing systematic bias. Figure 3 presents the coverage probabilities of the proposed two-sided tolerance intervals for the Binomial and Poisson
909
910
The cumulants are given as Kr = (r) (). Let 3 and 4 denote the skewness and
kurtosis. In the subclass with a quadratic variance function (QVF), the variance
00 () depends on only through the mean and, indeed,
2 V () = d0 + d1 + d2 2
(3.1)
(3.2)
The notation is used in the statements of theorems for both the discrete and
the continuous cases, although for all the discrete cases happens to be equal
to 1. Note that d/d = 00 () = 2 , so
K3 = (3) () =
dV d
dK3 d
= 2+6d2 4 .
d d
d d
911
Hence,
3 =
K3
K4
= (d1 + 2d2 ) 1 and 4 = 4 = 2 + 6d2 .
3
(3.3)
Here are the important facts about the Binomial, Negative Binomial, and
Poisson distributions.
Binomial, B(1, p): = log(p/q), () = log(1 + e ), and h(x) = 1. Also = p,
V () = pq = 2 . Thus d0 = 0, d1 = 1, d2 = 1,
3 =
1 2
,
((1 ))1/2
and 4 =
1 6 + 62
.
(1 )
Negative Binomial, NB(1, p), the number of successes before the rst failure:
= log p, () = log(1 e ), and h(x) = 1, where p is the probability of
success. Here = p/q, and V () = p/q 2 = + 2 , so d0 = 0, d1 = 1, d2 = 1,
3 =
1 + 2
,
((1 + ))1/2
and 4 =
1 + 6 + 62
.
(1 + )
1
1/2
and 4 =
1
.
(3.4)
where the rst O(n1/2 ) term, S1 n1/2 , and the rst O(n1 ) term, S2 n1 , are the
rst and second order smooth terms, respectively, and Osc1 n1/2 and Osc2 n1
are the oscillatory terms. (The oscillatory terms vanish in the case of continuous
912
distributions.) The smooth terms capture the systematic bias in the coverage
probability. See Bhattacharya and Rao (1976) and Hall (1982) for details on
Edgeworth expansions.
We call a tolerance interval rst-order probability matching if the rst order smooth term S1 n1/2 vanishes, and call the interval second-order probability
matching if both S1 n1/2 and S2 n1 vanish. Note that the oscillatory terms are
unavoidable for any nonrandomized procedures in the case of lattice distributions. See Ghosh (1994) and Ghosh (2001) for general discussions on probability
matching condence sets.
Motivated by the discussion given at the end of this section, we consider an
approximate -content, (1 )-condence lower tolerance bound of the form
d1 X
d2 X 2
L(X) = X + a b n(d0 +
+
) + c,
(3.5)
n
n2
where d0 , d1 and d2 are the constants in (3.1), and a, b, and c are constants
depending on and such that
L(X) < L(Y ) if X < Y.
(3.6)
Remark. The quantity a in (3.5) re-centers the tolerance interval and, we see
later, a is important to the performance of the tolerance interval. The quantity
c in (3.5) plays the role of a boundary correction. The parameter c is to be
adjusted such that the coecients S1 and S2 of the smooth terms n1/2 and
n1 in (3.4) can be zero. The eect of c can be signicant when is near the
boundaries.
We use the Edgeworth expansion to choose the constants a, b, and c so
that the resulting tolerance intervals are rst-order and second-order probability matching. The rst step in the derivation is to invert the constraint
1 F (L(X)) to a constraint on X of the form X u(, ). Then the
coverage probability of the tolerance interval can be expanded using the Edgeworth expansion. The optimal choice of the values a, b, and c can then be solved
by setting the smooth terms in the expansion to zero. The algebra involved here
is more tedious than for deriving the probability matching condence interval.
The detailed proof is given in the Appendix.
Theorem 1. The tolerance interval given in (3.5) is rst-order probability matching for the three discrete distributions in the NEF-QVF if
1 2
a = [(z1
1)(1 + 2d2
) + (1 + 3z z1 + 2z2 )(d1 + 2d2
)],
6
b = z + z1 ,
(3.7)
(3.8)
913
and c = 0, where
= X/n and
= d0 + d1
+ d2
2 . The tolerance interval
(3.5) is second-order probability matching with a and b given as in (3.7) and (3.8),
and c given by
c=
{
1
3
(1 + 18d0 d2 + 2(8 + 9d1 )d2
+ 2d22
2 )z1
36(z + z1 )
2
+24d2 (d0 +
(d1 + d2
))z1
z + z1 [1 + 2d2
(20 + 5d2
+ 24d2
z2 )
d2 X 2
d1 X
X + a + b n(d0 +
+
),
(3.10)
n
n2
d1 X
d2 X 2
X + a + b n(d0 +
+
) + c,
(3.11)
n
n2
respectively, with the same a, b, and c as the lower tolerance intervals.
For all three distributions, b = z + z1 . It is useful to give the expressions
of the constants a and c individually for each of the three distributions.
1
)(z + z1 )(2z + z1 ), and
1. Binomial: a = (1 2
6
c=
1
1
2
2
(13z2 + 11z z1 + z1
+ 5)(
2 ) + (2z2 + z z1 z1
+ 7).
18
36
1
1
2
2. Poisson: a = (z + z1 )(2z + z1 ) and c =
(7 z1
+ z z1 + 2z2 ).
6
36
1
1
)(z + z1 )(2z + z1 ) and c =
(13z2 +
3. Negative Binomial: a = (1 + 2
6
18
1
2
2
11z z1 + z1
+ 5)(
+
2 ) + (2z2 + z z1 z1
+ 7).
36
We consider the discrete NEF-QVF family because the most important discrete distributions are there. Our results can be generalized to the discrete NEF
family with variance functions of the form V () = d0 + d1 + d2 2 + d3 3 + d4 4 ,
because there exists a general solution for V () = 0.
Figure 2 plots the coverage probabilities of the rst-order and second-order
probability matching (0.9, 0.95) lower tolerance intervals for n = 50. It is clear
from Figure 2 that for the three discrete distributions, the rst and second order
probability matching tolerance intervals have nearly vanishing systematic bias,
914
and the second order probability matching interval has noticeably smaller systematic bias than the rst order probability matching interval for the Negative
Binomial distribution. Since there are oscillatory terms for the discrete distributions in the Edgeworth expansion, we mainly evaluate the performances of the
tolerance intervals in terms of the smooth terms. Here, we evaluate the performance of a tolerance interval by checking if their coverage probabilities can
approximate the nominal level well. The coverage probability of the proposed
one-sided tolerance interval can be closer to the nominal level than that of the
existing tolerance intervals, comparing Figure 2 with Figure 1.
915
The motivation for considering the form (3.5) is briey described as follows.
Let X1 , . . . , Xn be a sample from a normal distribution N (, 2 ). Wald and
Wolfowitz (1946) introduced the -content, (1 )-condence tolerance interval
1
n1
+
tS, X
tS],
(3.12)
[X
2n1,
2n1,
and S are the sample mean and sample standard deviation, respectively,
where X
2
n1, is the quantile of the chi-squared distribution with n 1 degrees of
freedom, and t is the solution of the equation
1 +t
n
1
2
ex /2 dx = .
1
2
t
n
To make a better analogy between the NEF-QVF families and normal cases,
we
the tolerance interval in (3.12) in terms of X =
n rst attempt to rewrite
2 ). Note that (3.12) implies
X
under
N
(n,
n
i=1 i
n1
n1
tS) , (X
tS) ),
1 P (, (X +
2
n1,
2n1,
where , denotes the cdf of the N (, 2 ) distribution. Since
1
n1
(n + n(X
)
tS)
=
t nS), (3.13)
, (X
n,
n
2n1,
2n1,
and replacing by the lower or upper limits of a -condence condence interval
+ z(1)/2 S/n) for , we have the tolerance interval
z(1)/2 S/n, X
(X
1
1
n1
n1
t+(1 )z(1)/2 ) ns, X +(
t+(1 )z(1)/2 ) ns]
[X (
2
2
n
n
n1,
n1,
under the N (n, n 2 ) distribution.
For the NEF-QVF families, by the Central Limit Theorem, and adopting
the method in the normal case by identifying d0 + d1 X/n + d2 X 2 /(n2 ) as s2 ,
an approximate -content, (1 )-condence tolerance lower
bound and upper
1/ n)z(1)/2 ) n(d0 + d1 X/n + d2 X 2 /n2 ). More generally, we consider tolerance bounds of the form
d1 X
d2 X 2
L(X) = X + a b n(d0 +
+
)+c
n
n2
d1 X
d2 X 2
+
)+c
U (X) = X + a + b n(d0 +
n
n2
with suitably chosen constants a, b, and c.
916
(4.1)
917
918
length of the proposed tolerance intervals is less than that of the existing tolerance intervals. Thus, based on both coverage probability and expected length,
the tolerance intervals derived from our analytical approach outperform existing
tolerance intervals.
Acknowledgement
The research of Tony Cai was supported in part by NSF Grant DMS-0604954.
Appendix. Proof of Theorem 1
We begin by introducing notation and a technical lemma. All three discrete
distributions under consideration are lattice distributions with the maximal span
of one. Lemma 1 below gives the Edgeworth expansion and Cornish-Fisher expansion for these distributions. The rst part is from Brown et al. (2003). For
details on the Edgeworth expansion and Cornish-Fisher expansion, see Esseen
(1945), Petrov (1975), Bhattacharya and Rao (1976), and Hall (1982).
Let X1 , . . . , Xn be iid observations from a discrete distribution in the NEFQVF family. Denote the mean of X1 by and the standard deviation by . Let
3 = K3 / 3 and 4 = K4 / 4 be the skewness and kurtosis of X1 , respectively. Set
)/, where X
= X/n. Let Fn (z) = P (Zn z)
X = n1 Xi and Zn = n1/2 (X
be the cdf of Zn and let fn,, = inf{x : P (X x) 1 } be the 1 quantile
of the distribution of X.
Lemma 1. Suppose z = z0 + c1 n1/2 + c2 n1 + O(n3/2 ), where z0 , c1 and c2
are constants. Then the two-term Edgeworth expansion for Fn (z) is
Fn (z) = (z0 ) + p1 (z)(z0 )n1/2 + p2 (z)(z0 )n1 + Osc1 n1/2 + Osc2 n1
+O(n3/2 ),
(A.1)
where Osc1 and Osc2 are bounded oscillatory functions of and z, and
1
p1 (z) = c1 + 3 (1 z02 ),
6
1 2 1 3
1
p2 (z) = c2 z0 c1 + (z0 3z0 )3 c1 4 (z03 3z0 )
2
6
24
1 2 5
3
3 (z0 10z0 + 15z0 ),
72
1
p3 (z) = c1 + 3 (z02 3).
6
(A.2)
(A.3)
(A.4)
919
920
1
1
1
(d1 + 2d2 )a + b2 d21 + c
2
8
2
}
1
1
1
2
2
2
2
+ (1+2d2 )(d1 +2d2 )(z1 1) b d0 d2 (d1 4d0 d2 )z1 (n 2 )1/2
12
2
8
2 1/2
Dn =(n )
+O(n1 ).
Note that (1 b2 d2 n1 )1 = 1 + b2 d2 n1 + O(n2 ). Using this and the above
expansion for Dn , we have
{
1
1
2
(1 + 2d2 )(z1
zn = (b z1 ) +
1) (d1 + 2d2 )z1 b
6
2
}
1
+( d1 + d2 )b2 a 1 n1/2
2
{
1
1
3
3
+
d2 (3z1 z1
) 2 + (d2 + d22 2 )(2z1
5z1 )
4
9
1
1
1 3
z1 ) + (b z1 )b2 d2 2 + [ a(d1 + 2d2 ) + b2 d21
+ (z1
72
2
8
1 2
1
1
2
b d0 d2 + c + (1 + 2d2 )(d1 + 2d2 )(z1 1)
2
2
12 }
1 2
2
]b 2 n1 + O(n3/2 )
(d1 4d0 d2 )z1
8
(b z1 ) + c1 n1/2 + c2 n1 + O(n3/2 ).
(A.10)
It then follows from the Edgeworth expansion (A.1) for P (Zn zn ) given in
Lemma 1 that b needs to be chosen as b = z + z1 in order for the coverage
probability of the tolerance interval to be close to the nominal level 1 . With
this choice of b, and using the notation in (3.4) for the Edgeworth expansion of
P (Zn zn ), the coecients for the smooth terms are
1
S1 = [c1 + 3 (1 z2 )](z ),
6
{
1
1
1
S2 = c2 z c21 + (z3 3z )3 c1 4 (z3 3z )
2
6
24
}
1 2 5
3
3 (z 10z + 15z ) (z ).
72
(A.11)
(A.12)
921
This leads to
1 2
1)(1 + 2d2 ) + 3z (z + z1 )(d1 + 2d2 ) + 3 (1 z2 )]
a = [(z1
6
1 2
= [(z1
1)(1 + 2d2 ) + (1 + 3z z1 + 2z2 )(d1 + 2d2 )].
(A.13)
6
However, is unknown. We replace by
in a and set c = 0. It is straightforward to verify that there is no rst-order eect by replacing with
in (A.13),
and that
d1 X
d2 X 2
X + a b n(d0 +
+
),
(A.14)
n
n2
with a and b given in (3.7) and (3.8), is rst-order probability matching lower
bound.
Second-order probability matching interval: To make the interval secondorder probability matching, we need both S1 0 and S2 0. We can nd the
value of c from (A.10), (A.11) and (A.12). However, a was assumed to be a
constant not depending on X in the original derivation of zn . While in (A.14), a
is a function of X and this has a second-order eect. We thus need to consider
tolerance bound of the form (3.5) with a given in (3.7) and b given in (3.8), and
2
to redo the analysis to nd the optimal c. Set h1 = (1/6)(d2 ((z1
1) + 3(z +
2
2
z1 )z ))+2d2 /2(1z ) and h2 = (1/6)[(z1 1)+3d1 z (z +z1 )+d1 (1z2 )].
Then (3.5) can be rewritten as
n(d0 +
d2 X 2
d1 X
+
) + c.
n
n2
Dn
{
d2
= d0 n + n1 fn,, (nd1 + d2 fn,, ) + 1 (z + z1 )2 h2 d1 + c
4
d0 d2 (z + z1 )2 + 4h1 d0 + [4d0 h21 + 2fn,, (h1 d1 h2 d2 )
+(h22 d2
4h21 cn2
}1/2
.(A.16)
922
References
Agresti, A. and Coull, B. (1998). Approximate is better than exact for interval estimation of
binomial proportions. Amer. Statist. 52, 119-126.
Bhattacharya, R. N. and Rao, R. R. (1976). Normal Approximation And Asymptotic Expansions.
Wiley, New York.
Brown, L. D. (1986). Fundamentals of Statistical Exponential Families with Applications in
Statistical Decision Theory. Lecture Notes-Monograph Series, Institute of Mathematical
Statistics, Hayward.
Brown, L. D., Cai, T. and DasGupta, A. (2002). Condence intervals for a binomial proportion
and Edgeworth expansions. Ann. Statist. 30, 160-201.
Brown, L. D., Cai, T. and DasGupta, A. (2003). Interval estimation in exponential families.
Statistica Sinica 13, 19-49.
Cai, T. (2005). One-sided condence intervals in discrete distributions. J. Statist. Plann. Inference 131, 63-88.
923
Easterling, R. G. and Weeks, D. L. (1970). An accuracy criterion for Bayesian tolerance intervals.
J. Roy. Statist. Soc. Ser. B 32, 236-240.
Esseen, C. G. (1945). Fourier analysis of distribution functions: a mathematical study of the
Laplace-Gaussian law. Acta Math. 77, 1-125.
Ghosh, J. K. (1994). Higher Order Asymptotics. NSF-CBMS Regional Conference Series, Institute of Mathematical Statistics, Hayward.
Ghosh, M. (2001). Comment on Interval estimation for a binomial proportion by L. Brown,
T. Cai and A. DasGupta. Statist. Sci. 16, 124-125.
Hahn, G. J. and Chandra, R. (1981). Tolerance intervals for Poisson and binomial variables. J.
Quality Tech. 13, 100-110.
Hahn, G. J. and Meeker, W. Q. (1991). Statistical Intervals: A Guide for Practitioners. Wiley
Series.
Hall, P. (1982). Improving the normal approximation when constructing one-sided condence
intervals for binomial or Poisson parameters. Biometrika 69, 647-52.
Kocherlakota, S. and Balakrishnan, N. (1986). Tolerance limits which control percentages in
both tails: sampling from mixtures of normal distributions. Biometrical J. 28, 209-217.
Krishnamoorthy, K. and Mathew, T.(2004). One-sided tolerance limits in balanced and unbalanced one-way random models based on generalized condence intervals. Technometrics
46, 44-52.
Morris, C. N. (1982). Natural exponential families with quadratic variance functions. Ann.
Statist. 10, 65-80.
Mukerjee, R. and Reid, N. (2001). Second-order probability matching priors for a parametric
function with application to Bayesian tolerance limits. Biometrika 8, 587-592.
Petrov, V. V. (1975). Sums of Independent Random Variables. Springer, New York.
Vangel, M. G. (1992). New methods for one-sided tolerance limits for a one-way balanced
random-eects ANOVA model. Technometrics 34, 176-185.
Wald, A. and Wolfowitz, J. (1946). Tolerance limits for normal distribution. Ann. Math. Statist.
17, 208-215.
Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. Ann. Math.
Statist. 12, 91-96.
Wilks, S. S. (1942). Statistical prediction with special reference to the problem of tolerance
limits. Ann. Math. Statist. 13, 400-409.
Zacks, S. (1970). Uniformly most accurate upper tolerance limits for monotone likelihood ratio
families of discrete distributions. J. Amer. Statist. Assoc. 65, 307-316.
Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
19104, U.S.A.
E-mail: tcai@wharton.upenn.edu
Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan.
E-mail: wang@stat.nctu.edu.tw
(Received October 2007; accepted April 2008)