Professional Documents
Culture Documents
Lognormal Distributions
1 INTRODUCTION
(We take 6 without loss of generality, since - has the same distribu-
tion as An alternative notation replaces y and 6 by the expected value
and standard deviation a of Z = - 8). The two sets of parameters are
, related by the equations
208 LOGNORMAL DISTRIBUTIONS
It can be seen that a change in the value of the parameter affects only
the location of the distribution. It does not affect the variance or the shape
(or any property depending only on differences between values of the
variable and its expected value). It is convenient to assign a particular value
for ease of algebra, with the understanding that many of the results so
obtained can be transferred to the more general distribution. In many
applications 8 is "known" to be zero (so that = or X is a
"positive random variable"). This important case has been given the name
two-parameter distribution (parameters y, S or a). For this
distribution (14.1) becomes
and becomes
log X -
=
The general case (with 8 not necessarily zero) can be called the
parameter lognormal distribution (parameters y, 6 , 8 or a ,
HISTORICAL REMARKS 209
2 HISTORICAL REMARKS
then
n
log
1
and if the independent random variables log are such that a central limit
type of result applies, then the standardized distribution of log would tend
to a unit normal distribution as n tends to infinity. The limiting distribution
of would then be (two-parameter) lognormal. In an accompanying paper
(1879) obtained expressions for the mean, median, mode, variance,
and certain percentiles of the distribution.
Subsequent to this, little material was published relating to the
distribution until 1903 when Kapteyn again considered its genesis on the lines
described above. [Fechner (1897) mentioned the use of the distribution in the
description of psychophysical phenomena but gave little emphasis to this
topic.] Kapteyn and van Uven (1916) gave a graphical method (based on the
use of sample quantiles) for estimating parameters, and in the following years
there was a considerable increase in published information on the lognormal
and related distributions. (1917) obtained formulas for the higher
moments (in a study of the distribution of ages at first marriage), while van
Uven considered transformations to normality from a more general
point of view; (1919) obtained approximate formulas for the standard
deviations of estimators obtained by the method of moments. Estimation
from percentile points was described by Davies (1925, 1929) and tables to
LOGNORMAL DISTRIBUTIONS
log = log +
1
leading to
Soon after Gibrat's work, Gaddum (1933) and Bliss found that
distributions of critical dose (dose just causing reaction) for a number of
drugs could be represented with adequate accuracy by a (two-parameter)
lognormal distribution. On the basis of these observations, a highly developed
method of statistical analysis of "quantal" (all-or-none) response data has
been elaborated. (The term analysis" has been given to such analy-
ses, although is often understood to apply to a special transformed
value, + 5, where is an observed proportion.)
Lognormal distributions have also been found to be applicable to distribu-
tions of particle size in naturally occurring aggregates [Hatch Hatch
MOMENTS A N D OTHER PROPERTIES
+ =
In particular
= - +
= - + + 3w2
The shape factors are
and
Neither of these depends on Note that a, and a, > 3-that is, the
distributions are positively skewed and are leptokurtic.
Equations and may be regarded as parametric equations of
a curve in the plane. This curve is called the lognormal line and is
MOMENTS A N D OTHER PROPERTIES 213
shown in Figure 12.1 of Chapter 12, where it separates the regions for
Johnson and distributions. The coefficient of variation is - It
also does not depend on The distribution of X is unimodal; the mode is at
Mode ( X ) = (14.10)
From (14.4) it is clear that the value such that = a , is
related to the corresponding percentile of the unit normal distribution by
the formula:
= +
In particular
Median (X) = = (since = 0).
Comparison of and shows that
Median ( X ) Mode ( X )
and
Mode ( X ) Median ( X )
= w + 2 = (Coefficient of + 3. (14.14)
Coefficient of variation
In hydrological literature special attention has been devoted to relations
between Median and [Burges, Lettenmeier, and Bates
Charbeneau Burges and Hoshi
The standardized deuiate is, for example,
and they have suggested using this as a basis for deciding whether a
lognormal distribution is appropriate.
From Table 14.1 we see that as a tends to zero (or to infinity), the
standardized lognormal distribution tends to a unit normal distribution, and
For a small,
It is to be expected that
Wise (1952) has shown that the probability density function of the
parameter distribution has two points of inflection at
and so
m
ity
e
Hence
Setting = we obtain
5.0
0 1 2 3 4 5 6 7 8 9 1 0
The entropy is
= (u2
I
4 ESTIMATION
4.1 Known
If the value of the parameter is known (it is often possible to take equal
to zero), then estimation of the parameters and (or and presents
few difficulties beyond those already discussed in connection with the normal
distribution. In fact, by using values of = - the problem is
reduced to that of estimation of parameters of a normal distribution. Many
specialized problems, such as those relating to truncated distributions or
censored samples, or the use of order statistics, may be reduced to corre-
sponding problems (already discussed in Chapter for the normal distribu-
tion. Maximum likelihood estimation is exactly equivalent to maximum likeli-
hood estimation for normal distributions, so that the maximum likelihood
estimators for and respectively, are
other is that where the original values are not available, but the data are
in groups of equal width. While the results can immediately be written down
in the form of grouped observations from a normal distribution, the groups
appropriate to the transformed variables will not be of equal width. In such
circumstances methods of the kind described in Chapter 12, Section 2, might
be applied.
Sometimes it may be desired to estimate not but rather the
expected value + and the variance - of the vari-
able X. Since and = - - are jointly complete and
sufficient statistics for and if the UMVU estimator of
= + is
respectively, where
Unfortunately, the series in (14.29) converges slowly (except for very small
values of Finney recommends using the approximation
Value of
Efficiency
Efficiency of
Figure 14.3 The Efficiencyof and of in Large Samples
4.2 Unknown
Estimation problems present considerable difficulty when 0 is not known. As
might be expected, estimation of 0 is particularly inaccurate. This parameter
is a "threshold value," below which the cumulative distribution function is
ESTIMATION 223
zero and above which it is positive. Such values are often-one might say
"usually" to estimate.
However, estimation of parameters is often not as important as estimation
of probabilities-in particular, of cumulative distribution functions.
Table 14.3 shows the median and upper and lower 10% and 25% points of
lognormal distributions with widely differing values of 0, but with and a so
chosen that each distribution is standardized has zero expected value
and unit standard deviation; see also Table 14.1).
There can be considerable variation in 0 with little effect on the per-
centiles and little effect on values of the cumulative distribution functions for
fixed values of X. Insensitivity to variation in is most marked for large
negative values of These observations and Table 14.3 should correct any
feeling of depression caused by the inaccuracy of estimation of 0. Of course
there can be situations where the accuracy of estimation of 0 itself is a
primary consideration. In these cases special techniques, both of experimen-
tation and analysis, may be needed. These will be briefly described at the end
of this section.
Maximum likelihood estimation of 0, and a might be expected, to be
attained by the following tedious, but straightforward, procedure. Since for
given the likelihood function is maximized by taking
= -
1
[cf.
224 LOGNORMAL DISTRIBUTIONS
(note the factor n 2 in the last formula). If is known, their results give
= 4.1-not much less than if 8 is not known-but n is now
only about 2.1.
Tiku (1968) has suggested an approximate linearization of the maximum
likelihood equations based on the approximate formula
- = + (14.36)
. . .,
The constant c is chosen to be a large positive number (although its value
can be problem dependent). (1991) points out that a necessary
condition for the maximum likelihood equations to "degenerate" so that they
cannot be solved is that
From - = whence
to zero.
Inserting the value of from we obtain
The values of A and B are easily determined from (14.38) and but
Dahiya and Guttman provide tables to facilitate the process. These
authors also consider construction of shortest intervals for the
median [Of course, if it is known that = confidence rvals for
are easily constructed using normal theory results (see Chapter
Since
log x = log
i=l
and
ESTIMATION 227
where is the rth order statistic among . ., and is the rth order
statistic among n mutually independent unit normal variables, or by
-
= S 2 with =
n-
The actual estimating equations for are
and
228 LOGNORMAL DISTRIBUTIONS
whence
can be used.
This method is easy to apply but liable to be inaccurate because of
sampling variation in [However, (1981) found it to be quite
reliable.] Even when is known, estimation of population variance by sample
variance is relatively inaccurate (see Figure 14.3). Approximate formulas for
ESTIMATION 229
= + exp
whence
and
230 LOGNORMAL DISTRIBUTIONS
which is very unlikely to be the case if the distribution has substantial positive
skewness.
If the sample median is replaced by the sample arithmetic mean
and equation by
= + exp ---
I
This is then combined with the first two maximum likelihood equations
(14.33).
There are natural modifications of this method, to allow for cases when
there are several equal (or indistinguishable, as in grouped data) minimum
values. Using a variant of this an initial value of (for use in some
iterative process) may be chosen as minus some arbitrary (usually rather
small) value.
Cohen, and Ding's moment estimation method
uses the equations (in an obvious notation)
The quantities
- = i = .., n +
in the form
where
The integral I has no closed form solution and must be numerically evalu-
ated. These authors use two-parameter lognormal distributions for
independent variables = with independent "noninformative prior"
distributions for and in analysis of environmental data.
The reliability function for two parameter lognormal distributions is given
-
a
Here
.
+
t -
+ I
(1989) obtains a very similar result directly assuming the joint prior:
a
Comparison with the maximum likelihood estimator
log t -
exp 2 +a I 1 + .
1
+ - - (14.62)
where
= + +
Time
Figure 14.4 Geometrical Method of Estimating Lognormal Parameters
TABLES AND GRAPHS
with
+ - - .
Sequential estimation and testing have been discussed by Zacks (1966) and
Tomlinson
Issues of testing normality versus lognormality has received attention in
the literature for the last 20 years. See Kotz Rademaker, and
and more recently and Hwang (1991).
Aitchison and Brown (1957) give values of the coefficient of variation, a,,
(a, - 3), the ratios [from of mean to median and mean to
mode and the probability that X does not exceed for
6 APPLICATIONS
1 - ru)
( r t h moment if not truncated) , (14.65)
-
"No censoring.
CENSORING, TRUNCATED LOGNORMAL AND RELATED DISTRIBUTIONS 243
where = = - with
Cohen suggests selecting trial values for 6, solving the first two equa-
tions with = for and using standard Newton-Raphson procedure,
and then substituting these values into the third equation. If the third
equation is not satisfied for any value of in the permissible interval
he recommends application of a modified maximum likelihood
method (which has proved satisfactory in many applications) in which the
third equation is replaced by
= - +
where = + and is the rth order statistic among the X's
for some r 1.
The discretized form of the (truncated) lognormal distribution has been
found to offer a competitive alternative to logarithmic series distributions in
some practical situations (see Chapter 7). Quensel has described the
logarithmic Gram-Charlier distribution in which log X has a Gram-Charlier
distribution.
Kumazawa and Numakunai (1981) introduced a hybrid distribu-
tion defined by the cdf,
Figure 14.5 Frequency Curves of the Normal, Lognormal, and Hybrid Lognormal Distribution,
and Respectively
if either 7 or 8.
=-
P
c - - - - p},
Then
If the values and that maximize are found, then the set
will be the solution of equations (14.721, and as long as
- 6/2 and + these may be taken as the maximum likelihood
estimates.
To locate the greatest value of the likelihood, Lambert (1970) recommends
that "the function log be plotted over a range of values of and
Examination of such a plot enables one either to locate a region in which the
greatest value of log lies or to observe in which direction to shift to
find this value; in this case the function was recalculated over a suitable new
region. Once a region containing the greatest value is found, changing the
intervals in and at which log is calculated enables one to locate the
greatest value with any desired accuracy and so to obtain and Lambert
concludes that "although techniques have been developed for estimating,
from the likelihood, the parameters of the four-parameter lognormal
CONVOLUTION OF NORMAL AND LOGNORMAL DISTRIBUTIONS 247
1 - (log -
, 0, (14.73)
log -5
248 LOGNORMAL DISTRIBUTIONS
= log
BIBLIOGRAPHY 249
BIBLIOGRAPHY
Krige, D. G. (1960). On the departure of one value distributions from the lognormal
model in South African gold mines, Journal of the South African Institute for
Mining and Metallurgy, 61, 231-244.
Krige, D. G. (1971). Geostatistical case studies of the advantages of
Kriging with mean for a base metal mine and a gold mine, Journal of
the International Association for Mathematical Geology, 14,
W. C. (1936). Application of logarithmic moments to size frequency
distributions of sediments, Journal of Sedimentary Petrology, 6, 35-47.
Kumazawa, S., and Numakunai, T. (1981). A new theoretical analysis of occupational
dose distributions indicating the effect of dose limits, Health Physics,
Lambert, J . A. (1964). Estimation of parameters in the three-parameter lognormal
distribution, Australian Journal of Statistics, 6, 29-32.
Lambert, J. A. Estimation of parameters in the four parameter lognormal
distribution, Australian Journal of Statistics, 12, 33-43.
Land, C. E. (1972). An evaluation of approximate confidence interval estimation
methods for lognormal means, Technometrics,14, 145-158.
Laurent, A. G. (1963). The lognormal distribution and the translation method:
description and estimation problems, Journal of the American Statistical Associa-
tion, 58, (Correction, Ibid., 58, 1163.)
Lawrence, R. J. The lognormal as event-time distribution, Chapter 8 in
Lognormal Distributions: Theory and Applications, E. L. Crow and K. Shimizu
(editors), pp. 211-228, New York: Marcel Dekker.
Lawrence, R. J. Applications in economics and business, Chapter 9 in
Lognormal Distributions: Theory and Applications, E. L. Crow and K. Shimizu
(editors), pp. 229-266, New York: Marcel Dekker.
Leipnik, R. B. (1991). Lognormal random variables, Journal of the Australian Mathe-
matical Society, Series B, 32, 327-347.
Likes, J. (1980). Variance of the M W E for lognormal variance, Technometrics, 22,
253-258.
D. (1879). The law of the geometric mean, Proceedings of the Royal Society
of London,
S. (1976). A comparison of the lognormal and transition models for
wastage, The Statistician, 25, 281-294.
Masuyama, M. (1984). A measure of biochemical individual variability,
Joumal, 26, 327-346.
Mehran, F. (1973). Variance of the M W E for the lognormal mean, Journal of the
American Statistical Association, 68, 726-727.
Michelini, C. (1972). Convergence pattern of the scoring method in estimating
parameters of a lognormal function, Journal of the American Statistical Associa-
tion, 67, 319-323.
Miller, R. L., and Goldberg, E. D. The normal distribution in geochemistry,
Geochimica et 8, 53-62.
Morrison, J. (1958). The lognormal distribution in quality control, Applied Statistics, 7,
160-172.
BIBLIOGRAPHY 255
Reid, D. D. (1981). The Poisson lognormal distribution and its use as a model of
plankton aggregation, Statistical Distributions in Work, C.
G. P. Patil, and B. A. Baldessari (editors), 303-316, Dordrecht:
Rendu, J.-M. M. (1988). Applications in geology, Chapter 14 in Lognormal Distribu-
tions: Theory and Applications, E. L. Crow and K. Shimizu (editors), pp. 357-366,
New York: Marcel Dekker.
Rohn, W. B. (1959). Reliability prediction for complex systems, Proceedings of the
Fifth National Symposium on Reliability and Quality Control in Electronics,
381-388.
Rona, R. J., and D. G. (1977). National study of health and growth:
Standards of attained height, weight and triceps in English children 5 to
years old, Annals of Human Biology, 4, 501-523.
Royston, P. (1991). Constructing time-specific reference ranges, Statistics in Medicine,
10, 675-690.
Royston, P. (1992). Estimation, reference ranges and goodness of fit for the
parameter log-normal distribution, Statistics in Medicine, 11, 897-912.
Rukhin, A. L. (1986). Improved estimation in lognormal models, Journal of the
American Statistical Association, 81, 1041-1049.
Sargan, J. D. (1957). The distribution of wealth, 25, 568-590.
P. E. (1950). The distribution of incubation periods of infectious diseases,
American Journal of Hygiene, 51, 310-318.
Schwartz, S. C., and Yeh, Y. (1982). On the distribution function and moments of
power-sums with log-normal components, Bell System Technical Journal, 61,
1441-1462.
Severo, N. C., and Olds, E. G. (1956). A comparison of tests on the mean of a
logarithmico-normal distribution, Annals of Mathematical Statistics, 27, 670-686.
Shaban, S. A. Poisson-lognormal distributions, Chapter 7 in Lognormal
Distributions: and Applications, E. L. Crow and K. Shimizu (editors), pp.
195-210, New York: Marcel Dekker.
Shaban, S. A. Applications in industry, Chapter 10 in Lognormal Distribu-
tions: Theory and Applications, E. L. Crow and K. Shimizu (editors), pp. 267-286,
New York: Marcel Dekker.
Shimizu, K. (1986). Estimation in lognormal distribution, Proceedings of the Second
Japan-China Symposium on Statistics, 237-240, Kyushi University, Fukanoka,
Japan.
Shimizu, K., and Iwase, K. Uniformly minimum variance unbiased estimation
in lognormal and related distributions, Communications in and
Methods, 10, 1127-1147.
Sichel, H. S. (1947). Experimental and theoretical investigations of the bias error in
mine sampling with special reference to narrow gold reefs, Transactions of the
Institute of Mining and Metallurgy, 56, 403-473.
Sichel, H. S. Application of statistical techniques to the evaluation of mineral
deposits, Bulletin of the International Statistical Institute, 42, 245-268.
S. K. A note on Bayes estimators and robustness of lognormal
parameters, Journal of the Indian Society of Agricultural Statistics, 49-53.
BIBLIOGRAPHY 257