Professional Documents
Culture Documents
& *H NUCLEAR
1NSTRUMENTS
BwmtoDS
IN PHYStCS
RESEARCH
SectlonA
ELSEVIER
Abstract
Bayes’ theorem offers a natural way to unfold experimental distributions in order to get the best estimates of the true
ones. The weak point of the Bayes approach, namely the need of the knowledge of the initial distribution, can be overcome
by an iterative procedure. Since the method proposed here does not make use of continuous variables, but simply of cells in
the spaces of the true and of the measured quantities, it can be applied in multidimensional problems.
This paper presents a different approach, based on The final distribution depends also on P(E ICi). These
Bayes’ theorem, recognized by statisticians as the most probabilities must be calculated or estimated with Monte
powerful tool for making statistical inferences. The main Carlo methods. One has to notice that, in contrast to
advantages with respect to other unfolding methods are: P(C,), these probabilities are not updated by the observa-
- it is theoretically well grounded; tions. So if there are ambiguities concerning the choice of
- it can be applied to multidimensional problems; P(E IC,) one has to try them all in order to evaluate their
- it can use cells of different sizes for the distribution systematic effects on the results.
of the true and the experimental values;
- the domain of definition of the experimental values
may differ from that of the true values; 3. Unfolding an experimental distribution
- it can take into account any kind of smearing and
migration from the true values to the observed ones; If one observes n(E) events with effect E, the expected
- it gives the best results (in terms of its ability to
number of events assignable to each of the causes is
reproduce the true distribution) if one makes a realistic
guess about the distribution that the true values follow, ii = n(E)P(C, (E). (2)
but, in case of total ignorance, satisfactory results are
As the outcome of a measurement one has several possible
obtained even starting from a uniform distribution;
effects E, (i = 1, 2; . ., n,) for a given cause Ci. For
- it can take different sources of background into
each of them the Bayes formula (1) holds, and P(C, JEj)
account; can be evaluated. For simplicity we will refer to condi-
- it does not require matrix inversion;
tional probabilities P(C, JEj) as smearing matrix S,even
- it provides the correlation matrix of the results;
if they describe cell-to-cell migration. Let us write Eq. (1)
- it can be implemented in a short, simple and fast
again in the case of nE possible effects ‘, indicating the
program, which deals directly with distributions and not initial probability of the causes with P,(C,):
with individual events.
P(Ej Ici)pO(ci)
P(C;IEj) = n, (3)
2. Bayes’ theorem c P(E,~c,)pow
I=1
To stay close to the application of interest, let us state One has to note that:
Bayes’ theorem in terms of several independent causes - X:2, P,,(C,) = 1, as usual. Notice that if the proba-
(C;, i = 1,2,. . ., n,) which can produce one effect (El. bility of a cause is initially set to zero it can never change,
Let us assume we know the initial probability of the i.e. if a cause does not exist it cannot be invented;
causes P(C,) and the conditional probability of the ith - Zyz 1 P(C, IEj) = 1: this normalization condition,
cause to produce the effect P(E IC,). The Bayes formula mathematically trivial since it comes directly from Eq. (3),
is then tells that each effect must come from one or more of the
causes under examination. This means that if the observ-
P(EIC,)J’(C,) ables contain also a non-negligible amount of background,
P(Ci
. IE)
, = np (1)
r.P(ElC,)W,)
I= 1
this needs to be included among the causes;
- 0 I li = EYE, P(E, IC,) I 1: there is no need for
each cause to produce at least one of the effects taken
This can be read as follows: if we observe a single event under consideration. li gives the eficiency of detecting
(effect), the probability that it has been due to the ith the cause Ci in any of the possible effects.
cause is proportional to the probability of the cause times
After Nabs experimental observations one obtains a
the probability of the cause to produce the effect. distribution of frequencies n(E)= ME,),
For example, if we consider DIS events, the effect E n(E,), . . ., n(E_)}. The expected number of events to be
can be the observation of an event in a cell of the assigned to each of the causes and only due to the ob-
measured quantities {AQ&,, Ax,,,}. The causes Ci are
then all the possible cells of the true values
{AQ&, AxtrJi.
One immediately sees that the P(C, IE) depends on the
initial probability of the causes. This gives a first impres- 1
The broadening of the distribution due to the smearing sug-
sion that this formula is sterile. In reality the Bayes gests a choice of nE larger then n,. We would like to remark that
formula has the power to increase the knowledge of P(C ;> there is no need to reject events where a measured quantity has a
with the increasing number of observations. If one has no a value outside the range allowed for the physical quantity. For
priori prejudice on P(C,) the process of inference can be example, for the case of DIS events, also cells with xmcas> 1 or
started from a uniform distribution. QL < 0 give information about the true distribution.
G. D’Agostini/Nucl. Instr. and Meth. in Phys. Res. A 362 (1995) 487-498 489
served events can be calculated applying Eq. (2) to each 4. Estimation of the uncertainties
effect:
After the iterative procedure described above has con-
fi(ci) l&S= 2 n(Ej)P(C,IEj).
j=l
verged, one obtains the unfolded distribution A(C). As far
as the evaluation of the uncertainties is concerned, one
Taking into account the inefficiency ‘, the best estimate of cannot simply take the square root of these numbers. Even
the true number of events is then if the number of cells is large and the uncertainty on
P(E, IC,) is negligible, the quantities which are distributed
;I(Ci) = ; ,z n(Ej)P(Ci IEj) Ei # 0. according to a Poisson distribution are the observed num-
(4)
1 ]=I bers n(E) and not ir(C), since the latter get contributions
from several n(Ej). Moreover, it is clear that the uncertain-
From these unfolded events we can estimate the true total
ties on n(C,) have some degree of correlation, since the
number of events, the final probabilities of the causes and
observed number of events n(Ej) is shared between all the
the overall efficiency:
causes from which the events can be originated.
To see all the sources of uncertainties and of correla-
tions on n(C) in detail, let us rewrite Eq. (41, making use
of Eq. (31, as
ri(ci)
r;(Ci) = P(C, I n(E)) = hi,
N true
.
start again; if, after the second iteration the value of x2 is i#j
“small enough”, stop the iteration; otherwise go to step 2.
Some criteria about the optimum number of iterations will If the relative frequency in each cell is sufftciently small,
be discussed later. due to the large number of bins or to low efficiency, the
numbers in each of the cells can be approximated by an
independent Poisson distribution.
- P(Ej ICi): these terms are usually estimated by
Monte Carlo. They are affected by statistical and system-
’ If li = 0 then ii will be set to zero, since the experiment atic errors. The latter come from the assumptions made in
is not sensitive to the cause C,. the simulation and have to be treated with appropriate
490 G. D’Agostini/Nucl. Instr. and Meth. in Phys. Res. A 362 (19%) 487-498
I 3000 2000
1000
1000
SW SW
0 i 20000 iii.. 0
0 10 0 10 0 10
3000 2000
30w
7
,:I
. - a- 1500
2000 2000
I__ 1000 --
-8
,
low _r 1000
500
#.*
--_
0 ~ ._ 0 r 0 IL!! L Cl ;_
0 10 0 10 0 0 10
MOO ,
_.-. r: (k,
-1
!3
(1) m
2ow
1500
M L-l-J-
1
2000 2000
--I l!I.?
1000
4
1000 ," 1000 ”
SW
..
.;
_.A&-'
0
OrJ 10 0 10 OO 10
Fig. 1. (a) Visualization of the smearing matrix St (see text): in the abscissa and the ordinate are the true and the measured quantities,
respectively; the box area is proportional to the migration probability. (b-e) Smeared distributions of fi_4 (see text) obtained from 10000
generated events. (f-i) True distributions of fr_,, (solid lines) compared with the results of a 3-step unfolding (dashed lines). (j-m)
Unfolded distributions after the first, second and third step (solid, dotted and dashed lines) of fr _,,.
G. D’Agostini/Nucl. Instr. and h4eth. in Phys. Rex A 362 (1995) 487-498 491
methods. The statistical errors come instead from the the cause-cells Ci is shared between the effect-cells Ej and
limited number of simulated events. Like the uncertainties their distribution is multinomial. If the migration effect is
on n(Ej), they also induce correlations between the results. not very strong, i.e. a cell Ci is observed only in a small
In fact, the total number of events generated for each of number of cells Ej, one cannot neglect the covariance
D .
In)
Kl
: *
00
10 0. . .
00 . .
.l-J. . :
OD. .
0000 .
q[1700
. . -nom0
. . .
000
D *
0
0 0
0 5 10
cause nr.
3000 2000
I------
I
(d) W
I.500
2000 . 2000
t
ef feet nr
1OW
3000 r
0
k--t
Ot---kL-
ef fed nc
10
7000
0 L
0
effect nr
10
500
10000
0
j’-;
ef feet nr.
10
ri
3000 r
191
t (hl
1
2000 2000
h
1000
,- 1000
,...‘i.
0
cause nr
10 0
0
L2!!5-
1
’
.-
cause nr
- 0
0
ti
-_ -
cause nr,
_’
10
cause nr.
2000 2000
CjJ (ml
-;
1500 1500
-i.:
.a ,.,-._I”
i _-
1.: v
1. _, ‘;
500 -:. 500
a_
between the terms P(Ejl IC,) and P(Ej, (Cj). We can 6. Results
instead reasonably neglect the covariance between two
terms related to different cause-cells. 6.1. The program
Under these hypotheses the contribution to V from the
uncertainty of M is The above method has been implemented in a short
self-contained Fortran code available on request from the
V,,(M) = z n(E,)n(Ej)Cov( Mki, Mrj) 9 author together with examples.
i,j= 1 The user can provide either the smearing matrix or,
where: more directly, the number of MC events produced in a cell
of the true quantities together with the number of events
‘Mki aM,j
cov(&, M,,) = c which fall in each cell of the measured quantities. Only in
the latter case it is possible to take into account the
uncertainties due to the limited MC statistics. It is interest-
x Cov[P(Er IC,), P(EsIcu)]; ing to remark that there is no need to generate the MC
1
events according to a realistic physical distribution of the
‘ku6ri ‘ku ‘ri”ui ‘u
--- variable under study, and in fact it can be more efficient to
P(E,IC,) E, P(EiIC,) ’ have several runs in different regions and to merge the
results at the end, or to use a uniform distribution in order
Cov[P(E, Ku), P(E,lC,)] to populate well all the kinematical regions.
The covariance matrix of ri(C) calculated by the pro-
~P(E,IC,)[l-P(E,IC,)1
u
(r=s) gram allows the user to redefine the cell sizes (which may
= be not all equal) a posteriori in order to reduce the
- ;P(E,
lC,)P(EsICu) (r#s). correlation of the results. On the other end, the presence of
common sources of uncertainty make it impossible to
redefine the cells so as to have uncorrelated results. The
In the last expression n, represents the number of events full covariance matrix should be used to exploit at best the
generated in the cell C, in order to evaluate the smearing results in further analysis.
function.
The sum of the two contributions gives the elements of 6.2. Unfolding a distribution not affected by statistical
the covariance matrix of the unfolded numbers: fluctuations
vk, = V,d’@)) + ‘k,@).
The ideal performances of the method have been stud-
ied with samples equivalent to having infinite statistics.
This was done calculating the expected number of events
5. Treatment of background from the true distribution and the smearing matrix, namely
The unfolding based on Bayes’ theorem can take into n(Ej) = CP(EjJCi)P,,“,(Ci)N,,“,,
i
account in a natural way the presence of background,
simply adding it to the possible causes responsible for the or in a more compact way
observables. Even several sources of background can be
n(E) = S~&L~
treated. For example, in case of a single contribution, one
adds to the physical cells an extra C++ r, with initial Figs. la and 2a show the smearing matrices, called
probability P(C,c + 1). The conditional probabilities hereafter S, and Sz, respectively, used for the tests. The
P(Ej lCnc+ 1 ) will be just the unnormalized shape (in the abscissa and the ordinate represent the true and measured
sense that E_ + 1 may be smaller than unity) of the back- quantities, respectively, and the box area is proportional to
ground distributions, The result of the unfolding will then the probability P(cj !Ci>. In order to test the unfolding
provide the number of events to be assigned to the back- performances, nontrrvral smearing matrices have been cho-
ground. sen: in S, the causes have all different efficiencies; S,
In principle this method could also be used to disentan- shows an unusual anticorrelation and also long range
gle the true distributions of several physics processes migrations; both are almost flat in some points; the do-
contributing to the same distribution of the observable. The mains of the true and measured value are different. The
problem will not be further discussed in the rest of the number of effect cells has been chosen larger than the
paper, but this method is likely to work only in very number of causes, as the smearing makes the distributions
simple cases, and other more sophisticated methods - usually broader. The true value distributions are shown
like neural network algorithms - should yield better with solid lines in Figs. If-li, repeated also in Fig. 2. We
results. will refer to these test distributions with fr, fz, f3 and f4.
G. D ‘Agostini/Nucl. Instr. and Meth. in Phys. Rex A 362 (199.5) 487-498 493
Monte Carlo samples of smeared distributions obtained In all cases the initial distribution of the true values has
from 10000 generated events (those without random fluc- been assumed to be uniform.
tuations do not look much different) are shown in Figs. Already after the first iteration the unfolded distribution
lb-le and 2b-2e for the two smearing matrices. is close to the true one (as shown in the case of limited
events ----
SUmeored distribution
0
Unfolded distribution - step 3 Unfolded distribution - step 4
Fig. 3. Example of a two dimensional unfolding: true distribution (a), smeared distribution (b) and results after the first 4 steps(c-f),
494 G. D ‘Agostini/Nucl. Instr. and Meth. in Phys. Res. A 362 (1995)487-498
statistics with solid line of Figs. lj-lm and Figs. 2j-2m). and 2b-2e. As the smearing matrices are rather severe
The agreement increases constantly with the number of there is no similarity at all with the true distributions.
iterations and eventually, for all the test distribution and After the experience gained with the sample having no
smearing matrices, the true distribution is recovered. This statistical fluctuation, the first results are a bit surprising.
means that MS + 1. Obviously, this cannot be a rigorous After a few iterations the unfolded distribution becomes
result, since both matrices have all the elements defined very close to the true one, but if one performs a very large
not negative, and hence there must be a correct combina- number of iterations, it converges toward a distribution
tion of the zeros in the row of M with the zeros of the which shows strong fluctuations around the true one. The
column of S in order to give the null off-diagonal elements reason is simple. As stated before, an infinite iteration loop
of the product matrix. An easy case where this limit cannot yields an unfolding matrix which is - in some sense -
hold is when two causes have the same probabilities to the inverse of the smearing matrix. So one gets exactly the
produce the effects, i.e. P(Ej 1Ci) = P(E, 1C,) Vj. In this same problems as discussed in the introduction about the
case the result will be that A(C,) = ri(C,) independently of matrix inversion method. The reason is that each of the
the true probabilities of Cj and C,. This is in fact the best bins in the true value distribution acts as a independent
result that the hypothesis allows. An extreme case is when degree of freedom and after an infinite number of itera-
the elements of the smearing matrix are all equal. One tions one reaches a very fluctuating solution - a kind of
finds then that the final probabilities are equal to the initial amplification of the statistical fluctuations - similar to
ones, since under this condition the observations do not the result of a fit of a large number n of points with a
increase the knowledge at all. polynomial of order n - 1.
Figs. lm-lp and 2m-2p show the result after the 1st
6.3. Simulation with limited statistics (solid line), 2nd (dotted line) and 3rd step (dashed line).
The latter is compared to the true distribution (solid line)
To make a more realistic evaluation of the perfor- in Figs. li-le and 2i-2e. The agreement is qualitatively
mances of the method, the role of observed distributions good also in the case of difficult situations, like f3 with
has been played by Monte Carlo events simulated accord- Sa. An example of 2-D unfolding is shown in Fig. 3,
ing to the true distribution and the smearing matrix. For where the true distribution, the smeared one, and the
each of the configurations 10000 events have been gener- results of the first 4 steps are shown.
ated. The observed distributions are shown in Figs. lb-le In order to quantify the goodness of the results, Figs.
3000 r 2000 L
(d)
1500 -
2000
c jlw,
1 c
I_-__
1000
L
loo0
;I !
1
! 4
5GQ !-
4a,i r
0”““.
OO 10 0 10
b 2.5 2.5 ! s
L ta
Q
0
1
N!
c -2.5
2
0 10 0 10
i
G.
3000
I-
(c) (d)
1500 1500
2 2000
C
F low i
a,
i
1000
500 c
i
t
2000 0 -1
0 10
5
b
k 2.5 2.5
i?
Jz 0 0
UL
-5 -5
0 10 0 10
4a-4d and Figs. 5a-5d show, respectively for S, and S,, give for each of the true bins the difference between
the results obtained unfolding 100 independent MC data unfolded and true numbers of events, divided by the
sets, each of 10000 generated events, and where a 3-step standard deviation calculated in each unfolding. This gives
procedure has been used. Figs. 4e-4h and Figs. 5e-Sh an idea of the bias of the method and of the goodness of
3000
Ic)
L.2
I
2000
:
I I
low
I j
1-
1 i ‘:
0 0
0 10
cause nr cause nr:
:
5
: ?, 5
: ,. . 1
; ‘, lf) I
: : ,: : ‘.
, i ., , ; ; ; ;
2.5 2.5 2.5
,+:. .;. :‘
. .,: :
0 i 1
\ i ‘1: 1: :; ; 0
-2.5
-:,: .:
.; ., !
:,,.’
,; -. ; i -2.5 -2.5
) _.‘I
>. . .a.
:
-5 .;:, -5
~ _
0 10
cause nr cause nr: cause nr cause nr:
Fig. 6. Same as Fig. 4, but with a 1OOtLstepunfolding.
496 G. D’Agostini/Nucl. Instr. and Meth. in Phys. Res. A 362 (1995) 487-498
i I’!
I_
1
2000
1 !
1000
i i
;
ot’n*“s
0 10 0 10
Ii\ ., :, r’
0 ;:
i‘!... j; 0
: ., ‘: r ? .
.:. ..:
;i;. :( ;i’:
-2.5 - -2.5 -2.5
j:j ,;c::
!. : .
i>t .‘,\;j
-5 . -5
; .t:.. ._
0 10 0 10 0 10
cause nl: muse nr cause n,: cause .nr:
Fig. 7. Same as Fig. 5, but with a lC00-step unfolding.
the uncertainty estimation. Only some cases show quantita- tions. The hypothesis that most of the physical distribu-
tively satisfactory results. Figs. 6 and 7 show what hap- tions of interest, in particular structure functions, are
pens after 1000 steps. Very large fluctuations appear around smooth is well proven by experience. For this reason using
the true value, as discussed above. the result of the first iteration with all its fluctuations as
In order to avoid the problem of wild fluctuations with initial probability for the second step is from the Bayesian
the increasing number of steps some possible ways out point of view even wrong in principle, since one is “tell-
may be considered: ing” the unfolding that the physical distribution can be of
- reduce the number of degrees of freedom putting some that kind, with all those wild fluctuations. It is preferable
constraints between the probabilities of the true values, for instead to feed into the program, as the initial distribution
example imposing that they follow a particular function. of the next step, a continuous and smooth function, whose
This is exactly equivalent to making a maximum-likeli- shape is already influenced by the observations.
hood fit to the data (it is in fact known that the maximum- One has to notice that there is no reason to worry that
likelihood principle can be derived from Bayes’ theorem). the smoothing procedure produces biased results or hides
It implies we know the expression of the function a priori, strong peaks if they are significantly present in the data,
and it is the best way to proceed if one is just interested in since the smoothed distribution used as initial probability
finding the parameters of a particular model; for the second iteration is nothing but an hypothesis more
- find a criterion concerning the optimum number of realistic that the first one and can only give better results
iterations, which may depend on the kind of problem: than the uniform distribution.
playing with the distributions fi_4 one can realize that in For the test distributions a rough smoothing has been
most of the cases a good agreement is reached after a few performed for all of them by a polynomial fit of 3rd
iterations. Some particularly difficult cases for which this degree. 3 Superior results, with respect to the previous
is not the case have to be attributed to the smearing
matrices chosen; 3
- smooth the results of the unfolding before feeding them The smoothing has not been put inside the unfolding program
and must be done by the user, who knows the topology of the
into the next step as “initial probability”.
cells in the space of the physics quantities. It is recommanded,
The third possibility has been chosen, as it turns out to
whenever it is possible, to choose the smoothing function which
produce stable results and moreover to be consistent with reflects at best the physics case. This can be of particular rele-
the spirit of the Bayesian inference. We remind the reader vance in multidimensional problems, and moreover it can provide
that in this frame knowledge is achieved by making use of a very effective procedure for a simultaneous unfolding and
Bayes’ theorem, initial hypotheses and empirical observa- fitting.
G. D’Agostini/Nucl. Instr. and Meth. in Phys.Res. A 362 (1995) 487-498 491
3000 3000 ,
(a)
I___
1500 /I 2000
3
z 1000
.I, 4
! 4
9,
! 1000 1000
?
500
4 I (
i b I
I OL
0 0
I
10 0 10 0 10
cause nr cause nr cause nr. muse nr
b 2.5
\
3
+” 0
u!
c -2.5
2
-5
i
0 10 0 10
cause nr: cause nr: cause nr: cause nr.
Fig. 8. Same as Fig. 4, but with a 20-step unfolding and smoothing the probability distribution between the steps.
case, are achieved after a few steps and the convergence is result of 100 data samples after a 20-step unfolding with
obtained between 3 steps (as in the case of fi and f4 with intermediate smoothing. No oscillations are present and the
S,) and 15 steps (f2with S,).Figs. 8 and 9 show the results do not change with the increasing the number of
2000 4
I__&
i
i.
1000
1000 . 6:
t t
'L!
.,i i t I
0 0 L-
0 10 OO 10 OOU
cause nr: cause nc cause nc cause nr:
5
I
2.5
-2.5 -2.5
-5
.
0 10 0 10
cause nr cause nr cause nr cause nr
Fig. 9. Same as Fig. 5, but with a 20-step unfolding and smoothing the probability distribution between the steps.
498 G. D’Agostini/Nucl. Instr. and Meth. in Phys. Res. A 362 (1995) 487-498
steps, indicating that the procedure has converged. More- Note added
over, the differences between the unfolded and the true
numbers, normalized to the calculated standard deviations,
After the appearance of the preprint of this paper, I
show that the latter are well enough estimated.
have received two claims of similar methods used in the
analysis of high energy physics data: Stefan Kluth pointed
out the use of formula (4) - although stated differently,
7. Conclusions
not justified by the Bayes’ theorem and not taking into
account the inefficiency - together with the iteration
The iterative use of Bayes’ theorem provides a promis-
procedure in an OPAL publication [2]; Fraqois IX
ing method to unfold multidimensional distributions. With
Diberder, in order to perform a 4-dimensional unfolding of
the smoothing of the resulting distribution between the
ALEPH data [3], has used a method which seems identical
steps a fast convergence is reached, and, for normal appli-
to the one here proposed (unfortunately no detailed infor-
cations, the results are stable with respect to variations of
mation is available in the paper) and which also makes use
the initial probabilities and of the smoothing procedures.
of the smoothing between the iterations. In both cases
The covariance matrix of the result, which can also take
there was no attempt to calculate directly the uncertainties,
into account the uncertainties due to the limited Monte
and they were estimated from the dispersion of the results
Carlo statistics used to evaluate the smearing matrix, is
in subsamples of the data sets.
provided. A Monte Carlo study has shown that the esti-
mated standard deviations turn out to be close to those
calculated from the dispersion of the data around the mean
values, and that the method does not bias the results.
References
Acknowledgements
[l] V. Blobel, Proc. 1984 CERN School of Computing,
Aiguablava, Catalonia, Spain, 9-12 September 1984 (CERN
I am grateful to several colleagues of the ZEUS Collab- 1985) p. 88.
oration for the discussions we had on the subject. In [2] OPAL Collaboration, P.D. Acton et al., Z. Phys. C 59 (1993)
particular, I would like to thank Lutz Feld for criticisms on
the proposed method and for a careful check of the pro- [3] f;LEPH Collaboration, D. Baskulic et al., Phys. Lett. B 307
gram. (1993) 209.