Dirichlet Process

|

Khalid El-Arini
November 28, 2005

= e are given a data set, and are told that
it was generated from a mixture of
Gaussians.

Gaussians.
= nfortunately, no one has any idea V

Gaussians produced the data.

Gaussians.
= nfortunately, no one has any idea V

Gaussians produced the data.
°
= e can guess the number of clusters, do
EM for Gaussian Mixture Models, look at
the results, and then try again«
= e can do hierarchical agglomerative
clustering, and cut the tree at a visually
appealing level«
°
appealing level«
°
appealing level«
= e want to cluster the data in a
statistically principled manner, without
resorting to hacks.
| |

= et
= e write:
= istribution over possible parameter vectors for a

multinomial distribution, and is in fact the conjugate prior
for the multinomial.
= eta distribution is the special case of a irichlet for 2
dimensions.
= amples from the distribution lie in the dimensional
simplex
= hus, it is in fact a ³distribution over distributions.´
|
= A | V
is also a distribution on
distributions.
= e write:
G ~ , G0)
± G0 is a base distribution
± is a positive scaling parameter
= G has the same support as G0
|
=
onsider Gaussian G0
= G ~ , G0)
|
= G ~ , G0)
= G0 is continuous, so the probability that any two samples

are equal is precisely zero.
= owever, G is a discrete distribution, made up of a
countably infinite number of point masses [lackwell]
± herefore, there is always a non-zero probability of two samples
colliding
|
values determine how close

= G ~ , G0) G is to G0
= G ~ 2, G0)
ÿ |
G ~ , G0)
Xn | G ~ G for n = {, «, N} iid)
Marginalizing out G introduces dependencies

between the Xn variables

N
ÿ |
Assume we view these variables in a specific order, and are

interested in the behavior of Xn given the previous -
observations.
et there be unique values for the variables:

ÿ |
Î

Notice that above formulation of the joint does not depend on the
order we consider the variables. e can use eFinetti¶s
heorem to show that the variables are Vunder a
irichlet rocess model. his means we can consider them in
any order.
Î

et there be unique values for the variables:
an rewrite as:
Î

onsider a restaurant with infinitely many tables, where the

Xn¶s represent the patrons of the restaurant. From the above
conditional probability distribution, we can see that a customer is
more likely to sit at a table if there are already many people
sitting there. owever, with probability proportional to , the
customer will sit at a new table.
Also known as the ³clustering effect,´ and can be seen in the

setting of social clubs. [Aldous]
|

!

6f n were drawn from e.g. a Gaussian, no two values

Ä would be the same, but since they are drawn from a
distribution drawn from a irichlet rocess, we expect
N a clustering of the n
# unique values for n = # mixture components

Î
ÿ "#"
= o far, we¶ve just mentioned properties of
a distribution G drawn from a irichlet
rocess
= 6n , ethuraman developed a
constructive way of forming G, known as
³stick breaking´
= ho cares? Now we can perform
variational inference [lei and Jordan]
ÿ "#"
$%|$&
'%|$ #$(
*%$ +$
)%|'&
-%|' #$(
.%' +'$/ $
,
â |
= et be a positive, real-valued scalar
= et G0 be a non-atomic probability
distribution over support set A
= e say G ~ , G0), if for all natural
numbers k and k-partitions {A, «, Ak},
Î

= e now have a statistically principled
mechanism for solving our original
problem.
= his was intended as a general and fairly

shallow overview of irichlet rocesses.
i"
= Much thanks goes to avid lei for helping

me understand the little 6 know about
irichlet rocesses.
= ome material for this presentation was
inspired by slides from eg Grenager and
Zoubin Ghahramani.

lei, avid M. and Michael 6. Jordan. ³Variational inference

for irichlet process mixtures.´ ayesian Analysis ),
200.
Ghahramani, Zoubin. ³Non-parametric ayesian Methods.´
A6 utorial July 2005.
Grenager, eg. ³
hinese Restaurants and tick reaking:
An 6ntroduction to the irichlet rocess´
lackwell, avid and James . MacQueen. ³Ferguson
istributions via olya rn chemes.´ he Annals of
tatistics 2), , 5-55.
Ferguson, homas . ³A ayesian Analysis of ome
Nonparametric roblems´ he Annals of tatistics 2),
, 20 -20.

Dirichlet Process

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dirichlet Process

Uploaded by

Copyright:

Available Formats

|

= nfortunately, no one has any idea V

= nfortunately, no one has any idea V

= istribution over possible parameter vectors for a

= G0 is continuous, so the probability that any two samples

values determine how close

Marginalizing out G introduces dependencies

Assume we view these variables in a specific order, and are

et there be unique values for the variables:

et there be unique values for the variables:

onsider a restaurant with infinitely many tables, where the

Also known as the ³clustering effect,´ and can be seen in the

6f n were drawn from e.g. a Gaussian, no two values

# unique values for n = # mixture components

= his was intended as a general and fairly

= Much thanks goes to avid lei for helping

lei, avid M. and Michael 6. Jordan. ³Variational inference

You might also like

Dirichlet Process

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dirichlet Process

Uploaded by

Copyright:

Available Formats

| 

= nfortunately, no one has any idea V

= nfortunately, no one has any idea V

= istribution over possible parameter vectors for a

= G0 is continuous, so the probability that any two samples

values determine how close

 Marginalizing out G introduces dependencies

Assume we view these variables in a specific order, and are

et there be unique values for the variables:

et there be unique values for the variables:

onsider a restaurant with infinitely many tables, where the

Also known as the ³clustering effect,´ and can be seen in the

6f n were drawn from e.g. a Gaussian, no two values

# unique values for n = # mixture components

= his was intended as a general and fairly

= Much thanks goes to avid lei for helping

lei, avid M. and Michael 6. Jordan. ³Variational inference

You might also like

|

Marginalizing out G introduces dependencies

6f n were drawn from e.g. a Gaussian, no two values

# unique values for n = # mixture components

= his was intended as a general and fairly