You are on page 1of 25

|  



  

Khalid El-Arini
November 28, 2005
  
= e are given a data set, and are told that
it was generated from a mixture of
Gaussians.
  
= e are given a data set, and are told that
it was generated from a mixture of
Gaussians.

= nfortunately, no one has any idea V 


 Gaussians produced the data.
  
= e are given a data set, and are told that
it was generated from a mixture of
Gaussians.

= nfortunately, no one has any idea V 


 Gaussians produced the data.
°  
= e can guess the number of clusters, do
EM for Gaussian Mixture Models, look at
the results, and then try again«
= e can do hierarchical agglomerative
clustering, and cut the tree at a visually
appealing level«
°  
= e can guess the number of clusters, do
EM for Gaussian Mixture Models, look at
the results, and then try again«
= e can do hierarchical agglomerative
clustering, and cut the tree at a visually
appealing level«
°  
= e can guess the number of clusters, do
EM for Gaussian Mixture Models, look at
the results, and then try again«
= e can do hierarchical agglomerative
clustering, and cut the tree at a visually
appealing level«
= e want to cluster the data in a
statistically principled manner, without
resorting to hacks.
Π|  |
   
= et
= e write:

= istribution over possible parameter vectors for a


multinomial distribution, and is in fact the conjugate prior
for the multinomial.
= eta distribution is the special case of a irichlet for 2
dimensions.
= amples from the distribution lie in the  dimensional
simplex
= hus, it is in fact a ³distribution over distributions.´
|   

= A |  V 
  is also a distribution on
distributions.
= e write:
G ~  , G0)
± G0 is a base distribution
± is a positive scaling parameter
= G has the same support as G0
|   

=
onsider Gaussian G0

= G ~  , G0)
|   

= G ~  , G0)

= G0 is continuous, so the probability that any two samples


are equal is precisely zero.
= owever, G is a discrete distribution, made up of a
countably infinite number of point masses [lackwell]
± herefore, there is always a non-zero probability of two samples
colliding
|   

values determine how close


= G ~  , G0) G is to G0

= G ~  2, G0)
ÿ  |
G ~  , G0)
Xn | G ~ G for n = { , «, N} iid)

 Marginalizing out G introduces dependencies


between the Xn variables


N
ÿ  |

Assume we view these variables in a specific order, and are


interested in the behavior of Xn given the previous -
observations.

et there be unique values for the variables:


ÿ  |

Î  

   


Notice that above formulation of the joint does not depend on the
order we consider the variables. e can use eFinetti¶s
heorem to show that the variables are  Vunder a
irichlet rocess model. his means we can consider them in
any order.
Î 
Œ
  

et there be unique values for the variables:

an rewrite as:
Î 
Œ
  

onsider a restaurant with infinitely many tables, where the


Xn¶s represent the patrons of the restaurant. From the above
conditional probability distribution, we can see that a customer is
more likely to sit at a table if there are already many people
sitting there. owever, with probability proportional to ‰, the
customer will sit at a new table.

Also known as the ³clustering effect,´ and can be seen in the


setting of social clubs. [Aldous]
|   

  
       
 




! 
  

    
   

6f n were drawn from e.g. a Gaussian, no two values


Ä would be the same, but since they are drawn from a
distribution drawn from a irichlet rocess, we expect
N a clustering of the n

# unique values for n = # mixture components


Ό  
ÿ "#" 
= o far, we¶ve just mentioned properties of
a distribution G drawn from a irichlet
rocess
= 6n , ethuraman developed a
constructive way of forming G, known as
³stick breaking´
= ho cares? Now we can perform
variational inference [lei and Jordan]
ÿ "#" 

$%|$& 
'%|$  #$( 
*%$ +$
)%|'& 
-%|'  #$( 
.%' +'$/ $
,
â |   
= et ‰ be a positive, real-valued scalar
= et G0 be a non-atomic probability
distribution over support set A
= e say G ~ ‰, G0), if for all natural
numbers k and k-partitions {A , «, Ak},
Î 

= e now have a statistically principled
mechanism for solving our original
problem.

= his was intended as a general and fairly


shallow overview of irichlet rocesses.
i" 

= Much thanks goes to avid lei for helping


me understand the little 6 know about
irichlet rocesses.
= ome material for this presentation was
inspired by slides from eg Grenager and
Zoubin Ghahramani.
Œ

lei, avid M. and Michael 6. Jordan. ³Variational inference


for irichlet process mixtures.´ ayesian Analysis  ),
200.
Ghahramani, Zoubin. ³Non-parametric ayesian Methods.´
A6 utorial July 2005.
Grenager, eg. ³
hinese Restaurants and tick reaking:
An 6ntroduction to the irichlet rocess´
lackwell, avid and James . MacQueen. ³Ferguson
istributions via olya rn chemes.´ he Annals of
tatistics 2), , 5-55.
Ferguson, homas . ³A ayesian Analysis of ome
Nonparametric roblems´ he Annals of tatistics 2),
, 20 -20.

You might also like