Professional Documents
Culture Documents
9. Hierarchical Clustering
• Many times, clusters are not disjoint, but may have
subclusters, in turn having sub-subclusters, etc.
• Consider a sequence of partitions of the n samples
into c clusters
• The first is a partition into n clusters, each one
containing exactly one sample
• The second is a partition into n-1 clusters, the third into
n-2, and so on, until the n-th in which there is only one
cluster containing all of the samples
• At the level k in the sequence, c = n-k+1.
1
d avg ( Di , D j )
ni n j
x x'
xDi x 'D j
d mean ( D i , D j ) m i m j
which behave quite similar if the clusters are
hyperspherical and well separated.
Nearest-neighbor algorithm
• When dmin is used, the algorithm is called the
nearest neighbor algorithm
• If it is terminated when the distance between
nearest clusters exceeds an arbitrary threshold,
it is called single-linkage algorithm
• If data points are considered nodes of a graph
with edges forming a path between the nodes in
the same subset Di, the merging of Di and Dj
corresponds to adding an edge between the
neirest pair of node in Di and Dj
• The resulting graph has any closed loop and it is
a tree, if all subsets are linked we have a
spanning tree
Pattern Classification, Chapter 10
7
Problem!