Professional Documents
Culture Documents
K-Means Clustering
By
Subhasis Dasgupta
Asst Professor
Praxis Business School, Kolkata
Concept of Clustering
If there are N data points available and they are to be divided into K
number of groups, the best way to do this is to put similar data points
within a group and dissimilar points into other groups
There are many algorithms available to achieve this goal but partitioning
algorithms are the most preferred ones
In partitioning algorithms, the entire dataset is partitioned among K
groups to make sure that objects within a cluster are similar to each
other based on some objective function
In this section, we will focus on the very basic partitioning algorithm, i.e.
K-Means clustering
K-Means Clustering
The k-means algorithm is an algorithm to cluster n objects based on
attributes into k partitions, where
k<n
It assumes that the object attributes form a vector space
It is an algorithm for partitioning (or clustering) N data points into K disjoint
subsets Sj so as to minimize the within cluster sum-of-squares value.
Mathematically, within sum of square is calculated as:
where xn is a vector representing the the nth data point and uj is the
geometric centroid of the data points in Sj
Variable
1
Variable
2
1.0
1.0
1.5
2.0
3.0
4.0
5.0
7.0
3.5
5.0
4.5
5.0
3.5
4.5
Individua
l
Variable
1
Variable
2
1.0
1.0
1.5
2.0
3.0
4.0
5.0
7.0
3.5
5.0
4.5
3.5
Individua
l
Centroid
1 Dist
Centroid
2 Dist
0.0
7.21
2 (1.5,2.0)
1.12
6.10
3.61
3.61
7.21
0.0
4.72
2.5
5.31
2.06
4.30
2.93
Individua
l
Centroid
1 Dist
Centroid
2 Dist
1.57
7.21
2 (1.5,2.0)
0.47
6.10
2.04
1.78
5.64
1.84
3.15
0.73
Individua
l
Centroid
1 Dist
Centroid
2 Dist
0.56
7.21
2 (1.5,2.0)
0.56
6.10
3.05
1.42
6.66
2.20
4.16
0.42
Cluster centers