1. Improving the accuracy
of
K-means clustering
algorithm
Kasun Ranga Wijeweera
(krw19870829@gmail.com)
2. This presentation is based on the
following research paper
K. A. Abdul Nazeer, M. P. Sebastian, Improving
the Accuracy and Efficiency of the k-means
Clustering Algorithm, Proceedings of the World
Congress on Engineering 2009 Vol I, WCE
2009, July 1 – 3, 2009, London, U. K.
5. Algorithm k-means
1.Randomly choose K data items from X as initial
centroids.
2.Repeat
Assign each data point to the cluster which has
the closest centroid.
Calculate new cluster centroids.
Until the convergence criteria is met.
7. Algorithm selection of initial centroids
1. Set m = 1;
2. Compute the distance between each data point and all
other data points in the set;
3. Find the closest pair of data points from the set X and
form a data point set A[m] (1 <= m <= K) which
contains these two data points. Delete these two data
points from the set;
4. Find the data point in X that is closest to the data
points set. Add it to A[m] and delete it from X;
5. Repeat step 4 until the number of data points in A[m]
reaches 0.75*(n/k);
8. Algorithm selection of initial centroids
continued…
6. If m < k then m = m + 1, find another pair of data
points from X between which the distance is the
shortest, form another data point set A[m] and delete
them from X. Go to step 4;
7. For each data point set A[m] (1 <= m <= K) find the
arithmetic mean of the vectors of data points in A[m].
These means will be the initial centroids.