SlideShare una empresa de Scribd logo
1 de 10
Improving the accuracy
         of
  K-means clustering
      algorithm
           Kasun Ranga Wijeweera
          (krw19870829@gmail.com)
This presentation is based on the
   following research paper

   K. A. Abdul Nazeer, M. P. Sebastian, Improving
     the Accuracy and Efficiency of the k-means
   Clustering Algorithm, Proceedings of the World
     Congress on Engineering 2009 Vol I, WCE
        2009, July 1 – 3, 2009, London, U. K.
Consider a Set of Data Points,




And a Set of Clusters,
The Goal,
Algorithm k-means
1.Randomly choose K data items from X as initial
centroids.
2.Repeat
    Assign each data point to the cluster which has
   the closest centroid.
    Calculate new cluster centroids.
   Until the convergence criteria is met.
K-means gets stuck in a local
         optima
Algorithm selection of initial centroids
1. Set m = 1;
2. Compute the distance between each data point and all
   other data points in the set;
3. Find the closest pair of data points from the set X and
   form a data point set A[m] (1 <= m <= K) which
   contains these two data points. Delete these two data
   points from the set;
4. Find the data point in X that is closest to the data
   points set. Add it to A[m] and delete it from X;
5. Repeat step 4 until the number of data points in A[m]
   reaches 0.75*(n/k);
Algorithm selection of initial centroids
continued…
6. If m < k then m = m + 1, find another pair of data
   points from X between which the distance is the
   shortest, form another data point set A[m] and delete
   them from X. Go to step 4;
7. For each data point set A[m] (1 <= m <= K) find the
   arithmetic mean of the vectors of data points in A[m].
   These means will be the initial centroids.
Any Questions ?
Thanks for your attention !

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Pillar k means
Pillar k meansPillar k means
Pillar k means
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
presentation 2019 04_09_rev1
presentation 2019 04_09_rev1presentation 2019 04_09_rev1
presentation 2019 04_09_rev1
 
Pca
PcaPca
Pca
 
PCA
PCAPCA
PCA
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
Lecture6 pca
Lecture6 pcaLecture6 pca
Lecture6 pca
 
PCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and ToolkitsPCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and Toolkits
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Principal component analysis - application in finance
Principal component analysis - application in financePrincipal component analysis - application in finance
Principal component analysis - application in finance
 
Classifying hot water chemistry: Application of multivariate statistics
Classifying hot water chemistry: Application of multivariate statisticsClassifying hot water chemistry: Application of multivariate statistics
Classifying hot water chemistry: Application of multivariate statistics
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
 
KNN Algorithm using C++
KNN Algorithm using C++KNN Algorithm using C++
KNN Algorithm using C++
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 

Similar a Improved k-means

Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
ImXaib
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 

Similar a Improved k-means (20)

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
K means report
K means reportK means report
K means report
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Noura2
Noura2Noura2
Noura2
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
 
Clustering
ClusteringClustering
Clustering
 
50120140505013
5012014050501350120140505013
50120140505013
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 

Más de Kasun Ranga Wijeweera

Más de Kasun Ranga Wijeweera (20)

Decorator Design Pattern in C#
Decorator Design Pattern in C#Decorator Design Pattern in C#
Decorator Design Pattern in C#
 
Singleton Design Pattern in C#
Singleton Design Pattern in C#Singleton Design Pattern in C#
Singleton Design Pattern in C#
 
Introduction to Design Patterns
Introduction to Design PatternsIntroduction to Design Patterns
Introduction to Design Patterns
 
Algorithms for Convex Partitioning of a Polygon
Algorithms for Convex Partitioning of a PolygonAlgorithms for Convex Partitioning of a Polygon
Algorithms for Convex Partitioning of a Polygon
 
Geometric Transformations II
Geometric Transformations IIGeometric Transformations II
Geometric Transformations II
 
Geometric Transformations I
Geometric Transformations IGeometric Transformations I
Geometric Transformations I
 
Introduction to Polygons
Introduction to PolygonsIntroduction to Polygons
Introduction to Polygons
 
Bresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmBresenham Line Drawing Algorithm
Bresenham Line Drawing Algorithm
 
Digital Differential Analyzer Line Drawing Algorithm
Digital Differential Analyzer Line Drawing AlgorithmDigital Differential Analyzer Line Drawing Algorithm
Digital Differential Analyzer Line Drawing Algorithm
 
Loops in Visual Basic: Exercises
Loops in Visual Basic: ExercisesLoops in Visual Basic: Exercises
Loops in Visual Basic: Exercises
 
Conditional Logic: Exercises
Conditional Logic: ExercisesConditional Logic: Exercises
Conditional Logic: Exercises
 
Getting Started with Visual Basic Programming
Getting Started with Visual Basic ProgrammingGetting Started with Visual Basic Programming
Getting Started with Visual Basic Programming
 
CheckBoxes and RadioButtons
CheckBoxes and RadioButtonsCheckBoxes and RadioButtons
CheckBoxes and RadioButtons
 
Variables in Visual Basic Programming
Variables in Visual Basic ProgrammingVariables in Visual Basic Programming
Variables in Visual Basic Programming
 
Loops in Visual Basic Programming
Loops in Visual Basic ProgrammingLoops in Visual Basic Programming
Loops in Visual Basic Programming
 
Conditional Logic in Visual Basic Programming
Conditional Logic in Visual Basic ProgrammingConditional Logic in Visual Basic Programming
Conditional Logic in Visual Basic Programming
 
Assignment for Variables
Assignment for VariablesAssignment for Variables
Assignment for Variables
 
Assignment for Factory Method Design Pattern in C# [ANSWERS]
Assignment for Factory Method Design Pattern in C# [ANSWERS]Assignment for Factory Method Design Pattern in C# [ANSWERS]
Assignment for Factory Method Design Pattern in C# [ANSWERS]
 
Assignment for Events
Assignment for EventsAssignment for Events
Assignment for Events
 
Mastering Arrays Assignment
Mastering Arrays AssignmentMastering Arrays Assignment
Mastering Arrays Assignment
 

Improved k-means

  • 1. Improving the accuracy of K-means clustering algorithm Kasun Ranga Wijeweera (krw19870829@gmail.com)
  • 2. This presentation is based on the following research paper K. A. Abdul Nazeer, M. P. Sebastian, Improving the Accuracy and Efficiency of the k-means Clustering Algorithm, Proceedings of the World Congress on Engineering 2009 Vol I, WCE 2009, July 1 – 3, 2009, London, U. K.
  • 3. Consider a Set of Data Points, And a Set of Clusters,
  • 5. Algorithm k-means 1.Randomly choose K data items from X as initial centroids. 2.Repeat  Assign each data point to the cluster which has the closest centroid.  Calculate new cluster centroids. Until the convergence criteria is met.
  • 6. K-means gets stuck in a local optima
  • 7. Algorithm selection of initial centroids 1. Set m = 1; 2. Compute the distance between each data point and all other data points in the set; 3. Find the closest pair of data points from the set X and form a data point set A[m] (1 <= m <= K) which contains these two data points. Delete these two data points from the set; 4. Find the data point in X that is closest to the data points set. Add it to A[m] and delete it from X; 5. Repeat step 4 until the number of data points in A[m] reaches 0.75*(n/k);
  • 8. Algorithm selection of initial centroids continued… 6. If m < k then m = m + 1, find another pair of data points from X between which the distance is the shortest, form another data point set A[m] and delete them from X. Go to step 4; 7. For each data point set A[m] (1 <= m <= K) find the arithmetic mean of the vectors of data points in A[m]. These means will be the initial centroids.
  • 10. Thanks for your attention !