SlideShare una empresa de Scribd logo
1 de 24
K-means clustering
   algorithm



           Kasun Ranga Wijeweera
          (krw19870829@gmail.com)
What is Clustering?
• Organizing data into classes such that there is

   • high intra-class similarity
   • low inter-class similarity
• Finding the class labels and the number of classes
directly from the data (in contrast to classification).
• More informally, finding natural groupings among
objects.
What is a natural grouping among these objects?
What is a natural grouping among these objects?




                         Clustering is subjective




Simpson's Family   School Employees     Females     Males
Defining Distance Measures
Definition: Let O1 and O2 be two objects from the universe
of possible objects. The distance (dissimilarity) between
O1 and O2 is a real number denoted by D(O1,O2)


                      Kasun Kiosn




     0.23                   3                   342.7
Consider a Set of Data Points,




And a Set of Clusters,
The Goal,
Algorithm k-means
1. Randomly choose K data items from X as initial
centroids.
2. Repeat
    Assign each data point to the cluster which has
   the closest centroid.
    Calculate new cluster centroids.
   Until the convergence criteria is met.
The data points
Initialization
#Runs = 1
#Runs = 2
#Runs = 3
K-means gets stuck in a local
         optima
The data points
Initialization
#Runs = 1
#Runs = 2
#Runs = 3
#Runs = 4
Applications of K-means Method

    • Optical Character Recognition

    • Biometrics
    • Diagnostic Systems
    • Military Applications
Comments on the K-Means Method
• Strength
  – Relatively efficient: O(tkn), where n is # objects, k is # clusters,
    and t is # iterations. Normally, k, t << n.
  – Often terminates at a local optimum. The global optimum may
    be found using techniques such as: deterministic annealing and
    genetic algorithms
• Weakness
  – Applicable only when mean is defined, then what about
    categorical data?
  – Need to specify k, the number of clusters, in advance
  – Unable to handle noisy data and outliers
  – Not suitable to discover clusters with non-convex shapes
Any Questions ?
Thanks for your attention !

Más contenido relacionado

La actualidad más candente

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 

La actualidad más candente (20)

K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
K means
K meansK means
K means
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Clustering
ClusteringClustering
Clustering
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
Clustering
ClusteringClustering
Clustering
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 

Destacado

Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
guru_prasadg
 
Phase Diagrams and Phase Rule
Phase Diagrams and Phase RulePhase Diagrams and Phase Rule
Phase Diagrams and Phase Rule
Ruchi Pandey
 

Destacado (20)

K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
phase rule & phase diagram
phase rule & phase diagramphase rule & phase diagram
phase rule & phase diagram
 
Phase rule
Phase rulePhase rule
Phase rule
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
The phase rule
The phase ruleThe phase rule
The phase rule
 
Intro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithmIntro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithm
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Coacervation Phase Separation Techniques
Coacervation Phase Separation TechniquesCoacervation Phase Separation Techniques
Coacervation Phase Separation Techniques
 
Clustering training
Clustering trainingClustering training
Clustering training
 
Phase Diagrams and Phase Rule
Phase Diagrams and Phase RulePhase Diagrams and Phase Rule
Phase Diagrams and Phase Rule
 

Similar a K means Clustering Algorithm

Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
ImXaib
 
Clustering_Overview.pptx
Clustering_Overview.pptxClustering_Overview.pptx
Clustering_Overview.pptx
nyomans1
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
MostafaHazemMostafaa
 

Similar a K means Clustering Algorithm (20)

Unit3
Unit3Unit3
Unit3
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Clustering_Overview.pptx
Clustering_Overview.pptxClustering_Overview.pptx
Clustering_Overview.pptx
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
Clustering
ClusteringClustering
Clustering
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
cluster analysis
cluster analysiscluster analysis
cluster analysis
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 

Más de Kasun Ranga Wijeweera

Más de Kasun Ranga Wijeweera (20)

Decorator Design Pattern in C#
Decorator Design Pattern in C#Decorator Design Pattern in C#
Decorator Design Pattern in C#
 
Singleton Design Pattern in C#
Singleton Design Pattern in C#Singleton Design Pattern in C#
Singleton Design Pattern in C#
 
Introduction to Design Patterns
Introduction to Design PatternsIntroduction to Design Patterns
Introduction to Design Patterns
 
Algorithms for Convex Partitioning of a Polygon
Algorithms for Convex Partitioning of a PolygonAlgorithms for Convex Partitioning of a Polygon
Algorithms for Convex Partitioning of a Polygon
 
Geometric Transformations II
Geometric Transformations IIGeometric Transformations II
Geometric Transformations II
 
Geometric Transformations I
Geometric Transformations IGeometric Transformations I
Geometric Transformations I
 
Introduction to Polygons
Introduction to PolygonsIntroduction to Polygons
Introduction to Polygons
 
Bresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmBresenham Line Drawing Algorithm
Bresenham Line Drawing Algorithm
 
Digital Differential Analyzer Line Drawing Algorithm
Digital Differential Analyzer Line Drawing AlgorithmDigital Differential Analyzer Line Drawing Algorithm
Digital Differential Analyzer Line Drawing Algorithm
 
Loops in Visual Basic: Exercises
Loops in Visual Basic: ExercisesLoops in Visual Basic: Exercises
Loops in Visual Basic: Exercises
 
Conditional Logic: Exercises
Conditional Logic: ExercisesConditional Logic: Exercises
Conditional Logic: Exercises
 
Getting Started with Visual Basic Programming
Getting Started with Visual Basic ProgrammingGetting Started with Visual Basic Programming
Getting Started with Visual Basic Programming
 
CheckBoxes and RadioButtons
CheckBoxes and RadioButtonsCheckBoxes and RadioButtons
CheckBoxes and RadioButtons
 
Variables in Visual Basic Programming
Variables in Visual Basic ProgrammingVariables in Visual Basic Programming
Variables in Visual Basic Programming
 
Loops in Visual Basic Programming
Loops in Visual Basic ProgrammingLoops in Visual Basic Programming
Loops in Visual Basic Programming
 
Conditional Logic in Visual Basic Programming
Conditional Logic in Visual Basic ProgrammingConditional Logic in Visual Basic Programming
Conditional Logic in Visual Basic Programming
 
Assignment for Variables
Assignment for VariablesAssignment for Variables
Assignment for Variables
 
Assignment for Factory Method Design Pattern in C# [ANSWERS]
Assignment for Factory Method Design Pattern in C# [ANSWERS]Assignment for Factory Method Design Pattern in C# [ANSWERS]
Assignment for Factory Method Design Pattern in C# [ANSWERS]
 
Assignment for Events
Assignment for EventsAssignment for Events
Assignment for Events
 
Mastering Arrays Assignment
Mastering Arrays AssignmentMastering Arrays Assignment
Mastering Arrays Assignment
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

K means Clustering Algorithm

  • 1. K-means clustering algorithm Kasun Ranga Wijeweera (krw19870829@gmail.com)
  • 2. What is Clustering? • Organizing data into classes such that there is • high intra-class similarity • low inter-class similarity • Finding the class labels and the number of classes directly from the data (in contrast to classification). • More informally, finding natural groupings among objects.
  • 3. What is a natural grouping among these objects?
  • 4. What is a natural grouping among these objects? Clustering is subjective Simpson's Family School Employees Females Males
  • 5. Defining Distance Measures Definition: Let O1 and O2 be two objects from the universe of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1,O2) Kasun Kiosn 0.23 3 342.7
  • 6. Consider a Set of Data Points, And a Set of Clusters,
  • 8. Algorithm k-means 1. Randomly choose K data items from X as initial centroids. 2. Repeat  Assign each data point to the cluster which has the closest centroid.  Calculate new cluster centroids. Until the convergence criteria is met.
  • 14. K-means gets stuck in a local optima
  • 21. Applications of K-means Method • Optical Character Recognition • Biometrics • Diagnostic Systems • Military Applications
  • 22. Comments on the K-Means Method • Strength – Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. – Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms • Weakness – Applicable only when mean is defined, then what about categorical data? – Need to specify k, the number of clusters, in advance – Unable to handle noisy data and outliers – Not suitable to discover clusters with non-convex shapes
  • 24. Thanks for your attention !