SlideShare una empresa de Scribd logo
1 de 23
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Applications of Cluster Analysis  Understanding Group genes and proteins that have similar functionality, or group stocks with similar price fluctuations Summarization Reduce the size of large data sets
Types of Clustering  A clustering is a set of clusters       Important distinction between hierarchical and partitional sets of clusters   Partitional Clustering   A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset   Hierarchical clustering   A set of nested clusters organized as a hierarchical tree
Clustering Algorithms K-means  Hierarchical clustering  Graph based clustering 
K-means Clustering  Partitional clustering approach  Each cluster is associated with a centroid (center point)  Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified  The basic algorithm is very simple
K-means Clustering – Details  Initial centroids are often chosen randomly.  Clusters produced vary from one run to another.  The centroid is (typically) the mean of the points in the cluster.  ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc.  K-means will converge for common similarity measures mentioned above
K-means Clustering – Details  Most of the convergence happens in the first few iterations.  Often the stopping condition is changed to ‘Until relatively few points change clusters’  Complexity is O( n * K * I * d )  n = number of points, K = number of clusters,  I = number of iterations, d = number of attributes
Two different K-means Clusterings  Sub-optimal Clustering  Optimal Clustering 
Problems with Selecting Initial Points  If there are K ‘real’ clusters then the chance of selecting one centroid from each cluster is small.  Chance is relatively small when K is large  If clusters are the same size, n, then For example, if K = 10, then probability = 10!/1010 = 0.00036  Sometimes the initial centroids will readjust themselves in ‘right’ way, and sometimes they don’t  Consider an example of five pairs of clusters
Solutions to Initial Centroids Problem Multiple runs  Helps, but probability is not on your side   Sample and use hierarchical clustering to determine initial centroids  Select more than k initial centroids and then select among these initial centroids Select most widely separated   Bisecting K-means  Not as susceptible to initialization issues
Evaluating K-means Clusters  Most common measure is Sum of Squared Error (SSE)  For each point, the error is the distance to the nearest cluster  To get SSE, we square these errors and sum them.   x is a data point in cluster Ciand mi is the representative point for cluster Ci can show that micorresponds to the center (mean) of the cluster
Evaluating K-means Clusters  Given two clusters, we can choose the one with the smaller error  One easy way to reduce SSE is to increase K, the number of clusters  A good clustering with smaller K can have a lower SSE than a poor clustering with higher K
Limitations of K-means  K-means has problems when clusters are of differing  Sizes  Densities  Non-globular shapes   K-means has problems when the data contains outliers.   The number of clusters (K) is difficult to determine.
Hierarchical Clustering   Produces a set of nested clusters organized as a hierarchical tree  Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits
Strengths of Hierarchical Clustering  Do not have to assume any particular number of clusters  Any desired number of clusters can be obtained by ‘cutting’ the dendogram at the proper level   They may correspond to meaningful taxonomies  Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, …)
Hierarchical Clustering  Two main types of hierarchical clustering  Agglomerative:  Start with the points as individual clusters  At each step, merge the closest pair of clusters until only one cluster (or k clusters) left  Divisive:  Start with one, all-inclusive cluster  At each step, split a cluster until each cluster contains a point (or there are k clusters)
Agglomerative Clustering Algorithm  More popular hierarchical clustering technique  Basic algorithm is straightforward  Compute the proximity matrix  Let each data point be a cluster Repeat   Merge the two closest clusters   Update the proximity matrix  Until only a single cluster remains
Hierarchical Clustering: Group Average  Compromise between Single and Complete Link   Strengths  Less susceptible to noise and outliers   Limitations  Biased towards globular clusters
Hierarchical Clustering: Time and Space requirements  O(N2) space since it uses the proximity matrix.  N is the number of points.   O(N3) time in many cases  There are N steps and at each step the size, N2, proximity matrix must be updated and searched  Complexity can be reduced to O(N2 log(N) ) time for some approaches
Hierarchical Clustering: Problems and Limitations  Once a decision is made to combine two clusters, it cannot be undone No objective function is directly minimized  Different schemes have problems with one or more of the following:  Sensitivity to noise and outliers (MIN)  Difficulty handling different sized clusters and non-convex shapes (Group average, MAX)  Breaking large clusters (MAX)
conclusion The purpose of clustering in data mining and its types are discussed. The k-means and hierarchical algorithm are explained in detail and their pros and cons are analyzed.
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

Más contenido relacionado

La actualidad más candente

DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmPınar Yahşi
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster AnalysisDerek Kane
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
K Nearest Neighbor Presentation
K Nearest Neighbor PresentationK Nearest Neighbor Presentation
K Nearest Neighbor PresentationDessy Amirudin
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 
Linear models and multiclass classification
Linear models and multiclass classificationLinear models and multiclass classification
Linear models and multiclass classificationNdSv94
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureRajesh Piryani
 

La actualidad más candente (20)

DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
K Nearest Neighbor Presentation
K Nearest Neighbor PresentationK Nearest Neighbor Presentation
K Nearest Neighbor Presentation
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Linear models and multiclass classification
Linear models and multiclass classificationLinear models and multiclass classification
Linear models and multiclass classification
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Text clustering
Text clusteringText clustering
Text clustering
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
cluster analysis
cluster analysiscluster analysis
cluster analysis
 

Similar a Cluster Analysis

15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSandinoBerutu1
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptImXaib
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 

Similar a Cluster Analysis (20)

15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Clustering
ClusteringClustering
Clustering
 
Clustering
ClusteringClustering
Clustering
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
My8clst
My8clstMy8clst
My8clst
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 

Más de guest0edcaf

Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introductionguest0edcaf
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extractionguest0edcaf
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 

Más de guest0edcaf (6)

Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Cluster Analysis

  • 1. Cluster Analysis: Basic Concepts and Algorithms
  • 2. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
  • 3. Applications of Cluster Analysis  Understanding Group genes and proteins that have similar functionality, or group stocks with similar price fluctuations Summarization Reduce the size of large data sets
  • 4. Types of Clustering  A clustering is a set of clusters Important distinction between hierarchical and partitional sets of clusters  Partitional Clustering A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset  Hierarchical clustering A set of nested clusters organized as a hierarchical tree
  • 5. Clustering Algorithms K-means  Hierarchical clustering  Graph based clustering 
  • 6. K-means Clustering  Partitional clustering approach Each cluster is associated with a centroid (center point) Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified The basic algorithm is very simple
  • 7. K-means Clustering – Details  Initial centroids are often chosen randomly. Clusters produced vary from one run to another. The centroid is (typically) the mean of the points in the cluster. ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc. K-means will converge for common similarity measures mentioned above
  • 8. K-means Clustering – Details  Most of the convergence happens in the first few iterations. Often the stopping condition is changed to ‘Until relatively few points change clusters’ Complexity is O( n * K * I * d ) n = number of points, K = number of clusters,  I = number of iterations, d = number of attributes
  • 9. Two different K-means Clusterings  Sub-optimal Clustering  Optimal Clustering 
  • 10. Problems with Selecting Initial Points  If there are K ‘real’ clusters then the chance of selecting one centroid from each cluster is small. Chance is relatively small when K is large If clusters are the same size, n, then For example, if K = 10, then probability = 10!/1010 = 0.00036 Sometimes the initial centroids will readjust themselves in ‘right’ way, and sometimes they don’t Consider an example of five pairs of clusters
  • 11. Solutions to Initial Centroids Problem Multiple runs Helps, but probability is not on your side  Sample and use hierarchical clustering to determine initial centroids  Select more than k initial centroids and then select among these initial centroids Select most widely separated  Bisecting K-means Not as susceptible to initialization issues
  • 12. Evaluating K-means Clusters  Most common measure is Sum of Squared Error (SSE) For each point, the error is the distance to the nearest cluster To get SSE, we square these errors and sum them.  x is a data point in cluster Ciand mi is the representative point for cluster Ci can show that micorresponds to the center (mean) of the cluster
  • 13. Evaluating K-means Clusters  Given two clusters, we can choose the one with the smaller error One easy way to reduce SSE is to increase K, the number of clusters A good clustering with smaller K can have a lower SSE than a poor clustering with higher K
  • 14. Limitations of K-means  K-means has problems when clusters are of differing Sizes Densities Non-globular shapes  K-means has problems when the data contains outliers.  The number of clusters (K) is difficult to determine.
  • 15. Hierarchical Clustering   Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits
  • 16. Strengths of Hierarchical Clustering  Do not have to assume any particular number of clusters Any desired number of clusters can be obtained by ‘cutting’ the dendogram at the proper level  They may correspond to meaningful taxonomies Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, …)
  • 17. Hierarchical Clustering  Two main types of hierarchical clustering Agglomerative: Start with the points as individual clusters At each step, merge the closest pair of clusters until only one cluster (or k clusters) left Divisive: Start with one, all-inclusive cluster At each step, split a cluster until each cluster contains a point (or there are k clusters)
  • 18. Agglomerative Clustering Algorithm  More popular hierarchical clustering technique Basic algorithm is straightforward Compute the proximity matrix Let each data point be a cluster Repeat  Merge the two closest clusters  Update the proximity matrix Until only a single cluster remains
  • 19. Hierarchical Clustering: Group Average  Compromise between Single and Complete Link  Strengths Less susceptible to noise and outliers  Limitations Biased towards globular clusters
  • 20. Hierarchical Clustering: Time and Space requirements  O(N2) space since it uses the proximity matrix. N is the number of points.  O(N3) time in many cases There are N steps and at each step the size, N2, proximity matrix must be updated and searched Complexity can be reduced to O(N2 log(N) ) time for some approaches
  • 21. Hierarchical Clustering: Problems and Limitations  Once a decision is made to combine two clusters, it cannot be undone No objective function is directly minimized  Different schemes have problems with one or more of the following: Sensitivity to noise and outliers (MIN) Difficulty handling different sized clusters and non-convex shapes (Group average, MAX) Breaking large clusters (MAX)
  • 22. conclusion The purpose of clustering in data mining and its types are discussed. The k-means and hierarchical algorithm are explained in detail and their pros and cons are analyzed.
  • 23. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net