SlideShare una empresa de Scribd logo
1 de 19
Cluster Analysis Advanced Quantitative Data Analysis
Supervised and Unsupervised Learning Logistic regression and Fisher’s LDA and QDA are examples of supervised learning. This means that there is a ‘training set’ which contains known classifications into groups that can be used to derive a classification rule. This can be then evaluated on a ‘test set’, or this can be done repeatedly using cross validation.
Unsupervised Learning Unsupervised learning means (in this instance) that we are trying to discover a division of objects into classes without any training set of known classes, without knowing in advance what the classes are, or even how many classes there are. ‘Cluster analysis’, or simply ‘clustering’ is a collection of methods for unsupervised class discovery
Distance Measures It turns out that the most crucial decision to make in choosing a clustering method is defining what it means for two vectors to be close or far. There are other components to the choice, but these are all secondary Often the distance measure is implicit in the choice of method, but a wise decision maker knows what he/she is choosing.
A true distance, or metric, is a function defined on pairs of objects that satisfies a number of properties: D(x,y) = D(y,x)  D(x,y) ≥ 0 D(x,y) = 0  x = y  D(x,y) + D(y,z) ≥ D(x,z) (triangle inequality) The classic example of a metric is Euclidean distance. If x = (x1,x2,…xp), and y=(y1,y2,…yp) , are vectors, the Euclidean distance is [(x1-y1)2+(xp-yp)2]
Euclidean Distance y = (y1,y2) D(x,y) |x2-y2| |x1-y1| x = (x1,x2)
Mahalanobis Distance Mahalanobis distance is a kind of weighted Euclidean distance It produces distance contours of the same shape as a data distribution It is often more appropriate than Euclidean distance
Agglomerative Hierarchical Clustering We start with all data items as individuals In step 1, we join the two closest individuals In each subsequent step, we join the two closest individuals or clusters This requires defining the distance between two groups as a number that can be compared to the distance between individuals We can use the R command hclust
Group Distances Complete link clustering defines the distance between two groups as the maximum distance between any element of one group and any of the other Single link clustering defines the distance between two groups as the minimum distance between any element of one group and any of the other Average link clustering defines the distance between two groups as the mean distance between elements of one group and elements of the other
A graphical example Single linkage Complete linkage Average linkage
iris.d<- dist(iris[,1:4]) iris.hc<- hclust(iris.d) plot(iris.hc)
Divisive Clustering Divisive clustering begins with the whole data set as a cluster, and considers dividing it into k clusters. Usually this is done to optimize some criterion such as the ratio of the within cluster variation to the between cluster variation The choice of k is important
K-means is a widely used divisive algorithm (R command kmeans) Its major weakness is that it uses Euclidean distance
iris.km <- kmeans(iris[,1:4],3) plot(prcomp(iris[,1:4])$x,col=iris.km$cluster) table(iris.km$cluster,iris[,5])     setosa versicolor virginica   1  0     48         14          2  0      2         36          3 50      0          0
heatmap(as.matrix(swiss))
Multidimensional Scaling mds<- isoMDS(dist(USArrests)) plot(mds$points, type = "n ") text(mds$points, labels = as.character(rownames(USArrests)))

Más contenido relacionado

La actualidad más candente

Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering TypesSuryakumar Thangarasu
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy dataVivek Gandhi
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Program_Cluster_Analysis
Program_Cluster_AnalysisProgram_Cluster_Analysis
Program_Cluster_AnalysisSammya Sengupta
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with RAkira Murakami
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...Raed Aldahdooh
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)snegacmr
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingAmuthamca
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7Birat Sharma
 

La actualidad más candente (20)

Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering Types
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy data
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Program_Cluster_Analysis
Program_Cluster_AnalysisProgram_Cluster_Analysis
Program_Cluster_Analysis
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
08 clustering
08 clustering08 clustering
08 clustering
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Lect4
Lect4Lect4
Lect4
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7
 

Similar a Clustering

An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster AnalysisSSA KPI
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentssriharipatilin
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.pptsatvikpatil5
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxSyed Ejaz
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdfLibya Thomas
 

Similar a Clustering (20)

An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Clustering
ClusteringClustering
Clustering
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
PR07.pdf
PR07.pdfPR07.pdf
PR07.pdf
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
similarities-knn.pptx
similarities-knn.pptxsimilarities-knn.pptx
similarities-knn.pptx
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
 
Cs345 cl
Cs345 clCs345 cl
Cs345 cl
 
Classifiers
ClassifiersClassifiers
Classifiers
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.ppt
 
Clustering
ClusteringClustering
Clustering
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdf
 

Más de Alberto Labarga

El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017Alberto Labarga
 
Shokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanShokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanAlberto Labarga
 
Genome visualization challenges
Genome visualization challengesGenome visualization challenges
Genome visualization challengesAlberto Labarga
 
SocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaSocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaAlberto Labarga
 
Hacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetHacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetAlberto Labarga
 
hacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligentehacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligenteAlberto Labarga
 
Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Alberto Labarga
 
Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Alberto Labarga
 
Arduino: Control de motores
Arduino: Control de motoresArduino: Control de motores
Arduino: Control de motoresAlberto Labarga
 
Entrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoEntrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoAlberto Labarga
 
Práctica con Arduino: Simon Dice
Práctica con Arduino: Simon DicePráctica con Arduino: Simon Dice
Práctica con Arduino: Simon DiceAlberto Labarga
 
Entrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoEntrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoAlberto Labarga
 
Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Alberto Labarga
 
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Alberto Labarga
 
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Alberto Labarga
 
Introducción a la impresión 3D
Introducción a la impresión 3DIntroducción a la impresión 3D
Introducción a la impresión 3DAlberto Labarga
 

Más de Alberto Labarga (20)

El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017
 
Shokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanShokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob Dylan
 
Genome visualization challenges
Genome visualization challengesGenome visualization challenges
Genome visualization challenges
 
SocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaSocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativa
 
Hacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetHacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin Street
 
hacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligentehacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligente
 
jpd5 big data
jpd5 big datajpd5 big data
jpd5 big data
 
Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015
 
Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9
 
myHealthHackmedicine
myHealthHackmedicinemyHealthHackmedicine
myHealthHackmedicine
 
Big Data y Salud
Big Data y SaludBig Data y Salud
Big Data y Salud
 
Arduino: Control de motores
Arduino: Control de motoresArduino: Control de motores
Arduino: Control de motores
 
Entrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoEntrada/salida analógica con Arduino
Entrada/salida analógica con Arduino
 
Práctica con Arduino: Simon Dice
Práctica con Arduino: Simon DicePráctica con Arduino: Simon Dice
Práctica con Arduino: Simon Dice
 
Entrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoEntrada/Salida digital con Arduino
Entrada/Salida digital con Arduino
 
Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014
 
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
 
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
 
Introducción a la impresión 3D
Introducción a la impresión 3DIntroducción a la impresión 3D
Introducción a la impresión 3D
 
Vidas Contadas
Vidas ContadasVidas Contadas
Vidas Contadas
 

Último

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 

Último (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 

Clustering

  • 1. Cluster Analysis Advanced Quantitative Data Analysis
  • 2. Supervised and Unsupervised Learning Logistic regression and Fisher’s LDA and QDA are examples of supervised learning. This means that there is a ‘training set’ which contains known classifications into groups that can be used to derive a classification rule. This can be then evaluated on a ‘test set’, or this can be done repeatedly using cross validation.
  • 3. Unsupervised Learning Unsupervised learning means (in this instance) that we are trying to discover a division of objects into classes without any training set of known classes, without knowing in advance what the classes are, or even how many classes there are. ‘Cluster analysis’, or simply ‘clustering’ is a collection of methods for unsupervised class discovery
  • 4. Distance Measures It turns out that the most crucial decision to make in choosing a clustering method is defining what it means for two vectors to be close or far. There are other components to the choice, but these are all secondary Often the distance measure is implicit in the choice of method, but a wise decision maker knows what he/she is choosing.
  • 5. A true distance, or metric, is a function defined on pairs of objects that satisfies a number of properties: D(x,y) = D(y,x) D(x,y) ≥ 0 D(x,y) = 0  x = y D(x,y) + D(y,z) ≥ D(x,z) (triangle inequality) The classic example of a metric is Euclidean distance. If x = (x1,x2,…xp), and y=(y1,y2,…yp) , are vectors, the Euclidean distance is [(x1-y1)2+(xp-yp)2]
  • 6. Euclidean Distance y = (y1,y2) D(x,y) |x2-y2| |x1-y1| x = (x1,x2)
  • 7. Mahalanobis Distance Mahalanobis distance is a kind of weighted Euclidean distance It produces distance contours of the same shape as a data distribution It is often more appropriate than Euclidean distance
  • 8.
  • 9.
  • 10.
  • 11. Agglomerative Hierarchical Clustering We start with all data items as individuals In step 1, we join the two closest individuals In each subsequent step, we join the two closest individuals or clusters This requires defining the distance between two groups as a number that can be compared to the distance between individuals We can use the R command hclust
  • 12. Group Distances Complete link clustering defines the distance between two groups as the maximum distance between any element of one group and any of the other Single link clustering defines the distance between two groups as the minimum distance between any element of one group and any of the other Average link clustering defines the distance between two groups as the mean distance between elements of one group and elements of the other
  • 13. A graphical example Single linkage Complete linkage Average linkage
  • 14. iris.d<- dist(iris[,1:4]) iris.hc<- hclust(iris.d) plot(iris.hc)
  • 15. Divisive Clustering Divisive clustering begins with the whole data set as a cluster, and considers dividing it into k clusters. Usually this is done to optimize some criterion such as the ratio of the within cluster variation to the between cluster variation The choice of k is important
  • 16. K-means is a widely used divisive algorithm (R command kmeans) Its major weakness is that it uses Euclidean distance
  • 17. iris.km <- kmeans(iris[,1:4],3) plot(prcomp(iris[,1:4])$x,col=iris.km$cluster) table(iris.km$cluster,iris[,5]) setosa versicolor virginica 1 0 48 14 2 0 2 36 3 50 0 0
  • 19. Multidimensional Scaling mds<- isoMDS(dist(USArrests)) plot(mds$points, type = "n ") text(mds$points, labels = as.character(rownames(USArrests)))