SlideShare una empresa de Scribd logo
1 de 12
Cluster	
  Forests	
  
Presented	
  by:	
  Romit	
  Singhai	
  
	
  
References	
  
Cluster	
  Forests	
  
Donghui	
  Yan	
  
	
  Department	
  of	
  Sta=s=cs	
  
University	
  of	
  California,	
  Berkeley	
  
Aiyou	
  Chen	
  (Google)	
  
Michael	
  I.	
  Jordan	
  (U.C.	
  Berkeley)	
  
Overview	
  
•  Clustering	
  aims	
  to	
  par==on	
  a	
  set	
  of	
  data	
  such	
  
that	
  points	
  are	
  “similar”	
  within	
  the	
  same	
  
cluster	
  while	
  “dissimilar”	
  across	
  clusters.	
  
v One	
  of	
  the	
  fundamental	
  task	
  in	
  machine	
  learning	
  
and	
  paOern	
  classifica=on	
  
v Applicable	
  in	
  wide	
  scien=fic	
  and	
  business	
  domains	
  
Challenges	
  
•  Modern	
  data	
  	
  has	
  addi=onal	
  challenges	
  
v High	
  dimensionality	
  
v Huge	
  number	
  of	
  observa=ons	
  
v Increasingly	
  complex	
  
Mo=va=on	
  
•  Ensemble	
  to	
  achieve	
  best	
  performance	
  
•  Can	
  we	
  develop	
  a	
  clustering	
  analogy	
  to	
  RF?	
  
•  Unifying	
  view	
  of	
  clustering	
  and	
  classifica=on	
  
General	
  Approach	
  
Cluster	
  ensemble	
  methods	
  generally	
  consist	
  of	
  
two	
  stages	
  
v Genera=on	
  of	
  clustering	
  instances	
  
v Aggrega=on	
  of	
  mul=ple	
  clustering	
  instances	
  
	
  
Algorithmic	
  descrip=on	
  of	
  CF	
  
Experiments	
  
Dataset	
   #	
  Features	
   #	
  Classes	
  
Soybean	
   35	
   4	
  
ImageSeg	
   19	
   7	
  
SPECT	
   22	
   2	
  
Heart	
   13	
   2	
  
Wine	
   13	
   3	
  
WDBC	
   30	
   2	
  
Robot	
   164	
   5	
  
Madelon	
   500	
   2	
  
Performance	
  Metrics	
  
•  Propor=on	
  of	
  pairs	
  of	
  points	
  with	
  “correct”	
  co-­‐
cluster	
  membership	
  
Pr	
  =	
  (#	
  correctly	
  clustered	
  pairs/Total	
  #	
  pairs)	
  X	
  100	
  
%	
  
•  Clustering	
  accuracy	
  	
  
Pc	
  =	
  (#	
  points	
  with	
  “correct”	
  cluster	
  membership/
Total	
  #	
  points)	
  X	
  100	
  %	
  
	
  
•  Assume	
  availability	
  of	
  “true”	
  labels	
  for	
  the	
  
datasets	
  
Results	
  under	
  Pr	
  
Experiments	
  on	
  eight	
  UC	
  Irvine	
  datasets	
  
Results	
  under	
  Pc	
  
Conclusions	
  
•  CF	
  is	
  cluster	
  ensemble	
  method	
  that	
  
incorporates	
  model	
  selec=on	
  
•  Good	
  empirical	
  performance	
  

Más contenido relacionado

Similar a Cluster Forest

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataMark Heckmann
 
Data mining with Weka
Data mining with WekaData mining with Weka
Data mining with WekaAlbanLevy
 
Presentation
PresentationPresentation
Presentationbutest
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...butest
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint febimu409
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slidespannicle
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 

Similar a Cluster Forest (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid data
 
Mattar_PhD_Thesis
Mattar_PhD_ThesisMattar_PhD_Thesis
Mattar_PhD_Thesis
 
Data mining with Weka
Data mining with WekaData mining with Weka
Data mining with Weka
 
Presentation
PresentationPresentation
Presentation
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 

Último

basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 

Último (20)

basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 

Cluster Forest

  • 1. Cluster  Forests   Presented  by:  Romit  Singhai    
  • 2. References   Cluster  Forests   Donghui  Yan    Department  of  Sta=s=cs   University  of  California,  Berkeley   Aiyou  Chen  (Google)   Michael  I.  Jordan  (U.C.  Berkeley)  
  • 3. Overview   •  Clustering  aims  to  par==on  a  set  of  data  such   that  points  are  “similar”  within  the  same   cluster  while  “dissimilar”  across  clusters.   v One  of  the  fundamental  task  in  machine  learning   and  paOern  classifica=on   v Applicable  in  wide  scien=fic  and  business  domains  
  • 4. Challenges   •  Modern  data    has  addi=onal  challenges   v High  dimensionality   v Huge  number  of  observa=ons   v Increasingly  complex  
  • 5. Mo=va=on   •  Ensemble  to  achieve  best  performance   •  Can  we  develop  a  clustering  analogy  to  RF?   •  Unifying  view  of  clustering  and  classifica=on  
  • 6. General  Approach   Cluster  ensemble  methods  generally  consist  of   two  stages   v Genera=on  of  clustering  instances   v Aggrega=on  of  mul=ple  clustering  instances    
  • 8. Experiments   Dataset   #  Features   #  Classes   Soybean   35   4   ImageSeg   19   7   SPECT   22   2   Heart   13   2   Wine   13   3   WDBC   30   2   Robot   164   5   Madelon   500   2  
  • 9. Performance  Metrics   •  Propor=on  of  pairs  of  points  with  “correct”  co-­‐ cluster  membership   Pr  =  (#  correctly  clustered  pairs/Total  #  pairs)  X  100   %   •  Clustering  accuracy     Pc  =  (#  points  with  “correct”  cluster  membership/ Total  #  points)  X  100  %     •  Assume  availability  of  “true”  labels  for  the   datasets  
  • 10. Results  under  Pr   Experiments  on  eight  UC  Irvine  datasets  
  • 12. Conclusions   •  CF  is  cluster  ensemble  method  that   incorporates  model  selec=on   •  Good  empirical  performance