SlideShare una empresa de Scribd logo
1 de 28
Beijing, China August 17, 2009 A Framework for Multi-objective Clustering and Its Application to Co-location Mining RachsudaJianthapthaksin, Christoph F. Eick  and Ricardo Vilalta University of Houston, Texas, USA
Talk  Outline What is unique about this work with respect to clustering?  Multi-objective Clustering (MOC)—Objectives and an Architecture Clustering with Plug-in Fitness Functions Filling the Repository with Clusters Creating Final Clusterings Related Work  Co-location Mining Case Study Conclusion and Future Work
1. What is unique about this work with respect to clustering?  Clustering algorithms that support plug-in fitness function are used. Clustering algorithms are run multiple times to create clusters. Clusters are stored in a repository that is updated on the fly; cluster generation is separated from creating the final clustering. The final clustering is created from the clusters in the repository based on user preferences. Our approach needs to seeks for alternative, overlapping clusters.
2. Multi-Objective Clustering (MOC)  The particular problem investigated in this work:  ,[object Object]
Task: Find sets of clusters that a good with respect to two or more objectivesTexas Multi-Objective Clustering Dataset: (longitude,latitude,<concentrations>+)
 Survey MOC Approach Clustering algorithms are run multiple times maximizing different subsets of objectives that are captured in compound fitness functions.  Uses a repository to store promising candidates. Only clusters that satisfying two or more objectives are considered as candidates.  After a sufficient number of clusters has been created, final clustering are generated based on user-preferences. 5
An Architecture for MOC S1 S2 Q’ Clustering  Algorithm Goal-driven Fitness  Function Generator A Spatial Dataset M X Cluster Summarization Unit Q’ M’ Storage Unit S3 S4 Steps in multi-run clustering: S1: Generate a compound fitness function.  S2: Run a clustering algorithm.  S3: Update the cluster repository M. S4: Summarize clusters discovered M’. 6
3. Clustering with Plug-in Fitness Functions Motivation: Finding subgroups in geo-referenced datasets has many applications. However, in many applications the subgroups to be searched for do not share the characteristics considered by traditional clustering algorithms, such as cluster compactness and separation. Domain or task knowledge frequently imposes additional requirements concerning what constitutes a “good” subgroup. Consequently, it is desirable to develop clustering algorithms that provide plug-in fitness functions that allow domain experts to express desirable characteristics of subgroups they are looking for.
Current Suite of Spatial Clustering Algorithms Representative-based: SCEC, SRIDHCR, SPAM, CLEVER Grid-based: SCMRG Agglomerative: MOSAIC Density-based: SCDE, DCONTOUR (not really plug-in but some fitness functions can be simulated) Density-based Grid-based Agglomerative-based Representative-based Clustering Algorithms Remark: All algorithms partition a dataset into  clusters by maximizing a  reward-based, plug-in fitness function.
4. Filling the Repository with Clusters Plug-in Reward functions Rewardq(x) are used to assess to which extend an objective q is satisfied for a cluster x. User defined thresholds q  are used to determine if an objective q is satisfied by a cluster x (Rewardq(x)>q). Only clusters that satisfy 2 or more objectives are stored in the repository. Only non-dominated clusters are stored in the repository. Dominance relations only apply to pairs of clusters that have a certain degree of agreement (overlap) sim.
Dominance and Multi-Objective Clusters Dominance between clusters x and y with respect to multiple objectives Q. Dominance Constraint with Respect to the Repository 10
Compound Fitness Functions The goal-driven fitness function generator selects a subset Q’(Q) of the objectives Q and creates a compound fitness function qQ’relying on a penalty function approach [Baeck et al. 2000]. CmpReward(x)= (qQ’Rewardq(x)) *  Penalty(Q’,x) 11
Updating the Cluster Repository M:= clusters in the repository X:= “new” clusters generated by a single run of the clustering algorithm  12
5. Creating a Final Clustering Final clusterings are subsets of the clusters in the repository M. Inputs: The user provides her own individual objective function RewardU and a reward threshold U and cluster similarity threshold rem  that indicates how much cluster overlap she likes to tolerate. Goal: Find XM that maximizes:     subject to:  1. xXx’X   (xx’  Similarity(x,x’)<rem) 2. xX(RewardU(x)>U) Our paper introduces MO-Dominance-guided Cluster Reduction algorithm (MO-DCR) to create the final clustering.
MO-Dominance-guided Cluster Reduction(MO-DCR) algorithm (MO-DCR) : a dominant  cluster : dominated clusters A The algorithm loops over the following 2 steps until M is empty: Include dominant clusters D which are the highest reward clusters in M’ Remove D and their dominated clusters in the rem-proximity from M. B C Dominance graphs D Remark: AB  RewardU(A)>RewardU(B)  Similarity(A,B)> rem E F sim(A,B)=0.8 rem=0.5 M’ 0.7 0.6 A E A E F 14
6. Related Work Multi-objective clustering based on evolutionary algorithms (MOEA): VIENNA [Handl and Knowles 2004] , MOCLE [Faceli et al. 2007] ,[object Object],Moreover, MOC relies on cluster repositories that store individual clusters and not clusterings and summarization algorithms to create the final clustering. 15
7. Case Study: Co-location Mining Goal: Finding regional co-location patterns where high concentrations of Arsenic are co-located with a lot of other factors in Texas. Remark: Each binary co-location is treated as a single objective. Dataset:  TWDB has monitored water quality and collected the data for 105,814 wells in Texas over last 25 years.  we use a subset of Arsenic_10_avg data set: longitude and latitude, Arsenic (As), Molybdenum (Mo), Vanadium (V), Boron (B), Fluoride (F-), Chloride (Cl-), Sulfate (SO42-) and Total Dissolved Solids (TDS). 16
Objective Functions Used . Q’ Q Q = {q{As,Mo}, q{As,V}, q{As,B}, q{As,F-}, q{As,Cl-}, q{As,SO42-}, q{As,TDS}} RewardB(x) = (B,x)|x| 17
Steps of the Experiment ,[object Object],q{As,Mo}, q{As,V}, q{As,B}, q{As,F-}, q{As,Cl-}, q{As,SO42-}, q{As,TDS}. ,[object Object],	q{As,Mo}=13, q{As,V}=15, q{As,B}=10, q{As,F-}=25, q{As,Cl-}=7,  	q{As, SO42-}=6, q{As,TDS}=8. MOC Users Queries Spatial dataset and  fitness  functions (Q) Regions M’ (M)  with associated  co-location pattern MOC Step 1-3 MOC Step 4 Regions  (M) 18
Experimental Results MOC is able to identify: Multi-objective clusters Alternative clusters e.g. Rank1 regions of (a) and Rank2 regions of (b) Nested clusters e.g.  in (b) Rank3-5 regions are sub-regions of Rank1 region. Particularly discriminate among companion elements such as Vanadium (Rank3 region), or Chloride, Sulfate and Total Dissolved Solids (Rank4 region). (a)                              (b) Fig. 7.6 The top 5 regions and patterns with respect to two queries: query1={As,Mo} and query2={As,B} are shown in Figure (a) and (b), respectively. 19
8. Conclusion and Future Work Building blocks for Future Multi-Objective Clustering Systems were provided in this work; namely:  A dominance relation for problems in which only a subset of the objectives can be satisfied was introduced.  Clustering algorithms with plug-in fitness functions and the capability to create compound fitness functions are excessively used in our approach.  Initially, a repository of potentially useful clusters is generated based on a large set of objectives. Individualized, specific clusterings are then generated based on user preferences.    The approach is highly generic and incorporates specific domain needs in form of single-objective fitness functions. The approach was evaluated in a case study and turned out more suitable than a single-objective clustering approach that was used for the same application in a previous paper [ACM-GIS 2008].
Challenges in Multi-objective Clustering (MOC) Find clusters that are individually good with respect to multiple objectives in an automated fashion. Provide search engine style capabilities to summarize final clustering obtained from multiple runs of clustering algorithms. 21
Traditional Clustering Algorithms & Fitness Functions Clustering Algorithms No  Fitness Function Fixed  Fitness Function Provides Plug-in Fitness Function Implicit  Fitness Function DBSCAN Hierarchical Clustering CHAMELEON Our Work  PAM K-Means Traditional clustering algorithms consider only domain independent and task independent characteristics to form a solution.  Different domain tasks require different fitness functions.  Traditional Clustering Algorithms 22
Code MO-DCR Algorithm 23
Challenges Cluster Summarization Original Clusters A X A B : Eliminated clusters X B A C C X B X C Typical Output DCR  Output 24
Interestingness of a Pattern Interestingness of a pattern B (e.g. B= {C, D, E}) for an object o, Interestingness of a pattern B for a region c, Remark: Purity (i(B,o)>0) measures the percentage of objects  that exhibit pattern B in region c.
Characteristics of the Top5 Regions Table 7.7 Top 5 Regions Ranked by Reward of the Query {As­,Mo­} Table 7.8 Top 5 Regions Ranked by Reward of the Query {As­, B­} 26
Representative-based Clustering 2 Attribute1 1  3 Attribute2 4 Objective: Find a set of objects OR such that the clustering X  obtained by using the objects in OR as representatives minimizes q(X). Properties: Cluster shapes are convex polygons Popular Algorithms: K-means. K-medoids

Más contenido relacionado

La actualidad más candente

GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...IJNSA Journal
 
Machine-learning scoring functions for molecular docking
Machine-learning scoring functions for molecular dockingMachine-learning scoring functions for molecular docking
Machine-learning scoring functions for molecular dockingPedro Ballester
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門Ryo Iwaki
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionMostafa G. M. Mostafa
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningRyo Iwaki
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencleSAT Publishing House
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondXiangrui Meng
 

La actualidad más candente (20)

GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
 
Machine-learning scoring functions for molecular docking
Machine-learning scoring functions for molecular dockingMachine-learning scoring functions for molecular docking
Machine-learning scoring functions for molecular docking
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
11slide
11slide11slide
11slide
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
SciVisHalosFinalPaper
SciVisHalosFinalPaperSciVisHalosFinalPaper
SciVisHalosFinalPaper
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門
 
JavaYDL2
JavaYDL2JavaYDL2
JavaYDL2
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
 
nn network
nn networknn network
nn network
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detection
 
08slide
08slide08slide
08slide
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
B045060813
B045060813B045060813
B045060813
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 

Destacado

Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machinesbutest
 
MikroBasic
MikroBasicMikroBasic
MikroBasicbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
MGT-350 Russell.docx - Cameron School of Business - University of ...
MGT-350 Russell.docx - Cameron School of Business - University of ...MGT-350 Russell.docx - Cameron School of Business - University of ...
MGT-350 Russell.docx - Cameron School of Business - University of ...butest
 
презентацііія2222
презентацііія2222презентацііія2222
презентацііія2222guested712f1
 
Selasa, 28 April 2015
Selasa, 28 April 2015Selasa, 28 April 2015
Selasa, 28 April 2015suarakarya
 
lec21.ppt
lec21.pptlec21.ppt
lec21.pptbutest
 

Destacado (8)

Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 
MikroBasic
MikroBasicMikroBasic
MikroBasic
 
ppt
pptppt
ppt
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
MGT-350 Russell.docx - Cameron School of Business - University of ...
MGT-350 Russell.docx - Cameron School of Business - University of ...MGT-350 Russell.docx - Cameron School of Business - University of ...
MGT-350 Russell.docx - Cameron School of Business - University of ...
 
презентацііія2222
презентацііія2222презентацііія2222
презентацііія2222
 
Selasa, 28 April 2015
Selasa, 28 April 2015Selasa, 28 April 2015
Selasa, 28 April 2015
 
lec21.ppt
lec21.pptlec21.ppt
lec21.ppt
 

Similar a Mining Regional Knowledge in Spatial Dataset

Mining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial DatasetMining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial Datasetbutest
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkSri Ambati
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsPooyan Jamshidi
 
Identifier namespaces in mathematical notation
Identifier namespaces in mathematical notationIdentifier namespaces in mathematical notation
Identifier namespaces in mathematical notationAlexey Grigorev
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
 
Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...ijdmtaiir
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 

Similar a Mining Regional Knowledge in Spatial Dataset (20)

Mining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial DatasetMining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial Dataset
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Identifier namespaces in mathematical notation
Identifier namespaces in mathematical notationIdentifier namespaces in mathematical notation
Identifier namespaces in mathematical notation
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
My8clst
My8clstMy8clst
My8clst
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 
Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 

Más de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Más de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Mining Regional Knowledge in Spatial Dataset

  • 1. Beijing, China August 17, 2009 A Framework for Multi-objective Clustering and Its Application to Co-location Mining RachsudaJianthapthaksin, Christoph F. Eick and Ricardo Vilalta University of Houston, Texas, USA
  • 2. Talk Outline What is unique about this work with respect to clustering? Multi-objective Clustering (MOC)—Objectives and an Architecture Clustering with Plug-in Fitness Functions Filling the Repository with Clusters Creating Final Clusterings Related Work Co-location Mining Case Study Conclusion and Future Work
  • 3. 1. What is unique about this work with respect to clustering? Clustering algorithms that support plug-in fitness function are used. Clustering algorithms are run multiple times to create clusters. Clusters are stored in a repository that is updated on the fly; cluster generation is separated from creating the final clustering. The final clustering is created from the clusters in the repository based on user preferences. Our approach needs to seeks for alternative, overlapping clusters.
  • 4.
  • 5. Task: Find sets of clusters that a good with respect to two or more objectivesTexas Multi-Objective Clustering Dataset: (longitude,latitude,<concentrations>+)
  • 6. Survey MOC Approach Clustering algorithms are run multiple times maximizing different subsets of objectives that are captured in compound fitness functions. Uses a repository to store promising candidates. Only clusters that satisfying two or more objectives are considered as candidates. After a sufficient number of clusters has been created, final clustering are generated based on user-preferences. 5
  • 7. An Architecture for MOC S1 S2 Q’ Clustering Algorithm Goal-driven Fitness Function Generator A Spatial Dataset M X Cluster Summarization Unit Q’ M’ Storage Unit S3 S4 Steps in multi-run clustering: S1: Generate a compound fitness function. S2: Run a clustering algorithm. S3: Update the cluster repository M. S4: Summarize clusters discovered M’. 6
  • 8. 3. Clustering with Plug-in Fitness Functions Motivation: Finding subgroups in geo-referenced datasets has many applications. However, in many applications the subgroups to be searched for do not share the characteristics considered by traditional clustering algorithms, such as cluster compactness and separation. Domain or task knowledge frequently imposes additional requirements concerning what constitutes a “good” subgroup. Consequently, it is desirable to develop clustering algorithms that provide plug-in fitness functions that allow domain experts to express desirable characteristics of subgroups they are looking for.
  • 9. Current Suite of Spatial Clustering Algorithms Representative-based: SCEC, SRIDHCR, SPAM, CLEVER Grid-based: SCMRG Agglomerative: MOSAIC Density-based: SCDE, DCONTOUR (not really plug-in but some fitness functions can be simulated) Density-based Grid-based Agglomerative-based Representative-based Clustering Algorithms Remark: All algorithms partition a dataset into clusters by maximizing a reward-based, plug-in fitness function.
  • 10. 4. Filling the Repository with Clusters Plug-in Reward functions Rewardq(x) are used to assess to which extend an objective q is satisfied for a cluster x. User defined thresholds q are used to determine if an objective q is satisfied by a cluster x (Rewardq(x)>q). Only clusters that satisfy 2 or more objectives are stored in the repository. Only non-dominated clusters are stored in the repository. Dominance relations only apply to pairs of clusters that have a certain degree of agreement (overlap) sim.
  • 11. Dominance and Multi-Objective Clusters Dominance between clusters x and y with respect to multiple objectives Q. Dominance Constraint with Respect to the Repository 10
  • 12. Compound Fitness Functions The goal-driven fitness function generator selects a subset Q’(Q) of the objectives Q and creates a compound fitness function qQ’relying on a penalty function approach [Baeck et al. 2000]. CmpReward(x)= (qQ’Rewardq(x)) * Penalty(Q’,x) 11
  • 13. Updating the Cluster Repository M:= clusters in the repository X:= “new” clusters generated by a single run of the clustering algorithm 12
  • 14. 5. Creating a Final Clustering Final clusterings are subsets of the clusters in the repository M. Inputs: The user provides her own individual objective function RewardU and a reward threshold U and cluster similarity threshold rem that indicates how much cluster overlap she likes to tolerate. Goal: Find XM that maximizes: subject to: 1. xXx’X (xx’  Similarity(x,x’)<rem) 2. xX(RewardU(x)>U) Our paper introduces MO-Dominance-guided Cluster Reduction algorithm (MO-DCR) to create the final clustering.
  • 15. MO-Dominance-guided Cluster Reduction(MO-DCR) algorithm (MO-DCR) : a dominant cluster : dominated clusters A The algorithm loops over the following 2 steps until M is empty: Include dominant clusters D which are the highest reward clusters in M’ Remove D and their dominated clusters in the rem-proximity from M. B C Dominance graphs D Remark: AB  RewardU(A)>RewardU(B)  Similarity(A,B)> rem E F sim(A,B)=0.8 rem=0.5 M’ 0.7 0.6 A E A E F 14
  • 16.
  • 17. 7. Case Study: Co-location Mining Goal: Finding regional co-location patterns where high concentrations of Arsenic are co-located with a lot of other factors in Texas. Remark: Each binary co-location is treated as a single objective. Dataset: TWDB has monitored water quality and collected the data for 105,814 wells in Texas over last 25 years. we use a subset of Arsenic_10_avg data set: longitude and latitude, Arsenic (As), Molybdenum (Mo), Vanadium (V), Boron (B), Fluoride (F-), Chloride (Cl-), Sulfate (SO42-) and Total Dissolved Solids (TDS). 16
  • 18. Objective Functions Used . Q’ Q Q = {q{As,Mo}, q{As,V}, q{As,B}, q{As,F-}, q{As,Cl-}, q{As,SO42-}, q{As,TDS}} RewardB(x) = (B,x)|x| 17
  • 19.
  • 20. Experimental Results MOC is able to identify: Multi-objective clusters Alternative clusters e.g. Rank1 regions of (a) and Rank2 regions of (b) Nested clusters e.g. in (b) Rank3-5 regions are sub-regions of Rank1 region. Particularly discriminate among companion elements such as Vanadium (Rank3 region), or Chloride, Sulfate and Total Dissolved Solids (Rank4 region). (a) (b) Fig. 7.6 The top 5 regions and patterns with respect to two queries: query1={As,Mo} and query2={As,B} are shown in Figure (a) and (b), respectively. 19
  • 21. 8. Conclusion and Future Work Building blocks for Future Multi-Objective Clustering Systems were provided in this work; namely: A dominance relation for problems in which only a subset of the objectives can be satisfied was introduced. Clustering algorithms with plug-in fitness functions and the capability to create compound fitness functions are excessively used in our approach. Initially, a repository of potentially useful clusters is generated based on a large set of objectives. Individualized, specific clusterings are then generated based on user preferences. The approach is highly generic and incorporates specific domain needs in form of single-objective fitness functions. The approach was evaluated in a case study and turned out more suitable than a single-objective clustering approach that was used for the same application in a previous paper [ACM-GIS 2008].
  • 22. Challenges in Multi-objective Clustering (MOC) Find clusters that are individually good with respect to multiple objectives in an automated fashion. Provide search engine style capabilities to summarize final clustering obtained from multiple runs of clustering algorithms. 21
  • 23. Traditional Clustering Algorithms & Fitness Functions Clustering Algorithms No Fitness Function Fixed Fitness Function Provides Plug-in Fitness Function Implicit Fitness Function DBSCAN Hierarchical Clustering CHAMELEON Our Work PAM K-Means Traditional clustering algorithms consider only domain independent and task independent characteristics to form a solution. Different domain tasks require different fitness functions. Traditional Clustering Algorithms 22
  • 25. Challenges Cluster Summarization Original Clusters A X A B : Eliminated clusters X B A C C X B X C Typical Output DCR Output 24
  • 26. Interestingness of a Pattern Interestingness of a pattern B (e.g. B= {C, D, E}) for an object o, Interestingness of a pattern B for a region c, Remark: Purity (i(B,o)>0) measures the percentage of objects that exhibit pattern B in region c.
  • 27. Characteristics of the Top5 Regions Table 7.7 Top 5 Regions Ranked by Reward of the Query {As­,Mo­} Table 7.8 Top 5 Regions Ranked by Reward of the Query {As­, B­} 26
  • 28. Representative-based Clustering 2 Attribute1 1 3 Attribute2 4 Objective: Find a set of objects OR such that the clustering X obtained by using the objects in OR as representatives minimizes q(X). Properties: Cluster shapes are convex polygons Popular Algorithms: K-means. K-medoids
  • 29. 5. CLEVER (ClustEring using representatiVEs and Randomized hill climbing) Is a representative-based, sometimes called prototype-based clustering algorithm Uses variable number of clusters and larger neighborhood sizes to battle premature termination and randomized hill climbing and adaptive sampling to reduce complexity. Searches for optimal number of clusters

Notas del editor

  1. —radical depart from traditional clustering, MOC Relies on clustering algorithms that support plug-in fitness functions, and multi-run clustering with repository to generate a large number of clusters.It generates clusters satisfying two or more objectives (not clustering level), and supports different domain specific tasks assigned by users.Is incremental approach that collects and refines clusters on the fly, and the search for alternative clusters takes into consideration what clusters already have been generated, rewarding novelty.Support different domain specific tasks assigned by users.Provide search engine type capabilities to users, enabling them to query a large set of clusters with respect to different objectives and thresholds to generate final clusterings.
  2. The architecture of the MOC system that we propose is depicted in the figure given; it consists of 4 main components: a clustering algorithm, storage unit, goal-driven fitness function generator and cluster summarization unit. Steps in MOC include: first, the goal-driven fitness function generator selects a new compound fitness function for the clustering algorithm, which generates a new clustering X in the second step. Third, the storage unit updates its repository M using the clusters in X. The algorithm iterates over these three steps until a large number of clusters has been obtained. Later, in the fourth step, the cluster summarization unit produces final clusters based on user preferences which are subsets of the clusters in M.
  3. In general, M should only store non-dominated clusters, and algorithms that update M should not violate this constraint
  4. In particular to step 1, the goal-driven fitness function generator selects pairs of single-objectivefitness functions (q, q’) with q, q’Qto create a compound fitness function Q’ by considering all combinations of two fitness functions in Q. In general, the compound fitness function qQ’ is the sums of the rewards for all objectives qQ’; however, it gives more reward to clusters xi that satisfy all objectives in order to motivate theclustering algorithm to seek for multi-objective clusters.
  5. The algorithm loops over the following 2 steps until there is no cluster to be removed in the second step:1. Compute dominant clusters which are the highest quality clusters, and2. remove their dominated clusters in the rem-proximity.We illustrate MO-DCR algorithm using dominance graphs.In the first iteration A is a dominant cluster, and its dominated clusters B and C, which are very similar to A (in close proximity), are removed.In the second iteration D is found as a dominant cluster, and its dominated clusters E and F, are removed.Consequently, the final clustering consists of two clusters: A and D.
  6. Related work are Multi-objective clustering based on evolutionary algorithms and scatter tabu search.Similar to MOEA, our approach evaluates cluster quality by using a combination of fitness functions. In contrast, we selectively store and refine clusters on the fly to maintain efficacy of storage management and cluster comparison.
  7. We demonstrate the benefits of the proposed multi-objective clustering framework in a real world water pollution case study. The goal is to find regional co-location patterns of high concentrations of Arsenic with other factors in Texas.TWDB has monitored water quality and collected the data for 105,814 wells in Texas over last 25 years. For this experiment we used a water well dataset called Arsenic_10_avg that contains3 spatial attributes and other 8 non-spatial attributes of chemical concentrations including Arsenic concentrations
  8. Challenges in Multi-objective Clustering includeFind clusters that are individually good with respect to multiple fitness functions in automated fashion.Provide search engine style capabilities to summarize final clustering obtained from multiple runs of clustering algorithms.
  9. We can categorize clustering algorithms into 4 types based on the use of fitness functions.First, the clustering algorithms that do not use fitness function, for example, DBSCAN and hierarchical clustering.Second, the algorithms that use fitness function implicitly, for example, k-means.Third, the algorithms that use fixed fitness function, for example, PAM.And Forth, the algorithms that provide plug-in fitness function, for example, CHAMELEON and MOSAIC.Using traditional clustering algorithms for region discovery face significant limitations:As we just mentioned, traditional clustering algorithms consider only domain independent and task independent characteristics; for example. k-means algorithm considers cluster compactness to form a solution. However, in region discovery, such as hotspot discovery of high arsenic concentration, it requires the clustering algorithm to find clusters based on task specific characteristics. Therefore, clustering with plug-in fitness functions is desirable because it can capture what groupings are of particular interest to domain experts. 2) To the best of our knowledge, there is no single fitness function that effectively serves different domain tasks. Therefore, it is necessary to design task specific families of fitness functions.
  10. Step 4 provides search engine type capabilities to users, enabling them to query a large set of clusters with respect to different objectives and thresholds to generate final clusterings.Goal of MO-Dominance-guided Cluster Reduction algorithm (MO-DCR) isto return a clustering that is good with respect to multiple objectivesselected by a user, and to remove clusters that are highly overlapping.
  11. DCR algorithm is one of effective algorithms that generate final clustering. Normally we can straightforwardly produce the result by removing all dominated clusters. However, this way will remove too many clusters, for example, clusters A dominates and overlaps B, and B dominates and overlaps C. DCR does not remove C because it does not spatially overlap with A.Finally, MRC perform the tasks of parameter selections, finding alternative clusters and summarizing clusters in a highly automated fashion.