SlideShare a Scribd company logo
1 of 6
Download to read offline
Classification vs. Prediction
Classification:
– predicts categorical (discrete and unordered) class
labels
– classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new
data
Prediction:
– models continuous-valued functions to predicts
unknown or missing values
Chapter 4 -Classification Cont...
Cont...
Prediction is similar to classification
– First, construct a model
– Second, use model to predict unknown value
Major method for prediction is regression
– Linear and multiple regression
– Non-linear regression
Prediction is different from classification
– Classification refers to predict categorical class label
– Prediction models continuous-valued functions
Recall- Data Mining Models and Tasks
Classification
 Classification process involves two steps
1. Model construction:
 refers to describing a set of predetermined classes using training
data set
 The training data is a set of tuples where Each tuple/sample is
assumed to belong to a predefined class, as determined by the
class label attribute
The model is represented as classification rules, decision trees, or
mathematical formulae
2. Model usage:
 Refers to using the model for classifying future or unknown
objects
 Or explaining some scenario with some accuracy after testing
Classification—A Two-Step Process –cont...
 Model construction: describing a set of predetermined
classes
– Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
– The set of tuples used for model construction: training set
– The model is represented as classification rules, decision trees,
or mathematical formulae
 Model usage: for classifying future or unknown objects
– Estimate accuracy of the model
The known label of test sample is compared with the
classified result from the model
Accuracy rate is the percentage of test set samples that are
correctly classified by the model
Test set is independent of training set, otherwise over-fitting
will occur
Classification Process (1): Model Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
Classification Process (2): Use the Model in Prediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Supervised vs. Unsupervised Learning
 Supervised learning (classification)
– Supervision: The training data (observations, measurements,
etc.) are accompanied by labels indicating the class of the
observations
– New data is classified based on the training set
 Unsupervised learning (clustering)
– The class labels of training data is unknown
– Given a set of measurements, observations, etc. with the aim
of establishing the existence of classes or clusters in the data
– Usually classification follows after clustering
Issues regarding classification and prediction: Data
Preparation
Data cleaning
– Preprocess data in order to reduce noise and
handle missing values
Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
Data transformation
– Generalize and/or normalize data
Issues regarding classification and prediction :
Evaluating Classification Methods
 Predictive accuracy
– Measure how accurate is the classifier to predict object class label
 Speed
– This refers to the computational costs involved in generating and using the
given classifier or predictor
– time to construct the model
– time to use the model
 Scalability
– This refers to the ability to construct the classifier or predictor efficiently
given large amounts of data
 Robustness
– This is the ability of the classifier or predictor to make correct predictions
given noisy data or data with missing values
 Interpretability:
– This refers to the level of understanding and insight that is provided by the
classifier or predictor.
Classification: Technical Definition
 Given a collection of records (training set )
– Each record contains a set of attributes, one of the
attributes is the class.
 Find a model for class attribute as a function
of the values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
– A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to build
the model and test set used to validate it.
Thus Classification
 Classification is a data mining (machine learning) technique
used to predict group membership for data instances.
 Given a collection of records (training set), each record
contains a set of attributes, one of the attributes is the class.
– Find a model for class attribute as a function of the values of other
attributes.
 Goal: previously unseen records should be assigned a class
as accurately as possible. A test set is used to determine the
accuracy of the model.
– Usually, the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
 For example, one may use classification to predict whether the weather
on a particular day will be “sunny”, “rainy” or “cloudy”.
Illustrating Classification Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
Classification methods
 Goal: Predict class Ci = f(x1, x2, .. Xn)
 There are various classification methods. Popular
classification techniques include the following.
– Decision tree classifier: divide decision space into
piecewise constant regions.
– Rule based – Association based classifier
– K-Nearest Neighbour: classify based on similarity
measurement
– Neural networks: partition by non-linear boundaries
– Bayesian network: a probabilistic model
– Support vector machine: solves non-linearly separable
problems
Simple classification using decision tree
Decision tree classifier
 Decision tree performs classification by constructing a tree
based on training instances with leaves having class labels.
– The tree is traversed for each test instance to find a leaf,
and the class of the leaf is the predicted class. This is a
directed knowledge discovery in the sense that there is a
specific field whose value we want to predict.
 Widely used learning method. It has been applied to:
– classify medical patients based on the disease,
– equipment malfunction by cause,
– loan applicant by likelihood of payment.
– Accidents by severity
Pros and Cons of decision trees
Cons
- Cannot handle complicated
relationship between features
- simple decision boundaries
- problems with lots of missing
data
Pros
• Reasonable training time
• Fast application
• Easy to interpret
• Easy to implement
• Can handle large number of
features
Why decision tree induction in data mining?
• Relatively faster learning speed (than other classification
methods)
• Convertible to simple and easy to understand
classification if-then-else rules
• Comparable classification accuracy with other methods
• Does not require any prior knowledge of data
distribution, works well on noisy data.
Chapter 5 - Cluster Analysis
What is Cluster Analysis?
 Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
 Cluster analysis
– Grouping a set of data objects into clusters
 Clustering is unsupervised classification: no predefined classes
 Typical applications
– As a stand-alone tool to get insight into data distribution
– As a preprocessing step for other algorithms
Clustering cont…
Given a set of points, with a
notion of distance between
points, group the points into
some number of clusters, so
that members of a cluster are
in some sense as close to
each other as possible.
While data points in the
same cluster are similar, those
in separate clusters are
dissimilar to one another.
x x
x x x x
x x x x
x x x
x x
x
xx x
x x
x x x
x
x x x
x
x x
x x x x
x x x
x
• Clustering is a data mining (machine learning) technique that
finds similarities between data according to the characteristics
found in the data & groups similar data objects into one cluster
Cont…
 Thus Cluster Analysis
– Finding groups of objects such that the objects in a group will be
similar (or related) to one another and different from (or unrelated
to) the objects in other groups
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
Quality: What Is Good Clustering?
• A good clustering method will produce high quality clusters
with
– high intra-class similarity
– low inter-class similarity
• The quality of a clustering result depends on both the
similarity measure used by the method and its
implementation
– Key requirement of clustering: Need a good measure of similarity
between instances.
• The quality of a clustering method is also measured by its
ability to discover some or all of the hidden patterns in the
given datasets
Requirements of Clustering in Data Mining
 Scalability
– Highly scalable algorithms are needed for clustering on large databases like DW
 Ability to deal with different types of attributes
– Clustering may be performed also on binary, categorical and ordinal data
 Discovery of clusters with arbitrary shape
– Most algorithms tend to find spherical clusters
 Minimal requirements for domain knowledge to determine input parameters
– Clustering results are quite sensitive to the input parameters
– Parameters are often difficult to determine
 Able to deal with noise and outliers
 Insensitive to order of input records
 High dimensionality
– DW can contain several dimensions
 Incorporation of user-specified constraints
 Interpretability and usability
Example: Clustering Application
• Text/Document Clustering:
– Goal: To find groups of documents that are similar
to each other based on the important terms
appearing in them.
– Approach:
–To identify frequently occurring terms in each
document.
–Form a similarity measure based on the frequencies
of different terms and use it to cluster documents.
–Gain: Information Retrieval can utilize the clusters
to relate a new document or search term to clustered
documents.
Cont…
Applications of Cluster Analysis can be for
– Understanding
Group related documents for browsing,
group genes and proteins that have similar
functionality, or group stocks with similar
price fluctuations
– Summarization
Reduce the size of large data sets
What is not Cluster Analysis?
 Supervised classification
– Have class label information
 Simple segmentation
– Dividing students into different registration groups
alphabetically, by last name
 Results of a query
– Groupings are a result of an external specification
Types of Clusters
 Major types : Well-separated clusters and Center-based
clusters
 Well-Separated Clusters:
– A cluster is a set of points such that any point in a cluster is
closer (or more similar) to every other point in the cluster than
to any point not in the cluster.
3 well-separated clusters
Types of Clusters: Center-Based
 Center-based
– A cluster is a set of objects such that an object in a cluster is
closer (more similar) to the “center” of a cluster, than to the
center of any other cluster
– The center of a cluster is often a centroid, the average of all
the points in the cluster, or a medoid, the most “representative”
point of a cluster
4 center-based clusters
Type of data in clustering analysis
 Data types of variables are different
 The difference need proper distance computation logic for
cluster analysis
 Some of the types of data we have are:
– Interval-scaled variables
– Binary variables
– Nominal, and ordinal
– mixed types:
Interval-valued variables
 This are values of variables of an object which are characterized
by its continuous nature of the measurement such as height,
weight, age
 As the measurement unit affect cluster distance, we need
preprocessing that avoid the effect of unit of measurement
 This is called standardization
Binary Variables
 A binary variable is a variable which has only two possible values (1 or 0,
yes or no, etc)
– For example smoker, educated, Ethiopian, IsFemale etc
Nominal Variables
A generalization of the binary variable in that it can take more than 2 states,
e.g., red, yellow, blue, green
Ordinal Variables
 An ordinal variable can be discrete or continuous
 order is important, e.g., rank
Variables of Mixed Types
 A database may contain different types of variables
– symmetric binary, asymmetric binary,
nominal, ordinal, interval.
 One may use
– a weighted formula to combine their effects.
– Or preprocess the data so that it fits to the techniques
requirement
Major Clustering Approaches
 Partitioning clustering approach:
– Construct various partitions and then evaluate them by some criterion, e.g.,
minimizing the sum of square errors
– Typical methods:
distance-based: K-means clustering
model-based: expectation maximization (EM) clustering.
 Hierarchical clustering approach:
– Create a hierarchical decomposition of the set of data (or objects) using some
criterion
– Typical methods:
agglomerative Vs divisive
single link Vs complete link

More Related Content

What's hot

Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 ClassificationKhalid Elshafie
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)Shweta Ghate
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
6 classification
6 classification6 classification
6 classificationVishal Dutt
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Miningijsrd.com
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learningArunakumariAkula1
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in RBabu Priyavrat
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Sunil Nair
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
slides
slidesslides
slidesbutest
 
report.doc
report.docreport.doc
report.docbutest
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 

What's hot (20)

Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Ch06
Ch06Ch06
Ch06
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 Classification
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
6 classification
6 classification6 classification
6 classification
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
Classification
ClassificationClassification
Classification
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Clustering
ClusteringClustering
Clustering
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
slides
slidesslides
slides
 
report.doc
report.docreport.doc
report.doc
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 

Viewers also liked

Chapter 1 introduction to multimedia
Chapter 1 introduction to multimediaChapter 1 introduction to multimedia
Chapter 1 introduction to multimediaABDUmomo
 
Chapter 3 data representations
Chapter 3 data representationsChapter 3 data representations
Chapter 3 data representationsABDUmomo
 
Chapter 9 : INTERNET
Chapter 9 : INTERNETChapter 9 : INTERNET
Chapter 9 : INTERNETazira96
 
Chapter 2 multimedia authoring and tools
Chapter 2 multimedia authoring and toolsChapter 2 multimedia authoring and tools
Chapter 2 multimedia authoring and toolsABDUmomo
 
Chapter 02 multimedia systems hardware and software
Chapter 02   multimedia systems hardware and softwareChapter 02   multimedia systems hardware and software
Chapter 02 multimedia systems hardware and softwareUrvi Surat
 
Introduction To Multimedia
Introduction To MultimediaIntroduction To Multimedia
Introduction To MultimediaJomel Penalba
 
Chapter 5 : ANIMATION
Chapter 5 : ANIMATIONChapter 5 : ANIMATION
Chapter 5 : ANIMATIONazira96
 
Chapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIAChapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIAazira96
 
Chapter 3 : IMAGE
Chapter 3 : IMAGEChapter 3 : IMAGE
Chapter 3 : IMAGEazira96
 
Chapter 6 : VIDEO
Chapter 6 : VIDEOChapter 6 : VIDEO
Chapter 6 : VIDEOazira96
 
Chapter 2 : TEXT
Chapter 2 : TEXTChapter 2 : TEXT
Chapter 2 : TEXTazira96
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUNDazira96
 
Introduction to multimedia
Introduction to multimediaIntroduction to multimedia
Introduction to multimediaZurina Yasak
 
Chapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIAChapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIAazira96
 
Chapter 8 : MULTIMEDIA SKILLS
Chapter 8 : MULTIMEDIA SKILLSChapter 8 : MULTIMEDIA SKILLS
Chapter 8 : MULTIMEDIA SKILLSazira96
 

Viewers also liked (18)

Chapter 1 introduction to multimedia
Chapter 1 introduction to multimediaChapter 1 introduction to multimedia
Chapter 1 introduction to multimedia
 
Mmc manual
Mmc manualMmc manual
Mmc manual
 
Chapter 3 data representations
Chapter 3 data representationsChapter 3 data representations
Chapter 3 data representations
 
SE - System Models
SE - System ModelsSE - System Models
SE - System Models
 
Chapter 9 : INTERNET
Chapter 9 : INTERNETChapter 9 : INTERNET
Chapter 9 : INTERNET
 
Chapter 2 multimedia authoring and tools
Chapter 2 multimedia authoring and toolsChapter 2 multimedia authoring and tools
Chapter 2 multimedia authoring and tools
 
Chapter 02 multimedia systems hardware and software
Chapter 02   multimedia systems hardware and softwareChapter 02   multimedia systems hardware and software
Chapter 02 multimedia systems hardware and software
 
Multimedia ppt
Multimedia pptMultimedia ppt
Multimedia ppt
 
Introduction To Multimedia
Introduction To MultimediaIntroduction To Multimedia
Introduction To Multimedia
 
Chapter 5 : ANIMATION
Chapter 5 : ANIMATIONChapter 5 : ANIMATION
Chapter 5 : ANIMATION
 
Chapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIAChapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIA
 
Chapter 3 : IMAGE
Chapter 3 : IMAGEChapter 3 : IMAGE
Chapter 3 : IMAGE
 
Chapter 6 : VIDEO
Chapter 6 : VIDEOChapter 6 : VIDEO
Chapter 6 : VIDEO
 
Chapter 2 : TEXT
Chapter 2 : TEXTChapter 2 : TEXT
Chapter 2 : TEXT
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUND
 
Introduction to multimedia
Introduction to multimediaIntroduction to multimedia
Introduction to multimedia
 
Chapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIAChapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIA
 
Chapter 8 : MULTIMEDIA SKILLS
Chapter 8 : MULTIMEDIA SKILLSChapter 8 : MULTIMEDIA SKILLS
Chapter 8 : MULTIMEDIA SKILLS
 

Similar to Data mining chapter04and5-best

dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data miningAshikur Rahman
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyAnkita Dubey
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
BTech Pattern Recognition Notes
BTech Pattern Recognition NotesBTech Pattern Recognition Notes
BTech Pattern Recognition NotesAshutosh Agrahari
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision treesPadma Metta
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptxSandeepAgrawal84
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and ClusteringEng Teong Cheah
 
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicBuilding a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicIJDKP
 
Module three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesModule three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesNivaTripathy1
 

Similar to Data mining chapter04and5-best (20)

dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
 
Classification
ClassificationClassification
Classification
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita Dubey
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
BTech Pattern Recognition Notes
BTech Pattern Recognition NotesBTech Pattern Recognition Notes
BTech Pattern Recognition Notes
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
3 classification
3  classification3  classification
3 classification
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy LogicBuilding a Classifier Employing Prism Algorithm with Fuzzy Logic
Building a Classifier Employing Prism Algorithm with Fuzzy Logic
 
Module three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesModule three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rules
 

Recently uploaded

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Data mining chapter04and5-best

  • 1. Classification vs. Prediction Classification: – predicts categorical (discrete and unordered) class labels – classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction: – models continuous-valued functions to predicts unknown or missing values Chapter 4 -Classification Cont... Cont... Prediction is similar to classification – First, construct a model – Second, use model to predict unknown value Major method for prediction is regression – Linear and multiple regression – Non-linear regression Prediction is different from classification – Classification refers to predict categorical class label – Prediction models continuous-valued functions Recall- Data Mining Models and Tasks Classification  Classification process involves two steps 1. Model construction:  refers to describing a set of predetermined classes using training data set  The training data is a set of tuples where Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The model is represented as classification rules, decision trees, or mathematical formulae 2. Model usage:  Refers to using the model for classifying future or unknown objects  Or explaining some scenario with some accuracy after testing Classification—A Two-Step Process –cont...  Model construction: describing a set of predetermined classes – Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute – The set of tuples used for model construction: training set – The model is represented as classification rules, decision trees, or mathematical formulae  Model usage: for classifying future or unknown objects – Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur
  • 2. Classification Process (1): Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model) Classification Process (2): Use the Model in Prediction Classifier Testing Data NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Unseen Data (Jeff, Professor, 4) Tenured? Supervised vs. Unsupervised Learning  Supervised learning (classification) – Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations – New data is classified based on the training set  Unsupervised learning (clustering) – The class labels of training data is unknown – Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data – Usually classification follows after clustering Issues regarding classification and prediction: Data Preparation Data cleaning – Preprocess data in order to reduce noise and handle missing values Relevance analysis (feature selection) – Remove the irrelevant or redundant attributes Data transformation – Generalize and/or normalize data Issues regarding classification and prediction : Evaluating Classification Methods  Predictive accuracy – Measure how accurate is the classifier to predict object class label  Speed – This refers to the computational costs involved in generating and using the given classifier or predictor – time to construct the model – time to use the model  Scalability – This refers to the ability to construct the classifier or predictor efficiently given large amounts of data  Robustness – This is the ability of the classifier or predictor to make correct predictions given noisy data or data with missing values  Interpretability: – This refers to the level of understanding and insight that is provided by the classifier or predictor. Classification: Technical Definition  Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class.  Find a model for class attribute as a function of the values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
  • 3. Thus Classification  Classification is a data mining (machine learning) technique used to predict group membership for data instances.  Given a collection of records (training set), each record contains a set of attributes, one of the attributes is the class. – Find a model for class attribute as a function of the values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. – Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.  For example, one may use classification to predict whether the weather on a particular day will be “sunny”, “rainy” or “cloudy”. Illustrating Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set Classification methods  Goal: Predict class Ci = f(x1, x2, .. Xn)  There are various classification methods. Popular classification techniques include the following. – Decision tree classifier: divide decision space into piecewise constant regions. – Rule based – Association based classifier – K-Nearest Neighbour: classify based on similarity measurement – Neural networks: partition by non-linear boundaries – Bayesian network: a probabilistic model – Support vector machine: solves non-linearly separable problems Simple classification using decision tree Decision tree classifier  Decision tree performs classification by constructing a tree based on training instances with leaves having class labels. – The tree is traversed for each test instance to find a leaf, and the class of the leaf is the predicted class. This is a directed knowledge discovery in the sense that there is a specific field whose value we want to predict.  Widely used learning method. It has been applied to: – classify medical patients based on the disease, – equipment malfunction by cause, – loan applicant by likelihood of payment. – Accidents by severity Pros and Cons of decision trees Cons - Cannot handle complicated relationship between features - simple decision boundaries - problems with lots of missing data Pros • Reasonable training time • Fast application • Easy to interpret • Easy to implement • Can handle large number of features Why decision tree induction in data mining? • Relatively faster learning speed (than other classification methods) • Convertible to simple and easy to understand classification if-then-else rules • Comparable classification accuracy with other methods • Does not require any prior knowledge of data distribution, works well on noisy data.
  • 4. Chapter 5 - Cluster Analysis What is Cluster Analysis?  Cluster: a collection of data objects – Similar to one another within the same cluster – Dissimilar to the objects in other clusters  Cluster analysis – Grouping a set of data objects into clusters  Clustering is unsupervised classification: no predefined classes  Typical applications – As a stand-alone tool to get insight into data distribution – As a preprocessing step for other algorithms Clustering cont… Given a set of points, with a notion of distance between points, group the points into some number of clusters, so that members of a cluster are in some sense as close to each other as possible. While data points in the same cluster are similar, those in separate clusters are dissimilar to one another. x x x x x x x x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x x x • Clustering is a data mining (machine learning) technique that finds similarities between data according to the characteristics found in the data & groups similar data objects into one cluster Cont…  Thus Cluster Analysis – Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized Quality: What Is Good Clustering? • A good clustering method will produce high quality clusters with – high intra-class similarity – low inter-class similarity • The quality of a clustering result depends on both the similarity measure used by the method and its implementation – Key requirement of clustering: Need a good measure of similarity between instances. • The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns in the given datasets
  • 5. Requirements of Clustering in Data Mining  Scalability – Highly scalable algorithms are needed for clustering on large databases like DW  Ability to deal with different types of attributes – Clustering may be performed also on binary, categorical and ordinal data  Discovery of clusters with arbitrary shape – Most algorithms tend to find spherical clusters  Minimal requirements for domain knowledge to determine input parameters – Clustering results are quite sensitive to the input parameters – Parameters are often difficult to determine  Able to deal with noise and outliers  Insensitive to order of input records  High dimensionality – DW can contain several dimensions  Incorporation of user-specified constraints  Interpretability and usability Example: Clustering Application • Text/Document Clustering: – Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. – Approach: –To identify frequently occurring terms in each document. –Form a similarity measure based on the frequencies of different terms and use it to cluster documents. –Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents. Cont… Applications of Cluster Analysis can be for – Understanding Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations – Summarization Reduce the size of large data sets What is not Cluster Analysis?  Supervised classification – Have class label information  Simple segmentation – Dividing students into different registration groups alphabetically, by last name  Results of a query – Groupings are a result of an external specification Types of Clusters  Major types : Well-separated clusters and Center-based clusters  Well-Separated Clusters: – A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster. 3 well-separated clusters Types of Clusters: Center-Based  Center-based – A cluster is a set of objects such that an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any other cluster – The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most “representative” point of a cluster 4 center-based clusters
  • 6. Type of data in clustering analysis  Data types of variables are different  The difference need proper distance computation logic for cluster analysis  Some of the types of data we have are: – Interval-scaled variables – Binary variables – Nominal, and ordinal – mixed types: Interval-valued variables  This are values of variables of an object which are characterized by its continuous nature of the measurement such as height, weight, age  As the measurement unit affect cluster distance, we need preprocessing that avoid the effect of unit of measurement  This is called standardization Binary Variables  A binary variable is a variable which has only two possible values (1 or 0, yes or no, etc) – For example smoker, educated, Ethiopian, IsFemale etc Nominal Variables A generalization of the binary variable in that it can take more than 2 states, e.g., red, yellow, blue, green Ordinal Variables  An ordinal variable can be discrete or continuous  order is important, e.g., rank Variables of Mixed Types  A database may contain different types of variables – symmetric binary, asymmetric binary, nominal, ordinal, interval.  One may use – a weighted formula to combine their effects. – Or preprocess the data so that it fits to the techniques requirement Major Clustering Approaches  Partitioning clustering approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods: distance-based: K-means clustering model-based: expectation maximization (EM) clustering.  Hierarchical clustering approach: – Create a hierarchical decomposition of the set of data (or objects) using some criterion – Typical methods: agglomerative Vs divisive single link Vs complete link