SlideShare una empresa de Scribd logo
1 de 32
P a g e | 1 
Project Submitted By: 
Name VIDYA NAND JHA 
Stream IT, 3rd Year 
Section B 
Group II 
Class Roll 083 
University Roll 10900211092 
University Reg. NO 11090110208 
MOB 9748062585 
[Type text]
P a g e | 2 
ACKNOWLEDGEMENT 
I would like to express my special thanks of 
gratitude to Mr. CHANDAN BANERJEE, as well as 
our mentor who gave me the golden opportunity to 
do this wonderful seminar on the topic (Data 
Clustering).I came to know about so many new 
things. I am really thankful to them. 
Secondly i would also like to thank my friends who 
helped me a lot in finalizing this seminar within the 
limited time frame. 
.
P a g e | 3 
Table of contents 
ABSTRACT 04 
INTRODUCTION 05 
TYPES OF CLUSTERING 11 
APPLICATION 30 
CONCLUSION 31 
REFERENCES 32
P a g e | 4 
ABSTRACT 
We know that fast retrieval of relevant 
information from the database has been a 
significant issue. Different techniques has been 
developed for this purpose this one of them is 
Data Clustering. 
In this paper data clustering is discussed along 
with different clustering method and analysis, 
Like k-means, Hierarchical, Model-based and 
Density-based clustering. 
We also discussed application of data clustering 
like Data Mining.
P a g e | 5 
INTRODUCTION 
Cluster: Cluster is just an ordered list of object 
with similar characteristics. A cluster is 
comprised of a number of similar objects 
collected or grouped together. Everitt (1974) 
documents some of the following definition of 
cluster. 
1. A cluster is a set of entities which are alike 
and entities from different clusters are not 
alike. 
2. A cluster is an aggregation of point in the 
test space such that the distance between two 
points in the cluster is less than the distance 
between point in the cluster and any point not 
in it.
P a g e | 6 
Data Clustering: The process of grouping a set 
of physical or abstract objects into classes of 
similar objects is called clustering. 
We can also say that data clustering is method 
in which we make cluster of similar object that 
are similar in characteristics 
Some example of clustering 
1. Clustering images based on their perceptual 
similarities. 
2. Clustering webpage based on their content. 
3. Clustering web search results. 
4. Clustering user in social network site based 
on their properties.
P a g e | 7 
5. Image segmentation 
Clustering pixel 
Choice of similarities measure is very important 
for clustering.Similarity is inversely related to 
distance. Different way exits to measure 
distances .
P a g e | 8 
Clustering of objects 
In the above figure we may calculate distance 
through Euclidean method. 
That is Euclidean distance :
P a g e | 9 
Kernelized(non-linear) distance : 
Collection of objects. 
In above figure Kernelized(non-linear) distance 
seems more reasonable.
P a g e | 10 
WHAT IS GOOD CLUSTERING? 
A good clustering is one that achieves 
1) High within-cluster similarity 
2) Low inter-cluster similarity
P a g e | 11 
2. TYPES OF CLUSTERING METHODS 
There are many clustering methods 
available, and each of them may give a 
different grouping of a dataset. The 
choice of a particular method will 
depend on the type of output desired. 
The known performance of method with 
particular types of data, the hardware 
and software facilities available and the 
size of the dataset. In general, clustering 
methods may be divided into two 
categories based on the cluster structure 
which they produce. The non-hierarchical 
methods divide a dataset of 
N objects into M clusters, with or 
without overlap.
P a g e | 12 
2.1 PARTITIONING 
What is K-Means Clustering? 
Simply speaking it is an algorithm to classify or 
to group your objects based on 
attributes/features into K number of group. K is 
positive integer number. The grouping is done 
by minimizing the sum of squares of distances 
between data and the corresponding cluster 
centroid. Thus, the purpose of K-mean 
clustering is to classify the data, 
Example: Suppose we have 4 objects as your 
training data point and each object have 2 
attributes. Each attribute represents coordinate 
of the object 
.
P a g e | 13 
Object Feature 1 (X): 
weight index 
Feature 2 (Y): 
pH 
Medicine A 1 1 
Medicine B 2 1 
Medicine C 4 3 
Medicine D 5 4 
Thus, we also know before hand that these 
objects belong to two groups of medicine 
(cluster 1 and cluster 2). 
The problem now is to determine which 
medicines belong to cluster 1 and which 
medicines belong to the other cluster. Each 
medicine represents one point with two 
components coordinate.
P a g e | 14 
Numerical Example (manual calculation) 
The basic step of k-means clustering is simple. 
In the beginning, we determine number of 
cluster K and we assume the centroid or center 
of these clusters. We can take any random 
objects as the initial centroids or the first K 
objects can also serve as the initial centroids. 
Then the K means algorithm will do the three 
steps below until convergence.
P a g e | 15 
Suppose we have several objects (4 types of 
medicines) and each object have two attributes 
or features as
P a g e | 16 
shown in table below. Our goal is to group 
these objects into K=2 group of medicine based 
on the two 
features (pH and weight index). 
Objects Feature 1 (X): 
weight index 
Feature 2 (Y): 
pH 
Medicine A 1 1 
Medicine B 2 1 
Medicine C 4 3 
Medicine D 5 4 
Each medicine represents one point with two 
features (X, Y) that we can represent it as 
coordinate in a feature space as shown in the 
figure .
P a g e | 17 
1.Initial value of centroids: Suppose we use 
medicine A and medicine B as the first 
centroids. Let 1 c and 2 c denote the coordinate 
of the centroids, then c1 = (1,1) and 2 c = (2,1). 
2. Objects-Centroids distance: we calculate the 
distance between cluster centroid to each 
object. Let ususe Euclidean distance, then we 
have distance matrix at iteration 0 is
P a g e | 18 
c1= (1, 1), group-1 c2= (1, 
1), group-2 
Each column in the distance matrix symbolizes 
the object. The first row of the distance matrix 
Corresponds to the distance of each object to the 
first centroid and the second row is the distance 
of 
each object to the second centroid. For example, 
distance from medicine C = (4, 3) to the first 
centroid 1 c 1 = (1, 1) is =3.61, and its 
distance to the second centroid c 2= (2, 1) is 
=2.83 etc. 
3. Objects clustering: We assign each object 
based on the minimum distance. Thus, medicine 
A is
P a g e | 19 
assigned to group 1, medicine B to group 2, 
medicine C to group 2 and medicine D to group 
2. The element of Group matrix below is 1 if 
and only if the object is assigned to that group 
4. Iteration-1, determine centroids: Knowing the 
members of each group, now we compute the 
new centroid of each group based on these new 
memberships. Group 1 only has one member 
thus the centroid remains in c1 = (1, 1). 
Group 2 now has three members, thus the 
centroid is the average 
Co-ordinate among the 
three members:
P a g e | 20 
. 
This is figure shows that c1 = (1, 1) and
P a g e | 21 
5.Iteration-1, Objects-Centroids distances: The 
next step is to compute the distance of all 
objects to 
the new centroids. Similar to step 2, we have 
distance matrix at iteration 1 is 
6. Iteration-1, Objects clustering: Similar to step 
3, we assign each object based on the minimum
P a g e | 22 
distance. Based on the new distance matrix, we 
move the medicine B to Group 1 while all the 
other objects remain.The Group matrix is show 
7. Iteration 2, determine centroids: Now we 
repeat step 4 to calculate the new centroids 
coordinate based on the clustering of previous 
iteration. Group1 and group 2 both has two 
members, thus the new centroids are 
.
P a g e | 23 
8. Iteration-2, Objects-Centroids distances: 
Repeat step 2 again, we have new distance 
matrix at 
iteration 2 as
P a g e | 24 
9. Iteration-2, Objects clustering: Again, we 
assign each object based on the minimum 
distance 
We obtain result thatG2 = G1 . Comparing the 
grouping of last iteration and this iteration 
reveals that the objects does not move group 
anymore. Thus, the computation of the k-mean 
clustering has reached its stability and no more
P a g e | 25 
iteration is needed. We get the final grouping 
as the results 
Object Feature 1 (X): 
weight 
index 
Feature2(y) 
PH 
Group 
result 
Medicine A 1 1 1 
Medicine B 2 1 1 
Medicine C 4 3 2 
Medicine D 5 4 2 
If the number of data is less than the number of 
cluster then we assign each data as the 
centroid of the cluster. Each centroid will have 
a cluster number. 
2.2HIERARCHICAL : 
Hierarchical clustering is an agglomerative (top 
down) clustering method. As its name suggests, 
the idea of this method is to build a hierarchy
P a g e | 26 
of clusters, showing relations between the 
individual members and merging clusters of 
data based on similarity. 
In the first step of clustering, the algorithm will 
look for the two most similar data points and 
merge them to create a new "pseudo-datapoint", 
which represents the average of the 
two merged datapoints. Each iterative step 
takes the next two closest datapoints (or 
pseduo-datapoints) and merges them. This 
proces is generally continued until there is one 
large cluster containing all the original 
datapoints. 
An example of agglomerative clustering using 
colours. 
The example below demonstrates the 
agglomerative clustering using colours. In the
P a g e | 27 
first stages, the red and orange are combined 
to make reddish-orange. Green and blue are 
similar to each other and are merged to form 
aqua. Next, the reddish-orange and yellow are 
the closest colours, and create a light orange. 
Now we have three remaining colours: light 
orange, the original purple and aqua. The 
hierarchical clustering algorithm is only 
interested in finding the closest relationship at 
each stage regardless of the degree of 
similarity. So in this case, the light orange and 
purple are more similar than the any other 
combination. 
We are now left with the orange-purple 
mixture and aqua. They are now the most 
similar samples left and
P a g e | 28 
so form a final cluster. The resulting tree shows 
the overall similarity of the original samples . 
.
P a g e | 29 
3. APPLICATION 
Data clustering has immense number of 
applications in every field of life. One has to 
cluster a lot of thing on the basis of similarity 
either consciously or unconsciously. So the 
history of data clustering is old as the history of 
mankind. 
In computer field also, use of data clustering 
has its own value. Specially in the field of 
information retrieval data clustering plays an 
important role. Some of the applications are 
listed below.
P a g e | 30 
4. CONCLUSION 
This paper, we try to give the basic concept of 
clustering by first providing the definition and 
clustering and then the definition of some 
related terms. We give some examples to 
elaborate the concept. Then we give different 
approaches to data clustering and also 
discussed some algorithms to implement that 
approaches. The partitioning method and 
hierarchical method of clustering were 
explained. The applications of clustering are 
also discussed with the examples of medical 
images database, data mining using data 
clustering and finally the case study of windows 
NT. So we try to prove the importance of 
clustering in every area of computer science. 
We also try to prove that clustering is not 
something really typical to databases but it has 
its applications in the fields like networking.
P a g e | 31 
5. REFERENCES 
http://members.tripod.com/asim_saeed/paper 
.htm 
http://www-users. 
cs.umn.edu/~kumar/dmbook/dmslides/c 
hap8_basic_cluster_analysis.pdf. 
http://people.revoledu.com/kardi/tutorial/kMe 
an/index.html
P a g e | 32

Más contenido relacionado

La actualidad más candente

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingIOSR Journals
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering108kaushik
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using ClusteringDessy Amirudin
 
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGA COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clusteringtim_hare
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringDr Nisha Arora
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 

La actualidad más candente (20)

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
K means report
K means reportK means report
K means report
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using Clustering
 
Ip3514921495
Ip3514921495Ip3514921495
Ip3514921495
 
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGA COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Clustering
ClusteringClustering
Clustering
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
47 292-298
47 292-29847 292-298
47 292-298
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

Destacado (14)

Geo Siberia-2016
Geo Siberia-2016Geo Siberia-2016
Geo Siberia-2016
 
Prestige high fields
Prestige high fieldsPrestige high fields
Prestige high fields
 
Prestige song of the south
Prestige song of the south Prestige song of the south
Prestige song of the south
 
Adriana Servín Loyola...biografia
Adriana Servín Loyola...biografiaAdriana Servín Loyola...biografia
Adriana Servín Loyola...biografia
 
Prestige sunrise park
Prestige sunrise parkPrestige sunrise park
Prestige sunrise park
 
Prestige augusta golf village
Prestige augusta golf village Prestige augusta golf village
Prestige augusta golf village
 
Prestige Lakeside Habitat
Prestige Lakeside HabitatPrestige Lakeside Habitat
Prestige Lakeside Habitat
 
Prestige noth point
Prestige noth point Prestige noth point
Prestige noth point
 
R A Hills - Personal Profile
R A Hills - Personal ProfileR A Hills - Personal Profile
R A Hills - Personal Profile
 
Estrucctura del proyecto
Estrucctura del proyectoEstrucctura del proyecto
Estrucctura del proyecto
 
Prestige royal gardens
Prestige royal gardens Prestige royal gardens
Prestige royal gardens
 
Gestão do Conhecimento
Gestão do Conhecimento Gestão do Conhecimento
Gestão do Conhecimento
 
cephalosporin
cephalosporincephalosporin
cephalosporin
 
Dev Ops on AWS with PowerShell (PowerShell Conference Asia 2016)
Dev Ops on AWS with PowerShell (PowerShell Conference Asia 2016)Dev Ops on AWS with PowerShell (PowerShell Conference Asia 2016)
Dev Ops on AWS with PowerShell (PowerShell Conference Asia 2016)
 

Similar a Data Clusterng

AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxSyed Ejaz
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
k-mean-clustering.ppt
k-mean-clustering.pptk-mean-clustering.ppt
k-mean-clustering.pptRanimeLoutar
 
k-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSSk-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSSMarkNaguibElAbd
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clusteringPVP College
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern  RecognitionA Comprehensive Overview of Clustering Algorithms in Pattern  Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern RecognitionIOSR Journals
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsVoidVampire
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptSyedNahin1
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
L4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseL4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseMohaiminur Rahman
 
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 

Similar a Data Clusterng (20)

AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
k-mean-clustering.ppt
k-mean-clustering.pptk-mean-clustering.ppt
k-mean-clustering.ppt
 
k-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSSk-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSS
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern  RecognitionA Comprehensive Overview of Clustering Algorithms in Pattern  Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.ppt
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
L4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseL4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics Course
 
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 

Último

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 

Último (20)

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 

Data Clusterng

  • 1. P a g e | 1 Project Submitted By: Name VIDYA NAND JHA Stream IT, 3rd Year Section B Group II Class Roll 083 University Roll 10900211092 University Reg. NO 11090110208 MOB 9748062585 [Type text]
  • 2. P a g e | 2 ACKNOWLEDGEMENT I would like to express my special thanks of gratitude to Mr. CHANDAN BANERJEE, as well as our mentor who gave me the golden opportunity to do this wonderful seminar on the topic (Data Clustering).I came to know about so many new things. I am really thankful to them. Secondly i would also like to thank my friends who helped me a lot in finalizing this seminar within the limited time frame. .
  • 3. P a g e | 3 Table of contents ABSTRACT 04 INTRODUCTION 05 TYPES OF CLUSTERING 11 APPLICATION 30 CONCLUSION 31 REFERENCES 32
  • 4. P a g e | 4 ABSTRACT We know that fast retrieval of relevant information from the database has been a significant issue. Different techniques has been developed for this purpose this one of them is Data Clustering. In this paper data clustering is discussed along with different clustering method and analysis, Like k-means, Hierarchical, Model-based and Density-based clustering. We also discussed application of data clustering like Data Mining.
  • 5. P a g e | 5 INTRODUCTION Cluster: Cluster is just an ordered list of object with similar characteristics. A cluster is comprised of a number of similar objects collected or grouped together. Everitt (1974) documents some of the following definition of cluster. 1. A cluster is a set of entities which are alike and entities from different clusters are not alike. 2. A cluster is an aggregation of point in the test space such that the distance between two points in the cluster is less than the distance between point in the cluster and any point not in it.
  • 6. P a g e | 6 Data Clustering: The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering. We can also say that data clustering is method in which we make cluster of similar object that are similar in characteristics Some example of clustering 1. Clustering images based on their perceptual similarities. 2. Clustering webpage based on their content. 3. Clustering web search results. 4. Clustering user in social network site based on their properties.
  • 7. P a g e | 7 5. Image segmentation Clustering pixel Choice of similarities measure is very important for clustering.Similarity is inversely related to distance. Different way exits to measure distances .
  • 8. P a g e | 8 Clustering of objects In the above figure we may calculate distance through Euclidean method. That is Euclidean distance :
  • 9. P a g e | 9 Kernelized(non-linear) distance : Collection of objects. In above figure Kernelized(non-linear) distance seems more reasonable.
  • 10. P a g e | 10 WHAT IS GOOD CLUSTERING? A good clustering is one that achieves 1) High within-cluster similarity 2) Low inter-cluster similarity
  • 11. P a g e | 11 2. TYPES OF CLUSTERING METHODS There are many clustering methods available, and each of them may give a different grouping of a dataset. The choice of a particular method will depend on the type of output desired. The known performance of method with particular types of data, the hardware and software facilities available and the size of the dataset. In general, clustering methods may be divided into two categories based on the cluster structure which they produce. The non-hierarchical methods divide a dataset of N objects into M clusters, with or without overlap.
  • 12. P a g e | 12 2.1 PARTITIONING What is K-Means Clustering? Simply speaking it is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus, the purpose of K-mean clustering is to classify the data, Example: Suppose we have 4 objects as your training data point and each object have 2 attributes. Each attribute represents coordinate of the object .
  • 13. P a g e | 13 Object Feature 1 (X): weight index Feature 2 (Y): pH Medicine A 1 1 Medicine B 2 1 Medicine C 4 3 Medicine D 5 4 Thus, we also know before hand that these objects belong to two groups of medicine (cluster 1 and cluster 2). The problem now is to determine which medicines belong to cluster 1 and which medicines belong to the other cluster. Each medicine represents one point with two components coordinate.
  • 14. P a g e | 14 Numerical Example (manual calculation) The basic step of k-means clustering is simple. In the beginning, we determine number of cluster K and we assume the centroid or center of these clusters. We can take any random objects as the initial centroids or the first K objects can also serve as the initial centroids. Then the K means algorithm will do the three steps below until convergence.
  • 15. P a g e | 15 Suppose we have several objects (4 types of medicines) and each object have two attributes or features as
  • 16. P a g e | 16 shown in table below. Our goal is to group these objects into K=2 group of medicine based on the two features (pH and weight index). Objects Feature 1 (X): weight index Feature 2 (Y): pH Medicine A 1 1 Medicine B 2 1 Medicine C 4 3 Medicine D 5 4 Each medicine represents one point with two features (X, Y) that we can represent it as coordinate in a feature space as shown in the figure .
  • 17. P a g e | 17 1.Initial value of centroids: Suppose we use medicine A and medicine B as the first centroids. Let 1 c and 2 c denote the coordinate of the centroids, then c1 = (1,1) and 2 c = (2,1). 2. Objects-Centroids distance: we calculate the distance between cluster centroid to each object. Let ususe Euclidean distance, then we have distance matrix at iteration 0 is
  • 18. P a g e | 18 c1= (1, 1), group-1 c2= (1, 1), group-2 Each column in the distance matrix symbolizes the object. The first row of the distance matrix Corresponds to the distance of each object to the first centroid and the second row is the distance of each object to the second centroid. For example, distance from medicine C = (4, 3) to the first centroid 1 c 1 = (1, 1) is =3.61, and its distance to the second centroid c 2= (2, 1) is =2.83 etc. 3. Objects clustering: We assign each object based on the minimum distance. Thus, medicine A is
  • 19. P a g e | 19 assigned to group 1, medicine B to group 2, medicine C to group 2 and medicine D to group 2. The element of Group matrix below is 1 if and only if the object is assigned to that group 4. Iteration-1, determine centroids: Knowing the members of each group, now we compute the new centroid of each group based on these new memberships. Group 1 only has one member thus the centroid remains in c1 = (1, 1). Group 2 now has three members, thus the centroid is the average Co-ordinate among the three members:
  • 20. P a g e | 20 . This is figure shows that c1 = (1, 1) and
  • 21. P a g e | 21 5.Iteration-1, Objects-Centroids distances: The next step is to compute the distance of all objects to the new centroids. Similar to step 2, we have distance matrix at iteration 1 is 6. Iteration-1, Objects clustering: Similar to step 3, we assign each object based on the minimum
  • 22. P a g e | 22 distance. Based on the new distance matrix, we move the medicine B to Group 1 while all the other objects remain.The Group matrix is show 7. Iteration 2, determine centroids: Now we repeat step 4 to calculate the new centroids coordinate based on the clustering of previous iteration. Group1 and group 2 both has two members, thus the new centroids are .
  • 23. P a g e | 23 8. Iteration-2, Objects-Centroids distances: Repeat step 2 again, we have new distance matrix at iteration 2 as
  • 24. P a g e | 24 9. Iteration-2, Objects clustering: Again, we assign each object based on the minimum distance We obtain result thatG2 = G1 . Comparing the grouping of last iteration and this iteration reveals that the objects does not move group anymore. Thus, the computation of the k-mean clustering has reached its stability and no more
  • 25. P a g e | 25 iteration is needed. We get the final grouping as the results Object Feature 1 (X): weight index Feature2(y) PH Group result Medicine A 1 1 1 Medicine B 2 1 1 Medicine C 4 3 2 Medicine D 5 4 2 If the number of data is less than the number of cluster then we assign each data as the centroid of the cluster. Each centroid will have a cluster number. 2.2HIERARCHICAL : Hierarchical clustering is an agglomerative (top down) clustering method. As its name suggests, the idea of this method is to build a hierarchy
  • 26. P a g e | 26 of clusters, showing relations between the individual members and merging clusters of data based on similarity. In the first step of clustering, the algorithm will look for the two most similar data points and merge them to create a new "pseudo-datapoint", which represents the average of the two merged datapoints. Each iterative step takes the next two closest datapoints (or pseduo-datapoints) and merges them. This proces is generally continued until there is one large cluster containing all the original datapoints. An example of agglomerative clustering using colours. The example below demonstrates the agglomerative clustering using colours. In the
  • 27. P a g e | 27 first stages, the red and orange are combined to make reddish-orange. Green and blue are similar to each other and are merged to form aqua. Next, the reddish-orange and yellow are the closest colours, and create a light orange. Now we have three remaining colours: light orange, the original purple and aqua. The hierarchical clustering algorithm is only interested in finding the closest relationship at each stage regardless of the degree of similarity. So in this case, the light orange and purple are more similar than the any other combination. We are now left with the orange-purple mixture and aqua. They are now the most similar samples left and
  • 28. P a g e | 28 so form a final cluster. The resulting tree shows the overall similarity of the original samples . .
  • 29. P a g e | 29 3. APPLICATION Data clustering has immense number of applications in every field of life. One has to cluster a lot of thing on the basis of similarity either consciously or unconsciously. So the history of data clustering is old as the history of mankind. In computer field also, use of data clustering has its own value. Specially in the field of information retrieval data clustering plays an important role. Some of the applications are listed below.
  • 30. P a g e | 30 4. CONCLUSION This paper, we try to give the basic concept of clustering by first providing the definition and clustering and then the definition of some related terms. We give some examples to elaborate the concept. Then we give different approaches to data clustering and also discussed some algorithms to implement that approaches. The partitioning method and hierarchical method of clustering were explained. The applications of clustering are also discussed with the examples of medical images database, data mining using data clustering and finally the case study of windows NT. So we try to prove the importance of clustering in every area of computer science. We also try to prove that clustering is not something really typical to databases but it has its applications in the fields like networking.
  • 31. P a g e | 31 5. REFERENCES http://members.tripod.com/asim_saeed/paper .htm http://www-users. cs.umn.edu/~kumar/dmbook/dmslides/c hap8_basic_cluster_analysis.pdf. http://people.revoledu.com/kardi/tutorial/kMe an/index.html
  • 32. P a g e | 32