SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011



                An Iterative Improved k-means Clustering
                           Madhuri A. Dalal1 Nareshkumar D. Harale 2 Umesh L.Kulkarni                        3

   1
      Student of Computer , M.G.M’s college of Engineering, Kamothe, Navi Mumbai, India,madhuri_dalal25@yahoo.co.in
  2
      Faculty of computer, M.G.M’s college of Engineering, Kamothe, Navi Mumbai, India,nareshkumar.d.harale@gmail.com
            3
              Faculty of Information, Konkan Gyanpeeth College of Engineering,Karjat,India,kumeshl@rediffmail.com

Abstract: Clustering is a data mining (machine learning),               fundamental modes of understanding and learning. Cluster
unsupervised learning technique used to place data elements             analysis is the study of algorithms for grouping or classifying
into related groups without advance knowledge of the group              objects [8]. So a cluster is comprised of number of similar
definitions. One of the most popular and widely studied
                                                                        objects collected or grouped together[5][7].
clustering methods that minimize the clustering error for
points in Euclidean space is called K-means clustering.
                                                                        There are two goals of clustering algorithms:
However, the k-means method converges to one of many local                        (1) Determining good clusters and
minima, and it is known that the final results depend on the                      (2) Doing so efficiently.
initial starting points (means). In this research paper, we have        Clustering is particularly applied when there is a need to
introduced and tested an improved algorithm to start the k-             partition the instances into natural groups, but predicting
means with good starting points (means). The good initial               the class of objects is almost impossible. There are a large
starting points allow k-means to converge to a better local             number of approaches to the clustering problem, including
minimum; also the numbers of iteration over the full dataset            optimization based models that employ mathematical
are being decreased. Experimental results show that initial
                                                                        programming for developing efficient and meaningful
starting points lead to good solution reducing the number of
iterations to form a cluster.
                                                                        clustering schemes. It has been widely emphasized that
                                                                        clustering and optimization may help each other in several
Keywords: data mining , clustering, k-means clustering,                 aspects, leading to better methods and algorithms with
clustering algorithms.                                                  increased accuracy and efficiency. Exact and heuristic
                                                                        mathematical programming based clustering algorithms have
                     I . INTRODUCTION                                   been proposed in recent years. However, most of these
                                                                        algorithms suffer from scalability as the size and the dimension
    Clustering is a data mining technique that separates your
                                                                        of the data set increases.. Another important discussion in
data into groups whose members belong together. This is
                                                                        clustering is the definition of best partitioning of a data set,
similar to assigning animals and plants into families where
                                                                        which is difficult to predict since it is a relative and subjective
the members are alike. Clustering does not require a prior
                                                                        topic. Different models may result in different solutions
knowledge of the groups that are formed and the members
                                                                        subject to the selected clustering criteria and the developed
who must belong to it [17, 5]. Representing the data by fewer
                                                                        clustering model [4][6][13]. For cluster analysis to work
clusters necessarily loses certain fine details, but achieves
                                                                        efficiently and effectively, as
simplification. It models data by its clusters. Data modeling
                                                                        many literatures have presented, there are following typical
puts clustering in a historical perspective rooted in
                                                                        requirements of clustering in data mining:
mathematics, statistics, and numerical analysis. From a
                                                                        1. Scalability:
machine learning perspective clusters correspond to hidden
                                                                        That is to say an efficient and effective clustering method
patterns, the search for clusters is unsupervised learning,
                                                                        should not only be able to work well on small data sets, but
and the resulting system represents a data concept. From a
                                                                        also a large database containing about millions of objects.
practical perspective, clustering plays an outstanding role in
                                                                        2. Ability to deal with different types of attributes:
data mining applications such as scientific data exploration,
information retrieval and text mining, spatial database                 An efficient and effective clustering method is required
applications, Web analysis, CRM, marketing, medical                     to cluster various types of data, not only numerical, but
diagnostics, computational biology, and many others[1,17].              also binary, categorical, and ordinal data, or mixtures of
Data clustering is under vigorous development and is applied            these data types.
to many application areas including business, biology,                  3. Discovery of clusters with arbitrary shape:
medicine, chemistry, data mining and knowledge discovery                A cluster could be of any shape. It is important to develop
[4], [7], data compression and vector quantization [5], pattern         algorithms that can detect clusters of arbitrary shape.
recognition and pattern classification [8], neural networks,            4. Minimal requirements for domain knowledge to determine
artificial intelligence, and statistics.etc. Owing to the huge          input parameters:
amounts of data collected in databases, cluster analysis has            Some algorithms require users to input certain parameters in
recently become a highly active topic in data mining research           cluster analysis. But the parameters are hard to determine.
[2].The research has focused on finding efficient and effective         5· Ability to deal with noisy data:
cluster analysis in large databases. Classifying objects                A good clustering algorithm is required to be independent of
according to similarities is the base for much of science.              the influence of noise.
Organizing objects into sensible grouping is one of the most            6· Insensitivity to the order of input records:
                                                                   45
© 2011 ACEEE
DOI: 01.IJNS.02.03.183
ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011

It is important to develop algorithms that are insensitive to            efficiency is usually accomplished by either reducing the
the order of input.                                                      number of iterations to reach final convergence or reducing
7· High dimensionality:                                                  the total number of distance calculations. The k-means
The ability to handle high-dimensional data is important for a           algorithm randomly selects k initial cluster centers from the
good algorithm.                                                          original dataset. Then, the algorithm will converge to the
Several clustering algorithms have been proposed. These                  actual cluster centers after several iterations. Therefore,
algorithms can be broadly classified into hierarchical and               choosing a good set of initial cluster centers is very important
partitioning clustering algorithms [8]. Hierarchical algorithms          for the algorithm. However, it is difficult to select a good set
decompose a database D of n objects into several levels of               of initial cluster centers randomly.
nested partitioning (clustering), represented by a dendrogram            B. Iterative Improved k-means Clustering:
(tree). There are two types of hierarchical algorithms; an
agglomerative that builds the tree from the leaf nodes up,                   In this section we describe our algorithm that produces good
whereas a divisive builds the tree from the top down.                    starting points for the k-means algorithm instead of selecting
Partitioning algorithms construct a single partition of a                them randomly. And this will leads to better clusters at the final
database D of n objects into a set of k clusters, such that the          result than that of the original k-means. In the model in this
objects in a cluster are more similar to each other than to              study, it is assumed that the number of desired clusters k is
objects in different clusters. The k-means clustering algorithm          known a priori since the determination of the number of
is the most commonly used partitioned algorithm [8][12]                  clusters constitutes another subject of research in the
                                                                         clustering literature. However, typically k will be small. The
because it can be easily implemented, speed convergence to
                                                                         goal of the model is to find the optimal partitioning of the
local minimum. However this local minimum depends on the
                                                                         data set into K exclusive clusters given a data set of n data
initial starting means. In this paper, we introduce an efficient
                                                                         items in m dimensions, i.e. a set of n points in Rm. The
improved method to obtain good initial starting means, so
                                                                         parameter d ij denotes the distance between two data points
the final result will be better than that of randomly selected
                                                                         i and j in Rm and can be calculated by any desired norm on Rm
initial starting means. How to get good initial starting means
                                                                         such as the Euclidean distance which is is the straight-line
becomes an important operational objective [1][14]. This paper
                                                                         distance with two points. The Euclidean distance between
is organized as follows: Section 2 introduces K-means
                                                                         point’s p and q is the length of the line segment. In Cartesian
clustering and our proposed improved iterative k-means
                                                                         coordinates, if p = (p1, p2,..., pn) and q = (q1, q2... qn) are two
clustering method. Section 3 presents experimental results
                                                                         points in Euclidean n-space, then the distance from p to q is
comparing both algorithms. Section 4 concludes the paper.
                                                                         given by:
                 II . K-MEANS CLUSTERING
A. k-means Clustering:
                                                                         Consider a data set D = {d(j) = (d1(j) , d2(j),…….. dm(j) | j =
    There are many algorithms for clustering datasets. The k-            1……….n } in Rm and K be predefined number of clusters.
means clustering is the most popular method used to divide n             Bellow is the outline of a precise cluster centers initialization
patterns {x1, …, xn} in d dimensional space into k clusters [8].         method. K
The result is a set of k centers, each of which is located at the        Step1            : Dividing D into K parts as D =U S k , Sk1 ) Sk2 =
centroid of the partitioned dataset. This algorithm can be                                                                    k=1
summarized in the following steps:                                       according to data patterns.
1. Initialization: Select a set of k starting points {mj}, j = 1,        Step2 : Calculate new ck centers as the optimal solution of
2… k. The selection may be done in random manner or                                minz= || x – d(j) ||      …………………………(1)
according to some heuristic.                                                                    d(j) ε Sk
2. Distance calculation: For each pattern xi, 1<=i<=n, compute                   x = (x1. . . xm) ε Rm where || • || denotes the 2-norm.
its Euclidean distance to each cluster centroid mj,1<=j<=k ,             Step 3 : Decide membership of the patterns in each one of the
and then find the closest cluster centroid mj and assign the             K-clusters according to the minimum distance from cluster
object xi to it.                                                         center criteria.
3. Centroid recalculation: For each cluster j, 1<=j<=k,                  Step 4 : Repeat steps 2 and 3 till there is no change in
recomputed cluster centroid mj as the average of the data                cluster centers.
points assigned to it.                                                   The step 1 is rather flexible. We can accomplish it according
4. Convergence condition: Repeat steps 2 and 3 until                     to the visual inspection or any other methods. Normally this
convergence.                                                             heuristic partition is better than random sampling. Also we
To choose a proper number of clusters k is a domain                      can use sub-algorithm below to accomplish step 1 for any
dependent problem. To resolve this, some researchers have                data patterns.
proposed methods to perform k-clustering for various
numbers of clusters and employ certain criteria for selecting            C. Sub-algorithm:
the most suitable value of k [15] and [16]. Several variants of              1) Compute the dmin and dmax by
the k-means algorithm have been proposed. Their purpose is                      dmin = min || d(j) ||
to improve efficiency or find better clusters. Improved
                                                                    46
© 2011 ACEEE
DOI: 01.IJNS.02.03.183
ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011


       dmax = max || d(j) ||                                          improved k-means algorithm and to compare it with the
                                                                      original K-means algorithm of randomly choosing initial
1) For k = 0, 1, . . . , K “1, calculate new ck centers as:           starting points, we first solve a problem in detail by original
ck = dmin + (( k/K )+ 1/ (2* K)) (dmax – dmin) …..(2)                 and improved k-means algorithm, with the same dataset
Step 2 in sub algorithm has been modified for one-dimensional         separately. The cardinality of data set is given by column
dataset as follows :                                                  labeled N in table 1. Total number of iterations, required for
                                                                      entire solution of each dataset is displayed in two columns
      ck = dmin+ ((k/2*k + (5.5/(4*k)) (dmax – dmin)……….(3)           labeled k-means and improved k-means under iterations. The
Theoretically, it provides us an ideal cluster center.                following table shows the number of iterations taken by k-
Nevertheless, the process of finding the optimal solution to          means and improved k-means for k= 3 considering 1-D dataset.
problem (3) is too expensive if the problem is computed by
                                                                            TABLE II: C OMPARISON OF K-MEANS AND IMPROVED K-MEANS FOR
normal optimization algorithm .Fortunately we can establish
a simple recursive method to handle problem (1). We propose
a novel iterative method which is proved to be with high
efficiency by our simulation results.
We obtain the iterative formula

                                                                      Results for k-means and improved k-means can be plotted
                                                                      graphically for all datasets as shown below.
           Where qjk = || x(k) - d(j) ||
The above iteration process requires an initial point as its
input. We can simply use the average of the coordinates of
the data points.



              k= 1,2 ,3 ………………..K
D. Iterative improved k-means clustering Algorithm:                    Fig. 1: Iteration comparison between k- means and improved k-
                                                                                                means for k=3
    1. Dividing D into K parts as D = U S k , Sk1 ) Sk2 =
               . k=1 according to data patterns.(call sub             B. Multi dimensional Dataset:
algorithm)
2. Decide membership of the patterns in each one of the K-                Table III: Comparison of k-means and improved k-means
                                                                                                       for k=3
clusters according to the minimum distance from cluster center
criteria
3. Calculate new centers by iterative formula (5).
4. Repeat steps 3 and 4 till there is no change in cluster
centers.
                                                                      Results for k-means and improved k-means can be plotted
              III. EXPERIMENTAL RESULTS                               graphically for all datasets as shown below.
    We have evaluated our proposed algorithm on Fisher’s
iris datasets, Pima Indian medical diabetes dataset, and soya
bean plant dataset considering one-dimensional data as well
as multi-dimensional data. We compared our results with that
of the original k-means algorithm in terms of the number of
iterations for both algorithms. We give a brief description of
the datasets used in our algorithm evaluation. Table 1 shows
some characteristics of these datasets [9][10][11].
                                                                        Fig 2: Iteration comparison between k- means and improved k-
               TABLE 1: CHARACTERISTICS OF DATASETS                                             means for k= 3

                                                                          TABLE IV: C OMPARISON OF K-MEANS AND IMPROVED K-MEANS FOR K=4




A. One dimensional Dataset:
   To gain some idea of the numerical behavior of the
                                                                 47
© 2011 ACEEE
DOI: 01.IJNS.02.03.183
ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011

Results for k-means and improved k-means can be plotted                                           REFERENCES
graphically for all datasets as shown below.
                                                                       [1] Bradley P.S., Fayyad U.M., “Refining Initial Points for K-
                                                                       Means Clustering”, Proc. of the 15th International Conference on
                                                                       Machine Learning (ICML98), J. Shavlik (ed.), Morgan Kaufmann,
                                                                       San Francisco, 1998, pp. 91-99.
                                                                       [2] Zhijie Xu,Laisheng Wang , Jiancheng Luo ,Jianqin Zhang
                                                                       “Modified clustering algorithm for data mining “ 2005, 741-744
                                                                       [3] Sugar S, James G, “ Finding the number of clusters in a dataset
                                                                       : an information theoretic approach. “ Stat.Assoc. 98(2003) 750-
                                                                       763
                                                                       [4] Su,Chou C, “A modified version of the K-means algorithm with
                                                                       a distance based on cluster symmetry.” IEEE Transaction Pattern
                                                                       Analysis Machine Intel , 23(6) (2001) 674-680
 Fig. 3: Iteration comparison between k- means and improved k-
                                                                       [5] M.N. Mutty, A.K. Jain , P.J.Flynn , “Data clustering : a review”,
                          means for k= 4
                                                                       ACM computing serveys 31(3) (1999)264-322
It is seen that our experimental results shows that number of          [6] Shehroz S. Khan, Amir Ahmad, “Cluster center initialization
iterations required by improved k-means are less as compared           algorithm for K-means clustering.” , Pattern Recognition Letters
to that of original k-means algorithm and the margin between           25 (2004) 1293-1302 Operational Research, 174(2)(2006) 930-
the total number of iterations required by k-means and                 944
improved k-means is much larger.                                       [7] A.K. Jain, R.C. Dubes, “Algorithms for Clustering Data” ,
                                                                       Prentice Hall, Englewood Cliffs, NJ(1988)
                                                                       [8] Wei Li ,”Modified k-means Clustering algorithm” ,congress on
                      IV. CONCLUSION                                   image and signal processing, 2008,618-621
    This paper presented iterative improved k-means                    [9] Fishers Iris Dataset
clustering algorithm that makes the k-means more efficient             http://www.math.uah.edu/stat/data/Fisher.txt
                                                                       [10] Pima-indians Medical Diabetes dataset
and produce good quality clusters. We analyze the solutions
                                                                       http://archive.ics.uci.edu/ml/machine-learning-databases/pima-
of two algorithms namely original k-means and our proposed             indians-diabetes/pima-indians-diabetes.data
method, iterative improved k-means clustering algorithm .Our           [11] Soyabean plant small dataset
idea depends on the good selection of the starting points for          http://archive.ics.uci.edu/ml/machine-learning-databases/soybean/
the k-means. This is based on the optimization formulation             [12] Fahim A. M., Salem A. M., Torkey F. A. and Ramadan M.
and a novel iterative method. It can be applied to many                “An efficient enhanced k-means clustering algorithm”. Journal of
different kinds of clustering problems or combined with some           Zhejiang University Science A, 2006,vol7(10),pp1626-1633.
other data mining techniques for getting more promising                [13] Joaquín Pérez Ortega1 , Ma. Del Rocío Boone Rojas,1,2,
results. The experimental results using the proposed                   María J. Somodevilla García2 ,Research issues on K-means
                                                                       Algorithm: An Experimental Trial Using Matlab.
algorithm with different datasets are very promising. It is
                                                                       [14] Meila, M., Heckerman, D., An experimental comparison of
seen that iterations required by iterative improved k-means            several clustering methods, Microsoft Research Report MSR-TR-
algorithm are fewer than those original k-means algorithm.             98-06, Redmond, WA.(1998)
Our experimental results demonstrated that the proposed                [15] Pham D. T., Dimov S. S., and Nguyen C. D., “Selection of k
algorithm produces better results than that of the k-means             in K- means clustering”. Mechanical Engineering Science, 2004,
algorithm                                                              vol. 219. pp.103-119.
                                                                       [16] Ray S. and Turi R. H., “Determination of number of clusters
                     ACKNOWLEDGMENTS                                   in k- means clustering and application in colour image
                                                                       segmentation.”, in Proceedings of the 4th International Conference
    I thank Prof. Umesh L. Kulkarni and Prof. Naresh. D. Harale        on Advances in Pattern Recognition and Digital Techniques, 1999,
for their assistance with the implementation of the algorithms         pp. 137-143,
and for many useful and interesting discussions. I also thank          [17] J M Pena, J A Lozano, P Larranaga, “An empirical comparison
UCI machine learning repository for providing the original             of four initialia-zation methods for the K-means algorithm.” Pattern
                                                                       Recognition Lett. 20(1999) 1027-1040
dataset.




                                                                  48
© 2011 ACEEE
DOI: 01.IJNS.02.03.183

Más contenido relacionado

La actualidad más candente

Critical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network ProjectCritical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network Projectiosrjce
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilisticcsandit
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory basedijaia
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving dataiaemedu
 
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
 
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionA Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionIJCSIS Research Publications
 
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - CopyAMIT KUMAR
 
Intent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword SuggestionIntent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword SuggestionFindwise
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theorycsandit
 
B colouring
B colouringB colouring
B colouringxs76250
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
 
Neuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost ManagementNeuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost ManagementEditor IJCATR
 
Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Nexgen Technology
 

La actualidad más candente (20)

Critical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network ProjectCritical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network Project
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilistic
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text Documents
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory based
 
Az36311316
Az36311316Az36311316
Az36311316
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
final seminar
final seminarfinal seminar
final seminar
 
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
 
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionA Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
 
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - Copy
 
Intent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword SuggestionIntent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword Suggestion
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Mining
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theory
 
B colouring
B colouringB colouring
B colouring
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learning
 
Neuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost ManagementNeuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost Management
 
Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...
 

Destacado

FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmIDES Editor
 
GR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash MemoryGR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash MemoryIDES Editor
 
Assessment of Market Power in Deregulated Electricity Market with Congestion
Assessment of Market Power in Deregulated Electricity Market with CongestionAssessment of Market Power in Deregulated Electricity Market with Congestion
Assessment of Market Power in Deregulated Electricity Market with CongestionIDES Editor
 
Neural Network Based Noise Identification in Digital Images
Neural Network Based Noise Identification in Digital ImagesNeural Network Based Noise Identification in Digital Images
Neural Network Based Noise Identification in Digital ImagesIDES Editor
 
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D Faces
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D FacesDepth-Image-based Facial Analysis between Age Groups and Recognition of 3D Faces
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D FacesIDES Editor
 
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...IDES Editor
 
An Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender ClassificationAn Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender ClassificationIDES Editor
 
Performance Analysis of GA and PSO over Economic Load Dispatch Problem
Performance Analysis of GA and PSO over Economic Load Dispatch ProblemPerformance Analysis of GA and PSO over Economic Load Dispatch Problem
Performance Analysis of GA and PSO over Economic Load Dispatch ProblemIOSR Journals
 
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...IDES Editor
 
User Controlled Privacy in Participatory Sensing
User Controlled Privacy in Participatory SensingUser Controlled Privacy in Participatory Sensing
User Controlled Privacy in Participatory SensingIDES Editor
 
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...IDES Editor
 
Happy at work 4 achieving peak performance at work and in life by bj manalo
Happy at work 4 achieving peak performance at work and in life by bj manaloHappy at work 4 achieving peak performance at work and in life by bj manalo
Happy at work 4 achieving peak performance at work and in life by bj manaloSalt & Light Ventures
 

Destacado (12)

FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT Algorithm
 
GR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash MemoryGR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash Memory
 
Assessment of Market Power in Deregulated Electricity Market with Congestion
Assessment of Market Power in Deregulated Electricity Market with CongestionAssessment of Market Power in Deregulated Electricity Market with Congestion
Assessment of Market Power in Deregulated Electricity Market with Congestion
 
Neural Network Based Noise Identification in Digital Images
Neural Network Based Noise Identification in Digital ImagesNeural Network Based Noise Identification in Digital Images
Neural Network Based Noise Identification in Digital Images
 
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D Faces
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D FacesDepth-Image-based Facial Analysis between Age Groups and Recognition of 3D Faces
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D Faces
 
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...
 
An Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender ClassificationAn Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender Classification
 
Performance Analysis of GA and PSO over Economic Load Dispatch Problem
Performance Analysis of GA and PSO over Economic Load Dispatch ProblemPerformance Analysis of GA and PSO over Economic Load Dispatch Problem
Performance Analysis of GA and PSO over Economic Load Dispatch Problem
 
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...
 
User Controlled Privacy in Participatory Sensing
User Controlled Privacy in Participatory SensingUser Controlled Privacy in Participatory Sensing
User Controlled Privacy in Participatory Sensing
 
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...
 
Happy at work 4 achieving peak performance at work and in life by bj manalo
Happy at work 4 achieving peak performance at work and in life by bj manaloHappy at work 4 achieving peak performance at work and in life by bj manalo
Happy at work 4 achieving peak performance at work and in life by bj manalo
 

Similar a An Iterative Improved k-means Clustering

A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...IJCSIS Research Publications
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIJSRD
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative StudyFiona Phillips
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...Nicolle Dammann
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansIJASCSE
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysisinventy
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusabilityAlexander Decker
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docxYaseenRashid4
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 

Similar a An Iterative Improved k-means Clustering (20)

A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-Means
 
F04463437
F04463437F04463437
F04463437
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusability
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
41 125-1-pb
41 125-1-pb41 125-1-pb
41 125-1-pb
 

Más de IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

Más de IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

An Iterative Improved k-means Clustering

  • 1. ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011 An Iterative Improved k-means Clustering Madhuri A. Dalal1 Nareshkumar D. Harale 2 Umesh L.Kulkarni 3 1 Student of Computer , M.G.M’s college of Engineering, Kamothe, Navi Mumbai, India,madhuri_dalal25@yahoo.co.in 2 Faculty of computer, M.G.M’s college of Engineering, Kamothe, Navi Mumbai, India,nareshkumar.d.harale@gmail.com 3 Faculty of Information, Konkan Gyanpeeth College of Engineering,Karjat,India,kumeshl@rediffmail.com Abstract: Clustering is a data mining (machine learning), fundamental modes of understanding and learning. Cluster unsupervised learning technique used to place data elements analysis is the study of algorithms for grouping or classifying into related groups without advance knowledge of the group objects [8]. So a cluster is comprised of number of similar definitions. One of the most popular and widely studied objects collected or grouped together[5][7]. clustering methods that minimize the clustering error for points in Euclidean space is called K-means clustering. There are two goals of clustering algorithms: However, the k-means method converges to one of many local (1) Determining good clusters and minima, and it is known that the final results depend on the (2) Doing so efficiently. initial starting points (means). In this research paper, we have Clustering is particularly applied when there is a need to introduced and tested an improved algorithm to start the k- partition the instances into natural groups, but predicting means with good starting points (means). The good initial the class of objects is almost impossible. There are a large starting points allow k-means to converge to a better local number of approaches to the clustering problem, including minimum; also the numbers of iteration over the full dataset optimization based models that employ mathematical are being decreased. Experimental results show that initial programming for developing efficient and meaningful starting points lead to good solution reducing the number of iterations to form a cluster. clustering schemes. It has been widely emphasized that clustering and optimization may help each other in several Keywords: data mining , clustering, k-means clustering, aspects, leading to better methods and algorithms with clustering algorithms. increased accuracy and efficiency. Exact and heuristic mathematical programming based clustering algorithms have I . INTRODUCTION been proposed in recent years. However, most of these algorithms suffer from scalability as the size and the dimension Clustering is a data mining technique that separates your of the data set increases.. Another important discussion in data into groups whose members belong together. This is clustering is the definition of best partitioning of a data set, similar to assigning animals and plants into families where which is difficult to predict since it is a relative and subjective the members are alike. Clustering does not require a prior topic. Different models may result in different solutions knowledge of the groups that are formed and the members subject to the selected clustering criteria and the developed who must belong to it [17, 5]. Representing the data by fewer clustering model [4][6][13]. For cluster analysis to work clusters necessarily loses certain fine details, but achieves efficiently and effectively, as simplification. It models data by its clusters. Data modeling many literatures have presented, there are following typical puts clustering in a historical perspective rooted in requirements of clustering in data mining: mathematics, statistics, and numerical analysis. From a 1. Scalability: machine learning perspective clusters correspond to hidden That is to say an efficient and effective clustering method patterns, the search for clusters is unsupervised learning, should not only be able to work well on small data sets, but and the resulting system represents a data concept. From a also a large database containing about millions of objects. practical perspective, clustering plays an outstanding role in 2. Ability to deal with different types of attributes: data mining applications such as scientific data exploration, information retrieval and text mining, spatial database An efficient and effective clustering method is required applications, Web analysis, CRM, marketing, medical to cluster various types of data, not only numerical, but diagnostics, computational biology, and many others[1,17]. also binary, categorical, and ordinal data, or mixtures of Data clustering is under vigorous development and is applied these data types. to many application areas including business, biology, 3. Discovery of clusters with arbitrary shape: medicine, chemistry, data mining and knowledge discovery A cluster could be of any shape. It is important to develop [4], [7], data compression and vector quantization [5], pattern algorithms that can detect clusters of arbitrary shape. recognition and pattern classification [8], neural networks, 4. Minimal requirements for domain knowledge to determine artificial intelligence, and statistics.etc. Owing to the huge input parameters: amounts of data collected in databases, cluster analysis has Some algorithms require users to input certain parameters in recently become a highly active topic in data mining research cluster analysis. But the parameters are hard to determine. [2].The research has focused on finding efficient and effective 5· Ability to deal with noisy data: cluster analysis in large databases. Classifying objects A good clustering algorithm is required to be independent of according to similarities is the base for much of science. the influence of noise. Organizing objects into sensible grouping is one of the most 6· Insensitivity to the order of input records: 45 © 2011 ACEEE DOI: 01.IJNS.02.03.183
  • 2. ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011 It is important to develop algorithms that are insensitive to efficiency is usually accomplished by either reducing the the order of input. number of iterations to reach final convergence or reducing 7· High dimensionality: the total number of distance calculations. The k-means The ability to handle high-dimensional data is important for a algorithm randomly selects k initial cluster centers from the good algorithm. original dataset. Then, the algorithm will converge to the Several clustering algorithms have been proposed. These actual cluster centers after several iterations. Therefore, algorithms can be broadly classified into hierarchical and choosing a good set of initial cluster centers is very important partitioning clustering algorithms [8]. Hierarchical algorithms for the algorithm. However, it is difficult to select a good set decompose a database D of n objects into several levels of of initial cluster centers randomly. nested partitioning (clustering), represented by a dendrogram B. Iterative Improved k-means Clustering: (tree). There are two types of hierarchical algorithms; an agglomerative that builds the tree from the leaf nodes up, In this section we describe our algorithm that produces good whereas a divisive builds the tree from the top down. starting points for the k-means algorithm instead of selecting Partitioning algorithms construct a single partition of a them randomly. And this will leads to better clusters at the final database D of n objects into a set of k clusters, such that the result than that of the original k-means. In the model in this objects in a cluster are more similar to each other than to study, it is assumed that the number of desired clusters k is objects in different clusters. The k-means clustering algorithm known a priori since the determination of the number of is the most commonly used partitioned algorithm [8][12] clusters constitutes another subject of research in the clustering literature. However, typically k will be small. The because it can be easily implemented, speed convergence to goal of the model is to find the optimal partitioning of the local minimum. However this local minimum depends on the data set into K exclusive clusters given a data set of n data initial starting means. In this paper, we introduce an efficient items in m dimensions, i.e. a set of n points in Rm. The improved method to obtain good initial starting means, so parameter d ij denotes the distance between two data points the final result will be better than that of randomly selected i and j in Rm and can be calculated by any desired norm on Rm initial starting means. How to get good initial starting means such as the Euclidean distance which is is the straight-line becomes an important operational objective [1][14]. This paper distance with two points. The Euclidean distance between is organized as follows: Section 2 introduces K-means point’s p and q is the length of the line segment. In Cartesian clustering and our proposed improved iterative k-means coordinates, if p = (p1, p2,..., pn) and q = (q1, q2... qn) are two clustering method. Section 3 presents experimental results points in Euclidean n-space, then the distance from p to q is comparing both algorithms. Section 4 concludes the paper. given by: II . K-MEANS CLUSTERING A. k-means Clustering: Consider a data set D = {d(j) = (d1(j) , d2(j),…….. dm(j) | j = There are many algorithms for clustering datasets. The k- 1……….n } in Rm and K be predefined number of clusters. means clustering is the most popular method used to divide n Bellow is the outline of a precise cluster centers initialization patterns {x1, …, xn} in d dimensional space into k clusters [8]. method. K The result is a set of k centers, each of which is located at the Step1 : Dividing D into K parts as D =U S k , Sk1 ) Sk2 = centroid of the partitioned dataset. This algorithm can be k=1 summarized in the following steps: according to data patterns. 1. Initialization: Select a set of k starting points {mj}, j = 1, Step2 : Calculate new ck centers as the optimal solution of 2… k. The selection may be done in random manner or minz= || x – d(j) || …………………………(1) according to some heuristic.                       d(j) ε Sk 2. Distance calculation: For each pattern xi, 1<=i<=n, compute x = (x1. . . xm) ε Rm where || • || denotes the 2-norm. its Euclidean distance to each cluster centroid mj,1<=j<=k , Step 3 : Decide membership of the patterns in each one of the and then find the closest cluster centroid mj and assign the K-clusters according to the minimum distance from cluster object xi to it. center criteria. 3. Centroid recalculation: For each cluster j, 1<=j<=k, Step 4 : Repeat steps 2 and 3 till there is no change in recomputed cluster centroid mj as the average of the data cluster centers. points assigned to it. The step 1 is rather flexible. We can accomplish it according 4. Convergence condition: Repeat steps 2 and 3 until to the visual inspection or any other methods. Normally this convergence. heuristic partition is better than random sampling. Also we To choose a proper number of clusters k is a domain can use sub-algorithm below to accomplish step 1 for any dependent problem. To resolve this, some researchers have data patterns. proposed methods to perform k-clustering for various numbers of clusters and employ certain criteria for selecting C. Sub-algorithm: the most suitable value of k [15] and [16]. Several variants of 1) Compute the dmin and dmax by the k-means algorithm have been proposed. Their purpose is dmin = min || d(j) || to improve efficiency or find better clusters. Improved 46 © 2011 ACEEE DOI: 01.IJNS.02.03.183
  • 3. ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011 dmax = max || d(j) || improved k-means algorithm and to compare it with the original K-means algorithm of randomly choosing initial 1) For k = 0, 1, . . . , K “1, calculate new ck centers as: starting points, we first solve a problem in detail by original ck = dmin + (( k/K )+ 1/ (2* K)) (dmax – dmin) …..(2) and improved k-means algorithm, with the same dataset Step 2 in sub algorithm has been modified for one-dimensional separately. The cardinality of data set is given by column dataset as follows : labeled N in table 1. Total number of iterations, required for entire solution of each dataset is displayed in two columns ck = dmin+ ((k/2*k + (5.5/(4*k)) (dmax – dmin)……….(3) labeled k-means and improved k-means under iterations. The Theoretically, it provides us an ideal cluster center. following table shows the number of iterations taken by k- Nevertheless, the process of finding the optimal solution to means and improved k-means for k= 3 considering 1-D dataset. problem (3) is too expensive if the problem is computed by TABLE II: C OMPARISON OF K-MEANS AND IMPROVED K-MEANS FOR normal optimization algorithm .Fortunately we can establish a simple recursive method to handle problem (1). We propose a novel iterative method which is proved to be with high efficiency by our simulation results. We obtain the iterative formula Results for k-means and improved k-means can be plotted graphically for all datasets as shown below. Where qjk = || x(k) - d(j) || The above iteration process requires an initial point as its input. We can simply use the average of the coordinates of the data points. k= 1,2 ,3 ………………..K D. Iterative improved k-means clustering Algorithm: Fig. 1: Iteration comparison between k- means and improved k- means for k=3 1. Dividing D into K parts as D = U S k , Sk1 ) Sk2 = . k=1 according to data patterns.(call sub B. Multi dimensional Dataset: algorithm) 2. Decide membership of the patterns in each one of the K- Table III: Comparison of k-means and improved k-means for k=3 clusters according to the minimum distance from cluster center criteria 3. Calculate new centers by iterative formula (5). 4. Repeat steps 3 and 4 till there is no change in cluster centers. Results for k-means and improved k-means can be plotted III. EXPERIMENTAL RESULTS graphically for all datasets as shown below. We have evaluated our proposed algorithm on Fisher’s iris datasets, Pima Indian medical diabetes dataset, and soya bean plant dataset considering one-dimensional data as well as multi-dimensional data. We compared our results with that of the original k-means algorithm in terms of the number of iterations for both algorithms. We give a brief description of the datasets used in our algorithm evaluation. Table 1 shows some characteristics of these datasets [9][10][11]. Fig 2: Iteration comparison between k- means and improved k- TABLE 1: CHARACTERISTICS OF DATASETS means for k= 3 TABLE IV: C OMPARISON OF K-MEANS AND IMPROVED K-MEANS FOR K=4 A. One dimensional Dataset: To gain some idea of the numerical behavior of the 47 © 2011 ACEEE DOI: 01.IJNS.02.03.183
  • 4. ACEEE Int. J. on Network Security , Vol. 02, No. 03, July 2011 Results for k-means and improved k-means can be plotted REFERENCES graphically for all datasets as shown below. [1] Bradley P.S., Fayyad U.M., “Refining Initial Points for K- Means Clustering”, Proc. of the 15th International Conference on Machine Learning (ICML98), J. Shavlik (ed.), Morgan Kaufmann, San Francisco, 1998, pp. 91-99. [2] Zhijie Xu,Laisheng Wang , Jiancheng Luo ,Jianqin Zhang “Modified clustering algorithm for data mining “ 2005, 741-744 [3] Sugar S, James G, “ Finding the number of clusters in a dataset : an information theoretic approach. “ Stat.Assoc. 98(2003) 750- 763 [4] Su,Chou C, “A modified version of the K-means algorithm with a distance based on cluster symmetry.” IEEE Transaction Pattern Analysis Machine Intel , 23(6) (2001) 674-680 Fig. 3: Iteration comparison between k- means and improved k- [5] M.N. Mutty, A.K. Jain , P.J.Flynn , “Data clustering : a review”, means for k= 4 ACM computing serveys 31(3) (1999)264-322 It is seen that our experimental results shows that number of [6] Shehroz S. Khan, Amir Ahmad, “Cluster center initialization iterations required by improved k-means are less as compared algorithm for K-means clustering.” , Pattern Recognition Letters to that of original k-means algorithm and the margin between 25 (2004) 1293-1302 Operational Research, 174(2)(2006) 930- the total number of iterations required by k-means and 944 improved k-means is much larger. [7] A.K. Jain, R.C. Dubes, “Algorithms for Clustering Data” , Prentice Hall, Englewood Cliffs, NJ(1988) [8] Wei Li ,”Modified k-means Clustering algorithm” ,congress on IV. CONCLUSION image and signal processing, 2008,618-621 This paper presented iterative improved k-means [9] Fishers Iris Dataset clustering algorithm that makes the k-means more efficient http://www.math.uah.edu/stat/data/Fisher.txt [10] Pima-indians Medical Diabetes dataset and produce good quality clusters. We analyze the solutions http://archive.ics.uci.edu/ml/machine-learning-databases/pima- of two algorithms namely original k-means and our proposed indians-diabetes/pima-indians-diabetes.data method, iterative improved k-means clustering algorithm .Our [11] Soyabean plant small dataset idea depends on the good selection of the starting points for http://archive.ics.uci.edu/ml/machine-learning-databases/soybean/ the k-means. This is based on the optimization formulation [12] Fahim A. M., Salem A. M., Torkey F. A. and Ramadan M. and a novel iterative method. It can be applied to many “An efficient enhanced k-means clustering algorithm”. Journal of different kinds of clustering problems or combined with some Zhejiang University Science A, 2006,vol7(10),pp1626-1633. other data mining techniques for getting more promising [13] Joaquín Pérez Ortega1 , Ma. Del Rocío Boone Rojas,1,2, results. The experimental results using the proposed María J. Somodevilla García2 ,Research issues on K-means Algorithm: An Experimental Trial Using Matlab. algorithm with different datasets are very promising. It is [14] Meila, M., Heckerman, D., An experimental comparison of seen that iterations required by iterative improved k-means several clustering methods, Microsoft Research Report MSR-TR- algorithm are fewer than those original k-means algorithm. 98-06, Redmond, WA.(1998) Our experimental results demonstrated that the proposed [15] Pham D. T., Dimov S. S., and Nguyen C. D., “Selection of k algorithm produces better results than that of the k-means in K- means clustering”. Mechanical Engineering Science, 2004, algorithm vol. 219. pp.103-119. [16] Ray S. and Turi R. H., “Determination of number of clusters ACKNOWLEDGMENTS in k- means clustering and application in colour image segmentation.”, in Proceedings of the 4th International Conference I thank Prof. Umesh L. Kulkarni and Prof. Naresh. D. Harale on Advances in Pattern Recognition and Digital Techniques, 1999, for their assistance with the implementation of the algorithms pp. 137-143, and for many useful and interesting discussions. I also thank [17] J M Pena, J A Lozano, P Larranaga, “An empirical comparison UCI machine learning repository for providing the original of four initialia-zation methods for the K-means algorithm.” Pattern Recognition Lett. 20(1999) 1027-1040 dataset. 48 © 2011 ACEEE DOI: 01.IJNS.02.03.183