SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
International Journal of Research in Computer Science
eISSN 2249-8265 Volume 3 Issue 2 (2013) pp. 1-9
www.ijorcs.org, A Unit of White Globe Publications
doi: 10.7815/ijorcs. 32.2013.060


                 A PSO-BASED SUBTRACTIVE DATA
                     CLUSTERING ALGORITHM
               Mariam El-Tarabily1, Rehab Abdel-Kader2, Mahmoud Marie3, Gamal Abdel-Azeem4
       124
          Electrical Engineering Department, Faculty of Engineering - Port-Said, Port-Said University, EGYPT
        E-mail: 1mariammokhtar75@hotmail.com, 2r.abdelkader@eng.psu.edu.eg, 4gamalagag@hotmail.com
   3
    Computers and Systems Engineering Department, Faculty of Engineering, Al-Azhar University, Cairo, EGYPT
                                      E-mail: mahmoudim@hotmail.com

Abstract: There is a tremendous proliferation in the         18]. There are two major clustering techniques:
amount of information available on the largest shared        “Partitioning” and “Hierarchical” [2, 9]. In hierarchical
information source, the World Wide Web. Fast and             clustering, the output is a tree showing a sequence of
high-quality clustering algorithms play an important         clustering with each clustering being a partition of the
role in helping users to effectively navigate,               data set. On the other hand, Partitioning clustering [1]
summarize, and organize the information. Recent              algorithms partition the data set into a specified
studies have shown that partitional clustering               number of clusters. These algorithms try to minimize a
algorithms such as the k-means algorithm are the most        certain criteria (e.g. a square error function) and can
popular algorithms for clustering large datasets. The        therefore be treated as optimization problems.
major problem with partitional clustering algorithms
is that they are sensitive to the selection of the initial      In recent years, it has been recognized that the
partitions and are prone to premature converge to            partitional clustering technique is well suited for
local optima. Subtractive clustering is a fast, one-pass     clustering large datasets due to their relatively low
algorithm for estimating the number of clusters and          computational requirements. The time complexity of
cluster centers for any given set of data. The cluster       the partitioning technique is almost linear, which
estimates can be used to initialize iterative                makes it a widely used technique. The best-known
optimization-based clustering methods and model              partitioning clustering algorithm is the K-means
identification methods. In this paper, we present a          algorithm and its variants [10].
hybrid Particle Swarm Optimization, Subtractive +               Subtractive clustering method, as proposed by Chiu
(PSO) clustering algorithm that performs fast                [13], is a relatively simple and effective approach to
clustering. For comparison purpose, we applied the           approximate estimation of cluster centers on the basis
Subtractive + (PSO) clustering algorithm, PSO, and           of a density measure in which the data points are
the Subtractive clustering algorithms on three different     considered candidates for cluster centers. This method
datasets. The results illustrate that the Subtractive +      can obtain initial cluster centers that are required by
(PSO) clustering algorithm can generate the most             more sophisticated clustering algorithms. It can also be
compact clustering results as compared to other              used as quick stand-alone method for approximate
algorithms.                                                  clustering.
Keywords: Data Clustering, Subtractive Clustering,               Particle Swarm Optimization (PSO) algorithm is a
Particle Swarm Optimization, Subtractive Algorithm,          population based stochastic optimization technique
Hybrid Algorithm.                                            that can be used to find an optimal, or near optimal,
                                                             solution to a numerical and qualitative problem [4, 11,
                I. INTRODUCTION                              17]. Several attempts were proposed in the literature to
   Clustering is one of the most extensively studied         apply PSO to the data clustering problem [6, 18, 19,
research topics due to its numerous important                20, 21]. The major drawback is that the number of
applications in machine learning, image segmentation,        cluster is initially unknown and the clustering result is
information retrieval, and pattern recognition.              sensitive to the selection of the initial cluster centroids
Clustering involves dividing a set of objects into a         and may converge to the local optima. Therefore, the
specified number of clusters [14]. The motivation            initial selection of the cluster centroids decides the
behind clustering a set of data is to find inherent          processing of PSO and the partition result of the
structure in the data and expose this structure as a set     dataset as well. The same initial cluster centroids in a
of groups. The data objects within each group should         dataset will always generate the same cluster results.
exhibit a large degree of similarity while the similarity    However, if good initial clustering centroids can be
among different clusters should be minimized [3, 9,          obtained using any of the other techniques, the PSO


                                                                           www.ijorcs.org
2                                         Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem

would work well in refining the clustering centroids to       digital pheromones to coordinate swarms within an n-
find the optimal clustering centers. The Subtractive          dimensional design space to improve the search
clustering algorithm can be used to generate the              efficiency and reliability. In [28] a hybrid fuzzy
number of clusters and a good initial cluster centroids       clustering method based on FCM and fuzzy PSO
for the PSO.                                                  (FPSO) is proposed which make use of the merits of
                                                              both algorithms. Experimental results show that the
   In this paper, we present a hybrid Subtractive +           proposed method is efficient and can reveal
(PSO) clustering algorithm that performs fast                 encouraging results.
clustering. Experimental results indicate that the
Subtractive + (PSO) clustering algorithm can find the               III.     DATA CLUSTERING PROBLEM

                                                              clustered is represented as a set of vectors X = {𝑥𝑥1 , 𝑥𝑥2 ,
optimal solution after nearly 50 iterations in


                                                              …., 𝑥𝑥 𝑛 }, where the vector 𝑥𝑥 𝑖𝑖 corresponds to a single
comparison with the ordinary PSO algorithm. The                  In most clustering algorithms, the dataset to be
remainder of this paper is organized as follows:
Section 2 provides the related works in data clustering
using PSO. Section 3 provides a general overview of           object and is called the feature vector. The feature
the data clustering problem and the basic PSO                 vector should include proper features to represent the
algorithm. The proposed hybrid Subtractive + (PSO)            object.
clustering algorithm is described in Section 4. Section
                                                              The similarity metric: Since similarity is fundamental
5 provides the detailed experimental setup and results
                                                              to the definition of a cluster, a measure of the
for comparing the performance of the Subtractive +
                                                              similarity between two data sets from the same feature
(PSO) clustering algorithm with the Subtractive
                                                              space is essential to most clustering procedures.
algorithm, and PSO. The discussion of the
                                                              Because of the variety of feature types and scales, the
experiment’s results is also presented. Conclusions are

                                                              compute the similarity between data 𝑚 𝑝 and 𝑚 𝑗 . The
                                                              distance measure must be chosen carefully. Over the
drawn in Section 6.
                                                              years, two prominent ways have been proposed to
               II. RELATED WORK
                                                              most popular metric for continuous features is the
    The well-known partitioning algorithm is the K-           Euclidean distance, given by:

                                                                                     𝑑 𝑚 �𝑚 𝑝𝑘      − 𝑚 𝑗𝑘 � �
                                                                                                              2
                                                                𝑑(𝑚 𝑝 ,𝑚 𝑗) = �∑
means algorithm [2, 6, 7, 16] and its variants. The

                                                                                     𝑘=1                          𝑑
main drawback of the K-means algorithm is that the
                                                                                                                      𝑚
cluster result is sensitive to the selection of the initial                                                                      (1)
cluster centroids and may converge to the local
optimal and it generally requires a prior knowledge of        which is a special case of the Minkowski metric [5] ,
the probable number of clusters for a data collection.
                                                                                                                          1�
                                                              given by:

                                                                𝐷 𝑛 �𝑚 𝑝 , 𝑚 𝑗 � = �∑ 𝑖𝑖=1 �𝑚 𝑖𝑖.𝑝 − 𝑚 𝑖𝑖,𝑗 � �
                                                                                        𝑑𝑚                        𝑛          𝑛
In recent years scientists have proposed several
approaches [3] inspired from the biological collective

                                                              where 𝑚 𝑝 and 𝑚 𝑗 are two data vectors; 𝑑 𝑚 denotes the
behaviors to solve the clustering problem, such as                                                                               (2)


                                                              dimension number of the vector space; 𝑚 𝑝𝑘 and 𝑚 𝑗𝑘
Genetic Algorithm (GA) [8], Particle Swarm


                                                              stand for the data 𝑚 𝑝 and 𝑚 𝑗 ’s weight values in
Optimization (PSO), Ant clustering and Self-
Organizing Maps (SOM) [9]. In [6] authors
represented a hybrid PSO+K-means document
clustering algorithm that performed fast document             dimension k.
clustering. The results indicated that the PSO+K-
                                                                 The second commonly used similarity measure in
means algorithm can generate the best results in just 50
                                                              clustering is the cosine correlation measure [15], given

                                                                                                   𝑚 𝑡𝑝𝑚 𝑗
                                                                           cos (𝑚 𝑝 , 𝑚 𝑗 ) =
iterations in comparison with the K-means algorithm
                                                              by:
and the PSO algorithm. Reference [24] proposed a
                                                                                                �𝑚 𝑝 ||𝑚 𝑗|

                                                              where 𝑚 𝑡𝑝 , 𝑚 𝑗 denotes the dot-product of the two data
Discrete PSO with crossover and mutation operators of                                                                            (3)
Genetic Algorithm for document clustering. The
proposed system markedly increased the success of the
clustering problem, it tried to avoid the stagnation          vectors; |.| indicates the length of the vector. Both
behavior of the particles, but it could not always avoid      similarity metrics are widely used in clustering
that behavior. In [26] authors investigated the               literatures.
application of the EPSO to cluster data vectors. The
EPSO algorithm was compared against the PSO                   A. Subtractive Clustering Algorithm
clustering algorithm which showed that the EPSO
                                                                 In Subtractive clustering data points are considered
convergence is slower to lower quantization error,            as candidates for the cluster centers [25]. In this
while the PSO convergence is faster to a large
                                                              method the computation complexity is linearly
quantization error. Reference [27] presented a new            proportional to the number of data points and
approach to particle swarm optimization (PSO) using


                                                                               www.ijorcs.org
A PSO-Based Subtractive Data Clustering Algorithm                                                                                     3

independent of the dimension of the problem under             dimensional problem space represents one solution for


   Consider a collection on n data points {x1 , … , xn } in
consideration.                                                the problem. When a particle moves to a new location,
                                                              a different problem solution is generated. The fitness


                                                              position for that particle 𝑝 𝑏𝑒𝑠𝑡 and to the fitness of the
                                                              function is evaluated for each particle in the swarm


point xi is defined as
an M-dimensional space. Since each data point is a            and is compared to the fitness of the best previous
candidate for cluster centers, a density measure at data

                                                               𝑔 𝑏𝑒𝑠𝑡 . After finding the two best values, the 𝑖 𝑡ℎ
       𝐷𝑖𝑖 = ∑ 𝑗=1 𝑒𝑥𝑥𝑝 �−                  �
               𝑛             ∥𝑥 𝑖 −𝑥 𝑗 ∥2
                                                              global best particle among all particles in the swarm

                             (𝑟 𝑎 ∕2)2

where ra is a positive constant. Hence, a data point
                                                      (4)     particles evolve by updating their velocities and


                                                                𝑣 𝑖𝑖 𝑑 = 𝑤 ∗ 𝑣 𝑖𝑖 𝑑 + 𝑐1∗ 𝑟𝑎𝑛𝑑1 ∗ (𝑝 𝑏𝑒𝑠𝑡 − 𝑥𝑥 𝑖𝑖 𝑑 ) + 𝑐2
                                                              positions according to the following equations:


neighboring data points. The radius ra defines a                                    ∗ 𝑟𝑎𝑛𝑑2 ∗ (𝑔 𝑏𝑒𝑠𝑡 − 𝑥𝑥 𝑖𝑖 𝑑 )
will have a high density value if it has many


radius ra contribute only slightly to the density                        𝑥𝑥 𝑖𝑖 𝑑 = 𝑥𝑥 𝑖𝑖 𝑑 + 𝑣 𝑖𝑖 𝑑
neighborhood for a data point, Data points outside the                                         (6)
                                                                                                                    (7)
measure.
                                                              where d denotes the dimension of the problem space;


is selected as the first cluster center. Let xc1 be the
   After calculating the density measure of all data          rand1, rand2 are random values in the range of (0, 1).


point selected and Dc1 is its corresponding density
points, the data point with the highest density measure       The random values, rand1 and rand2, are used for the
                                                              sake of completeness, that is, to make sure that


xi are recalculated as follows:
                                                              particles explore wide search space before converging
measure. The density measure Di, for each data point          around the optimal solution. c1 and c2 are constants and
                                                              are known as acceleration coefficients; The values of

        𝐷𝑖𝑖 = 𝐷𝑖𝑖 − 𝐷 𝐶1 𝑒𝑥𝑥𝑝 �−               2 �
                                    ∥𝑥 𝑖 − 𝑥 𝑐1 ∥2
                                                              c1 and c2 control the weight balance of pbest and gbest in

                                         𝑟
                                     � 𝑏�2�
                                                     (5)      deciding the particle’s next movement. w denotes the


     Where, 𝑟 𝑏 is a positive constant. Therefore, data
                                                              inertia weight factor; An improvement to original PSO


points close to the first cluster center 𝑥𝑥 𝑐1 will have
                                                              is constituted by the fact that w is not kept constant
                                                              during execution; rather, starting from a maximal
                                                              value, it is linearly decremented as the number of


 𝑟 𝑏 defines a neighborhood that has a measurable
significantly reduced density measure and are unlikely        iterations increases down to a minimal value [4],


reduction in the density measure. The constant rb is
to be selected as the next cluster center. The constant

                                                                 𝑤 = (𝑤 − 0.4)                                   + 0.4
                                                              initially set to 0.9, decreasing to 0.4 according to:
                                                                                      (𝑀𝐴𝑋𝐼𝑇𝐸𝑅 – 𝐼𝑇𝐸𝑅𝐴𝑇𝐼𝑂𝑁)
normally larger than 𝑟 𝑎 to prevent closely spaced                                          𝑀𝐴𝑋𝐼𝑇𝐸𝑅

cluster centers; generally 𝑟 𝑏 is equal to 1.5 𝑟 𝑎 , as
                                                                                                                                (8)
                                                                  MAXITER is the maximum number of iterations,
suggested in [25].                                            and ITERATION represents the current number of


recalculated, the next cluster center 𝑥𝑥 𝑐2 is selected and
                                                              iterations. The inertia weight factor w provides the
   After the density measure for each data point is           necessary diversity to the swarm by changing the
                                                              momentum of particles to avoid the stagnation of
the density measures for all data points are                  particles at the local optima. The empirical research
recalculated. This iterative process is repeated until a      conducted by Eberhart and Shi shows improvement of
sufficient number of cluster centers are generated.           search efficiency through gradually decreasing the
   When applying subtractive clustering to a set of           value of inertia weight factor from a high value during
input-output data, each of the cluster centers represent      the search.


                                                              current coordinate 𝑥𝑥 𝑖𝑖 𝑑 , and its velocity 𝑣 𝑖𝑖 𝑑 that
a prototype that exhibits certain characteristics of the         Equation 6 indicates that each particle records its
system to be modeled. These cluster centers would be
reasonably used as the initial clustering centers for         indicates the speed of its movement along the
PSO algorithm.

                                                              particle’s current velocity, 𝑣 -vector, to its location, 𝑥𝑥 -
                                                              dimensions in a problem space. For every generation,
                                                              the particle’s new location is computed by adding the
B. PSO Algorithm
   PSO was originally developed by Eberhart and               vector.
Kennedy in 1995 based on the phenomenon of
collective intelligence inspired by the social behavior          The best fitness values are updated at each


                                                                                  𝑃𝑖𝑖 (𝑡)      𝑓�𝑋𝑖𝑖 (𝑡 + 1)� ≤ 𝑓( 𝑋𝑖𝑖 (𝑡))
of bird flocking or fish schooling [11]. In the PSO           generation, based on

                                                               𝑃𝑖𝑖 (𝑡 + 1) = �                                              �
                                                                                 𝑋𝑖𝑖 (𝑡 + 1)    𝑓(𝑋𝑖𝑖 (𝑡 + 1)) > 𝑓(𝑋𝑖𝑖 (𝑡))
algorithm, the birds in a flock are symbolically
represented by particles. These particles can be                                                                                  (9)
considered as simple agents “flying” through a
problem space. A particle’s location in the multi-


                                                                             www.ijorcs.org
4                                          Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem

    It is possible to view the clustering problem as an         vector space at each generation. The average distance
optimization problem that locates the optimal centroids         of data objects to the cluster centroid is used as the
of the clusters rather than finding an optimal partition        fitness value to evaluate the solution represented by
[18, 20, 21, 22]. This view offers us a chance to apply         each particle. The fitness value is measured by the
PSO optimal algorithm on the clustering solution. The           equation below:
                                                                                        𝑝
                                                                                       ∑ 𝑖 𝑑(𝑜 𝑖 ,𝑚 𝑖𝑗 )
                                                                                 ∑ 𝑖=1�                  �
                                                                                    𝑁𝑐  𝑗=1
PSO clustering algorithm performs a globalized search


                                                                           𝑓 =
                                                                                            𝑝𝑖
in the entire solution space [4, 17]. Utilizing the PSO

                                                                                            𝑁𝑐
algorithm’s optimal ability, if given enough time, the


                                                                where mij denotes the 𝑗 𝑡ℎ data vector, which belongs
proposed hybrid Subtractive+(PSO) clustering                                                                   (10)


                                                                to cluster i ; 𝑜𝑖𝑖 is the centroid vector of 𝑖 𝑡ℎ cluster;
algorithm can yield more compact clustering results


                                                                d(𝑜 𝑖𝑖 ,𝑚 𝑖𝑖 𝑗 ) is the distance between data point 𝑚 𝑖𝑖 𝑗 and
compared to traditional PSO clustering algorithm.


                                                                the cluster centroid 𝑜𝑖𝑖 ; 𝑝𝑖𝑖 stands for the data number,
However, in order to cluster the large datasets, PSO


                                                                which belongs to clusterci ; 𝑁 𝑐 stands for the cluster
requires much more iteration (generally more than 500
iterations) to converge to the optima than the hybrid
Subtractive + (PSO) clustering algorithm does.
Although the PSO algorithm is inherently parallel and           number.
can be implemented using parallel hardware, such as a
computer cluster, the computation requirement for                   In the hybrid Subtractive + (PSO) clustering
clustering extremely huge datasets is still high. In            algorithm, the Subtractive algorithm is used at the
terms of execution time, hybrid Subtractive + (PSO)             initial stage to help discovering the vicinity of the
clustering algorithm is the most efficient for large            optimal solution by suggesting good initial cluster
dataset [1].                                                    centers and the number of clusters. The result from
                                                                Subtractive algorithm is used as the initial seed of the
       IV. HYBRID SUBTRACTIVE + (PSO)                           PSO algorithm, which is applied for refining and
            CLUSTERING ALGORITHM                                generating the final result using the global search
                                                                capability of PSO. The flow chart of the hybrid
   In the hybrid Subtractive + (PSO) clustering                 Subtractive + (PSO) is depicted graphically in Figure
algorithm, the multidimensional vector space is                 1.
modeled as a problem space. Each vector can be
represented as a dot in the problem space. The whole                      V. EXPERIMENTAL STUDIES
dataset can be represented as a multiple dimension
space with a large number of dots in the space. The                The main purpose in this paper is to compare the
hybrid Subtractive + (PSO) clustering algorithm                 quality of the PSO and hybrid Subtractive + (PSO)
includes two modules, the Subtractive clustering                clustering algorithm, where the quality of the
module and PSO module. At the initial stage, the                clustering is measured according to the intra–cluster
Subtractive clustering module is executed to search for         distances, i.e. the distance between the data vectors
the clusters’ centroid locations and the suggested              and the cluster centroid within a cluster, where the
number of clusters. This information is transferred to          objective is to minimize the intra-cluster distances.
the PSO module for refining and generating the final            Clustering Problem: We used three different data
optimal clustering solution.                                    collections to compare the performance of the PSO
The Subtractive clustering module: The Subtractive              and hybrid Subtractive + (PSO) clustering algorithms.
clustering module predicts the optimal number of                These datasets are downloaded from Machine
clusters and finds the optimal initial cluster centroids        Learning Repository site [23]. A description of the test
for the next phase.                                             datasets is given in Table 1. In order to reduce the
                                                                impact of the length variations of different data, each
The PSO clustering module: In the PSO clustering                data vector is normalized so that it is of unit length.
algorithm, the whole dataset can be represented as a
                                                                               Table 1: Summary of datasets
multiple dimension space with a large number of dots


particle maintains a matrix X i= (C1 , C2 , …, Ci , .., Ck ),
in space. One particle in the swarm represents one                         Number of             Number of      Number of


where Ci represents the ith cluster centroid vector and
possible solution for clustering the dataset. Each                         Instances             Attributes      classes
                                                                  Iris         150                     4               3
                                                                 Wine          178                    13               3
k represent the total number of clusters. According to           Yeast         1484                    8              10
its own experience and those of its neighbors, the
particle adjusts the centroid vector position in the




                                                                              www.ijorcs.org
A PSO-Based Subtractive Data Clustering Algorithm                                                                                                       5

                                                               Start



                            Calculate the density function for each data point 𝑥𝑥 𝑖𝑖 using equation 4.
                                                                                                Subtractive clustering module




                               Choose the cluster center which has the highest density function.



                           Recalculate the density function for each data point 𝑥𝑥 𝑖𝑖 using equation 5.




                                                            sufficient                  NO
                                                           number of
                                                         cluster centers

                                                                       YES


                Inherit cluster centroid vectors and the number of clusters into the particles as an initial seed.



                 Assign each data point vector in the data set to the closest centroid vector for each particle



                                       Calculate the fitness value based on equation 10.



              Use the velocity and particle position to update equations 6 and 7 and generate the next solutions




                                                     The maximum number
                                                         of iterations is
                                                           exceeded?                            NO
                                                                or
                                                     The average change in
                                                     centroid vectors is less



                                                                    YES                                                         PSO clustering module


                                                              STOP


                                      Figure 1: The flowchart of hybrid Subtractive + (PSO)

Experimental Setting: In this section we present the                         database and 50 particles in the Yeast database. All
experimental results of the hybrid Subtractive + (PSO)                       these values were chosen to ensure good convergence.
clustering algorithm. For the sake of comparison, we
also include the results of the Subtractive and PSO                          Results and Discussion: The fitness equation (10) is
clustering algorithms. In our case the Euclidian                             used not only in the PSO algorithm for the fitness
distance measure is used as the similarity metrics in                        value calculation, but also in the evaluation of the
                                                                             cluster quality. It indicates the value of the average

swarm with the result of the subtractive algorithm. 𝑐1 =
each algorithm. The performance of the clustering
                                                                             distance between a data point and the cluster centroid

 𝑐2 = 1.49 and w inertia weight is according to equation
algorithm can be improved by seeding the initial
                                                                             to which they belong. The smaller the value, the more
                                                                             compact the clustering solution is. Table 2
(8). These values are chosen respectively based on the                       demonstrates the experimental results by using the
results reported in [17]. We choose number of particles                      Subtractive, PSO and Subtractive + (PSO) clustering
as a function of number of classes. In Iris plants                           algorithm respectively. For an easy comparison, the
database we chose 15 particles, 15 particles in Wine                         PSO and hybrid Subtractive + (PSO) clustering


                                                                                               www.ijorcs.org
6                                                 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem

algorithm runs 200 iterations in each experiment. For                            Table 2: Performance comparison Subtractive, PSO,
all the result reported, averages over more than ten                                             Subtractive+(PSO)
simulations are given in Table 2. To illustrate the
convergence behavior of different clustering                                                                    Fitness value
algorithms, the clustering fitness values at each                                          Subtractive          PSO     Subtractive + PSO
iteration are recorded when these two algorithms are                              Iris             6.12         6.891           3.861
applied on datasets separately. As shown in Table 2,                              Wine             2.28         2.13            1.64
the Subtractive + (PSO) clustering approach generates
the clustering result that has the lower fitness value for                       Yeast             1.30         1.289           0.192
all three datasets using the Euclidian similarity metric,                     Figure 2 shows the suggested cluster centers done
The results from the Subtractive+(PSO) approach have                       by the Subtractive clustering algorithm, the cluster
improvements compared to the results of the PSO                            centers appear in black as shown figure. These centers
approach.                                                                  will be used as the initial seed of the PSO algorithm.
                                                                           Figures 3, 4, 5 illustrate the convergence behaviors of
                                                                           the two algorithms on the three datasets using the
                                                                           Euclidian distance as a similarity metric.
                                 4.5
                                                                                           data points
                                           Iris data points                                suggested clusters
                                  4


                                 3.5
                             Y




                                  3


                                 2.5


                                  2
                                   4         4.5         5         5.5     6       6.5         7      7.5          8
                                                                           X

                                 0.015
                                                 Wine data points
                                                                                 data points
                                                                                 suggested clusters

                                  0.01




                                 0.005




                                       0
                                        0             0.01          0.02         0.03          0.04         0.05
                                                                           X
                                           0.8
                                                                               data points
                                                  Yeast data points
                                           0.7
                                                                               suggested clusters


                                           0.6


                                           0.5
                                       Y




                                           0.4


                                           0.3


                                           0.2


                                           0.1
                                              0              0.2           0.4           0.6              0.8
                       Figure 2: Suggested cluster centers by the Subtractive clustering algorithm



                                                                                               www.ijorcs.org
A PSO-Based Subtractive Data Clustering Algorithm                                                                                                       7


                               7.5



                                7



                               6.5



                                6
                                                                                      PSO Algorithm
                                                                                      Hybrid Subtractive PSO Algorithm
                       gbest




                               5.5



                                5



                               4.5



                                4



                               3.5
                                     0        10       20     30        40      50       60         70    80       90      100
                                                                             iteration

                                                   Figure 3: Algorithm convergence for Iris database



             2.5
                                                                                                               PSO Algorithm
             2.4                                                                                               Hybrid Subtractive PSO Algorithm

             2.3

             2.2

             2.1
     gbest




              2

             1.9

             1.8

             1.7

             1.6
                   0   20                40             60         80           100           120        140         160          180             200
                                                                             iteration
                                               Figure 4: Algorithm convergence for Wine database




                                                                                                    www.ijorcs.org
8                                            Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem



                 1.4




                 1.2
                                                                                           PSO Algorithm
                                                                                           Hybrid Subtractive PSO Algorithm

                  1




                 0.8
         gbest




                 0.6




                 0.4




                 0.2




                  0
                   0       20    40          60       80        100         120      140           160         180            200
                                                             iteration

                                      Figure 5: Algorithm convergence for Yeast database



Subtractive PSO Algorithm, 𝑔 𝑏𝑒𝑠𝑡 is the fitness of the
    In all the previous figures representing Hybrid              algorithm, PSO can conduct a globalized searching for
                                                                 the optimal clustering, but it requires more iteration
global best particle among all particles in the swarm            numbers. The subtractive clustering helps the PSO to
particle that was inherited with cluster centroid vectors        start with good initial cluster centroid to converge
from Subtractive algorithm. Now, we notice that the              faster with small fitness function which means a more
Subtractive + PSO algorithm has a good start and it              compact result. The algorithm includes two modules,
converges quickly with lower fitness function. As                the Subtractive module and the PSO module. The
shown in Figure 3, the fitness value of the Subtractive          Subtractive module is executed at the initial stage to
+ PSO algorithm starts with 6.1 and it reduced sharply           discover good initial cluster centroids. The result from
from 6.1 to 3.8 within 25 iterations and fixed at 3.68.          the Subtractive module is used as the initial seed of the
The PSO algorithm starts with 7.4, the reduction of the          PSO module to discover the optimal solution by a
fitness value in PSO is not as sharp as in Subtractive +         global search and at the same time to avoid consuming
PSO and becomes smoothly after 55 iterations. The                high computation. The PSO algorithm will be applied
same happened in the Figures 4 and 5. The Subtractive            for refining and generating the final result.
+ PSO algorithm shows good improvement for large                 Experimental results illustrate that using this hybrid
dataset as shown in Figure 5. This indicates that upon           Subtractive + PSO algorithm can generate better
termination the Subtractive + PSO yield minimal                  clustering results compared to using ordinary PSO.
fitness values. Therefore, the proposed algorithm is an
efficient and effective solution to the data clustering                            VII. REFERENCES
problem.                                                         [1] Khaled         S.   Al-Sultana,   M.    Maroof    Khan,
                                                                         "Computational experience on four algorithms for the
                       VI. CONCLUSION                                    hard clustering problem". Pattern Recognition Letter,
                                                                         Vol.17, No.3, pp.295–308, 1996. doi: 10.1016/0167-
   This paper investigated the application of the                        8655(95)00122-0
Subtractive + PSO algorithm, which is a hybrid of
                                                                 [2] Michael       R. Anderberg , "Cluster Analysis for
PSO and Subtractive algorithms to cluster data vectors.
                                                                         Applications". Academic Press Inc., New York, 1973.
Subtractive clustering module is executed to search for
the cluster's centroid locations and the suggested               [3] Pavel Berkhin, "Survey of clustering data mining
number of clusters. This information is transferred to                   techniques". Accrue Software Research Paper, pp.25-
                                                                         71, 2002. doi: 10.1007/3-540-28349-8_2
the PSO module for refining and generating the final
optimal clustering solution. In the general PSO


                                                                                  www.ijorcs.org
A PSO-Based Subtractive Data Clustering Algorithm                                                                            9

[4] A. Carlisle, G. Dozier, "An Off-The- Shelf PSO". In         [18] Michael Steinbach, George Karypis, Vipin Kumar, "A
    Proceedings     of      the      Particle        Swarm            Comparison of Document Clustering Techniques".
    Optimization Workshop, 2001, PP: 1-6.                             TextMining Workshop, KDD, 2000.
[5] Krzysztof   J. Cios, Witold Pedrycz, Roman W.               [19] Razan Alwee, Siti Mariyam, Firdaus Aziz, K.H.Chey,
    Swiniarski, "Data Mining – Methods for Knowledge                  Haza Nuzly, "The Impact of Social Network Structure
    Discovery". Kluwer Academic Publishers, 1998. doi:                in Particle Swarm Optimization for Classification
    10.1007/978-1-4615-5589-6                                         Problems". International Journal of Soft Computing ,
[6] X.   Cui, P. Palathingal, T.E. Potok, "Document                   Vol. 4, No. 4, 2009, pp:151-156.
    Clustering using Particle Swarm Optimization". IEEE         [20] Van D. M., Engelbrecht. A.P., "Data clustering using
    Swarm Intelligence Symposium 2005, Pasadena,                      particle swarm optimization". Proceedings of IEEE
    California,     pp.      185    -     191.      doi:              Congress on Evolutionary Computation 2003, Canbella,
    10.1109/SIS.2005.1501621                                          Australia.        pp:         215-220.         doi:
[7] Eberhart, R.C., Shi, Y. "Comparing Inertia Weights and            10.1109/CEC.2003.1299577
    Constriction Factors in Particle Swarm Optimization".       [21] Sherin M. Youssef, Mohamed Rizk, Mohamed El-
    Congress on Evolutionary Computing, vol. 1, 2000, pp:             Sherif, "Dynamically Adaptive Data Clustering Using
    84-88. doi: 10.1109/CEC.2000.870279                               Intelligent Swarm-like Agents". International Journal of
[8] Everitt, B. "Cluster Analysis". 2nd Edition, Halsted              Mathematics and Computer in simulation, Vol. 1, No.2,
    Press, New York, 1980.                                            2007.
[9] A. K. Jain , M. N. Murty , P. J. Flynn, "Data Clustering:   [22] Rehab F. Abdel-Kader, "Genetically Improved PSO
    A Review". ACM ComputingSurvey, Vol. 31, No. 3,                   Algorithm for Efficient Data Clustering". Proceeding
    pp: 264-323, 1999. doi: 10.1145/331499.331504                     Second International Conference on Machine Learning
                                                                      and     Computing       2010,      pp.71-75.    doi:
[10] J. A. Hartigan, "Clustering Algorithms". John Wiley              10.1109/ICMLC.2010.19
    and Sons, Inc., New York, 1975.
                                                                [23] UCI Repository of Machine Learning Databases.
[11] Eberhart RC, Shi Y, Kennedy J, "Swarm Intelligence".             http://www.ics.uci .edu/~mlearn/MLRepository.html .
    Morgan Kaufmann, New York, 2001.
                                                                [24] K. Premalatha, A.M. Natarajan, "Discrete PSO with GA
[12] Mahamed G. Omran, Ayed Salman, Andries P.                        Operators for Document Clustering". International
    Engelbrecht, "Image classification using particle swarm           Journal of Recent Trends in Engineering, Vol. 1, No. 1,
    optimization". Proceedings of the 4th Asia-Pacific                2009.
    Conference on Simulated Evolution and Learning 2002,
    Singapore,         pp:           370-374.           doi:    [25] JunYing Chen, Zheng Qin, Ji Jia, "A Weighted Mean
    10.1142/9789812561794_0019                                        Subtractive Clustering Algorithm".    Information
                                                                      Technology Journal, No. 7, pp.356-360, 2008. doi:
[13] S. L. Chiu, "Fuzzy model identification based on cluster         10.3923/itj.2008.356.360
    estimation". Journal of Intelligent and Fuzzy Systems,
    Vol. 2, No. 3, 1994.                                        [26] Neveen I. Ghali, Nahed El-dessouki, Mervat A. N,
                                                                      Lamiaa Bakraw, "Exponential Particle Swarm
[14] Salton G. and Buckley C., "Term-weighting approaches             Optimization    Approach      for    Improving     Data
    in automatic text retrieval". Information Processing and          Clustering". International Journal of Electrical &
    Management, Vol. 24, No. 5, pp: 513-523, 1988. doi:               Electronics Engineering, Vol. 3, Issue 4, May 2009.
    10.1016/0306-4573(88)90021-0
                                                                [27] Vijay    Kalivarapu, Jung-Leng Foo, Eliot Winer,
[15] Song Liangtu, Zhang Xiaoming, "Web Text Feature                  "Improving solution characteristics of particle swarm
    Extraction with Particle Swarm Optimization". IJCSNS              optimization using digital pheromones". Structural and
    International Journal of Computer Science and Network             Multidisciplinary     Optimization     -     STRUCT
    Security, Vol. 7, No. 6, 2007.                                    MULTIDISCIP OPTIM, Vol. 37, No. 4, pp: 415-427,
[16] Selim, Shokri Z., "K-means type algorithms: A                    2009. doi: 10.1007/s00158-008-0240-9
    generalized convergence theorem and characterization        [28] H. Izakian, A. Abraham,   and V. Snásel,   "Fuzzy
    of local optimality". Pattern Analysis and Machine                Clustering using Hybrid Fuzzy c-means and Fuzzy
    Intelligence, IEEE Transactions Vol. 6, No.1, pp:81–87,           Particle Swarm Optimization", World Congress on
    1984. doi: 10.1109/TPAMI.1984.4767478                             Nature & Biologically Inspired Computing, NaBIC
[17] Yuhui Shi, Russell C. Eberhart, "Parameter Selection in          2009. In Proc. NaBIC, pp.1690-1694, 2009. doi:
    Particle Swarm Optimization". The 7th Annual                      10.1109/NABIC.2009.5393618
    Conference on Evolutionary Programming, San Diego,
    pp. pp 591-600, 1998. doi: 10.1007/BFb0040810


                                                        How to cite
   Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem, " A PSO-Based Subtractive Data
   Clustering Algorithm ". International Journal of Research in Computer Science, 3 (2): pp. 1-9, March 2013. doi:
   10.7815/ijorcs. 32.2013.060




                                                                               www.ijorcs.org

Más contenido relacionado

La actualidad más candente

PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somArchiLab 7
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1abc
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory basedijaia
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingIAEME Publication
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andeSAT Publishing House
 

La actualidad más candente (20)

PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
A046010107
A046010107A046010107
A046010107
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the som
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
MCCS
MCCSMCCS
MCCS
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory based
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms and
 

Similar a A PSO-Based Subtractive Data Clustering Algorithm

Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2IAEME Publication
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...ijdmtaiir
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957IJMER
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmIOSR Journals
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 

Similar a A PSO-Based Subtractive Data Clustering Algorithm (20)

Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
D0931621
D0931621D0931621
D0931621
 
50120140505013
5012014050501350120140505013
50120140505013
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
I017235662
I017235662I017235662
I017235662
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...Improve the Performance of Clustering Using Combination of Multiple Clusterin...
Improve the Performance of Clustering Using Combination of Multiple Clusterin...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learning
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution Algorithm
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 

Más de IJORCS

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsIJORCS
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4IJORCS
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemIJORCS
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...IJORCS
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionIJORCS
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicIJORCS
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2IJORCS
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1IJORCS
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessIJORCS
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...IJORCS
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...IJORCS
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsIJORCS
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...IJORCS
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...IJORCS
 
Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3IJORCS
 

Más de IJORCS (20)

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition System
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression Function
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State Logic
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETs
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
 
Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3Call for papers, IJORCS, Volume 3 - Issue 3
Call for papers, IJORCS, Volume 3 - Issue 3
 

Último

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

A PSO-Based Subtractive Data Clustering Algorithm

  • 1. International Journal of Research in Computer Science eISSN 2249-8265 Volume 3 Issue 2 (2013) pp. 1-9 www.ijorcs.org, A Unit of White Globe Publications doi: 10.7815/ijorcs. 32.2013.060 A PSO-BASED SUBTRACTIVE DATA CLUSTERING ALGORITHM Mariam El-Tarabily1, Rehab Abdel-Kader2, Mahmoud Marie3, Gamal Abdel-Azeem4 124 Electrical Engineering Department, Faculty of Engineering - Port-Said, Port-Said University, EGYPT E-mail: 1mariammokhtar75@hotmail.com, 2r.abdelkader@eng.psu.edu.eg, 4gamalagag@hotmail.com 3 Computers and Systems Engineering Department, Faculty of Engineering, Al-Azhar University, Cairo, EGYPT E-mail: mahmoudim@hotmail.com Abstract: There is a tremendous proliferation in the 18]. There are two major clustering techniques: amount of information available on the largest shared “Partitioning” and “Hierarchical” [2, 9]. In hierarchical information source, the World Wide Web. Fast and clustering, the output is a tree showing a sequence of high-quality clustering algorithms play an important clustering with each clustering being a partition of the role in helping users to effectively navigate, data set. On the other hand, Partitioning clustering [1] summarize, and organize the information. Recent algorithms partition the data set into a specified studies have shown that partitional clustering number of clusters. These algorithms try to minimize a algorithms such as the k-means algorithm are the most certain criteria (e.g. a square error function) and can popular algorithms for clustering large datasets. The therefore be treated as optimization problems. major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial In recent years, it has been recognized that the partitions and are prone to premature converge to partitional clustering technique is well suited for local optima. Subtractive clustering is a fast, one-pass clustering large datasets due to their relatively low algorithm for estimating the number of clusters and computational requirements. The time complexity of cluster centers for any given set of data. The cluster the partitioning technique is almost linear, which estimates can be used to initialize iterative makes it a widely used technique. The best-known optimization-based clustering methods and model partitioning clustering algorithm is the K-means identification methods. In this paper, we present a algorithm and its variants [10]. hybrid Particle Swarm Optimization, Subtractive + Subtractive clustering method, as proposed by Chiu (PSO) clustering algorithm that performs fast [13], is a relatively simple and effective approach to clustering. For comparison purpose, we applied the approximate estimation of cluster centers on the basis Subtractive + (PSO) clustering algorithm, PSO, and of a density measure in which the data points are the Subtractive clustering algorithms on three different considered candidates for cluster centers. This method datasets. The results illustrate that the Subtractive + can obtain initial cluster centers that are required by (PSO) clustering algorithm can generate the most more sophisticated clustering algorithms. It can also be compact clustering results as compared to other used as quick stand-alone method for approximate algorithms. clustering. Keywords: Data Clustering, Subtractive Clustering, Particle Swarm Optimization (PSO) algorithm is a Particle Swarm Optimization, Subtractive Algorithm, population based stochastic optimization technique Hybrid Algorithm. that can be used to find an optimal, or near optimal, solution to a numerical and qualitative problem [4, 11, I. INTRODUCTION 17]. Several attempts were proposed in the literature to Clustering is one of the most extensively studied apply PSO to the data clustering problem [6, 18, 19, research topics due to its numerous important 20, 21]. The major drawback is that the number of applications in machine learning, image segmentation, cluster is initially unknown and the clustering result is information retrieval, and pattern recognition. sensitive to the selection of the initial cluster centroids Clustering involves dividing a set of objects into a and may converge to the local optima. Therefore, the specified number of clusters [14]. The motivation initial selection of the cluster centroids decides the behind clustering a set of data is to find inherent processing of PSO and the partition result of the structure in the data and expose this structure as a set dataset as well. The same initial cluster centroids in a of groups. The data objects within each group should dataset will always generate the same cluster results. exhibit a large degree of similarity while the similarity However, if good initial clustering centroids can be among different clusters should be minimized [3, 9, obtained using any of the other techniques, the PSO www.ijorcs.org
  • 2. 2 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem would work well in refining the clustering centroids to digital pheromones to coordinate swarms within an n- find the optimal clustering centers. The Subtractive dimensional design space to improve the search clustering algorithm can be used to generate the efficiency and reliability. In [28] a hybrid fuzzy number of clusters and a good initial cluster centroids clustering method based on FCM and fuzzy PSO for the PSO. (FPSO) is proposed which make use of the merits of both algorithms. Experimental results show that the In this paper, we present a hybrid Subtractive + proposed method is efficient and can reveal (PSO) clustering algorithm that performs fast encouraging results. clustering. Experimental results indicate that the Subtractive + (PSO) clustering algorithm can find the III. DATA CLUSTERING PROBLEM clustered is represented as a set of vectors X = {𝑥𝑥1 , 𝑥𝑥2 , optimal solution after nearly 50 iterations in …., 𝑥𝑥 𝑛 }, where the vector 𝑥𝑥 𝑖𝑖 corresponds to a single comparison with the ordinary PSO algorithm. The In most clustering algorithms, the dataset to be remainder of this paper is organized as follows: Section 2 provides the related works in data clustering using PSO. Section 3 provides a general overview of object and is called the feature vector. The feature the data clustering problem and the basic PSO vector should include proper features to represent the algorithm. The proposed hybrid Subtractive + (PSO) object. clustering algorithm is described in Section 4. Section The similarity metric: Since similarity is fundamental 5 provides the detailed experimental setup and results to the definition of a cluster, a measure of the for comparing the performance of the Subtractive + similarity between two data sets from the same feature (PSO) clustering algorithm with the Subtractive space is essential to most clustering procedures. algorithm, and PSO. The discussion of the Because of the variety of feature types and scales, the experiment’s results is also presented. Conclusions are compute the similarity between data 𝑚 𝑝 and 𝑚 𝑗 . The distance measure must be chosen carefully. Over the drawn in Section 6. years, two prominent ways have been proposed to II. RELATED WORK most popular metric for continuous features is the The well-known partitioning algorithm is the K- Euclidean distance, given by: 𝑑 𝑚 �𝑚 𝑝𝑘 − 𝑚 𝑗𝑘 � � 2 𝑑(𝑚 𝑝 ,𝑚 𝑗) = �∑ means algorithm [2, 6, 7, 16] and its variants. The 𝑘=1 𝑑 main drawback of the K-means algorithm is that the 𝑚 cluster result is sensitive to the selection of the initial (1) cluster centroids and may converge to the local optimal and it generally requires a prior knowledge of which is a special case of the Minkowski metric [5] , the probable number of clusters for a data collection. 1� given by: 𝐷 𝑛 �𝑚 𝑝 , 𝑚 𝑗 � = �∑ 𝑖𝑖=1 �𝑚 𝑖𝑖.𝑝 − 𝑚 𝑖𝑖,𝑗 � � 𝑑𝑚 𝑛 𝑛 In recent years scientists have proposed several approaches [3] inspired from the biological collective where 𝑚 𝑝 and 𝑚 𝑗 are two data vectors; 𝑑 𝑚 denotes the behaviors to solve the clustering problem, such as (2) dimension number of the vector space; 𝑚 𝑝𝑘 and 𝑚 𝑗𝑘 Genetic Algorithm (GA) [8], Particle Swarm stand for the data 𝑚 𝑝 and 𝑚 𝑗 ’s weight values in Optimization (PSO), Ant clustering and Self- Organizing Maps (SOM) [9]. In [6] authors represented a hybrid PSO+K-means document clustering algorithm that performed fast document dimension k. clustering. The results indicated that the PSO+K- The second commonly used similarity measure in means algorithm can generate the best results in just 50 clustering is the cosine correlation measure [15], given 𝑚 𝑡𝑝𝑚 𝑗 cos (𝑚 𝑝 , 𝑚 𝑗 ) = iterations in comparison with the K-means algorithm by: and the PSO algorithm. Reference [24] proposed a �𝑚 𝑝 ||𝑚 𝑗| where 𝑚 𝑡𝑝 , 𝑚 𝑗 denotes the dot-product of the two data Discrete PSO with crossover and mutation operators of (3) Genetic Algorithm for document clustering. The proposed system markedly increased the success of the clustering problem, it tried to avoid the stagnation vectors; |.| indicates the length of the vector. Both behavior of the particles, but it could not always avoid similarity metrics are widely used in clustering that behavior. In [26] authors investigated the literatures. application of the EPSO to cluster data vectors. The EPSO algorithm was compared against the PSO A. Subtractive Clustering Algorithm clustering algorithm which showed that the EPSO In Subtractive clustering data points are considered convergence is slower to lower quantization error, as candidates for the cluster centers [25]. In this while the PSO convergence is faster to a large method the computation complexity is linearly quantization error. Reference [27] presented a new proportional to the number of data points and approach to particle swarm optimization (PSO) using www.ijorcs.org
  • 3. A PSO-Based Subtractive Data Clustering Algorithm 3 independent of the dimension of the problem under dimensional problem space represents one solution for Consider a collection on n data points {x1 , … , xn } in consideration. the problem. When a particle moves to a new location, a different problem solution is generated. The fitness position for that particle 𝑝 𝑏𝑒𝑠𝑡 and to the fitness of the function is evaluated for each particle in the swarm point xi is defined as an M-dimensional space. Since each data point is a and is compared to the fitness of the best previous candidate for cluster centers, a density measure at data 𝑔 𝑏𝑒𝑠𝑡 . After finding the two best values, the 𝑖 𝑡ℎ 𝐷𝑖𝑖 = ∑ 𝑗=1 𝑒𝑥𝑥𝑝 �− � 𝑛 ∥𝑥 𝑖 −𝑥 𝑗 ∥2 global best particle among all particles in the swarm (𝑟 𝑎 ∕2)2 where ra is a positive constant. Hence, a data point (4) particles evolve by updating their velocities and 𝑣 𝑖𝑖 𝑑 = 𝑤 ∗ 𝑣 𝑖𝑖 𝑑 + 𝑐1∗ 𝑟𝑎𝑛𝑑1 ∗ (𝑝 𝑏𝑒𝑠𝑡 − 𝑥𝑥 𝑖𝑖 𝑑 ) + 𝑐2 positions according to the following equations: neighboring data points. The radius ra defines a ∗ 𝑟𝑎𝑛𝑑2 ∗ (𝑔 𝑏𝑒𝑠𝑡 − 𝑥𝑥 𝑖𝑖 𝑑 ) will have a high density value if it has many radius ra contribute only slightly to the density 𝑥𝑥 𝑖𝑖 𝑑 = 𝑥𝑥 𝑖𝑖 𝑑 + 𝑣 𝑖𝑖 𝑑 neighborhood for a data point, Data points outside the (6) (7) measure. where d denotes the dimension of the problem space; is selected as the first cluster center. Let xc1 be the After calculating the density measure of all data rand1, rand2 are random values in the range of (0, 1). point selected and Dc1 is its corresponding density points, the data point with the highest density measure The random values, rand1 and rand2, are used for the sake of completeness, that is, to make sure that xi are recalculated as follows: particles explore wide search space before converging measure. The density measure Di, for each data point around the optimal solution. c1 and c2 are constants and are known as acceleration coefficients; The values of 𝐷𝑖𝑖 = 𝐷𝑖𝑖 − 𝐷 𝐶1 𝑒𝑥𝑥𝑝 �− 2 � ∥𝑥 𝑖 − 𝑥 𝑐1 ∥2 c1 and c2 control the weight balance of pbest and gbest in 𝑟 � 𝑏�2� (5) deciding the particle’s next movement. w denotes the Where, 𝑟 𝑏 is a positive constant. Therefore, data inertia weight factor; An improvement to original PSO points close to the first cluster center 𝑥𝑥 𝑐1 will have is constituted by the fact that w is not kept constant during execution; rather, starting from a maximal value, it is linearly decremented as the number of 𝑟 𝑏 defines a neighborhood that has a measurable significantly reduced density measure and are unlikely iterations increases down to a minimal value [4], reduction in the density measure. The constant rb is to be selected as the next cluster center. The constant 𝑤 = (𝑤 − 0.4) + 0.4 initially set to 0.9, decreasing to 0.4 according to: (𝑀𝐴𝑋𝐼𝑇𝐸𝑅 – 𝐼𝑇𝐸𝑅𝐴𝑇𝐼𝑂𝑁) normally larger than 𝑟 𝑎 to prevent closely spaced 𝑀𝐴𝑋𝐼𝑇𝐸𝑅 cluster centers; generally 𝑟 𝑏 is equal to 1.5 𝑟 𝑎 , as (8) MAXITER is the maximum number of iterations, suggested in [25]. and ITERATION represents the current number of recalculated, the next cluster center 𝑥𝑥 𝑐2 is selected and iterations. The inertia weight factor w provides the After the density measure for each data point is necessary diversity to the swarm by changing the momentum of particles to avoid the stagnation of the density measures for all data points are particles at the local optima. The empirical research recalculated. This iterative process is repeated until a conducted by Eberhart and Shi shows improvement of sufficient number of cluster centers are generated. search efficiency through gradually decreasing the When applying subtractive clustering to a set of value of inertia weight factor from a high value during input-output data, each of the cluster centers represent the search. current coordinate 𝑥𝑥 𝑖𝑖 𝑑 , and its velocity 𝑣 𝑖𝑖 𝑑 that a prototype that exhibits certain characteristics of the Equation 6 indicates that each particle records its system to be modeled. These cluster centers would be reasonably used as the initial clustering centers for indicates the speed of its movement along the PSO algorithm. particle’s current velocity, 𝑣 -vector, to its location, 𝑥𝑥 - dimensions in a problem space. For every generation, the particle’s new location is computed by adding the B. PSO Algorithm PSO was originally developed by Eberhart and vector. Kennedy in 1995 based on the phenomenon of collective intelligence inspired by the social behavior The best fitness values are updated at each 𝑃𝑖𝑖 (𝑡) 𝑓�𝑋𝑖𝑖 (𝑡 + 1)� ≤ 𝑓( 𝑋𝑖𝑖 (𝑡)) of bird flocking or fish schooling [11]. In the PSO generation, based on 𝑃𝑖𝑖 (𝑡 + 1) = � � 𝑋𝑖𝑖 (𝑡 + 1) 𝑓(𝑋𝑖𝑖 (𝑡 + 1)) > 𝑓(𝑋𝑖𝑖 (𝑡)) algorithm, the birds in a flock are symbolically represented by particles. These particles can be (9) considered as simple agents “flying” through a problem space. A particle’s location in the multi- www.ijorcs.org
  • 4. 4 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem It is possible to view the clustering problem as an vector space at each generation. The average distance optimization problem that locates the optimal centroids of data objects to the cluster centroid is used as the of the clusters rather than finding an optimal partition fitness value to evaluate the solution represented by [18, 20, 21, 22]. This view offers us a chance to apply each particle. The fitness value is measured by the PSO optimal algorithm on the clustering solution. The equation below: 𝑝 ∑ 𝑖 𝑑(𝑜 𝑖 ,𝑚 𝑖𝑗 ) ∑ 𝑖=1� � 𝑁𝑐 𝑗=1 PSO clustering algorithm performs a globalized search 𝑓 = 𝑝𝑖 in the entire solution space [4, 17]. Utilizing the PSO 𝑁𝑐 algorithm’s optimal ability, if given enough time, the where mij denotes the 𝑗 𝑡ℎ data vector, which belongs proposed hybrid Subtractive+(PSO) clustering (10) to cluster i ; 𝑜𝑖𝑖 is the centroid vector of 𝑖 𝑡ℎ cluster; algorithm can yield more compact clustering results d(𝑜 𝑖𝑖 ,𝑚 𝑖𝑖 𝑗 ) is the distance between data point 𝑚 𝑖𝑖 𝑗 and compared to traditional PSO clustering algorithm. the cluster centroid 𝑜𝑖𝑖 ; 𝑝𝑖𝑖 stands for the data number, However, in order to cluster the large datasets, PSO which belongs to clusterci ; 𝑁 𝑐 stands for the cluster requires much more iteration (generally more than 500 iterations) to converge to the optima than the hybrid Subtractive + (PSO) clustering algorithm does. Although the PSO algorithm is inherently parallel and number. can be implemented using parallel hardware, such as a computer cluster, the computation requirement for In the hybrid Subtractive + (PSO) clustering clustering extremely huge datasets is still high. In algorithm, the Subtractive algorithm is used at the terms of execution time, hybrid Subtractive + (PSO) initial stage to help discovering the vicinity of the clustering algorithm is the most efficient for large optimal solution by suggesting good initial cluster dataset [1]. centers and the number of clusters. The result from Subtractive algorithm is used as the initial seed of the IV. HYBRID SUBTRACTIVE + (PSO) PSO algorithm, which is applied for refining and CLUSTERING ALGORITHM generating the final result using the global search capability of PSO. The flow chart of the hybrid In the hybrid Subtractive + (PSO) clustering Subtractive + (PSO) is depicted graphically in Figure algorithm, the multidimensional vector space is 1. modeled as a problem space. Each vector can be represented as a dot in the problem space. The whole V. EXPERIMENTAL STUDIES dataset can be represented as a multiple dimension space with a large number of dots in the space. The The main purpose in this paper is to compare the hybrid Subtractive + (PSO) clustering algorithm quality of the PSO and hybrid Subtractive + (PSO) includes two modules, the Subtractive clustering clustering algorithm, where the quality of the module and PSO module. At the initial stage, the clustering is measured according to the intra–cluster Subtractive clustering module is executed to search for distances, i.e. the distance between the data vectors the clusters’ centroid locations and the suggested and the cluster centroid within a cluster, where the number of clusters. This information is transferred to objective is to minimize the intra-cluster distances. the PSO module for refining and generating the final Clustering Problem: We used three different data optimal clustering solution. collections to compare the performance of the PSO The Subtractive clustering module: The Subtractive and hybrid Subtractive + (PSO) clustering algorithms. clustering module predicts the optimal number of These datasets are downloaded from Machine clusters and finds the optimal initial cluster centroids Learning Repository site [23]. A description of the test for the next phase. datasets is given in Table 1. In order to reduce the impact of the length variations of different data, each The PSO clustering module: In the PSO clustering data vector is normalized so that it is of unit length. algorithm, the whole dataset can be represented as a Table 1: Summary of datasets multiple dimension space with a large number of dots particle maintains a matrix X i= (C1 , C2 , …, Ci , .., Ck ), in space. One particle in the swarm represents one Number of Number of Number of where Ci represents the ith cluster centroid vector and possible solution for clustering the dataset. Each Instances Attributes classes Iris 150 4 3 Wine 178 13 3 k represent the total number of clusters. According to Yeast 1484 8 10 its own experience and those of its neighbors, the particle adjusts the centroid vector position in the www.ijorcs.org
  • 5. A PSO-Based Subtractive Data Clustering Algorithm 5 Start Calculate the density function for each data point 𝑥𝑥 𝑖𝑖 using equation 4. Subtractive clustering module Choose the cluster center which has the highest density function. Recalculate the density function for each data point 𝑥𝑥 𝑖𝑖 using equation 5. sufficient NO number of cluster centers YES Inherit cluster centroid vectors and the number of clusters into the particles as an initial seed. Assign each data point vector in the data set to the closest centroid vector for each particle Calculate the fitness value based on equation 10. Use the velocity and particle position to update equations 6 and 7 and generate the next solutions The maximum number of iterations is exceeded? NO or The average change in centroid vectors is less YES PSO clustering module STOP Figure 1: The flowchart of hybrid Subtractive + (PSO) Experimental Setting: In this section we present the database and 50 particles in the Yeast database. All experimental results of the hybrid Subtractive + (PSO) these values were chosen to ensure good convergence. clustering algorithm. For the sake of comparison, we also include the results of the Subtractive and PSO Results and Discussion: The fitness equation (10) is clustering algorithms. In our case the Euclidian used not only in the PSO algorithm for the fitness distance measure is used as the similarity metrics in value calculation, but also in the evaluation of the cluster quality. It indicates the value of the average swarm with the result of the subtractive algorithm. 𝑐1 = each algorithm. The performance of the clustering distance between a data point and the cluster centroid 𝑐2 = 1.49 and w inertia weight is according to equation algorithm can be improved by seeding the initial to which they belong. The smaller the value, the more compact the clustering solution is. Table 2 (8). These values are chosen respectively based on the demonstrates the experimental results by using the results reported in [17]. We choose number of particles Subtractive, PSO and Subtractive + (PSO) clustering as a function of number of classes. In Iris plants algorithm respectively. For an easy comparison, the database we chose 15 particles, 15 particles in Wine PSO and hybrid Subtractive + (PSO) clustering www.ijorcs.org
  • 6. 6 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem algorithm runs 200 iterations in each experiment. For Table 2: Performance comparison Subtractive, PSO, all the result reported, averages over more than ten Subtractive+(PSO) simulations are given in Table 2. To illustrate the convergence behavior of different clustering Fitness value algorithms, the clustering fitness values at each Subtractive PSO Subtractive + PSO iteration are recorded when these two algorithms are Iris 6.12 6.891 3.861 applied on datasets separately. As shown in Table 2, Wine 2.28 2.13 1.64 the Subtractive + (PSO) clustering approach generates the clustering result that has the lower fitness value for Yeast 1.30 1.289 0.192 all three datasets using the Euclidian similarity metric, Figure 2 shows the suggested cluster centers done The results from the Subtractive+(PSO) approach have by the Subtractive clustering algorithm, the cluster improvements compared to the results of the PSO centers appear in black as shown figure. These centers approach. will be used as the initial seed of the PSO algorithm. Figures 3, 4, 5 illustrate the convergence behaviors of the two algorithms on the three datasets using the Euclidian distance as a similarity metric. 4.5 data points Iris data points suggested clusters 4 3.5 Y 3 2.5 2 4 4.5 5 5.5 6 6.5 7 7.5 8 X 0.015 Wine data points data points suggested clusters 0.01 0.005 0 0 0.01 0.02 0.03 0.04 0.05 X 0.8 data points Yeast data points 0.7 suggested clusters 0.6 0.5 Y 0.4 0.3 0.2 0.1 0 0.2 0.4 0.6 0.8 Figure 2: Suggested cluster centers by the Subtractive clustering algorithm www.ijorcs.org
  • 7. A PSO-Based Subtractive Data Clustering Algorithm 7 7.5 7 6.5 6 PSO Algorithm Hybrid Subtractive PSO Algorithm gbest 5.5 5 4.5 4 3.5 0 10 20 30 40 50 60 70 80 90 100 iteration Figure 3: Algorithm convergence for Iris database 2.5 PSO Algorithm 2.4 Hybrid Subtractive PSO Algorithm 2.3 2.2 2.1 gbest 2 1.9 1.8 1.7 1.6 0 20 40 60 80 100 120 140 160 180 200 iteration Figure 4: Algorithm convergence for Wine database www.ijorcs.org
  • 8. 8 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem 1.4 1.2 PSO Algorithm Hybrid Subtractive PSO Algorithm 1 0.8 gbest 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 200 iteration Figure 5: Algorithm convergence for Yeast database Subtractive PSO Algorithm, 𝑔 𝑏𝑒𝑠𝑡 is the fitness of the In all the previous figures representing Hybrid algorithm, PSO can conduct a globalized searching for the optimal clustering, but it requires more iteration global best particle among all particles in the swarm numbers. The subtractive clustering helps the PSO to particle that was inherited with cluster centroid vectors start with good initial cluster centroid to converge from Subtractive algorithm. Now, we notice that the faster with small fitness function which means a more Subtractive + PSO algorithm has a good start and it compact result. The algorithm includes two modules, converges quickly with lower fitness function. As the Subtractive module and the PSO module. The shown in Figure 3, the fitness value of the Subtractive Subtractive module is executed at the initial stage to + PSO algorithm starts with 6.1 and it reduced sharply discover good initial cluster centroids. The result from from 6.1 to 3.8 within 25 iterations and fixed at 3.68. the Subtractive module is used as the initial seed of the The PSO algorithm starts with 7.4, the reduction of the PSO module to discover the optimal solution by a fitness value in PSO is not as sharp as in Subtractive + global search and at the same time to avoid consuming PSO and becomes smoothly after 55 iterations. The high computation. The PSO algorithm will be applied same happened in the Figures 4 and 5. The Subtractive for refining and generating the final result. + PSO algorithm shows good improvement for large Experimental results illustrate that using this hybrid dataset as shown in Figure 5. This indicates that upon Subtractive + PSO algorithm can generate better termination the Subtractive + PSO yield minimal clustering results compared to using ordinary PSO. fitness values. Therefore, the proposed algorithm is an efficient and effective solution to the data clustering VII. REFERENCES problem. [1] Khaled S. Al-Sultana, M. Maroof Khan, "Computational experience on four algorithms for the VI. CONCLUSION hard clustering problem". Pattern Recognition Letter, Vol.17, No.3, pp.295–308, 1996. doi: 10.1016/0167- This paper investigated the application of the 8655(95)00122-0 Subtractive + PSO algorithm, which is a hybrid of [2] Michael R. Anderberg , "Cluster Analysis for PSO and Subtractive algorithms to cluster data vectors. Applications". Academic Press Inc., New York, 1973. Subtractive clustering module is executed to search for the cluster's centroid locations and the suggested [3] Pavel Berkhin, "Survey of clustering data mining number of clusters. This information is transferred to techniques". Accrue Software Research Paper, pp.25- 71, 2002. doi: 10.1007/3-540-28349-8_2 the PSO module for refining and generating the final optimal clustering solution. In the general PSO www.ijorcs.org
  • 9. A PSO-Based Subtractive Data Clustering Algorithm 9 [4] A. Carlisle, G. Dozier, "An Off-The- Shelf PSO". In [18] Michael Steinbach, George Karypis, Vipin Kumar, "A Proceedings of the Particle Swarm Comparison of Document Clustering Techniques". Optimization Workshop, 2001, PP: 1-6. TextMining Workshop, KDD, 2000. [5] Krzysztof J. Cios, Witold Pedrycz, Roman W. [19] Razan Alwee, Siti Mariyam, Firdaus Aziz, K.H.Chey, Swiniarski, "Data Mining – Methods for Knowledge Haza Nuzly, "The Impact of Social Network Structure Discovery". Kluwer Academic Publishers, 1998. doi: in Particle Swarm Optimization for Classification 10.1007/978-1-4615-5589-6 Problems". International Journal of Soft Computing , [6] X. Cui, P. Palathingal, T.E. Potok, "Document Vol. 4, No. 4, 2009, pp:151-156. Clustering using Particle Swarm Optimization". IEEE [20] Van D. M., Engelbrecht. A.P., "Data clustering using Swarm Intelligence Symposium 2005, Pasadena, particle swarm optimization". Proceedings of IEEE California, pp. 185 - 191. doi: Congress on Evolutionary Computation 2003, Canbella, 10.1109/SIS.2005.1501621 Australia. pp: 215-220. doi: [7] Eberhart, R.C., Shi, Y. "Comparing Inertia Weights and 10.1109/CEC.2003.1299577 Constriction Factors in Particle Swarm Optimization". [21] Sherin M. Youssef, Mohamed Rizk, Mohamed El- Congress on Evolutionary Computing, vol. 1, 2000, pp: Sherif, "Dynamically Adaptive Data Clustering Using 84-88. doi: 10.1109/CEC.2000.870279 Intelligent Swarm-like Agents". International Journal of [8] Everitt, B. "Cluster Analysis". 2nd Edition, Halsted Mathematics and Computer in simulation, Vol. 1, No.2, Press, New York, 1980. 2007. [9] A. K. Jain , M. N. Murty , P. J. Flynn, "Data Clustering: [22] Rehab F. Abdel-Kader, "Genetically Improved PSO A Review". ACM ComputingSurvey, Vol. 31, No. 3, Algorithm for Efficient Data Clustering". Proceeding pp: 264-323, 1999. doi: 10.1145/331499.331504 Second International Conference on Machine Learning and Computing 2010, pp.71-75. doi: [10] J. A. Hartigan, "Clustering Algorithms". John Wiley 10.1109/ICMLC.2010.19 and Sons, Inc., New York, 1975. [23] UCI Repository of Machine Learning Databases. [11] Eberhart RC, Shi Y, Kennedy J, "Swarm Intelligence". http://www.ics.uci .edu/~mlearn/MLRepository.html . Morgan Kaufmann, New York, 2001. [24] K. Premalatha, A.M. Natarajan, "Discrete PSO with GA [12] Mahamed G. Omran, Ayed Salman, Andries P. Operators for Document Clustering". International Engelbrecht, "Image classification using particle swarm Journal of Recent Trends in Engineering, Vol. 1, No. 1, optimization". Proceedings of the 4th Asia-Pacific 2009. Conference on Simulated Evolution and Learning 2002, Singapore, pp: 370-374. doi: [25] JunYing Chen, Zheng Qin, Ji Jia, "A Weighted Mean 10.1142/9789812561794_0019 Subtractive Clustering Algorithm". Information Technology Journal, No. 7, pp.356-360, 2008. doi: [13] S. L. Chiu, "Fuzzy model identification based on cluster 10.3923/itj.2008.356.360 estimation". Journal of Intelligent and Fuzzy Systems, Vol. 2, No. 3, 1994. [26] Neveen I. Ghali, Nahed El-dessouki, Mervat A. N, Lamiaa Bakraw, "Exponential Particle Swarm [14] Salton G. and Buckley C., "Term-weighting approaches Optimization Approach for Improving Data in automatic text retrieval". Information Processing and Clustering". International Journal of Electrical & Management, Vol. 24, No. 5, pp: 513-523, 1988. doi: Electronics Engineering, Vol. 3, Issue 4, May 2009. 10.1016/0306-4573(88)90021-0 [27] Vijay Kalivarapu, Jung-Leng Foo, Eliot Winer, [15] Song Liangtu, Zhang Xiaoming, "Web Text Feature "Improving solution characteristics of particle swarm Extraction with Particle Swarm Optimization". IJCSNS optimization using digital pheromones". Structural and International Journal of Computer Science and Network Multidisciplinary Optimization - STRUCT Security, Vol. 7, No. 6, 2007. MULTIDISCIP OPTIM, Vol. 37, No. 4, pp: 415-427, [16] Selim, Shokri Z., "K-means type algorithms: A 2009. doi: 10.1007/s00158-008-0240-9 generalized convergence theorem and characterization [28] H. Izakian, A. Abraham, and V. Snásel, "Fuzzy of local optimality". Pattern Analysis and Machine Clustering using Hybrid Fuzzy c-means and Fuzzy Intelligence, IEEE Transactions Vol. 6, No.1, pp:81–87, Particle Swarm Optimization", World Congress on 1984. doi: 10.1109/TPAMI.1984.4767478 Nature & Biologically Inspired Computing, NaBIC [17] Yuhui Shi, Russell C. Eberhart, "Parameter Selection in 2009. In Proc. NaBIC, pp.1690-1694, 2009. doi: Particle Swarm Optimization". The 7th Annual 10.1109/NABIC.2009.5393618 Conference on Evolutionary Programming, San Diego, pp. pp 591-600, 1998. doi: 10.1007/BFb0040810 How to cite Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem, " A PSO-Based Subtractive Data Clustering Algorithm ". International Journal of Research in Computer Science, 3 (2): pp. 1-9, March 2013. doi: 10.7815/ijorcs. 32.2013.060 www.ijorcs.org