SlideShare una empresa de Scribd logo
1 de 5
Descargar para leer sin conexión
J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.884-888
884 | P a g e
Clustering on High Dimensional data Using Locally Linear
Embedding (LLE) Techniques
First Author, J.Shalini, Second Author, R.Jayasree, Third
Author,K.Vaishnavi
Assistant Professor,Department Of Computer Science
Abstract
Clustering is the task of grouping a set of
objects in such a way that objects in the same group
(called cluster) are more similar (in some sense or
another) to each other than to those in other groups
(clusters). The dimension can be reduced by using
some techniques of dimension reduction. Recently
new non linear methods introduced for reducing the
dimensionality of such data called Locally Linear
Embedding (LLE).LLE combined with K-means
clustering in to coherent frame work to adaptively
select the most discriminant subspace. K-means
clustering use to generate class labels and use LLE
to do subspace selection.
Keywords - Clustering, High Dimension Data,
Locally Linear Embedding, k-means clustering,
Principal Component Analysis
I. INTRODUCTION
High dimensional datasets present many
mathematical challenges as well as some
opportunities to bound to give rise to new
theoretical development [23]. The problem is
especially severe when large databases with many
features are searched for patterns without filtering of
important features based on prior knowledge. The
growing importance of knowledge discovery and
data mining methods in practical applications has
made the feature selection/extraction problem.
II BACKGROUND STUDY
Developing effective clustering methods
for high dimensional datasets is a challenging
problem due to the curse of dimensionality. There
are many approaches to address the problem of
curse of dimensionality. The simplest approach of
dimension reduction techniques [6], principal
component analysis (PCA) (Duda et al., 2000;
Jolliffe, 2002) and random projections (Dasgupta,
2000). In PCA method, dimension reduction is
carried out as a preprocessing step and is decoupled
from the clustering process. Once the subspace
dimensions are selected, they stay fixed during the
clustering process. This is The main draw back of
PCA [11] [27]. An extension of this approach is
Locally Linear Embedding Techniques.
III PROPOSED WORK OF LLE
The proposed work is carried out by
collecting a large volume of high dimensional
unsupervised gene datasets, by using dimension
reduction techniques such as, LLE describe locally
linear embedding (LLE) an unsupervised learning
algorithm that computes low dimensional and
neighborhood preserving embeddings of high
dimensional data. Generating highly nonlinear
embeddings—do not involve local minima. The
work is implemented in Mat Lab 7.0. The reduction
can be done with the combination of LLE + k-means
clustering.
A. LLE
LLE is commonly called as Locally Linear
Embedding; it is one of the dimension reduction
techniques [23]. The LLE algorithm of Roweis and
Saul (2000) it involves mapping high-dimensional
inputs into a low dimensional ―description‖.
B. STEPS OF LLE WORKING
(1) Assign neighbors to each data point W Xi (for
example by using the K nearest neighbors).
(2) compute the weights Wij that best linearly
reconstruct Xi from its neighbors, solving the
constrained least-squares problem.
(3) Compute the low-dimensional embedding
vectors Yi best reconstructed by Wij,
minimizing the smallest eigenmodes of the
sparse symmetric matrix in Although the
weights Wij and vectors Yi are computed by
methods in linear algebra, the constraint that
points are only
Fig 1. Clusters
Reconstruction errors are measured by the cost
function.
The weights Wij summarize the contribution of the
jth data point to the ith reconstruction. To compute
J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.
885 | P a g e
the weights Wij, we minimize the cost space with as
many function subject to two constraints: first, that
each data point Xi is reconstructed only from its
neighbors. The optimal weights Wij subject to these
constraints are found by solving a least-squares
problem.
This cost function, like the previous one, is
based on locally linear reconstruction errors, but
here we fix the weights Wij while optimizing the
coordinates Yi. The embedding cost in Eq. 2 defines
a quadratic form in the vectors W Yi. Subject to
constraints that make the problem well-posed, it can
be minimized by solving a sparse N * N eigenvalue
problem[36]. In this work experiments, data points
were reconstructed from their K nearest neighbors,
as measured by Euclidean distance.
C. LLE Algorithms
IV. LLE+K-Means
Roweis and Saul (2000) have recently
proposed [36]an unsupervised learning algorithm of
low dimensional manifolds, whose goal is to recover
the non linear structure of high dimensional data.
The idea behind their local linear embedding
algorithm is that nearby points in the high
dimensional space remain nearby and similarly co-
located in the low dimensional embedding. Starting
from this intuition each data point xi (i = 1; n) is
approximated by a weighted linear combination of
its neighbors (from the nature of these local and
linear reconstructions the algorithm derives its
name). In its base formulation, the LLE algorithm
finds the linear coefficients wij by minimizing the
reconstruction
wh
ich k ¢ k is the Euclidean norm. For each xi the
weights wig are enforced to sum to one and are
equal to zero if a point ax does not belong to the set
of k neighbors for xi. The same weights that
characterize the local geometry in the original data
space are supposed to reconstruct the data points in
the low dimensional embedding. The n coordinates
are then estimated by minimizing the cost function:
for reconstructing it by its k neighbors, as in the first
step of LLE. LLE is an unsupervised, non-iterative
method, which avoids the local minima problems
plaguing many competing methods (e.g. those based
on the EM algorithm)
This formula is applied in this proposal on
five available data sets: namely Iris, Wine, Zoo,
Cancer, and Drosophila. Consider a set of input data
vectors X =(x1, ··· ,xn) in high dimensional space.
For simplicity, the data is centered in the
preprocessing step, so that the
standard K-means clustering is to minimize the
clustering objective function [11].
Where the matrix is the cluster
indicator: if xi belongs to the k-th cluster,
. Suppose the data consist of real-valued
vectors , each of dimensionality D, sampled
from some smooth underlying manifold. It
characterizes the local geometry of these patches by
linear coefficients that reconstruct each data point
from its neighbors. In the simplest formulation of
LLE, one identifies nearest neighbors per data point,
as measured by Euclidean distance.
V. EXPERIMENT AND RESULT
In this section it describes about the
experiments evaluate the Performance of LLE and
LLE +k -means in Dimension reduction and
compare with LDA+K-means, used widely in gene
data sets.
VI. DATASET DESCRIPTION
The proposed work has been implemented
and tested on five public available different gene
data sets namely Iris, Breast Cancer, Drosophila,
Wine, and Zoo.
The descriptions of these datasets are as
follows.
Five datasets including Iris, Wine, Drosophila,
cancer and Zoo
 Zoo gene describes the animal’s hair,
feathers, eggs, milk, airborne, back bone, fins, tails
etc.
 Iris gene describes the flowers sepal length,
sepal width, petal length, petal width it’s a
multivariate. The attribute used is real.
1. Compute the neighbors of each data point Xi
2. Compute the Weights Wij that best reconstruct
each data point Xi from its neighbors
Minimizing the cost by constraints linear fits.
3. Compare the vectors Yi best reconstructed by
the weights Wij Minimizing the quadratic
form by its bottom nonzero eigen vectors.
J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.
886 | P a g e
 Drosophila describes concise genomic,
proteomic, transcriptomic, genetic and functional
information on all known and predicted human
genes.
 Breast Cancer describes tumour size, Class:
no-recurrence-events, recurrence-events, age,
menopause, and inv-nodes6. Node-caps, Breast-
quad: left-up, left-low, right-up, right-low, central.
Wine data describes the attribute about Alcohol,
Malic acid, Ash Alkalinity of ash, Magnesium,
Total phenols, Flavanoids, Nonflavanoidphenols,
Proanthocyanins.
Table 1: Data set Description
Table 2.
Clustering accuracy table on UCI dataset. The
results are obtained by averaging5 trials of
LLE+ K-means
Fig 2: k-means clustering with Iris
Fig 3: LDA+ k-Means with Iris Data set
Fig 4: LLE+k-means clustering
Fig 5: K-Means clustering with wine Data set
With Iris Dataset
Fig 6: LDA-k means clustering
Fig7: LLE+K means clustering with wine Dataset
VII. RESULT ANALYSIS
Datasets #Samples #Dimensions #Class
Iris 150 4 3
Wine 178 13 3
Zoo 101 18 7
Breast
Cancer 286 9 4
Drosophila 163 4 3
J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.
887 | P a g e
All the above datasets have labels. View all the
labels of the datasets as the objective knowledge on
the structure of the datasets and use accuracy as the
clustering performance measure. Accuracy
discovers the one-to-one relationship between
clusters and classes and measures the extent to
which each cluster contains data points from the
corresponding class and it has been used as
performance measures for clustering analysis.
Accuracy can be described
Where n is the number of data points, Ck denotes
the k-the cluster, and Lm is the m-the
class. Is the number of data points that
belong to class m are assigned to cluster k. Accuracy
is then computed as the maximum sum of
for all pairs of clusters and classes, and
these pairs have no overlaps. On the five datasets
data repository, it compares the LLE-Km algorithm
with standard K-means algorithm. This work is also
comparing it with PCA-based clustering algorithm:
clustering. This shows that LLE-Km clustering is
viable and competitive. The subspace clustering is
able to perform the subspace selection and data
reduction. Note that LLE-Km is also able to
discover clusters in the low dimensional subspace to
overcome the curse of dimensionality.
IX. CONCLUSION
The performance of dimension reduction
techniques with k-means clustering method using
real different gene data sets was compared. In this
thesis, considered various properties of data sets and
data preprocessing procedures, and have confirmed
the importance of preprocessing data prior to
performing core cluster analysis. All clustering
methods were affected by data dimension and data
characteristics, such as the overlapping between
clusters and the presence of noise. In particular the
LLE + k m which are commonly used to reduce the
dimension of data, The future extension of this work
can be done via some other preprocessing
techniques.
REFERENCES
[1] Alone et alum Broad Pattern of gene
expression revealed by clustering analysis
of tumor and normal colon cancer tissues
probed by oligonucleatide arrays. PNAS,
96(12): 6745-6750, 1999.
[2] Anton Schwaighofer. Matlab interface to
svmlight.In (2004)
[3] Banerjee, A., Dhillon, I., Ghosh, J.,
Merugu, S., & Modha, D. (2004). A
generalized maximum entropy approach to
bregman co-clustering and matrix
approximation. Proc. ACM Int’l Conf
Knowledge Disc. Data Mining (KDD).
[4] arbara.D,‖ An Introduction to Cluster
Analysis for Data mining‖,
[5] room Head .D.S, Kirby, A new approach to
dimensionality reduction:Theory and
algorithms, SIAM journal of applied
Mathematics 60 (6) (2000)2114-2142.
[6] Carreira-Perpinan.M.A, Areview of
dimension reduction techniques. Technical
report CS-96-09, Department of Computer
Science, University of Sheffield, 1997.
[7] Cheng, Y., & Church, G. (2000).
Biclustering of expression data. Proc. Int’l
Symp. Mol. Bio (ISMB), 93–103.
[8] Chrisding et al ―K-means clustering Via
PCA‖ 21st
International Conference on
Machine Learning, Canada-2004.
[9] Diamantras. K.I and Kung.S-Y.Principal
Component Neural Networks. Theory and
Applications. John Wiley & Sons, New
York, Londin, Sydney, 1996.
[10] Ding, C., & He, X. (2004). K-means
clustering and principal component
analysis. Int’l Conf. Machine Learning
(ICML)
[11] Ding, C., He, X., Zha, H., & Simon, H.
(2002). Adaptive dimension reduction for
clustering high dimensional data. Proc.
IEEE Int’l Conf. Data Mining
[12] De la Torre, F., & Kanade, T. (2006).
Discriminative cluster analysis. Proc. Int’l
Conf. Machine Learning.
[13] D. DeMers and G.W. Cottrell. Non-linear
dimensionality reduction. In C.L. Giles,
S.J. Hanson, and J.D. Cowan, editors,
Advances in Neural Information
Processing Systems 5, pages 580–587.
Morgan Kaufmann, San Mateo, CA, 1993.
[14] Fukunaga .K, Olsen .D.R, An Algorithm
for Finding intrinsic dimensionality of data,
IEEE Transactions on Computers 20 (2)
(1976) 165-171.
[15] Halkidi.M, Batistakis.Y.―On Clustering
Validation Techniques‖, 2001.
[16] Han.J and Kamber.M. ―Data Mining
Concepts and Techniques‖, the Morgan
Kaufmann Publisher, August 2000. ISBN
1-55860-489-8.
[17] Hartigan .J.A, Wang.M.A, ―A K-Means
Clustering Algorithm‖, Appl.Stat, Vol.28,
1979, pp.100-108.
[18] Hartuv.E, Schmitt. A, Lange.J, ―An
Algorithm for Clustering cDNAS for Gene
Expression Analysis‖, Third International
Conference on Computational Molecular
Biology (RECOMB) 1999.
[19] Hotelling.H. ―Analysis of a complex of
statistical variables into principal
J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.
888 | P a g e
components‖. Journal of Educational
Psychology, 24:417–441, 1933
[20] Jackson .J.E. ―A User's Guide to Principal
Components‖. New York: John Wiley and
Sons, 1991.
[21] Jain .A.K, Dubes .R.C Algorithm for
clustering Data, Prentice Hall, 1988.
[22] Jain .A.K, Murty.M.N, Flynn.P.J. ―Data
Clustering: A Review‖ .ACM Computer
Surveys, Vol.31, No.3, 1999.
[23] John stone I.M(2009) ―Non-Linear
dimensionality reduction by LLE‖.
[24] Jolliffe, I. (2002). ―Principal component
analysis”. Springer. 2nd
Edition.
[25] Jolliffe. I.T. ―Principal Component
Analysis‖. Springer-Verlag, 1986.
[26] Jolliffe I.T. ―Principal Component
Analysis” (Springer-Verlag, New York,
1989).
[27] Kambhatla.N and Leen. T. K. ―Dimension
reduction by local principal component
analysis‖. Neural Computation 9, 1493–
1516 (1997).
[28] Kaski. S. ―Dimensionality reduction by
random mapping: fast similarity
computation for clustering‖. Proc. IEEE
International Joint Conference on Neural
Networks, 1:413-418, 1998.
[29] Kaufman.L, Rousseeuw.P.J, ―Finding
Groups in Data: An Introduction to Cluster
Analysis‖. John Wiley, New York, 1990.
[30] Kirby.M, Geometric Data Analysis: ―An
Empirical Approach to Dimensionality
Reduction n and the Study of Patterns‖,
John Wiley and Sons, 2001.
[31] Kohavi.R and John. G.The wrapper
approach. In H. Liu and H. Motoda,
editors, Feature Extraction, Construction
and Selection: ―Data Mining Perspective‖.
Springer Verlag, 1998.
[32] O. Kouropteva, O. Okun, and M.
Pietikäinen. Selection of the optimal
parameter value for the locally linear
embedding algorithm. 2002. Submitted to
1st International Conference on Fuzzy
Systems and Knowledge Discovery.
[33] M. Kramer. Nonlinear principal component
analysis using auto associative neural
networks. AIChE Journal 37, 233–243
(1991).
[34] Lee.Y and Lee.C.K, ―Classification on
multiple cancer types by multicategory
support vector machines using gene
expression data,‖ Bioinformatics, vol.19,
pp.1132-1139, 2003
[35] Li. K.-C. High dimensional data analysis via
the SIR/PHD approach., April 2000.
Lecture notes in progress.
[36] Li, T., Ma, S., & Ogihara, M. (2004).
Document clustering via Adaptive
subspace iteration. Proc. conf. Research
and development in IR (SIRGIR) (pp. 218–
225).
[37] Wolfe P.J and Belabbas .A (2008)‖Hessian
eigen maps:Locally Linear
techniques for high dimensional data.
[38] Lunan .Y, Li.H, ―Clustering of time-course
gene expression data using a mixed effects
model with B-splines, ―Bioinformatics
Vol.19, 2003, pp.474-482.
[39] S. T. Roweis and L. K. Saul. Nonlinear
dimensionality reduction by locally linear
embedding. Science 290, 2323-2326
(2000).
[40] Zha, H., Ding, C., Gu, M., He, X., &
Simon, H. (2002). Spectral relaxation for
K-means clustering. Advances in Neural
Information Processing Systems 14
(NIPS’01), 1057–1064.
J.SHALINI.my place of birth is in Coimbatore
on 07/10/1984 and my educational qualification is
B.Sc [CT] M.Sc.,Mphil computer science completed
my ug in sri Krishna college of engineering and
technology and my postgraduate in sri Ramakrishna
college of arts and science for women.Now I am
working in PSGR krishnammal college for women
as Assistant Professor..
R.JAYASREE my place of birth is in Coimbatore
and my educational qualification is ,MCA in
Bharathiar university and pursuing M.Phil in PSGR
Krishnammal college and now I am working as
Assistant Professor in PSGR krishnammal college
for women.
K.VAISHNAVI my place of birth is in Madurai and
my educational qualification is ,MCA in Bharathiar
university and pursuing M.Phil in PSGR
Krishnammal college and now I am working as
Assistant Professor in PSGR krishnammal college
for women.

Más contenido relacionado

La actualidad más candente

Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
 
A Hybrid Immunological Search for theWeighted Feedback Vertex Set Problem
A Hybrid Immunological Search for theWeighted Feedback Vertex Set ProblemA Hybrid Immunological Search for theWeighted Feedback Vertex Set Problem
A Hybrid Immunological Search for theWeighted Feedback Vertex Set ProblemMario Pavone
 
R sprojec3
R sprojec3R sprojec3
R sprojec3Qust04
 
Hybrid method for achieving Pareto front on economic emission dispatch
Hybrid method for achieving Pareto front on economic  emission dispatch Hybrid method for achieving Pareto front on economic  emission dispatch
Hybrid method for achieving Pareto front on economic emission dispatch IJECEIAES
 
Modeling cassava yield a response surface approach
Modeling cassava yield a response surface approachModeling cassava yield a response surface approach
Modeling cassava yield a response surface approachijcsa
 
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...IJMERJOURNAL
 
Textural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image ClassificationTextural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image ClassificationCSCJournals
 
Nonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel DataNonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel DataIJCSIS Research Publications
 
Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...Atef Alghzzy
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 

La actualidad más candente (18)

Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
 
A Hybrid Immunological Search for theWeighted Feedback Vertex Set Problem
A Hybrid Immunological Search for theWeighted Feedback Vertex Set ProblemA Hybrid Immunological Search for theWeighted Feedback Vertex Set Problem
A Hybrid Immunological Search for theWeighted Feedback Vertex Set Problem
 
R sprojec3
R sprojec3R sprojec3
R sprojec3
 
Af4201214217
Af4201214217Af4201214217
Af4201214217
 
Hybrid method for achieving Pareto front on economic emission dispatch
Hybrid method for achieving Pareto front on economic  emission dispatch Hybrid method for achieving Pareto front on economic  emission dispatch
Hybrid method for achieving Pareto front on economic emission dispatch
 
Modeling cassava yield a response surface approach
Modeling cassava yield a response surface approachModeling cassava yield a response surface approach
Modeling cassava yield a response surface approach
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
 
Textural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image ClassificationTextural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image Classification
 
Data Modelling
Data ModellingData Modelling
Data Modelling
 
Nonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel DataNonnegative Garrote as a Variable Selection Method in Panel Data
Nonnegative Garrote as a Variable Selection Method in Panel Data
 
Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
SVM
SVMSVM
SVM
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
MISHRA DISTRIBUTION
MISHRA DISTRIBUTIONMISHRA DISTRIBUTION
MISHRA DISTRIBUTION
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 

Destacado (19)

Eq33857862
Eq33857862Eq33857862
Eq33857862
 
Ev33889894
Ev33889894Ev33889894
Ev33889894
 
Ea33762765
Ea33762765Ea33762765
Ea33762765
 
Fj3110731078
Fj3110731078Fj3110731078
Fj3110731078
 
Fq3111191124
Fq3111191124Fq3111191124
Fq3111191124
 
Fk3110791084
Fk3110791084Fk3110791084
Fk3110791084
 
He3113871392
He3113871392He3113871392
He3113871392
 
Ak34224227
Ak34224227Ak34224227
Ak34224227
 
Y34147151
Y34147151Y34147151
Y34147151
 
Bu34437441
Bu34437441Bu34437441
Bu34437441
 
Ed33777782
Ed33777782Ed33777782
Ed33777782
 
El33827831
El33827831El33827831
El33827831
 
Gr3311701176
Gr3311701176Gr3311701176
Gr3311701176
 
Ef33787793
Ef33787793Ef33787793
Ef33787793
 
Hd3312521256
Hd3312521256Hd3312521256
Hd3312521256
 
Gx3312181223
Gx3312181223Gx3312181223
Gx3312181223
 
G33029037
G33029037G33029037
G33029037
 
Gw3312111217
Gw3312111217Gw3312111217
Gw3312111217
 
Ha3312361238
Ha3312361238Ha3312361238
Ha3312361238
 

Similar a Eu33884888

information theoretic subspace clustering
information theoretic subspace clusteringinformation theoretic subspace clustering
information theoretic subspace clusteringali hassan
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmijfcstjournal
 
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...idescitation
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Miningijsrd.com
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...ijsc
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptAbdullah Gubbi
 
Neural Networks with Complex Sample Data
Neural Networks with Complex Sample DataNeural Networks with Complex Sample Data
Neural Networks with Complex Sample DataSavano Pereira
 
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...ijscmcj
 
Scalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingScalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingeSAT Journals
 
Scalable and efficient cluster based framework for
Scalable and efficient cluster based framework forScalable and efficient cluster based framework for
Scalable and efficient cluster based framework foreSAT Publishing House
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...IOSRjournaljce
 

Similar a Eu33884888 (20)

information theoretic subspace clustering
information theoretic subspace clusteringinformation theoretic subspace clustering
information theoretic subspace clustering
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithm
 
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution Method
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
I017235662
I017235662I017235662
I017235662
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition ppt
 
Neural Networks with Complex Sample Data
Neural Networks with Complex Sample DataNeural Networks with Complex Sample Data
Neural Networks with Complex Sample Data
 
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
 
Scalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingScalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexing
 
Scalable and efficient cluster based framework for
Scalable and efficient cluster based framework forScalable and efficient cluster based framework for
Scalable and efficient cluster based framework for
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Eu33884888

  • 1. J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.884-888 884 | P a g e Clustering on High Dimensional data Using Locally Linear Embedding (LLE) Techniques First Author, J.Shalini, Second Author, R.Jayasree, Third Author,K.Vaishnavi Assistant Professor,Department Of Computer Science Abstract Clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). The dimension can be reduced by using some techniques of dimension reduction. Recently new non linear methods introduced for reducing the dimensionality of such data called Locally Linear Embedding (LLE).LLE combined with K-means clustering in to coherent frame work to adaptively select the most discriminant subspace. K-means clustering use to generate class labels and use LLE to do subspace selection. Keywords - Clustering, High Dimension Data, Locally Linear Embedding, k-means clustering, Principal Component Analysis I. INTRODUCTION High dimensional datasets present many mathematical challenges as well as some opportunities to bound to give rise to new theoretical development [23]. The problem is especially severe when large databases with many features are searched for patterns without filtering of important features based on prior knowledge. The growing importance of knowledge discovery and data mining methods in practical applications has made the feature selection/extraction problem. II BACKGROUND STUDY Developing effective clustering methods for high dimensional datasets is a challenging problem due to the curse of dimensionality. There are many approaches to address the problem of curse of dimensionality. The simplest approach of dimension reduction techniques [6], principal component analysis (PCA) (Duda et al., 2000; Jolliffe, 2002) and random projections (Dasgupta, 2000). In PCA method, dimension reduction is carried out as a preprocessing step and is decoupled from the clustering process. Once the subspace dimensions are selected, they stay fixed during the clustering process. This is The main draw back of PCA [11] [27]. An extension of this approach is Locally Linear Embedding Techniques. III PROPOSED WORK OF LLE The proposed work is carried out by collecting a large volume of high dimensional unsupervised gene datasets, by using dimension reduction techniques such as, LLE describe locally linear embedding (LLE) an unsupervised learning algorithm that computes low dimensional and neighborhood preserving embeddings of high dimensional data. Generating highly nonlinear embeddings—do not involve local minima. The work is implemented in Mat Lab 7.0. The reduction can be done with the combination of LLE + k-means clustering. A. LLE LLE is commonly called as Locally Linear Embedding; it is one of the dimension reduction techniques [23]. The LLE algorithm of Roweis and Saul (2000) it involves mapping high-dimensional inputs into a low dimensional ―description‖. B. STEPS OF LLE WORKING (1) Assign neighbors to each data point W Xi (for example by using the K nearest neighbors). (2) compute the weights Wij that best linearly reconstruct Xi from its neighbors, solving the constrained least-squares problem. (3) Compute the low-dimensional embedding vectors Yi best reconstructed by Wij, minimizing the smallest eigenmodes of the sparse symmetric matrix in Although the weights Wij and vectors Yi are computed by methods in linear algebra, the constraint that points are only Fig 1. Clusters Reconstruction errors are measured by the cost function. The weights Wij summarize the contribution of the jth data point to the ith reconstruction. To compute
  • 2. J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp. 885 | P a g e the weights Wij, we minimize the cost space with as many function subject to two constraints: first, that each data point Xi is reconstructed only from its neighbors. The optimal weights Wij subject to these constraints are found by solving a least-squares problem. This cost function, like the previous one, is based on locally linear reconstruction errors, but here we fix the weights Wij while optimizing the coordinates Yi. The embedding cost in Eq. 2 defines a quadratic form in the vectors W Yi. Subject to constraints that make the problem well-posed, it can be minimized by solving a sparse N * N eigenvalue problem[36]. In this work experiments, data points were reconstructed from their K nearest neighbors, as measured by Euclidean distance. C. LLE Algorithms IV. LLE+K-Means Roweis and Saul (2000) have recently proposed [36]an unsupervised learning algorithm of low dimensional manifolds, whose goal is to recover the non linear structure of high dimensional data. The idea behind their local linear embedding algorithm is that nearby points in the high dimensional space remain nearby and similarly co- located in the low dimensional embedding. Starting from this intuition each data point xi (i = 1; n) is approximated by a weighted linear combination of its neighbors (from the nature of these local and linear reconstructions the algorithm derives its name). In its base formulation, the LLE algorithm finds the linear coefficients wij by minimizing the reconstruction wh ich k ¢ k is the Euclidean norm. For each xi the weights wig are enforced to sum to one and are equal to zero if a point ax does not belong to the set of k neighbors for xi. The same weights that characterize the local geometry in the original data space are supposed to reconstruct the data points in the low dimensional embedding. The n coordinates are then estimated by minimizing the cost function: for reconstructing it by its k neighbors, as in the first step of LLE. LLE is an unsupervised, non-iterative method, which avoids the local minima problems plaguing many competing methods (e.g. those based on the EM algorithm) This formula is applied in this proposal on five available data sets: namely Iris, Wine, Zoo, Cancer, and Drosophila. Consider a set of input data vectors X =(x1, ··· ,xn) in high dimensional space. For simplicity, the data is centered in the preprocessing step, so that the standard K-means clustering is to minimize the clustering objective function [11]. Where the matrix is the cluster indicator: if xi belongs to the k-th cluster, . Suppose the data consist of real-valued vectors , each of dimensionality D, sampled from some smooth underlying manifold. It characterizes the local geometry of these patches by linear coefficients that reconstruct each data point from its neighbors. In the simplest formulation of LLE, one identifies nearest neighbors per data point, as measured by Euclidean distance. V. EXPERIMENT AND RESULT In this section it describes about the experiments evaluate the Performance of LLE and LLE +k -means in Dimension reduction and compare with LDA+K-means, used widely in gene data sets. VI. DATASET DESCRIPTION The proposed work has been implemented and tested on five public available different gene data sets namely Iris, Breast Cancer, Drosophila, Wine, and Zoo. The descriptions of these datasets are as follows. Five datasets including Iris, Wine, Drosophila, cancer and Zoo  Zoo gene describes the animal’s hair, feathers, eggs, milk, airborne, back bone, fins, tails etc.  Iris gene describes the flowers sepal length, sepal width, petal length, petal width it’s a multivariate. The attribute used is real. 1. Compute the neighbors of each data point Xi 2. Compute the Weights Wij that best reconstruct each data point Xi from its neighbors Minimizing the cost by constraints linear fits. 3. Compare the vectors Yi best reconstructed by the weights Wij Minimizing the quadratic form by its bottom nonzero eigen vectors.
  • 3. J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp. 886 | P a g e  Drosophila describes concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes.  Breast Cancer describes tumour size, Class: no-recurrence-events, recurrence-events, age, menopause, and inv-nodes6. Node-caps, Breast- quad: left-up, left-low, right-up, right-low, central. Wine data describes the attribute about Alcohol, Malic acid, Ash Alkalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoidphenols, Proanthocyanins. Table 1: Data set Description Table 2. Clustering accuracy table on UCI dataset. The results are obtained by averaging5 trials of LLE+ K-means Fig 2: k-means clustering with Iris Fig 3: LDA+ k-Means with Iris Data set Fig 4: LLE+k-means clustering Fig 5: K-Means clustering with wine Data set With Iris Dataset Fig 6: LDA-k means clustering Fig7: LLE+K means clustering with wine Dataset VII. RESULT ANALYSIS Datasets #Samples #Dimensions #Class Iris 150 4 3 Wine 178 13 3 Zoo 101 18 7 Breast Cancer 286 9 4 Drosophila 163 4 3
  • 4. J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp. 887 | P a g e All the above datasets have labels. View all the labels of the datasets as the objective knowledge on the structure of the datasets and use accuracy as the clustering performance measure. Accuracy discovers the one-to-one relationship between clusters and classes and measures the extent to which each cluster contains data points from the corresponding class and it has been used as performance measures for clustering analysis. Accuracy can be described Where n is the number of data points, Ck denotes the k-the cluster, and Lm is the m-the class. Is the number of data points that belong to class m are assigned to cluster k. Accuracy is then computed as the maximum sum of for all pairs of clusters and classes, and these pairs have no overlaps. On the five datasets data repository, it compares the LLE-Km algorithm with standard K-means algorithm. This work is also comparing it with PCA-based clustering algorithm: clustering. This shows that LLE-Km clustering is viable and competitive. The subspace clustering is able to perform the subspace selection and data reduction. Note that LLE-Km is also able to discover clusters in the low dimensional subspace to overcome the curse of dimensionality. IX. CONCLUSION The performance of dimension reduction techniques with k-means clustering method using real different gene data sets was compared. In this thesis, considered various properties of data sets and data preprocessing procedures, and have confirmed the importance of preprocessing data prior to performing core cluster analysis. All clustering methods were affected by data dimension and data characteristics, such as the overlapping between clusters and the presence of noise. In particular the LLE + k m which are commonly used to reduce the dimension of data, The future extension of this work can be done via some other preprocessing techniques. REFERENCES [1] Alone et alum Broad Pattern of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleatide arrays. PNAS, 96(12): 6745-6750, 1999. [2] Anton Schwaighofer. Matlab interface to svmlight.In (2004) [3] Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., & Modha, D. (2004). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. Proc. ACM Int’l Conf Knowledge Disc. Data Mining (KDD). [4] arbara.D,‖ An Introduction to Cluster Analysis for Data mining‖, [5] room Head .D.S, Kirby, A new approach to dimensionality reduction:Theory and algorithms, SIAM journal of applied Mathematics 60 (6) (2000)2114-2142. [6] Carreira-Perpinan.M.A, Areview of dimension reduction techniques. Technical report CS-96-09, Department of Computer Science, University of Sheffield, 1997. [7] Cheng, Y., & Church, G. (2000). Biclustering of expression data. Proc. Int’l Symp. Mol. Bio (ISMB), 93–103. [8] Chrisding et al ―K-means clustering Via PCA‖ 21st International Conference on Machine Learning, Canada-2004. [9] Diamantras. K.I and Kung.S-Y.Principal Component Neural Networks. Theory and Applications. John Wiley & Sons, New York, Londin, Sydney, 1996. [10] Ding, C., & He, X. (2004). K-means clustering and principal component analysis. Int’l Conf. Machine Learning (ICML) [11] Ding, C., He, X., Zha, H., & Simon, H. (2002). Adaptive dimension reduction for clustering high dimensional data. Proc. IEEE Int’l Conf. Data Mining [12] De la Torre, F., & Kanade, T. (2006). Discriminative cluster analysis. Proc. Int’l Conf. Machine Learning. [13] D. DeMers and G.W. Cottrell. Non-linear dimensionality reduction. In C.L. Giles, S.J. Hanson, and J.D. Cowan, editors, Advances in Neural Information Processing Systems 5, pages 580–587. Morgan Kaufmann, San Mateo, CA, 1993. [14] Fukunaga .K, Olsen .D.R, An Algorithm for Finding intrinsic dimensionality of data, IEEE Transactions on Computers 20 (2) (1976) 165-171. [15] Halkidi.M, Batistakis.Y.―On Clustering Validation Techniques‖, 2001. [16] Han.J and Kamber.M. ―Data Mining Concepts and Techniques‖, the Morgan Kaufmann Publisher, August 2000. ISBN 1-55860-489-8. [17] Hartigan .J.A, Wang.M.A, ―A K-Means Clustering Algorithm‖, Appl.Stat, Vol.28, 1979, pp.100-108. [18] Hartuv.E, Schmitt. A, Lange.J, ―An Algorithm for Clustering cDNAS for Gene Expression Analysis‖, Third International Conference on Computational Molecular Biology (RECOMB) 1999. [19] Hotelling.H. ―Analysis of a complex of statistical variables into principal
  • 5. J.Shalini, R.Jayasree, K.Vaishnavi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp. 888 | P a g e components‖. Journal of Educational Psychology, 24:417–441, 1933 [20] Jackson .J.E. ―A User's Guide to Principal Components‖. New York: John Wiley and Sons, 1991. [21] Jain .A.K, Dubes .R.C Algorithm for clustering Data, Prentice Hall, 1988. [22] Jain .A.K, Murty.M.N, Flynn.P.J. ―Data Clustering: A Review‖ .ACM Computer Surveys, Vol.31, No.3, 1999. [23] John stone I.M(2009) ―Non-Linear dimensionality reduction by LLE‖. [24] Jolliffe, I. (2002). ―Principal component analysis”. Springer. 2nd Edition. [25] Jolliffe. I.T. ―Principal Component Analysis‖. Springer-Verlag, 1986. [26] Jolliffe I.T. ―Principal Component Analysis” (Springer-Verlag, New York, 1989). [27] Kambhatla.N and Leen. T. K. ―Dimension reduction by local principal component analysis‖. Neural Computation 9, 1493– 1516 (1997). [28] Kaski. S. ―Dimensionality reduction by random mapping: fast similarity computation for clustering‖. Proc. IEEE International Joint Conference on Neural Networks, 1:413-418, 1998. [29] Kaufman.L, Rousseeuw.P.J, ―Finding Groups in Data: An Introduction to Cluster Analysis‖. John Wiley, New York, 1990. [30] Kirby.M, Geometric Data Analysis: ―An Empirical Approach to Dimensionality Reduction n and the Study of Patterns‖, John Wiley and Sons, 2001. [31] Kohavi.R and John. G.The wrapper approach. In H. Liu and H. Motoda, editors, Feature Extraction, Construction and Selection: ―Data Mining Perspective‖. Springer Verlag, 1998. [32] O. Kouropteva, O. Okun, and M. Pietikäinen. Selection of the optimal parameter value for the locally linear embedding algorithm. 2002. Submitted to 1st International Conference on Fuzzy Systems and Knowledge Discovery. [33] M. Kramer. Nonlinear principal component analysis using auto associative neural networks. AIChE Journal 37, 233–243 (1991). [34] Lee.Y and Lee.C.K, ―Classification on multiple cancer types by multicategory support vector machines using gene expression data,‖ Bioinformatics, vol.19, pp.1132-1139, 2003 [35] Li. K.-C. High dimensional data analysis via the SIR/PHD approach., April 2000. Lecture notes in progress. [36] Li, T., Ma, S., & Ogihara, M. (2004). Document clustering via Adaptive subspace iteration. Proc. conf. Research and development in IR (SIRGIR) (pp. 218– 225). [37] Wolfe P.J and Belabbas .A (2008)‖Hessian eigen maps:Locally Linear techniques for high dimensional data. [38] Lunan .Y, Li.H, ―Clustering of time-course gene expression data using a mixed effects model with B-splines, ―Bioinformatics Vol.19, 2003, pp.474-482. [39] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323-2326 (2000). [40] Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2002). Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS’01), 1057–1064. J.SHALINI.my place of birth is in Coimbatore on 07/10/1984 and my educational qualification is B.Sc [CT] M.Sc.,Mphil computer science completed my ug in sri Krishna college of engineering and technology and my postgraduate in sri Ramakrishna college of arts and science for women.Now I am working in PSGR krishnammal college for women as Assistant Professor.. R.JAYASREE my place of birth is in Coimbatore and my educational qualification is ,MCA in Bharathiar university and pursuing M.Phil in PSGR Krishnammal college and now I am working as Assistant Professor in PSGR krishnammal college for women. K.VAISHNAVI my place of birth is in Madurai and my educational qualification is ,MCA in Bharathiar university and pursuing M.Phil in PSGR Krishnammal college and now I am working as Assistant Professor in PSGR krishnammal college for women.