SlideShare a Scribd company logo
1 of 13
Download to read offline
Tutorial on Dimensionality Reduction
             Shatakirti
             MT2011096
Dimensionality Reduction


Contents
1 Introduction                                                                                    2

2 Linear Dimensionality Reduction                                                                  2

3 Non-Linear Dimensionality Reduction                                                              5
  3.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . .             .   .   .   .   .   .    5
  3.2 Manifold Learning . . . . . . . . . . . . . . . . . .               .   .   .   .   .   .    6
      3.2.1 Isomap . . . . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .    6
      3.2.2 Locally Linear Embedding (LLE) . . . . . .                    .   .   .   .   .   .    8
      3.2.3 Isomap vs Locally Linear Embedding (LLE)                      .   .   .   .   .   .   10
  3.3 Applications of Manifold Learning . . . . . . . . . .               .   .   .   .   .   .   11

References                                                                                        12


List of Figures
   1    Finding the Principal Components . . . . .        .   .   .   .   .   .   .   .   .   .    3
   2    Principal Components Analysis on 3D data .        .   .   .   .   .   .   .   .   .   .    4
   3    Manifolds . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .    5
   4    Swiss Roll Manifold . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .    6
   5    Isomap Manifold Learning . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .    8
   6    LLE algorithm . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .    9




                                      1
Dimensionality Reduction


1     Introduction
x
    In the recent times, data has become very large and therefore requires
classification of data from an extremely large data set. Many applications
in data mining require deriving a classifier or a function estimate from a
very large data set. The present state of data sets provide a large number of
examples in which the data is classified and given as an example for classifying
the data which may come in the future. The classified dataset consists of a
large number of features, some of which may be irrelavent and sometimes even
misleading. This may be an issue for an algorithm attempting to generalize
the data. Hence, the datasets which have an extremely complex feature
sets will slow down any algorithm which attempts to classify it and make it
difficult to find an optimal result. In order to decrease the burden on the
classifiers and function estimators, we will have to reduce the dimensionality
of the data so that the number of features in the data will reduce by a
large extent. Thus, dimensionality reduction simplifies data so that it can
be efficiently processed.
    Apart from visualization, dimensionality reduction helps reveal what is
the main feature governing a data set. For example, suppose we want to clas-
sify a mail as a spam or a non-spam. A general approach we would follow
is to represent it a vector of words appearing in the email. The dimension-
ality of this could easily be in hundreds. But, by dimensionality reduction
approach may reveal that there are only a few telling features like the words
”free”,”donate”,etc. This can help classify the mail as spam.

   There are two types by which we can reduce the dimensionality of the
given data set. They are:

    1. Linear Dimensionality Reduction

    2. Non-linear Dimensionality Reduction


2     Linear Dimensionality Reduction
The most popular algorithm for dimensionality reduction is Principal Com-
ponent Analysis(PCA). Given a data set, PCA finds the vectors along which
the data has maximum variance in addition to the relative importance of
these directions. An example will explain the PCA in a more intutive way.
Take for example, the data we have is the surface of a teapot and we need
to capture the most information about the 3D teapot.

                                      2
Dimensionality Reduction


   In order to achieve this, we rotate the teapot and get a position where
we can get the most visual information. The method to achieve that will be:
First find the axis so that the object has largest extend in average along the
axis (red axis). Next, rotate the object around the first axis to find the axis
that is perpendicualr to the first axis, and the object has largest extend in
average along this axis (green axis).




               Figure 1: Finding the Principal Components


  The two axises found are the first and the second principal component.
And the extends in average along the axises are called the eigenvalues.

Mathematically, the steps involved in PCA are:
Suppose we have n documents and overall of m terms,
  1. Construct an m x n term document matrix A. Each document is rep-
     resented as a column vector of m dimensions.

  2. Compute the empirical mean of each term.

  3. Compute the normalized matrix by subtracting emperical mean from
     each data dimensions. The mean subtracted is the average across each
     dimension.

  4. Calculate the m x m term covariance matrix from the normalized pro-
     jections.

  5. Calculate the eigenvectors and the eigenvalues of the covariance matrix.
     Since the covariance matrix is square, we can calculate the eigenvectors

                                     3
Dimensionality Reduction


     and the eigenvalues for this matrix. It is important to notice that these
     eigenvectorss are both unit eigenvectors. This is very important for
     PCA.

  6. Once the eigenvectors are found from the covariance matrix, the next
     step is to order them by eigenvalue, highest to lowest. This gives the
     components in order of significance. Now, if we wish, we can ignore the
     components of lower significance. we may lose some information, but if
     the eigenvalues are small, we may not lose much. Then, the final data
     set will have lesser dimensions. To be precise, if you originally have
     n dimensions in your data, and so you calculate n eigenvectors and n
     eigenvalues, and then you choose only the first p eigenvectors, then the
     final data set has only p dimensions. The value of p can be decided
     by computing the cumulative energy of the eigenvalues. Choose p such
     that the cumulative energy is above a certain threshhold say 90% of
     the overall cumulative energy.




           Figure 2: Principal Components Analysis on 3D data


   Despite PCA’s popularity, it has a number of drawbacks. One of the
major drawback is the requirement that the data lie on a linear subspace.
For example, in the figure 4 below (known as a swiss roll), the data is actually
2 dimensional manifold, but PCA will not correctly extract this data set.
   There are other approaches to reduce the number of dimensions. It has
been observed that high-dimensional data is often much simpler than the
dimensionality it actually shows. In other words, a high-dimesional data set

                                      4
Dimensionality Reduction


may contain many features that are all measurements of the same underlying
cause and thus are very closely related. For ex. taking a video footage of
a singe object from multiple angles simultaneously. The features of such a
dataset contain a lot of overlapping information. This notion is formalized
using the notion of a manifold.


3     Non-Linear Dimensionality Reduction
3.1    Manifolds
A manifold is basically a low-dimensional Eucledian subspace to which a
higher-dimensional subspace is mapped to. A more general topological man-
ifold can be described as a topological space that on a small enough scale
resembles the Euclidean space of a specific dimension, called the dimension of
the manifold. And thereby, a line and a circle are one dimensional manifolds.
A plane and a sphere are two dimensional manifolds and so on.




      (a) The sphere (surface of a ball)        (b) A 1D manifold embedded in 3D
      is a two-dimensional manifold since
      it can be represented by a col-
      lection of two-dimensional maps.
      Source:Wikipedia

                               Figure 3: Manifolds

    In the figure a above, we notice that the triangle drawn on the 3D globe
can actually be represented linearly on a 2D space.
    In the figure b above, notice that the curve is in 3D, but it has zero volume
and zero area and hence the 3D representation is somewhat misleading since
it can be represented as a line (1D).


                                            5
Dimensionality Reduction


3.2     Manifold Learning
Manifold learning is one of the most popular approach to non-linear dimen-
sionality reduction. The algorithms used for this job are based on the idea
that the data is acually present in the low-dimension but is embedded in a
high-dimension space, where the low-dimensional space reflects the under-
lying parameters. Manifold learning algorithms try to get these parameters
in order to find a low-dimensional representation of the data. Some of the
widely used algorithms for this purpose are: Isomap, Locally Linear Embed-
ding, Laplacian Eigenmaps, Semidefinite Embedding. The best example used
to explain the manifold learning is the swiss roll, a 2D manifold embedded
in 3D shown in the figure below.




                       Figure 4: Swiss Roll Manifold


3.2.1   Isomap
Isomap - short for isometric feature mapping - was one of the first algorithms
introduced for manifold learning. Its one of the most applied procedure for
the problem.

   Isomap consists of two main steps:
  1. Estimate the geodesic distances (distances along a manifold) between
     the points using shortest path distances on the data set’s k-nearest
     neighbour graph.

  2. Then use Multi Dimensional Scaling (MDS), to map the distances we
     got in the first step, onto the LD Eucledian space keeping in mind the
     interpoint distances between the points computed in the first step.


                                     6
Dimensionality Reduction


Estimating Geodesic distances

     A geodesic is defined as a curve that locally minimizes the distance be-
tween two points on any mathematically defined space, such as a curved
manifold. Equivalently, it is a path of minimal curvature. In noncurved
three-dimensional space, the geodesic is a straight line.
     We make an assumption that the data is present in D dimension and the
manifold is assumed to be present in d dimention. Isomap furthur assumes
that there is a chart that preserves the distances between points i.e. if xi,xj
are points on the manifold and G(xi , xj ) is the geodesic distance between
them, then there is a chart f : M → Rd such that
||f (xi ) − f (xj )|| = G(xi , xj )|| and the manifold is smooth enough that the
geodesic distance is between the nearby points are approximately linear.

Multidimensionaly scaling (MDS)

    After finding the geodesic distances, Isomap finds the points whose Eu-
clidean distances are equal to these geodesic distances. And since the mani-
fold is isometrically embedded, such points exist. Multidimensional Scaling is
a classical techique that may be used to find such points. The MDS basically
constructs a set of points from a matrix of dissimlarities whose interpoint
Euclidean distances match those in data’s actual dimension D, closely. Clas-
sical MDS (cMDS) algorithm is used by Isomap to minimize the cost. The
cMDS algorithm takes an input matrix giving dissimilarities between pairs
of items and outputs a coordinate matrix whose configuration minimizes a
loss function called strain.
Hence, first compute the pairwise distances from a given set of m vectors,
(x1 , x2 , ..xm ) in n dimensional space.

                                                    · · · δ1.m
                                                              
                                   0       δ1,2
                                δ2,1
                               
                                            0       · · · δ2,m 
                                                               
                         ∆=     .
                                .                  ...        
                                .
                                                               
                                                               
                                δm,1 δm,2 · · ·            0
And then, map the vectors onto the manifold of a lower dimension k<<n,subject
to the following optimization criteria:


                    min(x1 , x2 , ..xm )         (|xi − xj | − δi,j )2
                                           i<j




                                            7
Dimensionality Reduction


Isomap working
The Isomap algorithm sestimates the geodesic distances using the shortest
path algorithms and then finding an embedding of these distances in Eu-
clidean space using cMDS algorithm.

 Algorithm 1: Isomap
  input : x1 , x2 ...., xn RD , k

        1. Form the k-nearest neighbor graph with edge weights Wij := |xi − xj |
           for neighboring points xi , xj .

        2. Compute the shortest path distances between all pairss of points
           using Dijkstra’s or Floyd’s algorithm. Store the squares of these
           distances in D where D is the Euclidean distance matrix.

        3. Return Y:=cMDS(D).




                      Figure 5: Isomap Manifold Learning


    One particularly helpful feature of Isomap - not found in some of the
other algorithms - is that it automatically provides an estimate of the dimen-
sionality of the underlying manifold. In particular, the number of non-zero
eigenvalues found by Classical MDS (cMDS) gives the underlying dimension-
ality.

3.2.2    Locally Linear Embedding (LLE)
LLE assumes the manifold to be a collection of overlapping coordinate patches.
And if the neighbourhood sizes are small and the manifold is smooth, then,
the patches can be considered as almost linear. LLE also begins by finding a
set of nearest neighbors of each point. It then computes a set of weights for

                                       8
Dimensionality Reduction


each point from its neighbors that best reconstructs each data point. Then
in the end, it uses the eigenvector based optimization to find the LD embed-
ding of points such that the intrinsic weights are preserved to maintain the
nonlinear manifold in the LD space.




                         Figure 6: LLE algorithm


    To be more precise, the LLE algorithm is given as inputs an n x p data
matrix X, with rows xi , a desired number of dimensions q < p and an integer
k for finding local neighborhoods, where k ≥ q+1. The output is supposed
to be an n x q matrix Y, with rows yi
The steps involved in the LLE algorithm is given below:




                                     9
Dimensionality Reduction


 Algorithm 2: Locally Linear Embedding (LLE)
        1. For each xi , find the k nearest neighbors

        2. Find the weight matrix W which minimizes the residual sum of
           squares for reconstructing each xi from its neighbors.

                                               n
                              RSS(w) ≡              ||xi −         wij xj ||2
                                            i=1              j=1


          where wij = 0 unless xj is one of xi ’s k-nearest neighbors, and for
          each i, wij = 1
                  j

        3. Find the cordinates Y which minimize the reconstruction error using
           the weights,

                                           n
                                φ(Y) ≡             ||yi −         wij yj ||2
                                         i=1                j=1


          subject to the constraints that           Yij = 0 for each j and that Y T Y = I
                                               j




3.2.3    Isomap vs Locally Linear Embedding (LLE)
Embedding Type
Isomap looks for isometric embedding i.e it assumes that there is a coordi-
nate chart from the parameter space to HD space that preserves interpoint
distances and attempt to uncover this chart. LLE looks for conformal map-
pings i.e mapping which preserves local distances between points but not the
distances between all the points.

Local vs Global
Isomap is a global method because it considers the geodesic distances be-
tween all pairs of the points on the manifold. LLE is a local method because
it constructs an embedding considering only the placement of a point with
respect ot its neighbors.




                                      10
Dimensionality Reduction


3.3    Applications of Manifold Learning
Manifold learning methods are adaptable data-representation techniques that
enable dimensionality reduction and processing tasks in meaningful spaces.
Their success in medical image analysis as well as in other scientific fields lies
in both, their flexibility and the simplicity of their application. In medical
imaging, manifold learning has been successfully used to visualize, cluster,
classify and fuse high dimensional data, as well as for Content Based Image
Retrieval (CBIR), segmentation, registration, statistical population analysis
and shape modeling and classification.

  1. Manifold learning is used for patient position detection in MRI. Low-
     resolution images, acquired during the initial placement of the patient
     in the scanner, are exploited for detecting the patient position.

  2. Isomap Method is used in prediction of Protein Quaternary Structure.

  3. medical image analysis: applications to video endoscopy and 4D imag-
     ing.

  4. Identifying spectral clustering.

  5. Identify increase or decrease in disease cells or a tumour in application
     to Neuroimaging.

  6. Character Recognition.

  7. Research is going on for using Manifold learning for image and video
     indexing. As we know, there are millions and millions of videos on the
     internet. These are stored in the repositories along with the people
     creating and sharing them. Now, if a person queries for an image or a
     video, we need to effectively identify duplication and copyright of the
     image or the videos. For this purpose, manifold learning is being used
     for image/video analyzing, indexing and searching.




                                        11
Dimensionality Reduction


References
[1] ”Algorithms for manifold learning” by Lawrence Cayton - June 15,2005.

[2] ”A tutorial on Principal Component Analysis” by Lindsay I Smith -
    February 26,2002

[3] ”Linear Dimensionality Reduction” by Percy Liang - October 16,2006

[4] ”A layman’s introduction to principal component analysis” by VisuMap
    Technologies

[5] en.wikipedia.org/wiki/Principal_component_analysis

[6] en.wikipedia.org/wiki/Nonlinear_dimensionality_
    reductionr




                                   12

More Related Content

What's hot

Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachKhulna University
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisSayed Abulhasan Quadri
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in RDuyen Do
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysishktripathy
 

What's hot (20)

Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component Analysis
 
Pca
PcaPca
Pca
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 

Viewers also liked

Different type of databases
Different type of databasesDifferent type of databases
Different type of databasesShwe Yee
 
Concept description characterization and comparison
Concept description characterization and comparisonConcept description characterization and comparison
Concept description characterization and comparisonric_biet
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationDataminingTools Inc
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 

Viewers also liked (6)

Different type of databases
Different type of databasesDifferent type of databases
Different type of databases
 
Concept description characterization and comparison
Concept description characterization and comparisonConcept description characterization and comparison
Concept description characterization and comparison
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 

Similar to Dimensionality reduction

Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataKathleneNgo
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Florent Renucci
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27IJARIIE JOURNAL
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Bulbul Agrawal
 
Image recogonization
Image recogonizationImage recogonization
Image recogonizationSANTOSH RATH
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicoreillidan2004
 
A New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition PerformanceA New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition PerformanceCSCJournals
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmijfcstjournal
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1csandit
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 

Similar to Dimensionality reduction (20)

M2R Group 26
M2R Group 26M2R Group 26
M2R Group 26
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...
 
Image recogonization
Image recogonizationImage recogonization
Image recogonization
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 
A New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition PerformanceA New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition Performance
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithm
 
Pca seminar final report
Pca seminar final reportPca seminar final report
Pca seminar final report
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Final Report
Final ReportFinal Report
Final Report
 
Data mining of massive datasets
Data mining of massive datasetsData mining of massive datasets
Data mining of massive datasets
 
Mining of massive datasets
Mining of massive datasetsMining of massive datasets
Mining of massive datasets
 
Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 

Dimensionality reduction

  • 1. Tutorial on Dimensionality Reduction Shatakirti MT2011096
  • 2. Dimensionality Reduction Contents 1 Introduction 2 2 Linear Dimensionality Reduction 2 3 Non-Linear Dimensionality Reduction 5 3.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.1 Isomap . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.2 Locally Linear Embedding (LLE) . . . . . . . . . . . . 8 3.2.3 Isomap vs Locally Linear Embedding (LLE) . . . . . . 10 3.3 Applications of Manifold Learning . . . . . . . . . . . . . . . . 11 References 12 List of Figures 1 Finding the Principal Components . . . . . . . . . . . . . . . 3 2 Principal Components Analysis on 3D data . . . . . . . . . . . 4 3 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4 Swiss Roll Manifold . . . . . . . . . . . . . . . . . . . . . . . . 6 5 Isomap Manifold Learning . . . . . . . . . . . . . . . . . . . . 8 6 LLE algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1
  • 3. Dimensionality Reduction 1 Introduction x In the recent times, data has become very large and therefore requires classification of data from an extremely large data set. Many applications in data mining require deriving a classifier or a function estimate from a very large data set. The present state of data sets provide a large number of examples in which the data is classified and given as an example for classifying the data which may come in the future. The classified dataset consists of a large number of features, some of which may be irrelavent and sometimes even misleading. This may be an issue for an algorithm attempting to generalize the data. Hence, the datasets which have an extremely complex feature sets will slow down any algorithm which attempts to classify it and make it difficult to find an optimal result. In order to decrease the burden on the classifiers and function estimators, we will have to reduce the dimensionality of the data so that the number of features in the data will reduce by a large extent. Thus, dimensionality reduction simplifies data so that it can be efficiently processed. Apart from visualization, dimensionality reduction helps reveal what is the main feature governing a data set. For example, suppose we want to clas- sify a mail as a spam or a non-spam. A general approach we would follow is to represent it a vector of words appearing in the email. The dimension- ality of this could easily be in hundreds. But, by dimensionality reduction approach may reveal that there are only a few telling features like the words ”free”,”donate”,etc. This can help classify the mail as spam. There are two types by which we can reduce the dimensionality of the given data set. They are: 1. Linear Dimensionality Reduction 2. Non-linear Dimensionality Reduction 2 Linear Dimensionality Reduction The most popular algorithm for dimensionality reduction is Principal Com- ponent Analysis(PCA). Given a data set, PCA finds the vectors along which the data has maximum variance in addition to the relative importance of these directions. An example will explain the PCA in a more intutive way. Take for example, the data we have is the surface of a teapot and we need to capture the most information about the 3D teapot. 2
  • 4. Dimensionality Reduction In order to achieve this, we rotate the teapot and get a position where we can get the most visual information. The method to achieve that will be: First find the axis so that the object has largest extend in average along the axis (red axis). Next, rotate the object around the first axis to find the axis that is perpendicualr to the first axis, and the object has largest extend in average along this axis (green axis). Figure 1: Finding the Principal Components The two axises found are the first and the second principal component. And the extends in average along the axises are called the eigenvalues. Mathematically, the steps involved in PCA are: Suppose we have n documents and overall of m terms, 1. Construct an m x n term document matrix A. Each document is rep- resented as a column vector of m dimensions. 2. Compute the empirical mean of each term. 3. Compute the normalized matrix by subtracting emperical mean from each data dimensions. The mean subtracted is the average across each dimension. 4. Calculate the m x m term covariance matrix from the normalized pro- jections. 5. Calculate the eigenvectors and the eigenvalues of the covariance matrix. Since the covariance matrix is square, we can calculate the eigenvectors 3
  • 5. Dimensionality Reduction and the eigenvalues for this matrix. It is important to notice that these eigenvectorss are both unit eigenvectors. This is very important for PCA. 6. Once the eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives the components in order of significance. Now, if we wish, we can ignore the components of lower significance. we may lose some information, but if the eigenvalues are small, we may not lose much. Then, the final data set will have lesser dimensions. To be precise, if you originally have n dimensions in your data, and so you calculate n eigenvectors and n eigenvalues, and then you choose only the first p eigenvectors, then the final data set has only p dimensions. The value of p can be decided by computing the cumulative energy of the eigenvalues. Choose p such that the cumulative energy is above a certain threshhold say 90% of the overall cumulative energy. Figure 2: Principal Components Analysis on 3D data Despite PCA’s popularity, it has a number of drawbacks. One of the major drawback is the requirement that the data lie on a linear subspace. For example, in the figure 4 below (known as a swiss roll), the data is actually 2 dimensional manifold, but PCA will not correctly extract this data set. There are other approaches to reduce the number of dimensions. It has been observed that high-dimensional data is often much simpler than the dimensionality it actually shows. In other words, a high-dimesional data set 4
  • 6. Dimensionality Reduction may contain many features that are all measurements of the same underlying cause and thus are very closely related. For ex. taking a video footage of a singe object from multiple angles simultaneously. The features of such a dataset contain a lot of overlapping information. This notion is formalized using the notion of a manifold. 3 Non-Linear Dimensionality Reduction 3.1 Manifolds A manifold is basically a low-dimensional Eucledian subspace to which a higher-dimensional subspace is mapped to. A more general topological man- ifold can be described as a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold. And thereby, a line and a circle are one dimensional manifolds. A plane and a sphere are two dimensional manifolds and so on. (a) The sphere (surface of a ball) (b) A 1D manifold embedded in 3D is a two-dimensional manifold since it can be represented by a col- lection of two-dimensional maps. Source:Wikipedia Figure 3: Manifolds In the figure a above, we notice that the triangle drawn on the 3D globe can actually be represented linearly on a 2D space. In the figure b above, notice that the curve is in 3D, but it has zero volume and zero area and hence the 3D representation is somewhat misleading since it can be represented as a line (1D). 5
  • 7. Dimensionality Reduction 3.2 Manifold Learning Manifold learning is one of the most popular approach to non-linear dimen- sionality reduction. The algorithms used for this job are based on the idea that the data is acually present in the low-dimension but is embedded in a high-dimension space, where the low-dimensional space reflects the under- lying parameters. Manifold learning algorithms try to get these parameters in order to find a low-dimensional representation of the data. Some of the widely used algorithms for this purpose are: Isomap, Locally Linear Embed- ding, Laplacian Eigenmaps, Semidefinite Embedding. The best example used to explain the manifold learning is the swiss roll, a 2D manifold embedded in 3D shown in the figure below. Figure 4: Swiss Roll Manifold 3.2.1 Isomap Isomap - short for isometric feature mapping - was one of the first algorithms introduced for manifold learning. Its one of the most applied procedure for the problem. Isomap consists of two main steps: 1. Estimate the geodesic distances (distances along a manifold) between the points using shortest path distances on the data set’s k-nearest neighbour graph. 2. Then use Multi Dimensional Scaling (MDS), to map the distances we got in the first step, onto the LD Eucledian space keeping in mind the interpoint distances between the points computed in the first step. 6
  • 8. Dimensionality Reduction Estimating Geodesic distances A geodesic is defined as a curve that locally minimizes the distance be- tween two points on any mathematically defined space, such as a curved manifold. Equivalently, it is a path of minimal curvature. In noncurved three-dimensional space, the geodesic is a straight line. We make an assumption that the data is present in D dimension and the manifold is assumed to be present in d dimention. Isomap furthur assumes that there is a chart that preserves the distances between points i.e. if xi,xj are points on the manifold and G(xi , xj ) is the geodesic distance between them, then there is a chart f : M → Rd such that ||f (xi ) − f (xj )|| = G(xi , xj )|| and the manifold is smooth enough that the geodesic distance is between the nearby points are approximately linear. Multidimensionaly scaling (MDS) After finding the geodesic distances, Isomap finds the points whose Eu- clidean distances are equal to these geodesic distances. And since the mani- fold is isometrically embedded, such points exist. Multidimensional Scaling is a classical techique that may be used to find such points. The MDS basically constructs a set of points from a matrix of dissimlarities whose interpoint Euclidean distances match those in data’s actual dimension D, closely. Clas- sical MDS (cMDS) algorithm is used by Isomap to minimize the cost. The cMDS algorithm takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain. Hence, first compute the pairwise distances from a given set of m vectors, (x1 , x2 , ..xm ) in n dimensional space. · · · δ1.m   0 δ1,2  δ2,1  0 · · · δ2,m   ∆=  .  . ...   .   δm,1 δm,2 · · · 0 And then, map the vectors onto the manifold of a lower dimension k<<n,subject to the following optimization criteria: min(x1 , x2 , ..xm ) (|xi − xj | − δi,j )2 i<j 7
  • 9. Dimensionality Reduction Isomap working The Isomap algorithm sestimates the geodesic distances using the shortest path algorithms and then finding an embedding of these distances in Eu- clidean space using cMDS algorithm. Algorithm 1: Isomap input : x1 , x2 ...., xn RD , k 1. Form the k-nearest neighbor graph with edge weights Wij := |xi − xj | for neighboring points xi , xj . 2. Compute the shortest path distances between all pairss of points using Dijkstra’s or Floyd’s algorithm. Store the squares of these distances in D where D is the Euclidean distance matrix. 3. Return Y:=cMDS(D). Figure 5: Isomap Manifold Learning One particularly helpful feature of Isomap - not found in some of the other algorithms - is that it automatically provides an estimate of the dimen- sionality of the underlying manifold. In particular, the number of non-zero eigenvalues found by Classical MDS (cMDS) gives the underlying dimension- ality. 3.2.2 Locally Linear Embedding (LLE) LLE assumes the manifold to be a collection of overlapping coordinate patches. And if the neighbourhood sizes are small and the manifold is smooth, then, the patches can be considered as almost linear. LLE also begins by finding a set of nearest neighbors of each point. It then computes a set of weights for 8
  • 10. Dimensionality Reduction each point from its neighbors that best reconstructs each data point. Then in the end, it uses the eigenvector based optimization to find the LD embed- ding of points such that the intrinsic weights are preserved to maintain the nonlinear manifold in the LD space. Figure 6: LLE algorithm To be more precise, the LLE algorithm is given as inputs an n x p data matrix X, with rows xi , a desired number of dimensions q < p and an integer k for finding local neighborhoods, where k ≥ q+1. The output is supposed to be an n x q matrix Y, with rows yi The steps involved in the LLE algorithm is given below: 9
  • 11. Dimensionality Reduction Algorithm 2: Locally Linear Embedding (LLE) 1. For each xi , find the k nearest neighbors 2. Find the weight matrix W which minimizes the residual sum of squares for reconstructing each xi from its neighbors. n RSS(w) ≡ ||xi − wij xj ||2 i=1 j=1 where wij = 0 unless xj is one of xi ’s k-nearest neighbors, and for each i, wij = 1 j 3. Find the cordinates Y which minimize the reconstruction error using the weights, n φ(Y) ≡ ||yi − wij yj ||2 i=1 j=1 subject to the constraints that Yij = 0 for each j and that Y T Y = I j 3.2.3 Isomap vs Locally Linear Embedding (LLE) Embedding Type Isomap looks for isometric embedding i.e it assumes that there is a coordi- nate chart from the parameter space to HD space that preserves interpoint distances and attempt to uncover this chart. LLE looks for conformal map- pings i.e mapping which preserves local distances between points but not the distances between all the points. Local vs Global Isomap is a global method because it considers the geodesic distances be- tween all pairs of the points on the manifold. LLE is a local method because it constructs an embedding considering only the placement of a point with respect ot its neighbors. 10
  • 12. Dimensionality Reduction 3.3 Applications of Manifold Learning Manifold learning methods are adaptable data-representation techniques that enable dimensionality reduction and processing tasks in meaningful spaces. Their success in medical image analysis as well as in other scientific fields lies in both, their flexibility and the simplicity of their application. In medical imaging, manifold learning has been successfully used to visualize, cluster, classify and fuse high dimensional data, as well as for Content Based Image Retrieval (CBIR), segmentation, registration, statistical population analysis and shape modeling and classification. 1. Manifold learning is used for patient position detection in MRI. Low- resolution images, acquired during the initial placement of the patient in the scanner, are exploited for detecting the patient position. 2. Isomap Method is used in prediction of Protein Quaternary Structure. 3. medical image analysis: applications to video endoscopy and 4D imag- ing. 4. Identifying spectral clustering. 5. Identify increase or decrease in disease cells or a tumour in application to Neuroimaging. 6. Character Recognition. 7. Research is going on for using Manifold learning for image and video indexing. As we know, there are millions and millions of videos on the internet. These are stored in the repositories along with the people creating and sharing them. Now, if a person queries for an image or a video, we need to effectively identify duplication and copyright of the image or the videos. For this purpose, manifold learning is being used for image/video analyzing, indexing and searching. 11
  • 13. Dimensionality Reduction References [1] ”Algorithms for manifold learning” by Lawrence Cayton - June 15,2005. [2] ”A tutorial on Principal Component Analysis” by Lindsay I Smith - February 26,2002 [3] ”Linear Dimensionality Reduction” by Percy Liang - October 16,2006 [4] ”A layman’s introduction to principal component analysis” by VisuMap Technologies [5] en.wikipedia.org/wiki/Principal_component_analysis [6] en.wikipedia.org/wiki/Nonlinear_dimensionality_ reductionr 12