SlideShare una empresa de Scribd logo
1 de 14
Principal Component
Analysis-II
PCA Process – STEP 1: Get some data
• Subtract the mean from each of the
dimensions
• This produces a data set whose mean
is zero.
• Subtracting the mean makes variance
and covariance
• calculation easier by simplifying their
equations.
• The variance and co-variance values
are not affected by the mean value.
• Suppose we have two measurement
types X1 and X2, hence m = 2, and ten
samples each, hence n = 10.
•PCA Process – STEP 1: Get some data
PCA Process – STEP 2
• Calculate the covariance matrix
• Since the data is 2 dimensional, the covariance matrix will be 2x2.
• Since the non-diagonal elements in this covariance matrix are positive,
we should expect that both the X1 and X2 variables increase together.
• Since it is symmetric, we expect the eigenvectors to be orthogonal.
PCA Process – STEP 3
• Since the covariance matrix is square, we can calculate the eigenvectors and
eigenvalues for this matrix.
• Calculate the eigen vectors V and eigen values D of the covariance matrix
It is important to notice that these eigenvectors are both unit eigenvectors ie.
their lengths are both 1.
PCA Process – STEP 3
Eigenvectors are plotted as diagonal dotted
lines on the plot. (note: they are perpendicular
to each other).
One of the eigenvectors goes through the
middle of the points, like drawing a line of best
fit.
The second eigenvector gives us the other, less
important, pattern in the data, that all the
points follow the main line, but are off to the
side of the main line by some amount.
PCA Process – STEP 4
Reduce dimensionality and form feature
vector
• The eigenvector with the highest
eigenvalue is the principal component of
the data set.
• In our example, the eigenvector with the
largest eigenvalue is the one that points
down the middle of the data.
• Once eigenvectors are found from the
covariance matrix, the next step is to order
them by eigenvalue, highest to lowest. This
gives the components in order of
significance.
PCA Process – STEP 4
You need to form a feature vector, which is just a fancy name for a matrix of vectors. This is
constructed by taking the eigenvectors that you want to keep from the list of eigenvectors,
and forming a matrix with these eigenvectors in the columns.
STEP 4 - Deriving the new data set
• This the final step in PCA, and is also the easiest.
• Once we have chosen the components (eigenvectors) that we wish to keep
in our data and formed a feature vector, we simply take the transpose of
the vector and multiply it on the left of the original data set, transposed.
RowFeatureVector is the matrix with the eigenvectors in the columns
transposed so that the eigenvectors are now in the rows, with the most
significant eigenvector at the top.
RowZeroMeanData is the mean-adjusted data transposed, ie. the data items
are in each column, with each row holding a separate dimension.
Note: This is essential Rotating the coordinate axes so higher-variance axes come first.
• STEP 4 - Deriving the new data set
FinalData is the final data set, with data items in
columns, and dimensions along rows.
• STEP 4 - Deriving the new data set
• So, how do we get the original data back?
• Before we do that, remember that only if we took all the eigenvectors in our
transformation will we get exactly the original data back.
• If we have reduced the number of eigenvectors in the final transformation, then the
retrieved data has lost some information.
• STEP 5 - Getting the old data back
• Recall that the final transform is this:
• which can be turned around so that, to get the original data back,
• STEP 5 - Getting the old data back
where is the inverse of . However, when we take all
the eigenvectors in our feature vector, it turns out that the inverse of our feature vector is
actually equal to the transpose of our feature vector. This is only true because the
elements of the matrix are all the unit eigenvectors of our data set.
But, to get the actual original data back, we need to add on the mean of that original data
(remember we subtracted it right at the start). So, for completeness,
This formula also applies to when you do not have all the eigenvectors in the
feature vector. So even when you leave out some eigenvectors, the above
equation still makes the correct transform.
• STEP 5 - Getting the old data back

Más contenido relacionado

La actualidad más candente

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 

La actualidad más candente (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
"Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ..."Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ...
 
PCA (Principal Component Analysis)
PCA (Principal Component Analysis)PCA (Principal Component Analysis)
PCA (Principal Component Analysis)
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 

Similar a Lect5 principal component analysis

Principal component analysis.pptx
Principal component analysis.pptxPrincipal component analysis.pptx
Principal component analysis.pptx
MeenakshiR43
 
Principal component analysis.pptx
Principal component analysis.pptxPrincipal component analysis.pptx
Principal component analysis.pptx
MeenakshiR43
 

Similar a Lect5 principal component analysis (20)

pcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxpcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptx
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 
PCA Final.pptx
PCA Final.pptxPCA Final.pptx
PCA Final.pptx
 
Principal component analysis.pptx
Principal component analysis.pptxPrincipal component analysis.pptx
Principal component analysis.pptx
 
Principal component analysis.pptx
Principal component analysis.pptxPrincipal component analysis.pptx
Principal component analysis.pptx
 
Principal component analysis.pptx
Principal component analysis.pptxPrincipal component analysis.pptx
Principal component analysis.pptx
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Sess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptxSess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptx
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Covariance.pdf
Covariance.pdfCovariance.pdf
Covariance.pdf
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Unit3_1.pptx
Unit3_1.pptxUnit3_1.pptx
Unit3_1.pptx
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
svm.pptx
svm.pptxsvm.pptx
svm.pptx
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
pca.ppt
pca.pptpca.ppt
pca.ppt
 
The following ppt is about principal component analysis
The following ppt is about principal component analysisThe following ppt is about principal component analysis
The following ppt is about principal component analysis
 

Más de hktripathy

Más de hktripathy (17)

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 

Último

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Lect5 principal component analysis

  • 2. PCA Process – STEP 1: Get some data • Subtract the mean from each of the dimensions • This produces a data set whose mean is zero. • Subtracting the mean makes variance and covariance • calculation easier by simplifying their equations. • The variance and co-variance values are not affected by the mean value. • Suppose we have two measurement types X1 and X2, hence m = 2, and ten samples each, hence n = 10.
  • 3. •PCA Process – STEP 1: Get some data
  • 4. PCA Process – STEP 2 • Calculate the covariance matrix • Since the data is 2 dimensional, the covariance matrix will be 2x2. • Since the non-diagonal elements in this covariance matrix are positive, we should expect that both the X1 and X2 variables increase together. • Since it is symmetric, we expect the eigenvectors to be orthogonal.
  • 5. PCA Process – STEP 3 • Since the covariance matrix is square, we can calculate the eigenvectors and eigenvalues for this matrix. • Calculate the eigen vectors V and eigen values D of the covariance matrix It is important to notice that these eigenvectors are both unit eigenvectors ie. their lengths are both 1.
  • 6. PCA Process – STEP 3 Eigenvectors are plotted as diagonal dotted lines on the plot. (note: they are perpendicular to each other). One of the eigenvectors goes through the middle of the points, like drawing a line of best fit. The second eigenvector gives us the other, less important, pattern in the data, that all the points follow the main line, but are off to the side of the main line by some amount.
  • 7. PCA Process – STEP 4 Reduce dimensionality and form feature vector • The eigenvector with the highest eigenvalue is the principal component of the data set. • In our example, the eigenvector with the largest eigenvalue is the one that points down the middle of the data. • Once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives the components in order of significance.
  • 8. PCA Process – STEP 4 You need to form a feature vector, which is just a fancy name for a matrix of vectors. This is constructed by taking the eigenvectors that you want to keep from the list of eigenvectors, and forming a matrix with these eigenvectors in the columns.
  • 9. STEP 4 - Deriving the new data set • This the final step in PCA, and is also the easiest. • Once we have chosen the components (eigenvectors) that we wish to keep in our data and formed a feature vector, we simply take the transpose of the vector and multiply it on the left of the original data set, transposed. RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top. RowZeroMeanData is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension. Note: This is essential Rotating the coordinate axes so higher-variance axes come first.
  • 10. • STEP 4 - Deriving the new data set FinalData is the final data set, with data items in columns, and dimensions along rows.
  • 11. • STEP 4 - Deriving the new data set
  • 12. • So, how do we get the original data back? • Before we do that, remember that only if we took all the eigenvectors in our transformation will we get exactly the original data back. • If we have reduced the number of eigenvectors in the final transformation, then the retrieved data has lost some information. • STEP 5 - Getting the old data back
  • 13. • Recall that the final transform is this: • which can be turned around so that, to get the original data back, • STEP 5 - Getting the old data back where is the inverse of . However, when we take all the eigenvectors in our feature vector, it turns out that the inverse of our feature vector is actually equal to the transpose of our feature vector. This is only true because the elements of the matrix are all the unit eigenvectors of our data set. But, to get the actual original data back, we need to add on the mean of that original data (remember we subtracted it right at the start). So, for completeness,
  • 14. This formula also applies to when you do not have all the eigenvectors in the feature vector. So even when you leave out some eigenvectors, the above equation still makes the correct transform. • STEP 5 - Getting the old data back