In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
4. About Dimensionality Reduction
● “Curse of dimensionality”
○ High time + space complexity
○ Overfitting
○ Presence of irrelevant data
● What ml algo’s want
○ Uncorrelated data or independent variables
○ Less enough data to predict
● “blessing of dimensionality”
5. Feature Selection
● What is Feature Selection?
○ Simplify models to make them easier to interpret by users
○ Shorter training times
○ Avoid the curse of dimensionality
○ Enhanced generalization by reducing overfitting
● Ways to select subsets for feature selections are
○ Optimum method
○ Heuristic method
○ Randomized method
● Feature evaluation methods
○ Filter methods (unsupervised method)
○ Wrapper method (supervised method)
7. Subset Selection
● Forward Selection method
○ It starts with empty data set.
○ Try each remaining feature
○ Estimate classification/regression error for adding each feature
○ Select features that give maximum improvement
○ Stop when there is no significant improvement
● Backward Elimination method
○ Starts with the full feature set
○ Try removing features
○ Drop feature with smallest improvement/impact
○ Stop when there is no significant improvement
8. Univariate and MultiVariate
● Univariant
○ Pearson correlation coefficient
○ F-score
○ Chi-square
○ Signal to noise ratio
○ Mutual information
● Multivariat
○ Minimum Redundancy and Maximum Relevance (mRMR)
○ Fast Correlation based Feature Selection (FCBF)
9. Feature extraction
● It doesn’t select any subset
● It creates new feature set
● After creation features are uncorrelated among
themselves
● Types of feature extractions
○ PCA (Principal Component Analysis)
○ LDA (Linear Discriminant Analysis)
16. Eigenvector and eigenvalues
● Principal component = eigenvector
● When we use svd, every eigenvector has a
eigenvalue
● Terminate those eigenvector with value 0
● We will have projection values which could be later
on used for reducing the dimensions of other inputs.