SlideShare a Scribd company logo
1 of 14
Application of Fisher Linear Discriminant Analysis
to Speech/Music Classification
Enrique Alexandre, Manuel Rosa, Lucas Cuadra, and Roberto Gil-Pita
Departamento de Teor´ıa de la Se˜nal y Comunicaciones
Universidad de Alcal´a. 28805 - Alcal´a de Henares,
Madrid, Spain
Presented By:
S. Lushanthan
Agenda
 Objective
 Time Frequency Decomposition
 Feature Extraction
 Classification Algorithms
 Data Collection
 Results and Discussion
Objective
 The well-known K-N-N algorithm has been widely used in many sound
classification applications. The Objective here is to,
“Demonstrate the superior behavior of the Fishers Linear Discriminant
algorithm compared to the K-Nearest-Neighbor algorithm”
Why Speech/ Music Classification?
 Fisher LDA Classifier has not been tried much in the domain of speech/ audio
classification
 If this succeeds, this would be a first-step in many Music-Genre Classification
systems
Signal Processing – Time frequency
decomposition
Feature Extraction
 Literature says that features can be classified in to 3 different classes,
1. Timbre-related
2. Rhythm related
3. Pitch-related
 For simplification purposes only “timbre-related” features are used
 A 512-samples window is used, with no overlap between adjacent frames
 The time-frequency decomposition is performed using either a Modified
Discrete Cosine Transform (MDCT), or a Discrete Fourier Transform (DFT)
 All the features are calculated and their mean and standard deviation are
computed every 43 frames (1.85 seconds at our sampling rate). Thus a 2-
dimensional vector, containing the mean and standard deviation
computed every 43 frames
Feature Description Mathematical Equation
Spectral Centroid Measure of brightness of a sound
Spectral Roll-off Shape of the spectrum
Zero Crossing Rate (ZCR) How noisy a signal is
High Zero Crossing Rate Ratio # of frames whose ZCR is 1.5x above
the mean ZCR
Short-Time Energy (STE) Mean energy of the signal within each
analysis frame
Low Short-Time Energy Ratio Ratio of frames whose STE is 0.5x below
the mean STE
Mel-frequency Cepstral
Coefficients (MFCC)
Provide a compact representation of
the spectral envelope
Voice2White measure of the energy inside the
typical speech band (300-4000 Hz)
respect to the whole energy of the
signal
Activity Level calculated using method for the
objective measurement of active
speech
Classification Algorithms
K- Nearest- Neighbor
 Classification Rule
Assume that we have a training set with L vectors grouped into C different classes. To
obtain the class corresponding to a new observed vector X, the algorithm has simply
to look for the K nearest neighbors to the test vector X, and weigh their class
numbers they belong to, usually using a majority rule.
Fisher LDA
 Data are projected onto a line, and the classification is performed in this one-
dimensional space
 The class separability function in a direction w є Rn is defined as:
 Find an analytic expression for w which maximizes J(w):
SB and SW are the between-class and
within class scatter matrixes respectively
Data Collection
 Corpus for speech/music classification provided by Dan Ellis originally recorded by
Eric Scheirer during his internship at Interval Research Corporation
“Music-Speech” Corpus
Training
Data
music
(60 files)
speech
(60 files)
m + s
(60 files)
Test
Data
speech
(without bgm)
(120 files)
music with
no vocals
(126 files)
music with
vocals
(120 files)
45 minutes,
15 seconds
each
15.25 minutes,
2.5 seconds
each
Results and Discussion
 Fisher LDA, 1-N-N, 3-N-N for all features individually
 Probability of Error
Feature Fisher 1-NN 3-NN
Centroid (MDCT) 8.74% 17.48% 21.85%
Centroid (DFT) 16.66% 29.23% 30.60%
Roll-off (MDCT) 14.48% 25.40% 21.85%
Roll-off (DFT) 8.19% 13.11% 13.11%
ZCR 9.83% 19.67% 18.03%
HZCRR 25.13% 39.89% 36.33%
STE 48.63% 22.40% 22.67%
LSTER 11.74% 33.87% 23.77%
MFCC 4.09% 22.13% 26.50%
Voice2White 4.91% 6.28% 6.01%
Activity level 12.84% 18.03% 18.85%
Combination of two or more
of these features does not
seem to improve the results.
e.g:
MFCC and the Voice2White
features with a Fisher linear
discriminant classifier, leads to a
probability of error equal to
4.09%,the same with MFCC alone
Confusion matrixes using the Voice2White
feature
Classifier Speech Music
Fisher
Speech 104 16
Music 2 244
1-N-N
Speech 114 6
Music 17 229
3-N-N Speech 116 4
Music 18 228
Fisher LDA has
high probability
of error when the
input is Speech
K-N-N has high
probability of
error when the
input is Music
So Why not combine classifiers using
Majority Rule for better results?
Probability of Error drops to 4.5%
Conclusion
 Fisher linear discriminant analysis can provide very promising results using
only one feature for the classification
 Better results may be obtained combining the results obtained from two or
more classifiers
Thank you!

More Related Content

What's hot

NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine LearningPavithra Thippanaik
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureRajesh Piryani
 
Interactive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationInteractive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationQuentin Pleplé
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Conditioncsandit
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering ReportMiaolan Xie
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015Edhole.com
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsJames Bell
 
Smooth entropies a tutorial
Smooth entropies a tutorialSmooth entropies a tutorial
Smooth entropies a tutorialwtyru1989
 
icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIzukun
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Izukun
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
presentation
presentationpresentation
presentationjie ren
 

What's hot (20)

NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
Interactive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationInteractive Latent Dirichlet Allocation
Interactive Latent Dirichlet Allocation
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition
 
Db Scan
Db ScanDb Scan
Db Scan
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
 
Smooth entropies a tutorial
Smooth entropies a tutorialSmooth entropies a tutorial
Smooth entropies a tutorial
 
Combinatorial Optimization
Combinatorial OptimizationCombinatorial Optimization
Combinatorial Optimization
 
icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part II
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 
Dft
DftDft
Dft
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
presentation
presentationpresentation
presentation
 
main
mainmain
main
 

Viewers also liked

Introduction to Functional Data Analysis
Introduction to Functional Data AnalysisIntroduction to Functional Data Analysis
Introduction to Functional Data AnalysisRené Franck Essomba
 
face recognition system
face recognition systemface recognition system
face recognition systemAnil Kumar
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageLushanthan Sivaneasharajah
 
LDA presentation
LDA presentationLDA presentation
LDA presentationMohit Gupta
 
4 new-patch-agggregation.pptx
4 new-patch-agggregation.pptx4 new-patch-agggregation.pptx
4 new-patch-agggregation.pptxmustafa sarac
 
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15MLconf
 
Kernel fisher discriminant
Kernel fisher discriminantKernel fisher discriminant
Kernel fisher discriminantĐỗ Hợp
 
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorLec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorUnited States Air Force Academy
 
Face recognition using laplacian faces
Face recognition using laplacian facesFace recognition using laplacian faces
Face recognition using laplacian facesPulkiŧ Sharma
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisBhasker Rajan
 
T18 discriminant analysis
T18 discriminant analysisT18 discriminant analysis
T18 discriminant analysiskompellark
 
discriminant analysis
discriminant analysisdiscriminant analysis
discriminant analysiskrishnadk
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisMurali Raj
 

Viewers also liked (20)

Introduction to Functional Data Analysis
Introduction to Functional Data AnalysisIntroduction to Functional Data Analysis
Introduction to Functional Data Analysis
 
face recognition system
face recognition systemface recognition system
face recognition system
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
LDA presentation
LDA presentationLDA presentation
LDA presentation
 
4 new-patch-agggregation.pptx
4 new-patch-agggregation.pptx4 new-patch-agggregation.pptx
4 new-patch-agggregation.pptx
 
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
 
Kernel fisher discriminant
Kernel fisher discriminantKernel fisher discriminant
Kernel fisher discriminant
 
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorLec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
 
Microsoft Web Technology Stack
Microsoft Web Technology StackMicrosoft Web Technology Stack
Microsoft Web Technology Stack
 
Face recognition using LDA
Face recognition using LDAFace recognition using LDA
Face recognition using LDA
 
Face recognition using laplacian faces
Face recognition using laplacian facesFace recognition using laplacian faces
Face recognition using laplacian faces
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
LDA
LDALDA
LDA
 
PCA vs LDA
PCA vs LDAPCA vs LDA
PCA vs LDA
 
T18 discriminant analysis
T18 discriminant analysisT18 discriminant analysis
T18 discriminant analysis
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Lda
LdaLda
Lda
 
discriminant analysis
discriminant analysisdiscriminant analysis
discriminant analysis
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 

Similar to Application of Fisher LDA to Classify Speech and Music

sound level meter octave band ananlyser.pptx
sound level meter octave band ananlyser.pptxsound level meter octave band ananlyser.pptx
sound level meter octave band ananlyser.pptxpriyankatabhane
 
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...April Smith
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]威華 王
 
Acoustic fMRI noise reduction: a perceived loudness approach
Acoustic fMRI noise reduction: a perceived loudness approachAcoustic fMRI noise reduction: a perceived loudness approach
Acoustic fMRI noise reduction: a perceived loudness approachDimitri Vrehen
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsIJMTST Journal
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONIRJET Journal
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniquesidescitation
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...csandit
 
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...cscpconf
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...csandit
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotionsPranay Prasoon
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]威華 王
 

Similar to Application of Fisher LDA to Classify Speech and Music (20)

sound level meter octave band ananlyser.pptx
sound level meter octave band ananlyser.pptxsound level meter octave band ananlyser.pptx
sound level meter octave band ananlyser.pptx
 
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
 
F010334548
F010334548F010334548
F010334548
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
Acoustic fMRI noise reduction: a perceived loudness approach
Acoustic fMRI noise reduction: a perceived loudness approachAcoustic fMRI noise reduction: a perceived loudness approach
Acoustic fMRI noise reduction: a perceived loudness approach
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
example based audio editing
example based audio editingexample based audio editing
example based audio editing
 
T26123129
T26123129T26123129
T26123129
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotions
 
S@P Noise.pptx
S@P Noise.pptxS@P Noise.pptx
S@P Noise.pptx
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Application of Fisher LDA to Classify Speech and Music

  • 1. Application of Fisher Linear Discriminant Analysis to Speech/Music Classification Enrique Alexandre, Manuel Rosa, Lucas Cuadra, and Roberto Gil-Pita Departamento de Teor´ıa de la Se˜nal y Comunicaciones Universidad de Alcal´a. 28805 - Alcal´a de Henares, Madrid, Spain Presented By: S. Lushanthan
  • 2. Agenda  Objective  Time Frequency Decomposition  Feature Extraction  Classification Algorithms  Data Collection  Results and Discussion
  • 3. Objective  The well-known K-N-N algorithm has been widely used in many sound classification applications. The Objective here is to, “Demonstrate the superior behavior of the Fishers Linear Discriminant algorithm compared to the K-Nearest-Neighbor algorithm” Why Speech/ Music Classification?  Fisher LDA Classifier has not been tried much in the domain of speech/ audio classification  If this succeeds, this would be a first-step in many Music-Genre Classification systems
  • 4. Signal Processing – Time frequency decomposition
  • 5. Feature Extraction  Literature says that features can be classified in to 3 different classes, 1. Timbre-related 2. Rhythm related 3. Pitch-related  For simplification purposes only “timbre-related” features are used  A 512-samples window is used, with no overlap between adjacent frames  The time-frequency decomposition is performed using either a Modified Discrete Cosine Transform (MDCT), or a Discrete Fourier Transform (DFT)  All the features are calculated and their mean and standard deviation are computed every 43 frames (1.85 seconds at our sampling rate). Thus a 2- dimensional vector, containing the mean and standard deviation computed every 43 frames
  • 6. Feature Description Mathematical Equation Spectral Centroid Measure of brightness of a sound Spectral Roll-off Shape of the spectrum Zero Crossing Rate (ZCR) How noisy a signal is High Zero Crossing Rate Ratio # of frames whose ZCR is 1.5x above the mean ZCR Short-Time Energy (STE) Mean energy of the signal within each analysis frame Low Short-Time Energy Ratio Ratio of frames whose STE is 0.5x below the mean STE Mel-frequency Cepstral Coefficients (MFCC) Provide a compact representation of the spectral envelope Voice2White measure of the energy inside the typical speech band (300-4000 Hz) respect to the whole energy of the signal Activity Level calculated using method for the objective measurement of active speech
  • 7. Classification Algorithms K- Nearest- Neighbor  Classification Rule Assume that we have a training set with L vectors grouped into C different classes. To obtain the class corresponding to a new observed vector X, the algorithm has simply to look for the K nearest neighbors to the test vector X, and weigh their class numbers they belong to, usually using a majority rule.
  • 8. Fisher LDA  Data are projected onto a line, and the classification is performed in this one- dimensional space  The class separability function in a direction w є Rn is defined as:  Find an analytic expression for w which maximizes J(w): SB and SW are the between-class and within class scatter matrixes respectively
  • 9. Data Collection  Corpus for speech/music classification provided by Dan Ellis originally recorded by Eric Scheirer during his internship at Interval Research Corporation
  • 10. “Music-Speech” Corpus Training Data music (60 files) speech (60 files) m + s (60 files) Test Data speech (without bgm) (120 files) music with no vocals (126 files) music with vocals (120 files) 45 minutes, 15 seconds each 15.25 minutes, 2.5 seconds each
  • 11. Results and Discussion  Fisher LDA, 1-N-N, 3-N-N for all features individually  Probability of Error Feature Fisher 1-NN 3-NN Centroid (MDCT) 8.74% 17.48% 21.85% Centroid (DFT) 16.66% 29.23% 30.60% Roll-off (MDCT) 14.48% 25.40% 21.85% Roll-off (DFT) 8.19% 13.11% 13.11% ZCR 9.83% 19.67% 18.03% HZCRR 25.13% 39.89% 36.33% STE 48.63% 22.40% 22.67% LSTER 11.74% 33.87% 23.77% MFCC 4.09% 22.13% 26.50% Voice2White 4.91% 6.28% 6.01% Activity level 12.84% 18.03% 18.85% Combination of two or more of these features does not seem to improve the results. e.g: MFCC and the Voice2White features with a Fisher linear discriminant classifier, leads to a probability of error equal to 4.09%,the same with MFCC alone
  • 12. Confusion matrixes using the Voice2White feature Classifier Speech Music Fisher Speech 104 16 Music 2 244 1-N-N Speech 114 6 Music 17 229 3-N-N Speech 116 4 Music 18 228 Fisher LDA has high probability of error when the input is Speech K-N-N has high probability of error when the input is Music So Why not combine classifiers using Majority Rule for better results? Probability of Error drops to 4.5%
  • 13. Conclusion  Fisher linear discriminant analysis can provide very promising results using only one feature for the classification  Better results may be obtained combining the results obtained from two or more classifiers

Editor's Notes

  1. Discreate Fourier Transformation
  2. J(w) Equation- Rayleigh quotient