SlideShare una empresa de Scribd logo
1 de 15
ENVIRONMENTAL NATURAL
SOUND DETECTION AND
CLASSIFICATION USING
CONTENT-BASED RETRIEVAL
(CBR) AND MFCC

1

Project Mentor :- Shiladitya Pujari
Project group member :Par th Sinha(20093043)
Pankaj Kumar(20093013)
Manas Sarkar(20093030)
Ruchasri Nath(20093055)
MAIN TOPICS
 Objective
 Methodology
 Result
 Future

scope & conclusion

2
OBJECTIVE


To develop an Environmental Sound Detection &
Classification technique (using Content Based
Retrieval & MFCC) so that computer system can
predict and understand “SOUND” more
accurately.



To make computer systems more intelligent &
reliable in understanding its environment based
on this technique.

3
DESCRIPTION OF TERMS


MFCC



CBR

4
WHAT ARE MFCCS?








In sound processing, the Mel-frequency cepstrum (MFC) is a
representation of the short-term power spectrum of a sound, based
on a linear cosine transform of a log power spectrum on a
nonlinear Mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients
that collectively make up an MFC. They are derived from a type
of cepstral representation of the audio clip (a nonlinear "spectrum-ofa-spectrum").
The difference between the cepstrum and the Mel-frequency
cepstrum is that in the MFC, the frequency bands are equally spaced
on the Mel scale, which approximates the human auditory system's
response more closely than the linearly-spaced frequency bands used
in the normal cepstrum. This frequency warping can allow for better
representation of sound, for example, in audio compression.
MFCCs are commonly derived as follows:
1. Take the Fourier transform of (a windowed excerpt of) a signal.
2. Map the powers of the spectrum obtained above onto the Mel 5
scale, triangular overlapping windows.
(CONTD…….)
3.Take the logs of the powers at each of the mel frequencies.
4.Take the discrete cosine transform of the list of mel log powers,
as if it were a signal.
5. The MFCCs are the amplitudes of the resulting spectrum.
 MFCCs
are
commonly
used
as features in speech
recognition systems, such as the systems which can
automatically recognize numbers spoken into a telephone. They
are also common in speaker recognition, which is the task of
recognizing people from their voices.
 MFCCs are also increasingly finding uses in music information
retrieval applications
such
as genre classification,
audio
similarity measures, etc.
6
CBR


Content Based Retrieval means that the retrieval
and the required search is based on the analysis
of the actual contents of the data(here sound)
rather than the metadata such as keywords, tags
and/or descriptions associated with the sounds.



In our project we’ll use multimedia database
which provides Content Based Retrieval .

7
METHODOLOGY(1)
The major steps involved in the entire method
are as follows :
 Extraction

of feature for classifying highly diversified
natural sounds.

 Making

clusters according to their feature similarity.

 Finding

a match for a particular sound query from the
cluster.

8
METHODOLOGY(2)






First we take input sound(audio signal of any format).
Then some preprocessing will be done to normalize the
signals.
Feature Extraction of the audio signal.
Next will be the Classification phase(consisting of two
phases):Training phase
 Testing phase


9
METHODOLOGY(3)

10

Fig: Mel Frequency Cepstral Coefficient pipeline
PROCESS DESCRIPTION
Sampling


It is the process of converting a continuous signal into a discrete signal. Sampling can be done for
signals varying in space, time, or any other dimension, and similar results are obtained in two or
more dimensions.

Pre-emphasis


In processing of electronic audio signals,pre-emphasis refers to a system process designed to
increase (within a frequency band) the magnitude of some (usually higher) frequencies with respect
to the magnitude of other (usually lower) frequencies in order to improve the overall signal-to-noise
ratio (SNR) by minimizing the adverse effects.

Windowing


In signal processing, a window function (also known as tapering function) is a mathematical
function that is zero-valued outside of some chosen interval. For instance, a function that is
constant inside the interval and zero elsewhere is called a rectangular window, which describes the
shape of its graphical representation.

Fast Fourier Transform


FFTs are of great importance to a wide variety of applications, from digital signal processing and
solving partial differential equations to algorithms for quick multiplication of large integers.

Absolute Value


11
In mathematics, the absolute value (or modulus) |a of a real number a is the numerical value of a
without its sign. The absolute value of a number may be thought of as its distance from zero.
PROCESS
DESCRIPTION(CONTINUED..)
Discrete cosine transformation(DCT)


In particular, a DCT is a Fourier-related transform similar to the discrete Fourier transform
(DFT), but uses only real numbers. DCTs are equivalent to DFTs of roughly twice the length,
operating on real data with even symmetry (since the Fourier transform of a real and even
function is real and even), where in some variants the input and/or output data are shifted by
half a sample. There are eight standard DCT variants, of which four are commonly used.

Linear Discriminate Analysis (LDA)


Linear discriminate analysis (LDA) and the related Fisher's linear discriminate are methods
used in statistics, pattern recognition and machine learning to find a linear combination of
features which characterizes or separates two or more classes of objects or events. The
resulting combination may be used as a linear classifier or, more commonly, for
dimensionality reduction before later classification.

12
TRAINING AND TESTING

Fig: Flow chart of Training Session

13
Fig: Flowchart of Testing Session
RESULT
On using the above mentioned approaches (MFCC and
CBR) for sound detection and classification system we find
that the Recognition Rate is very high and very accurate.
Although the recognition rate is high enough, one
problem is that of Rejection Rate, that is, the rejection rate
is not quite good enough.
This implies that if the particular sound that is to be
tested is already present in the database then the matching
process is very accurate but if that sound is not present in
the database then the system doesn’t reject the sound (or
stop the matching) rather it matches it with the nearest
and closest sounds in terms of features.  
14
CONCLUSION
Future scope and applications
 Environmental monitoring
 Speaker recognition
 Genre classification
  Audio similarity measures
 Robotic awareness
Conclusion
This method of environmental sound detection and classification is developed using MFCC
pipeline and CBR for extraction of features of a particular sound and retrieval of sound
features from the multimedia database respectively. This method can be implemented in the
domain of robotics where sound detection and recognition may be possible up to a satisfactory
level. If the method will be properly implemented with computer vision, then humancomputer interaction process can be developed much. MFCC is undoubtedly more efficient
feature extraction method because it is designed by giving emphasis on human perception
power. Using more than one features of a sound may obviously improve the performance of the
15
method. Applying clustering technique, accuracy can be boosted. Another good feature
available today is Audio spectrum projection provided by MPEG7 specification. Inclusion of this
feature may increase the performance measure of the method.

Más contenido relacionado

La actualidad más candente

Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 
COLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisRushin Shah
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker VerificationCody Ray
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABsipij
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
 
Automatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approachAutomatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approachAbdullah al Mamun
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
Speech based password authentication system on FPGA
Speech based password authentication system on FPGASpeech based password authentication system on FPGA
Speech based password authentication system on FPGARajesh Roshan
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabSohaib Tallat
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech RecognitionDr. Uday Saikia
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
 
Speaker identification
Speaker identificationSpeaker identification
Speaker identificationTriloki Gupta
 

La actualidad más candente (19)

Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
COLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech Analysis
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
 
Speech Signal Analysis
Speech Signal AnalysisSpeech Signal Analysis
Speech Signal Analysis
 
Automatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approachAutomatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approach
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Speech based password authentication system on FPGA
Speech based password authentication system on FPGASpeech based password authentication system on FPGA
Speech based password authentication system on FPGA
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, Matlab
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 
Speaker identification
Speaker identificationSpeaker identification
Speaker identification
 
A017410108
A017410108A017410108
A017410108
 

Similar a Environmental Sound detection Using MFCC technique

Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingCSCJournals
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327IJMER
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identificationsipij
 
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...IDES Editor
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
 
Towards an objective comparison of feature extraction techniques for automati...
Towards an objective comparison of feature extraction techniques for automati...Towards an objective comparison of feature extraction techniques for automati...
Towards an objective comparison of feature extraction techniques for automati...journalBEEI
 

Similar a Environmental Sound detection Using MFCC technique (20)

Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
N017428692
N017428692N017428692
N017428692
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
 
D04812125
D04812125D04812125
D04812125
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modelling
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
Ijecet 06 09_010
Ijecet 06 09_010Ijecet 06 09_010
Ijecet 06 09_010
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identification
 
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
 
Towards an objective comparison of feature extraction techniques for automati...
Towards an objective comparison of feature extraction techniques for automati...Towards an objective comparison of feature extraction techniques for automati...
Towards an objective comparison of feature extraction techniques for automati...
 

Último

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Environmental Sound detection Using MFCC technique

  • 1. ENVIRONMENTAL NATURAL SOUND DETECTION AND CLASSIFICATION USING CONTENT-BASED RETRIEVAL (CBR) AND MFCC 1 Project Mentor :- Shiladitya Pujari Project group member :Par th Sinha(20093043) Pankaj Kumar(20093013) Manas Sarkar(20093030) Ruchasri Nath(20093055)
  • 2. MAIN TOPICS  Objective  Methodology  Result  Future scope & conclusion 2
  • 3. OBJECTIVE  To develop an Environmental Sound Detection & Classification technique (using Content Based Retrieval & MFCC) so that computer system can predict and understand “SOUND” more accurately.  To make computer systems more intelligent & reliable in understanding its environment based on this technique. 3
  • 5. WHAT ARE MFCCS?     In sound processing, the Mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-ofa-spectrum"). The difference between the cepstrum and the Mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the Mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound, for example, in audio compression. MFCCs are commonly derived as follows: 1. Take the Fourier transform of (a windowed excerpt of) a signal. 2. Map the powers of the spectrum obtained above onto the Mel 5 scale, triangular overlapping windows.
  • 6. (CONTD…….) 3.Take the logs of the powers at each of the mel frequencies. 4.Take the discrete cosine transform of the list of mel log powers, as if it were a signal. 5. The MFCCs are the amplitudes of the resulting spectrum.  MFCCs are commonly used as features in speech recognition systems, such as the systems which can automatically recognize numbers spoken into a telephone. They are also common in speaker recognition, which is the task of recognizing people from their voices.  MFCCs are also increasingly finding uses in music information retrieval applications such as genre classification, audio similarity measures, etc. 6
  • 7. CBR  Content Based Retrieval means that the retrieval and the required search is based on the analysis of the actual contents of the data(here sound) rather than the metadata such as keywords, tags and/or descriptions associated with the sounds.  In our project we’ll use multimedia database which provides Content Based Retrieval . 7
  • 8. METHODOLOGY(1) The major steps involved in the entire method are as follows :  Extraction of feature for classifying highly diversified natural sounds.  Making clusters according to their feature similarity.  Finding a match for a particular sound query from the cluster. 8
  • 9. METHODOLOGY(2)     First we take input sound(audio signal of any format). Then some preprocessing will be done to normalize the signals. Feature Extraction of the audio signal. Next will be the Classification phase(consisting of two phases):Training phase  Testing phase  9
  • 10. METHODOLOGY(3) 10 Fig: Mel Frequency Cepstral Coefficient pipeline
  • 11. PROCESS DESCRIPTION Sampling  It is the process of converting a continuous signal into a discrete signal. Sampling can be done for signals varying in space, time, or any other dimension, and similar results are obtained in two or more dimensions. Pre-emphasis  In processing of electronic audio signals,pre-emphasis refers to a system process designed to increase (within a frequency band) the magnitude of some (usually higher) frequencies with respect to the magnitude of other (usually lower) frequencies in order to improve the overall signal-to-noise ratio (SNR) by minimizing the adverse effects. Windowing  In signal processing, a window function (also known as tapering function) is a mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation. Fast Fourier Transform  FFTs are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers. Absolute Value  11 In mathematics, the absolute value (or modulus) |a of a real number a is the numerical value of a without its sign. The absolute value of a number may be thought of as its distance from zero.
  • 12. PROCESS DESCRIPTION(CONTINUED..) Discrete cosine transformation(DCT)  In particular, a DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but uses only real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. There are eight standard DCT variants, of which four are commonly used. Linear Discriminate Analysis (LDA)  Linear discriminate analysis (LDA) and the related Fisher's linear discriminate are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier or, more commonly, for dimensionality reduction before later classification. 12
  • 13. TRAINING AND TESTING Fig: Flow chart of Training Session 13 Fig: Flowchart of Testing Session
  • 14. RESULT On using the above mentioned approaches (MFCC and CBR) for sound detection and classification system we find that the Recognition Rate is very high and very accurate. Although the recognition rate is high enough, one problem is that of Rejection Rate, that is, the rejection rate is not quite good enough. This implies that if the particular sound that is to be tested is already present in the database then the matching process is very accurate but if that sound is not present in the database then the system doesn’t reject the sound (or stop the matching) rather it matches it with the nearest and closest sounds in terms of features.   14
  • 15. CONCLUSION Future scope and applications  Environmental monitoring  Speaker recognition  Genre classification   Audio similarity measures  Robotic awareness Conclusion This method of environmental sound detection and classification is developed using MFCC pipeline and CBR for extraction of features of a particular sound and retrieval of sound features from the multimedia database respectively. This method can be implemented in the domain of robotics where sound detection and recognition may be possible up to a satisfactory level. If the method will be properly implemented with computer vision, then humancomputer interaction process can be developed much. MFCC is undoubtedly more efficient feature extraction method because it is designed by giving emphasis on human perception power. Using more than one features of a sound may obviously improve the performance of the 15 method. Applying clustering technique, accuracy can be boosted. Another good feature available today is Audio spectrum projection provided by MPEG7 specification. Inclusion of this feature may increase the performance measure of the method.