1. SHREEJEE INSTITUTE OF
TECHNOLOGY AND MANAGEMENT
Speaker Recognition
• Guided By:- Mr. Prakash
Singh Panwar
• By:- Rajpal Singh Chouhan
• EC BRANCH 1ST YEAR
2. What is Speaker Recognition?
Speaker Recognition is the process of automatically
recognizing who is speaking on the basis of individual
information included in speech signals.
Speaker Recognition
=
Speaker Identification,
Speaker Verification
4. Speaker Verification
• a
• Synonyms: authentication, detection.
• User claims an identity.
• System task: Accept or reject identity claim.
Is this Ahmad’s
voice ?
?
5. Model of Speaker Recognizer
• a
Fig -1 : Simple model of Speaker Recognizer .
U Permitted
to Access
Hello,
Mr. John
6. The Structure of Speaker
Recognizer• a
• Figure 2 :Functional Scheme of an ASR System.
Feature
Extraction Feature Vector
Training Mode
Recognition
Speaker
Modeling
Classification
Decision Logic
Speaker #ID
Speaker_1
7. Speech Signal Analysis
Feature Extraction
• a
• - The aim is to extract the voice features to
distinguish different phonemes of a language.
5
1
5
6
4
5
4
6
5
1
5
6
1
5
6
1
6
5
1
5
6
4
5
6
4
5
4
2
5
1
5
6
1
5
6
5
8. MFCC extraction
• a
Pre-emphasis DFT
Mel filter
banks
Log(||2) IDFT
Speech
signal
x(n)
WINDOW
x’(n)
xt (n)
Xt(k)
Yt(m)
MFCC
yt
(m)(k)
MFCC means Mel-frequency cepstral coefficients that
representation of the short-term power spectrum of a sound for
audio processing.
The MFCCs are the amplitudes of the resulting spectrum.
9. a
• a
Speech waveform of a
phoneme “ae”
After pre-emphasis and
Hamming windowing
Power spectrum MFCC
11. Vector Quantization (VQ)
• aAIM of VQ :
representation of large amounts
of data by (few) prototype vectors.
example:
identification and grouping
in clusters of similar data.
assignment of feature vector
to the closest prototype w
(similarity or distance measure,
e.g. Euclidean distance )
15. Database Creation Condition
• a
Table 1: Database description.
Parameter Characteristics
Language Bangla
No. of speaker 5
Speech type Sentence reading
Recording condition A normal room condition
Audio Length 60-90 seconds
Audio type Stereo
Sample Format 16-bit PCM
Sampling Frequency 8 KHz
Bit Rate 1411 kbps
16. Speaker Recognition Result
• a
Table 3: Test result for speaker recognition system.
Speaker No. of input Correct Incorrect Accuracy
Speaker_1 5 5 0 100%
Speaker_2 9 8 1 88.88%
Speaker_3 6 6 0 100%
Speaker_3 12 11 1 91.67%
Speaker_4 8 8 0 100%
Speaker_5 10 10 0 100%
Total Speaker 50 48 2 96%
17. Applications
• a
• Transaction authentication
– Toll fraud prevention
– Telephone credit card purchases
– Telephone brokerage (e.g., stock trading)
• Access control
– Physical facilities
– Computers and data networks
• Information retrieval
– Customer information for call centers
– Audio indexing (speech skimming device)
• Forensics
– Voice sample matching