Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation

Major Project Mid-Term Presentation :Speaker Verification for Remote Authentication Members: Ganesh Tiwari (063BCT510) MadhavPandey(063BCT514) ManojShrestha(063BCT518) Supervisor : Dr. SubarnaShakya Associate Professor

Introduction Voice biometric system user login Text-Prompted system The claimant is asked to speak a prompted text Speech and Speaker Recognition/Verification More secure to playback attack. Web Application Client (Adobe Flex) : Voice Capture, preprocessing and feature extraction Server (JAVA) : Training / Classification BlazeDS RPC for JAVA-Flex Connectivity

Block Diagram of Speaker / Speech Recognition System

Signal Capture and Pre-Processing

Capture and Preprocessing Get the audio signal i.e., ADC Make suitable for feature extraction

Capture and Preprocessing :Capture 22050 Hz 16-bits,Signed Little Endian Mono Uncompressed PCM

Capture and Preprocessing :PCM Extract

Capture and Preprocessing : Silence Removal Algorithm described in paper ‘a new method for silence removal and endpoint detection’ † †G. Saha, SandipanChakroborty, SumanSenapati of Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Khragpur, India

Capture and Preprocessing :Pre-Emphasis Boosting the high frequency energy In time domain, y[n] = x[n]−αx[n−1], 0.9 ≤ α≤ 1.0

Capture and Preprocessing : Framing Speech Signal is stationary (statistical properties) for 10-30 ms 50% overlapped frames each of 23ms is used

Capture and Preprocessing :Windowing Windowing is done on the frame blocked signal Hamming window

Feature Extraction Transform the input audio signal into a sequence of acoustic feature vectors MFCC : Mel Filter CepstralCoefficients as Feature Perceptual approach Human Ear processes audio signal in Mel scale Mel scale : linear up to 1KHz and logarithmic after 1KHz MFCC gives distribution of energy in Mel frequency band Calculated for each frame

Feature Extraction : Fourier Transform Gives information about the amount of energy at each frequency band FFT used

Feature Extraction : Mel Filter We used filter bank of triangular filters spaced in Mel scale

Feature Extraction : Mel Filter (contd..) Mel Filter Where,

Feature Extraction :Log, IFT(DCT) Log DCT MFCC

Feature Extraction : Cepstral Mean Subtraction CMS: for minimizing channel effect

Feature Extraction : Energy and Deltas For completeness of feature vector and for achieving high recognition rate A Energy Feature A delta or velocity feature, and a double delta or acceleration feature Calculated by linear regression of regression window M

Composition of Feature Vector 12 MFCC Features 12 Delta MFCC 12 Delta-Delta MFCC 1 Energy Feature 1 Delta Energy Feature 1 Delta-Delta Energy Feature  39 Features from each frame

Speaker Recognition/Verification by GMM

Gaussian Mixture Model Parametric probability density function Based on clustering technique M Gaussian components 𝑝(𝑥/)= 𝑚=1𝑀𝑤𝑚 . 𝑔𝑚(𝑥/𝜇𝑚 , 𝐶𝑚) 𝑥: a k-dimensional random vector 𝑤𝑚: mixture weight of mth component 𝑔𝑚 : k-dimensional Gaussian function (pdf) 𝑔𝑚𝑥/𝜇𝑚 , 𝐶𝑚 = 12𝜋𝐾.|𝐶𝑚| exp{−12𝑥−𝜇𝑚 .(𝐶𝑚−1(𝑥−𝜇𝑚 ))}  = (𝑤𝑚, 𝜇𝑚 ,𝐶𝑚)

GMM Training Goal: estimate the parameters Method: Maximum Likelihood estimation Input: X = {𝑥1,𝑥1,…,𝑥𝑇} P(X/) =𝑡=1𝑇𝑝(𝑥𝑡/) Maximize with Expectation Maximization algorithm Iterative process: initial model: 𝑖 new model: 𝑖+1 P(X/ 𝑖+1) ≥ P(X/ 𝑖)

Verification Decision: Hypothesis Test H0: the speaker is the claimed speaker H1: the speaker is an imposter Based on likelihood ratio  = P(X/)P(X/) Decision by threshold < 𝜃𝑇reject identity claim > 𝜃𝑇 accept identity claim

Hidden Markov Model :Definition Hidden Markov Model (HMM) is the statistical model HMM is the extension of Markov Process HMM has hidden states and observable symbols per states HMM Model : Observed data : feature vector Hidden states : phonemes

Codebook Generation K-Means Clustering Clustering the whole database & Codebook Generation VQ : Vector Quantization is used for mapping each input feature vector to discrete quantized symbols Codebook for each incoming feature vector is built Compare it to each of the prototype vectors in codebook Select the one which is closest (by some distancemetric) Replace the input vector by the index of this prototype vector observation sequence

Speech Recognition System: By : HMM / VQ

Hidden Markov Model :Training Training by: Forward backward (Baum-Welch) algorithm Forward-backward algorithm iteratively re-estimates the parameters and improves the probability that given observation are generated by the new parameters Three parameters need to be re-estimated: Initial state distribution: πi Transition probabilities: ai,j Emission probabilities: bi(ot) Input is observation sequence, given by VQ

Hidden Markov Model :Verification/Matching Viterbi algorithm is used Input is Observation sequence, given by VQ HMM model of the word Best matched word is returned

Problem Faced Learning curve Complex Mathematics Flex & Java Connectivity (initially) Data conversion

Remaining Tasks Speech Training Data Collection Model Training (HMM, GMM) Module Integration Testing

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (20)

Similar a Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation

Similar a Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation (20)

Último

Último (20)

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Mid-term Project Presentation