1. Submitted by : ABHINAV TYAGI (9911103403)
ANSHULI MITTAL(9911103436)
2. Introduction
Automatic speaker recognition is the use of a software to
recognize a person from a spoken phrase. These software can
operate in two modes: to identify a particular person or to
verify a person’s claimed identity.
Speaker recognition is a performance biometric; i.e., you
perform a task to be recognized. Your voice, like other
biometrics, cannot be forgotten or misplaced, unlike
knowledge-based (e.g., password) or possession-based (e.g.,
key) access control methods.
3. Literature Survey
SPEAKER RECOGNITION USING MFCC AND GMM
Author : Ashutosh Parab, JoyebMulla, PankajBhadoria, and
VikramBangar, University of Pune
Biometric is physical characteristic unique to each individual. Due to the
increased number of dialogue system applications the interest in that field
has grown significantly in recent years. Nevertheless, there are many open
issues in the field of automatic speaker identifi-cation. Among them the
choice of the appropriate speech signal features and machine learning
algorithms could be mentioned.
We have also studied and compared different approaches and algorithms
to find out the most efficient model for speaker recognition. We believe
MFCC-GMM model is most appropriate based on parameters like
identification accuracy, computation time, false rejection rate, false accep-
tance rate. The proposed system is a version of voice bio metric which
incorporates text independent speaker verifica-tion implemented
independently.
4. SPEAKER RECOGNITION IN THE BIOMETRIC SECURITY SYSTEMS
Author: Filip Ors´ag, Faculty of Information Technology Institute of Intelligent
Systems
At present, the importance of the biometric security increases a lot in
context of the events in the world. Development of the individual
biometric technologies such as the fingerprint recognition, iris or retina
recognition or speaker recognition has been considered very important.
However, it comes to be true that only one biometric technology is not
sufficient enough. Herein a design of the complex biometric security
system is introduced based on the speaker recognition and the fingerprint
authentication. A method of acquisition of a unique vector from speaker
specific features is introduced as well.
5. SPEAKER RECOGNITION
Author : Joseph P. Campbell, Jr. (j.campbell@ieee.org)
A tutorial on the design and development of automatic
speaker recognition systems is presented. Automatic
speaker recognition is the use of a machine to recognize a
person from a spoken phrase. These systems can operate
in two modes: to identify a particular person or to verify a
person’s claimed identity. Speech processing and the basic
components of automatic speaker recognition systems are
shown and design tradeoffs are discussed. The
performances of various systems are compared.
6. Problem Statement
Today security is the most important aspect for a person. At
banks, hospitals, offices a person may not be physically
present but his id, passwords, keys can be illegally used to
operate on. Thus a much secure software is needed for
security at these places.
7. Solution
"Biometrics" means "life measurement" but the term is usually associated
with the use of unique physiological characteristics to identify an
individual. A number of biometric traits have been developed and are used
to authenticate the person's identity.
The method of identification based on biometric characteristics is
preferred over traditional passwords and PIN based methods for various
reasons such as: The person to be identified is required to be physically
present at the time-of-identification. Identification based on biometric
techniques obviates the need to remember a password or carry a token. A
biometric system is essentially a pattern recognition system which makes a
personal identification by determining the authenticity of a specific
physiological or behavioural characteristic possessed by the user.
During Capture process, raw biometric is captured by a sensing device
such as a fingerprint scanner or video camera. Among the various
biometric technologies being considered, the attributes which satisfy the
above requirements are fingerprint, facial features, hand geometry, voice,
iris, retina, vein patterns, palm print, DNA, keystroke dynamics, ear shape,
odor, signature etc.
8.
9. Speaker verification is defined as deciding if a speaker is who he claims to
be. This is different than the speaker identification problem, which is
deciding if a speaker is a specific person or is among a group of persons.
In speaker verification, a person makes an identity claim (e.g., entering an
employee number or presenting his smart card). In text-dependent
recognition, the phrase is known to the system and it can be fixed or not
fixed and prompted (visually or orally). This signal is analyzed by a
verification system that makes the binary decision to accept or reject the
user’s identity claim or possibly to report insufficient confidence and
request additional input before making the decision. He then attempts to
be authenticated by speaking a prompted phrase(s) into the microphone.
There is generally a tradeoffs between recognition accuracy and the test-
session duration of speech. In addition to his voice, ambient room noise
and delayed versions of his voice enter the microphone via reflective
acoustic surfaces. Prior to a verification session, users must enrol in the
system (typically under supervised conditions). During this enrolment,
voice models are generated and stored (possibly on a smart card) for use in
later verification sessions. There is also generally a trade off between
recognition accuracy and the enrolment-session duration of speech and the
number of enrolment sessions.
10. Protocols And Algorithms
Text-dependent algorithm: The text-dependent speaker
recognition is based on saying the same phrase for enrollment
and verification. If a voice sample matches the template that was
extracted from a specific phrase.
Two-factor authentication with a passphrase. Each user records
a unique phrase (such as passphrase or an answer to a “secret
question” that is known only by the person being enrolled).
Text-independent algorithm. This method is more convenient, as
it does not require each user to remember the passphrase.
Automatic voice activity detection. Detect when users start and
finish speaking.
Liveness detection. A system may request each user to enroll a
set of unique phrases. Later the user will be requested to say a
specific phrase from the enrolled set.
11. Identification capability. VeriSpeak functions can be used
in 1-to-1 matching (verification) and 1-to-many
(identification) modes.
Multiple samples of the same phrase. A template may
store several voice records with the same phrase to improve
recognition reliability.
Fused matching. A system may ask users to pronounce
several specific phrases during speaker verification or
identification and match each audio sample against records
in the database.
12. Text Independent Algorithm
This method involves the training of speech patterns and
recognition of patterns via pattern comparison. This type
of characterization of speech via training is called pattern
classification.
1.Compute power spectrum of windowed speech.
2. Perform grouping to 21 critical bands in bark scale or mel
scale for sampling frequency of 16 kHz.
3. Perform loudness equalization and cube root compression
to simulate the power law of hearing.
4. Perform IFFT
5. Perform LP analysis by Levinson -Durbin procedure.
6. Convert LP coefficients into cepstral coefficients.
13.
14. The way in which L training vectors can be clustered into a set of
M code book vectors is by K-means clustering algorithm.
Clusters are formed in such a way that they capture the
characteristics of the training data distribution. It is observed that
Euclidean distance is small for the most frequently occurring
vectors and large for the least frequently occurring ones.