Speech Technology Overview

Speech Technology
Overview
Presented by
Amr Medhat

Computer Engineering Department
Cairo University
22-10-2005

??Speech… Why

The easiest way of communication for
human beings

??Speech… How
Noise

Channel

Signal + … Protocol

Sender Message Receiver

Computer Analogy

text (TTS) speech

Speech Speech
Production Synthesis

(ASR) ( )

Speech speech Speech text
Perception Recognition

Recognition Made Easy
I bought a boat.
‫افرنقعوا أيها المتكأكئين‬
gute Nacht
Feature Decoder
Extraction (Search)

Grammar Lexicon Phone
Models

Recognizer Characteristics
 Discrete words / continuous speech
 Read / spontaneous speech
 Speaker dependent / independent
 Small / large vocabulary
 Finite state / context sensitive language
model

What to study
 Phonetics and Phonology (Linguistics)
 Speech Signal Processing (DSP)
 Pattern Recognition (AI)
 Hidden Markov Models ( )
 Artificial Neural Networks
 Hybrid ANN - HMM

Phonetics
 Phonetics: study of the production, perception,
and physical properties of speech sounds
 Phonology: describes the way sounds function
within a given language and how they are
combined and organized
 Phoneme: The smallest phonetic unit in a
language that is capable of conveying a
distinction in meaning
 E.g.
 boat-bought, car-jar, ‫نشاط-شمس ,أرض-أحمد‬

Speech Signal Processing
 Sampling
 Rate:
e.g. 16 kHz
 Sample size: e.g. 16 bits
 Format: PCM (.wav files)
 Time or Frequency domain features?
 Spectrogram: represents the time-varying
spectrum of a signal. (x, y, intensity)
 Can’t represent features?:
 Filters Banks, LPCs, MFCCs

Spectrogram

Waveform and Spectrogram of the word: "phonetician"

HMM
 What is a model?
 The coins example

 Parameter estimation: Baum-Welch
 Decoding: Viterbi P (O | λ)

Tools
 Audio Editing
 Cool Edit ( )
 Gold Wave
 Sound Forge
 ASR
 HTK ( )
 MATLAB
 Microsoft SAPI SDK
 Java Speech API
 ISIP ASR Toolkit
 Torch (Machine learning tool)

Technologies and applications
 Speech Recognition
 Dictation
 Call centers & IVR systems
 Command and control

 Speech Verification: Pronunciation teaching
 Speaker Recognition: Security
 Speech Synthesis
 Reading for the blind
 Telephone inquiries

?Can Image Processing Help
 Audio Visual Speech Recognition
 Spectrogram Reading
 Spectrogram Filtering
 vOICE: seeing with sound

Speech Technology Overview

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Speech Technology Overview

Similar a Speech Technology Overview (20)

Último

Último (20)

Speech Technology Overview

Notas del editor