2. OUTLINEOUTLINE
What is speech Synthesis?What is speech Synthesis?
Concatenative SynthesisConcatenative Synthesis
Concatenative Synthesizer ModelConcatenative Synthesizer Model
Bangla keyword setBangla keyword set
Classification of KeywordsClassification of Keywords
Independent KeywordIndependent Keyword
Dependent keywordDependent keyword
Database ModelingDatabase Modeling
Speech Synthesis ProcessSpeech Synthesis Process
Synthesizer ComplexitySynthesizer Complexity
PerformancePerformance
ConclusionsConclusions
ReferencesReferences
2
3. 3
Primary communication process to exchanged
information among people is speech.
In the modern era of information technology we can
expect to carry out spoken dialogue with computers by the
speech technology.
A text to speech synthesis technology creating synthetic
voice from text that has integrated language and speech for
human-computer interaction.
Abstract
4. What isWhat is SSpeechpeech SSynthesisynthesis??
Speech Synthesis is the artificial production ofSpeech Synthesis is the artificial production of
human speech.human speech.
The speech synthesizer is device that is used toThe speech synthesizer is device that is used to
translate text characters into sounds thattranslate text characters into sounds that
approximate the sound of human speech.approximate the sound of human speech.
Speech synthesizer also known as Text-to-Speech synthesizer also known as Text-to-
Speech(TTS).Speech(TTS).
4
5. Synthesizer TechnologySynthesizer Technology
HMM-based synthesisHMM-based synthesis
Formant synthesisFormant synthesis
Concatenation synthesisConcatenation synthesis
Diphone synthesisDiphone synthesis
Sinewave synthesisSinewave synthesis
And so on...And so on...
5
6. Concatenative SynthesisConcatenative Synthesis
Concatenative synthesis is based on theConcatenative synthesis is based on the
concatenation of segments of recorded speech.concatenation of segments of recorded speech.
Concatenative synthesis technology can beConcatenative synthesis technology can be
created by concatenating number of recordedcreated by concatenating number of recorded
voice that are stored in a database as audio file.voice that are stored in a database as audio file.
To return the synthesizer speech, a keyword isTo return the synthesizer speech, a keyword is
taken as input and searched from the databasetaken as input and searched from the database
and returning the output as speech.and returning the output as speech.
6
8. KeywordKeyword
Keyword is a unit of organization for a sequenceKeyword is a unit of organization for a sequence
of speech sounds.of speech sounds.
For example, the word “For example, the word “ ”বাংলা”বাংলা consist of twoconsist of two
keywords, one is “keywords, one is “ ”বাং”বাং and other is “and other is “ ”লা”লা ..
8
9. Bangla keyword setBangla keyword set
Bangla keywords is collected by the Bangla literacyBangla keywords is collected by the Bangla literacy
books:books:
1)1) িরেক্তর েবদনিরেক্তর েবদন(Riktar Badon) Written by(Riktar Badon) Written by কাজী নজরুল ইসলামকাজী নজরুল ইসলাম(Kazi(Kazi
Nazrul Islam);Nazrul Islam);
2)2) দুেগশনিন্দনীদুেগশনিন্দনী (Durgashnandini)written by(Durgashnandini)written by বিঙ্কমচন্দর্ চেট্টাপাধয্া য়বিঙ্কমচন্দর্ চেট্টাপাধয্া য়
(Bumkimchandro Chittopadhai);(Bumkimchandro Chittopadhai);
3)3) েশষ পর্শ্নেশষ পর্শ্ন (Shas Prasno) written by(Shas Prasno) written by ৎশর চন্দর্চ েট্টাপাধয্া য়ৎশর চন্দর্চ েট্টাপাধয্া য়
(Shratchandro Chittopadhai);(Shratchandro Chittopadhai);
4)4) েমঘনাবদ কাবয্েমঘনাবদ কাবয্ (Magnabod Kabbo)written by(Magnabod Kabbo)written by মাইেকল মধুসূদনমাইেকল মধুসূদন
দত্তদত্ত (Maikal Modhosudon(Maikal Modhosudon Dotto).Dotto).
9
11. Independent KeywordIndependent Keyword
A keyword that is constructed by only one letter.A keyword that is constructed by only one letter.
a) Vowel(a) Vowel(সব্রবণসব্রবণ): A speech sound that is produced): A speech sound that is produced
by comparatively open configuration of theby comparatively open configuration of the
vocal tract likevocal tract like অঅ,, আআ,, ইই,, ঈঈ and so on.and so on.
b) Consonant(b) Consonant(বয্ঞ্জনবণবয্ঞ্জনবণ): A basic speech sound in): A basic speech sound in
which the breath is at least partly obstructed andwhich the breath is at least partly obstructed and
combined with a vowel to form a syllable likecombined with a vowel to form a syllable like কক,,
খখ,, গগ,, ঘঘ and so on.and so on.
11
12. Dependent keywordDependent keyword
A keyword is constructed by one or more consonant withA keyword is constructed by one or more consonant with
combining kar(combining kar(কারকার) (smallest term of vowel. i.e,) (smallest term of vowel. i.e,াা িা াু ৈাাা িা াু ৈা andand
like this ) or fola(like this ) or fola(ফলাফলা))
a) Modifier Character: A keyword that is constructed by onea) Modifier Character: A keyword that is constructed by one
consonant with kar (consonant with kar (কারকার ) for example) for example কাকা,, েঢেঢ,, তাতা,, রুরু,, নৃনৃ,, েলেল,,
েগােগা,, ঞীঞী and like this.and like this.
b) Compound Character: A keyword which is the combination ofb) Compound Character: A keyword which is the combination of
two or more consonants . For example,two or more consonants . For example, ক্কক্ক,, ঙ্কঙ্ক,, ত্তত্ত,, দ্ধদ্ধ,, ণ্ঠণ্ঠ,,
ম্ভম্ভ,, প্তপ্ত,, গ্মগ্ম,, জ্ঝজ্ঝ and link this.and link this.
c) Complex Character: A keyword is the combination of bothc) Complex Character: A keyword is the combination of both
modifier character and compound character; For example,modifier character and compound character; For example, ক্কাক্কা
স্তা েঙ্ক াত্তা েণ্ঠ াম্ভাস্তা েঙ্ক াত্তা েণ্ঠ াম্ভাand so on.and so on.
12
13. Speech DatabaseSpeech Database
-A speech database is a collection ofA speech database is a collection of
recorded speech accessible on a computerrecorded speech accessible on a computer
and supported with the necessaryand supported with the necessary
transcriptions.transcriptions.
-In this Speech Synthesizer model, I haveIn this Speech Synthesizer model, I have
used about 1200 keywords.used about 1200 keywords.
13
16. Synthesizer ComplexitySynthesizer Complexity
16
Table 1: Time variation for different keywords
before and after segmentation
Word
(শব্দ)
Total
audio
file
length
L(ms)
2:1
ratio
length
for
letter
“আ”
LA(ms)
Origina
l length
for
“আ”
LO
(ms)
Error(%
) |LA-LO|
----------
L
আিম 1120 746.67 810 5.65%
আজ 960 640 710 7.29%
আর 1050 700 670 2.85%
আম 940 626.67 690 6.73%
Figure 6: Error rate for different
keywords before and after segmentation.
Ratio problem to segmenting recorded voice
17. Synthesizer Complexity(Cont.)Synthesizer Complexity(Cont.)
17
Table 1: Time variation for different speakers Figure 6: Error rate for different speakers
for same word “ ”আজ .
Speaker variation problem
Speaker Audio
file
length
for
Speaker
S (ms)
2:1
ratio
length
for
letter
”আ” SA
(ms)
Original
length
for
letter
“আ” SO
(ms)
Error(%)
|SA-SO|
---------
S
Speaker_1 1150 766.67 820 4.63%
Speaker_2 980 653.34 690 3.74%
Speaker_3 1070 713.34 780 6.22%
Speaker_4 1020 680 750 6.86%
18. Synthesizer Complexity(Cont.)Synthesizer Complexity(Cont.)
18
Confusing letter problem
Some types of Bangla keyword utterance can’t be
detected properly like হসন্ত(ি্), চন্দর্িবন্দু(িঁ), িবসগ(িঃ)
।
For example, the word তারা and তঁারা both utterance are
same.
So the keyword তা and তঁা has no difference.
19. PerformancePerformance
19
Table 1: Time variation for different speakers
In our proposed Concatenative
Bangla Speech Synthesizer Model
the Bangla sentence as an input
(text) to synthesis that time listener
couldn’t fully identify all of the
words. It was found that the
listeners identified about 85% of the
words correctly from the text.
20. ConclusionsConclusions
Using Concatenation speech synthesis algorithmUsing Concatenation speech synthesis algorithm
phonetic contexts has to reduce the audiophonetic contexts has to reduce the audio
waveform discontinuities and the phantomwaveform discontinuities and the phantom
mismatches at the borders.mismatches at the borders.
Our goal is to develop a Bangla TTS applicationOur goal is to develop a Bangla TTS application
that can produce real-time speech from thethat can produce real-time speech from the
input text for human-computer interaction.input text for human-computer interaction.
So, in future we will try to improve the accuracySo, in future we will try to improve the accuracy
about this Bangla speech synthesizer model.about this Bangla speech synthesizer model.
20