SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Kanru Hua (華侃如)
June 19, 2016
The Past, Present and Future
of Singing Voice Modeling
Motivation
“You are making too many assumptions, this thing won’t work on real
speech signal.”
— Jont B. Allen
● What’s wrong with current and past researches in this area?
● What’s our next step?
What’s in a Speech/Singing Synthesizer
Parameter
Generator
Vocoder
Text / Music Score
Speech Audio
Generate pitch, duration and spectrum…
from input
Generate waveform from parameters
Vocoder
Part 1
History of Speech
Analysis/Synthesis
(http://clas.mq.edu.au/speech/synthesis/history_synthesis/)
History of Math & Acoustics
1600 1700 1800 1900 2000
Law of Forces/Motions,
Foundation of Calculus
Wave Equation,
Complex Number
Fourier/Laplace Transform,
Analog Circuits & Electromagnetism
Newton Bernoulli, Euler,
d‘Alembert
(http://www2.ling.su.se/staff/hartmut/kemplne.htm)
Gauss, Fourier, Laplace,
Riemann, Cauchy,
Kirchhoff, Heaviside
Filtering Theory, Digital Systems,
Sampling Theory, ...
History of Math & Acoustics
1600 1700 1800 1900 2000
Law of Forces/Motions,
Foundation of Calculus
Wave Equation,
Complex Number
Fourier/Laplace Transform,
Analog Circuits & Electromagnetism
Filtering Theory, Digital Systems,
Sampling Theory, ...
Newton Bernoulli, Euler,
d‘Alembert
Gauss, Fourier, Laplace,
Riemann, Cauchy,
Kirchhoff, Heaviside
(http://www2.ling.su.se/staff/hartmut/kemplne.htm)
= =
Frequency Response
Source-Filter Model
Vocal TractVocal Folds LipLung
t
f f
Signal Generator (Source) Filter 1 Filter 2
Signal Generator Filter 1 Filter 2Filter 0
20th Century, the Dawn of Speech Processing
Cooley and Tukey (1965): Fast Fourier Transform
Oppenheim (1969): one of the earliest digital implementation of speech analysis/
synthesis
Input
Pitch
(source)
Cepstrum
(vocal tract filter)
Analysis Synthesis
Spectrum
Output
Family Tree of Speech A/S Algorithms
Homomorphic Filtering
(Oppenheim, 1969)
STRAIGHT
(Kawahara, 1998)
WORLD1
(Morise, 2009)
WORLD2
(Morise, 2013)
TANDEM-STRAIGHT
(Kawahara & Morise, 2007)
PSOLA
(?, 1985)
Phase Vocoder
(Flanagan et al, 1966)
Source-Filter
Model
Sinusoidal Model
(McAulay & Quatieri, 1986)
SMS
(Serra, 1989)
Autotune
CELP
(Atal & Schroeder,1983)
LSP/LSF
(Itakura, 1975)
MGC/MLSA
(Imai, et al., 1983)
Sinsy
Melodyne
& NiaoNiao
& tn_fnds
Harmonic+Noise
(Stylianou, 1993)
NBVPM
(Bonada, 2004)
WBVPM
(Bonada, 2008)
Vocaloid Vocaloid 2+RUCE
(Rocaloid 4)
Rocaloid 3
Sine+Noise+Transient
(Levin & Smith, 1998)
CeVIO
Quasi-Harmonic Model
(Pantazis, et al., 2008)
Chiptune
Vocaine
(Agiomyrgiannakis, 2015)
Linear Prediction
(Atal & Schroeder,1967)
Part 2
What’s Wrong
Quasi-static Assumption
Algorithms affected:
● Homomorphic Filtering
● PSOLA
● Linear Prediction & CELP & MLSA
● Sinusoidal Model
● Harmonic+Noise Model
● SMS & NBVPM
● WORLD & STRAIGHT (slightly)
Mis-represented Aperiodic Component
Popular belief:
1. Speech = periodic signal + aperiodic signal (breathing noise)
2. Aperiodic signal is filtered white noise
Aperiodic
Periodic (Friction)
Mis-represented Aperiodic Component
t
Algorithms affected:
● (Quasi-)Harmonic+Noise Model
● SMS & Sines+Noise+Transients Model
● WORLD & (TANDEM-)STRAIGHT
● Algorithms that do not model aperiodic component
○ Phase vocoder, CELP, MLSA, ...
Over-simplified Source-Filter Model
Tract FilterOscillator Lip Filter
Tract FilterOscillator
Source Filter
Assumption: source filter is independent from pitch
Equivalent assumption:
“When my pitch is higher by 12 semitones, my vocal folds still
oscillate at the same speed.”
Affected algorithms: all of those listed on page 11
Part 3
Future: How to Fix &
the Low Level Speech Model
“Neoclassical” Approaches to Speech Modeling
Tract
Source
Lip
t
f
f
Input
Inverse
Linear Prediction
(Atal & Schroeder,1967)
ARX
(Wen, et al., 1995)
ARX-LF
(Vincent, et al., 2005)
LF Model
(Liljencrants, Fant and
Lin, 1985)
OVE Synthesizer
(Fant, 1953)
“Neoclassical” Approaches to Speech Modeling
Degottex (2013): similar idea, but in frequency domain
Hua (2016, in progress): more robust under poor recording conditions; less
sensitive to processed input.
The Low Level Speech Model (new version)
Level 0
(Signal Level)
Input Signal
Pitch Harmonic Model Noise Model
Spectrum
Channel 1 Energy
Channel 2 Energy
Channel 3 Energy
...
Harmonic Model
Harmonic Model
Harmonic Model
Output Signal
Glottal/Source Information
(LF Model)
Vocal Tract Filter Lip FilterLevel 1
(Acoustic Level)
An acoustically meaningful speech model
Inverse Analysis of Speech
Original
Glottal Flow
(Source Signal)
Pitch Shifting powered by LLSM
Original
50% Pitch
200% Pitch
Pitch Shifting powered by LLSM
Original
50% Pitch
200% Pitch
Instants of vocal fold closure were revealed
Reference
● A.V. Oppenheim, “Speech Analysis-Synthesis System Based on Homomorphic Filtering”. JASA
(1969): Vol. 45, No. 2.
● Degottex, Gilles, et al. "Mixed source model and its adapted vocal tract filter estimate for voice
transformation and synthesis." Speech Communication 55.2 (2013): 278-294.
● H. K. Dunn, "The calculation of vowel resonances, and an electrical vocal tract", Journal of the
Acoustical Society of America, 1950, vol. 22, p. 740-753.
● Pantazis, Yannis, and Yannis Stylianou. "Improving the modeling of the noise part in the harmonic
plus noise model of speech." Acoustics, Speech and Signal Processing (2008). IEEE International
Conference on.

Más contenido relacionado

Similar a The past, present and future of singing synthesis

Analysis Synthesis Comparison
Analysis Synthesis ComparisonAnalysis Synthesis Comparison
Analysis Synthesis ComparisonJim Webb
 
Transferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongTransferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongNAVER Engineering
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
 
Computational models of symphonic music
Computational models of symphonic musicComputational models of symphonic music
Computational models of symphonic musicEmilia Gómez
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
44th AES conference (2011)
44th AES conference (2011)44th AES conference (2011)
44th AES conference (2011)Alvaro Barbosa
 
Paper id 28201448
Paper id 28201448Paper id 28201448
Paper id 28201448IJRAT
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
 
Tervo: Sensory Dissonance Models
Tervo: Sensory Dissonance ModelsTervo: Sensory Dissonance Models
Tervo: Sensory Dissonance ModelsTommi Himberg
 

Similar a The past, present and future of singing synthesis (20)

Analysis Synthesis Comparison
Analysis Synthesis ComparisonAnalysis Synthesis Comparison
Analysis Synthesis Comparison
 
F010334548
F010334548F010334548
F010334548
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Transferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongTransferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given Song
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
Computational models of symphonic music
Computational models of symphonic musicComputational models of symphonic music
Computational models of symphonic music
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
44th AES conference (2011)
44th AES conference (2011)44th AES conference (2011)
44th AES conference (2011)
 
Paper id 28201448
Paper id 28201448Paper id 28201448
Paper id 28201448
 
50120140501002
5012014050100250120140501002
50120140501002
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
 
Tervo: Sensory Dissonance Models
Tervo: Sensory Dissonance ModelsTervo: Sensory Dissonance Models
Tervo: Sensory Dissonance Models
 
Confirmation Talk
Confirmation TalkConfirmation Talk
Confirmation Talk
 
Confirmation Talk
Confirmation TalkConfirmation Talk
Confirmation Talk
 

Último

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Último (20)

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

The past, present and future of singing synthesis

  • 1. Kanru Hua (華侃如) June 19, 2016 The Past, Present and Future of Singing Voice Modeling
  • 2. Motivation “You are making too many assumptions, this thing won’t work on real speech signal.” — Jont B. Allen ● What’s wrong with current and past researches in this area? ● What’s our next step?
  • 3. What’s in a Speech/Singing Synthesizer Parameter Generator Vocoder Text / Music Score Speech Audio Generate pitch, duration and spectrum… from input Generate waveform from parameters Vocoder
  • 4. Part 1 History of Speech Analysis/Synthesis (http://clas.mq.edu.au/speech/synthesis/history_synthesis/)
  • 5. History of Math & Acoustics 1600 1700 1800 1900 2000 Law of Forces/Motions, Foundation of Calculus Wave Equation, Complex Number Fourier/Laplace Transform, Analog Circuits & Electromagnetism Newton Bernoulli, Euler, d‘Alembert (http://www2.ling.su.se/staff/hartmut/kemplne.htm) Gauss, Fourier, Laplace, Riemann, Cauchy, Kirchhoff, Heaviside Filtering Theory, Digital Systems, Sampling Theory, ...
  • 6. History of Math & Acoustics 1600 1700 1800 1900 2000 Law of Forces/Motions, Foundation of Calculus Wave Equation, Complex Number Fourier/Laplace Transform, Analog Circuits & Electromagnetism Filtering Theory, Digital Systems, Sampling Theory, ... Newton Bernoulli, Euler, d‘Alembert Gauss, Fourier, Laplace, Riemann, Cauchy, Kirchhoff, Heaviside (http://www2.ling.su.se/staff/hartmut/kemplne.htm) = = Frequency Response
  • 7. Source-Filter Model Vocal TractVocal Folds LipLung t f f Signal Generator (Source) Filter 1 Filter 2 Signal Generator Filter 1 Filter 2Filter 0
  • 8. 20th Century, the Dawn of Speech Processing Cooley and Tukey (1965): Fast Fourier Transform Oppenheim (1969): one of the earliest digital implementation of speech analysis/ synthesis Input Pitch (source) Cepstrum (vocal tract filter) Analysis Synthesis Spectrum Output
  • 9. Family Tree of Speech A/S Algorithms Homomorphic Filtering (Oppenheim, 1969) STRAIGHT (Kawahara, 1998) WORLD1 (Morise, 2009) WORLD2 (Morise, 2013) TANDEM-STRAIGHT (Kawahara & Morise, 2007) PSOLA (?, 1985) Phase Vocoder (Flanagan et al, 1966) Source-Filter Model Sinusoidal Model (McAulay & Quatieri, 1986) SMS (Serra, 1989) Autotune CELP (Atal & Schroeder,1983) LSP/LSF (Itakura, 1975) MGC/MLSA (Imai, et al., 1983) Sinsy Melodyne & NiaoNiao & tn_fnds Harmonic+Noise (Stylianou, 1993) NBVPM (Bonada, 2004) WBVPM (Bonada, 2008) Vocaloid Vocaloid 2+RUCE (Rocaloid 4) Rocaloid 3 Sine+Noise+Transient (Levin & Smith, 1998) CeVIO Quasi-Harmonic Model (Pantazis, et al., 2008) Chiptune Vocaine (Agiomyrgiannakis, 2015) Linear Prediction (Atal & Schroeder,1967)
  • 11. Quasi-static Assumption Algorithms affected: ● Homomorphic Filtering ● PSOLA ● Linear Prediction & CELP & MLSA ● Sinusoidal Model ● Harmonic+Noise Model ● SMS & NBVPM ● WORLD & STRAIGHT (slightly)
  • 12. Mis-represented Aperiodic Component Popular belief: 1. Speech = periodic signal + aperiodic signal (breathing noise) 2. Aperiodic signal is filtered white noise Aperiodic Periodic (Friction)
  • 13. Mis-represented Aperiodic Component t Algorithms affected: ● (Quasi-)Harmonic+Noise Model ● SMS & Sines+Noise+Transients Model ● WORLD & (TANDEM-)STRAIGHT ● Algorithms that do not model aperiodic component ○ Phase vocoder, CELP, MLSA, ...
  • 14. Over-simplified Source-Filter Model Tract FilterOscillator Lip Filter Tract FilterOscillator Source Filter Assumption: source filter is independent from pitch Equivalent assumption: “When my pitch is higher by 12 semitones, my vocal folds still oscillate at the same speed.” Affected algorithms: all of those listed on page 11
  • 15. Part 3 Future: How to Fix & the Low Level Speech Model
  • 16. “Neoclassical” Approaches to Speech Modeling Tract Source Lip t f f Input Inverse Linear Prediction (Atal & Schroeder,1967) ARX (Wen, et al., 1995) ARX-LF (Vincent, et al., 2005) LF Model (Liljencrants, Fant and Lin, 1985) OVE Synthesizer (Fant, 1953)
  • 17. “Neoclassical” Approaches to Speech Modeling Degottex (2013): similar idea, but in frequency domain Hua (2016, in progress): more robust under poor recording conditions; less sensitive to processed input.
  • 18. The Low Level Speech Model (new version) Level 0 (Signal Level) Input Signal Pitch Harmonic Model Noise Model Spectrum Channel 1 Energy Channel 2 Energy Channel 3 Energy ... Harmonic Model Harmonic Model Harmonic Model Output Signal Glottal/Source Information (LF Model) Vocal Tract Filter Lip FilterLevel 1 (Acoustic Level) An acoustically meaningful speech model
  • 19. Inverse Analysis of Speech Original Glottal Flow (Source Signal)
  • 20. Pitch Shifting powered by LLSM Original 50% Pitch 200% Pitch
  • 21. Pitch Shifting powered by LLSM Original 50% Pitch 200% Pitch Instants of vocal fold closure were revealed
  • 22. Reference ● A.V. Oppenheim, “Speech Analysis-Synthesis System Based on Homomorphic Filtering”. JASA (1969): Vol. 45, No. 2. ● Degottex, Gilles, et al. "Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis." Speech Communication 55.2 (2013): 278-294. ● H. K. Dunn, "The calculation of vowel resonances, and an electrical vocal tract", Journal of the Acoustical Society of America, 1950, vol. 22, p. 740-753. ● Pantazis, Yannis, and Yannis Stylianou. "Improving the modeling of the noise part in the harmonic plus noise model of speech." Acoustics, Speech and Signal Processing (2008). IEEE International Conference on.