Interactive Powerpoint_How to Master effective communication
The past, present and future of singing synthesis
1. Kanru Hua (華侃如)
June 19, 2016
The Past, Present and Future
of Singing Voice Modeling
2. Motivation
“You are making too many assumptions, this thing won’t work on real
speech signal.”
— Jont B. Allen
● What’s wrong with current and past researches in this area?
● What’s our next step?
3. What’s in a Speech/Singing Synthesizer
Parameter
Generator
Vocoder
Text / Music Score
Speech Audio
Generate pitch, duration and spectrum…
from input
Generate waveform from parameters
Vocoder
4. Part 1
History of Speech
Analysis/Synthesis
(http://clas.mq.edu.au/speech/synthesis/history_synthesis/)
5. History of Math & Acoustics
1600 1700 1800 1900 2000
Law of Forces/Motions,
Foundation of Calculus
Wave Equation,
Complex Number
Fourier/Laplace Transform,
Analog Circuits & Electromagnetism
Newton Bernoulli, Euler,
d‘Alembert
(http://www2.ling.su.se/staff/hartmut/kemplne.htm)
Gauss, Fourier, Laplace,
Riemann, Cauchy,
Kirchhoff, Heaviside
Filtering Theory, Digital Systems,
Sampling Theory, ...
6. History of Math & Acoustics
1600 1700 1800 1900 2000
Law of Forces/Motions,
Foundation of Calculus
Wave Equation,
Complex Number
Fourier/Laplace Transform,
Analog Circuits & Electromagnetism
Filtering Theory, Digital Systems,
Sampling Theory, ...
Newton Bernoulli, Euler,
d‘Alembert
Gauss, Fourier, Laplace,
Riemann, Cauchy,
Kirchhoff, Heaviside
(http://www2.ling.su.se/staff/hartmut/kemplne.htm)
= =
Frequency Response
8. 20th Century, the Dawn of Speech Processing
Cooley and Tukey (1965): Fast Fourier Transform
Oppenheim (1969): one of the earliest digital implementation of speech analysis/
synthesis
Input
Pitch
(source)
Cepstrum
(vocal tract filter)
Analysis Synthesis
Spectrum
Output
11. Quasi-static Assumption
Algorithms affected:
● Homomorphic Filtering
● PSOLA
● Linear Prediction & CELP & MLSA
● Sinusoidal Model
● Harmonic+Noise Model
● SMS & NBVPM
● WORLD & STRAIGHT (slightly)
12. Mis-represented Aperiodic Component
Popular belief:
1. Speech = periodic signal + aperiodic signal (breathing noise)
2. Aperiodic signal is filtered white noise
Aperiodic
Periodic (Friction)
13. Mis-represented Aperiodic Component
t
Algorithms affected:
● (Quasi-)Harmonic+Noise Model
● SMS & Sines+Noise+Transients Model
● WORLD & (TANDEM-)STRAIGHT
● Algorithms that do not model aperiodic component
○ Phase vocoder, CELP, MLSA, ...
14. Over-simplified Source-Filter Model
Tract FilterOscillator Lip Filter
Tract FilterOscillator
Source Filter
Assumption: source filter is independent from pitch
Equivalent assumption:
“When my pitch is higher by 12 semitones, my vocal folds still
oscillate at the same speed.”
Affected algorithms: all of those listed on page 11
16. “Neoclassical” Approaches to Speech Modeling
Tract
Source
Lip
t
f
f
Input
Inverse
Linear Prediction
(Atal & Schroeder,1967)
ARX
(Wen, et al., 1995)
ARX-LF
(Vincent, et al., 2005)
LF Model
(Liljencrants, Fant and
Lin, 1985)
OVE Synthesizer
(Fant, 1953)
17. “Neoclassical” Approaches to Speech Modeling
Degottex (2013): similar idea, but in frequency domain
Hua (2016, in progress): more robust under poor recording conditions; less
sensitive to processed input.
18. The Low Level Speech Model (new version)
Level 0
(Signal Level)
Input Signal
Pitch Harmonic Model Noise Model
Spectrum
Channel 1 Energy
Channel 2 Energy
Channel 3 Energy
...
Harmonic Model
Harmonic Model
Harmonic Model
Output Signal
Glottal/Source Information
(LF Model)
Vocal Tract Filter Lip FilterLevel 1
(Acoustic Level)
An acoustically meaningful speech model
21. Pitch Shifting powered by LLSM
Original
50% Pitch
200% Pitch
Instants of vocal fold closure were revealed
22. Reference
● A.V. Oppenheim, “Speech Analysis-Synthesis System Based on Homomorphic Filtering”. JASA
(1969): Vol. 45, No. 2.
● Degottex, Gilles, et al. "Mixed source model and its adapted vocal tract filter estimate for voice
transformation and synthesis." Speech Communication 55.2 (2013): 278-294.
● H. K. Dunn, "The calculation of vowel resonances, and an electrical vocal tract", Journal of the
Acoustical Society of America, 1950, vol. 22, p. 740-753.
● Pantazis, Yannis, and Yannis Stylianou. "Improving the modeling of the noise part in the harmonic
plus noise model of speech." Acoustics, Speech and Signal Processing (2008). IEEE International
Conference on.