SlideShare a Scribd company logo
1 of 18
Download to read offline
©Yuki Saito, 07/03/2017
TRAINING ALGORITHM TO DECEIVE
ANTI-SPOOFING VERIFICATION
FOR DNN-BASED SPEECH SYNTHESIS
Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
(The University of Tokyo)
ICASSP 2017 SP-L4.2
/17
 Issue: quality degradation in statistical parametric speech
synthesis due to over-smoothing of the speech params.
 Countermeasures: reproducing natural statistics
– 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.]
– Histogram[Ohtani et al., 2012.]
 Proposed: training algorithm to deceive an Anti-Spoofing
Verification (ASV) for DNN-based speech synthesis
– Tries to deceive the ASV which distinguishes natural / synthetic speech.
– Compensates distribution difference betw. natural / synthetic speech.
 Results:
– Improves the synthetic speech quality.
– Works comparably robustly against its hyper-parameter setting.
1
Outline of This Talk
/17
Conventional Training Algorithm:
Minimum Generation Error (MGE) Training
2
Generation
error
𝐿G 𝒄, ො𝒄
Linguistic
feats.
[Wu et al., 2016.]
Natural
speech
params.
𝐿G 𝒄, ො𝒄 =
1
𝑇
ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize
𝒄
ML-based
parameter
generation
Generated
speech
params.ො𝒄
Acoustic models
⋯
⋯
⋯
Frame
𝑡 = 1
Static-dynamic
mean vectors
Frame
𝑡 = 𝑇
/173
Issue of MGE Training:
Over-smoothing of Generated Speech Parameters
Natural MGE
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
These distributions are significantly different...
(GV [Toda et al., 2007.] explicitly compensates the 2nd moment.)
Narrow
/174
Proposed algorithm:
Training Algorithm to Deceive
Anti-Spoofing Verification (ASV)
/17
Anti-Spoofing Verification (ASV):
Discriminator to Prevent Spoofing Attacks w/ Speech
5
[Wu et al., 2016.] [Chen et al., 2015.]
𝐿D,1 𝒄 𝐿D,0 ො𝒄
𝐿D 𝒄, ො𝒄 = → Minimize−
1
𝑇
෍
𝑡=1
𝑇
log 𝐷 𝒄 𝑡 −
1
𝑇
෍
𝑡=1
𝑇
log 1 − 𝐷 ො𝒄 𝑡
ො𝒄
Cross entropy
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Generated
speech params.
𝒄Natural
speech params.
Feature
function
𝝓 ⋅
Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅
or
Loss to recognize
generated speech as generated
Loss to recognize
natural speech as natural
/17
Training Algorithm to Deceive ASV
6
𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D
𝐸 𝐿G
𝐸 𝐿D
𝐿D,1 ො𝒄 → Minimize
𝐿G 𝒄, ො𝒄
Linguistic
feats.
Natural
speech params. 𝒄
ML-based
parameter
generation
Generated
speech params.ො𝒄
Acoustic models
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
ASV 𝐷 ⋅
Loss to recognize
generated speech as natural
𝜔D: weight, 𝐸𝐿G
, 𝐸𝐿D
: expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄
Static-dynamic
mean vectors
/17
 ① Update the acoustic models
 ② Update the ASV
Iterative Optimization of Acoustic models and ASV
7
By iterating ① and ②, we construct the final acoustic models!
Fixed
Fixed
𝐿G 𝒄, ො𝒄
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Feature
function
𝝓 ⋅
or
/17
 Compensations of speech feats. through the feature function:
– Automatically-derived feats. such as auto-encoded feats.
– Conventional analytically-derived feats. such as GV
 Loss function for training the acoustic models:
– Combination of MGE and adversarial training [Goodfellow et al., 2014.]
 The effect of the adversarial training:
– Minimizes the Jensen-Shannon divergence betw. the distributions of
the natural data / generated data.
8
Discussions of Proposed Algorithm
/179
Distributions of Speech Parameters
Our algorithm alleviates the over-smoothing effect!
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
Natural MGE Proposed
Narrow
Wide as
natural speech
/17
 Global Variance (GV): [Toda et al., 2007.]
– 2nd moment of the parameter distribution
10
Compensation of Global Variance
Feature index
0 5 10 15 20
10-3
10-1
101
Globalvariance
Proposed
Natural
MGE
10-2
100
10-4
GV is NOT used for training, but compensated by the ASV!
/17
 Maximal Information Coefficient (MIC): [Reshef et al., 2011.]
– Values to quantify a nonlinear correlation b/w two variables
– Natural speech params. tend to have weak correlation [Ijima et al., 2016.]
11
Additional Effect:
Alleviation of Unnaturally Strong Correlation
Natural MGE
0 6 12 18 24
0.0
0.2
0.4
0.6
0.8
1.0
Strong
Weak
Proposed
0 6 12 18 24 0 6 12 18 24
Proposed algorithm not only compensates the GV,
but also makes the correlations among speech params. natural!
/1712
Experimental Evaluations
/17
Experimental Conditions
13
Dataset
ATR Japanese speech database
(phonetic balanced 503 sentences)
Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling)
Linguistic feats.
274-dimensional vector
(phoneme, accent type, frame position, etc...)
Speech params.
Mel-cepstral coefficients (0th-through-24th),
𝐹0, 5-band aperiodicity
Prediction params.
Mel-cepstral coefficients
(the others were NOT predicted)
Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01)
Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear)
ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
/17
Initialization, Training, and Objective Evaluation
14
 Initialization:
– Acoustic models: conventional MGE training
– ASV: distinguish natural / generated speech after the MGE training
 Training:
– Acoustic models: update with the proposed algorithm
– ASV: distinguish natural / generated speech after updating the acoustic
models
 Objective evaluation:
– Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate
Spoofing rate =
# of the spoofing synthetic speech params.
Total # of the synthetic speech params.
We calculated these values w/ various 𝜔D.
/17
Results of Objective Evaluations
15
Generation loss Spoofing rate
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
0.45
0.50
0.55
0.60
0.65
0.70
0.75
1.0
0.8
0.6
0.4
0.2
0.0
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
Got
worse when 𝜔D > 0.3,
spoofing rate > 99%
Got
better
Our algorithm makes the generation loss worse
but
can train the acoustic models to deceive the ASV!
/17
Results of Subjective Evaluations
in Terms of Speech Quality
16
Proposed
𝜔D = 1.0
Proposed
𝜔D = 0.3
MGE
𝜔D = 0.0
Preference score (w/ 8 listeners)
0.0 0.2 0.4 0.6 0.8 1.0
Got
better
NO
significant
difference
Our algorithm improves the synthetic speech quality
and
works comparably robustly against its hyper-parameter setting!
Error bars denote 95% confidence intervals.
Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
/17
Conclusion
 Purpose:
– Improving the speech quality of statistical parametric speech synthesis
 Proposed:
– Training algorithm to deceive an ASV
• Compensates the difference b/w distributions of natural /
generated speech params. using adversarial training
 Results:
– Improved the speech quality compared to conventional training
– Worked comparably robustly against its hyper-parameter setting
 Future work:
– Devising temporal- and linguistic-dependent ASV
– Extending our algorithm to generate 𝐹0 and duration
17

More Related Content

What's hot

音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言Shinnosuke Takamichi
 
博士論文公聴会スライド
博士論文公聴会スライド博士論文公聴会スライド
博士論文公聴会スライドTeruhiko Takagi
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスShinnosuke Takamichi
 
音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識NU_I_TODALAB
 
外国人留学生日本語の音声合成における 話者性を保持した韻律補正
外国人留学生日本語の音声合成における話者性を保持した韻律補正外国人留学生日本語の音声合成における話者性を保持した韻律補正
外国人留学生日本語の音声合成における 話者性を保持した韻律補正Shinnosuke Takamichi
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
 
ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向Yuma Koizumi
 
距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知NU_I_TODALAB
 
ディープラーニング入門 ~ 画像処理・自然言語処理について ~
ディープラーニング入門 ~ 画像処理・自然言語処理について ~ディープラーニング入門 ~ 画像処理・自然言語処理について ~
ディープラーニング入門 ~ 画像処理・自然言語処理について ~Kensuke Otsuki
 
統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展Shinnosuke Takamichi
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討Shinnosuke Takamichi
 
深層学習を利用した音声強調
深層学習を利用した音声強調深層学習を利用した音声強調
深層学習を利用した音声強調Yuma Koizumi
 
[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement Learning[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement LearningDeep Learning JP
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス Shinnosuke Takamichi
 
【DL輪読会】Dropout Reduces Underfitting
【DL輪読会】Dropout Reduces Underfitting【DL輪読会】Dropout Reduces Underfitting
【DL輪読会】Dropout Reduces UnderfittingDeep Learning JP
 
Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法NU_I_TODALAB
 
CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換NU_I_TODALAB
 

What's hot (20)

音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言
 
Ea2015 7for ss
Ea2015 7for ssEa2015 7for ss
Ea2015 7for ss
 
Kameoka2017 ieice03
Kameoka2017 ieice03Kameoka2017 ieice03
Kameoka2017 ieice03
 
博士論文公聴会スライド
博士論文公聴会スライド博士論文公聴会スライド
博士論文公聴会スライド
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパス
 
音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識
 
外国人留学生日本語の音声合成における 話者性を保持した韻律補正
外国人留学生日本語の音声合成における話者性を保持した韻律補正外国人留学生日本語の音声合成における話者性を保持した韻律補正
外国人留学生日本語の音声合成における 話者性を保持した韻律補正
 
音声認識と深層学習
音声認識と深層学習音声認識と深層学習
音声認識と深層学習
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
 
ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向
 
距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知
 
ディープラーニング入門 ~ 画像処理・自然言語処理について ~
ディープラーニング入門 ~ 画像処理・自然言語処理について ~ディープラーニング入門 ~ 画像処理・自然言語処理について ~
ディープラーニング入門 ~ 画像処理・自然言語処理について ~
 
統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
 
深層学習を利用した音声強調
深層学習を利用した音声強調深層学習を利用した音声強調
深層学習を利用した音声強調
 
[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement Learning[DL輪読会]Inverse Constrained Reinforcement Learning
[DL輪読会]Inverse Constrained Reinforcement Learning
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス
 
【DL輪読会】Dropout Reduces Underfitting
【DL輪読会】Dropout Reduces Underfitting【DL輪読会】Dropout Reduces Underfitting
【DL輪読会】Dropout Reduces Underfitting
 
Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法
 
CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換
 

Viewers also liked

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asjYuki Saito
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"Shinnosuke Takamichi
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputShinnosuke Takamichi
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用NU_I_TODALAB
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”Shinnosuke Takamichi
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]Shinnosuke Takamichi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応Shinnosuke Takamichi
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”Shinnosuke Takamichi
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)Shinnosuke Takamichi
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Shinnosuke Takamichi
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習Shinnosuke Takamichi
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアルShunsuke Ono
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化Shunsuke Ono
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)Daichi Kitamura
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例Yahoo!デベロッパーネットワーク
 

Viewers also liked (17)

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asj
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 

Similar to Saito2017icassp

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfYuki Saito
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Priyanka Reddy
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdfrahul mishra
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...IJECEIAES
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpomosi2005
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmtJAEMINJEONG5
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion RecognitionSeoul National University
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...Lifeng (Aaron) Han
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...ssuser849b73
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET Journal
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_Dia Abdulkerim
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 

Similar to Saito2017icassp (20)

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdf
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
 
Une18apsipa
Une18apsipaUne18apsipa
Une18apsipa
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 

More from Yuki Saito

Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告Yuki Saito
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_ascYuki Saito
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUSYuki Saito
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversionYuki Saito
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentationYuki Saito
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn MeetingYuki Saito
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 readingYuki Saito
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumnYuki Saito
 
ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020Yuki Saito
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_publishedYuki Saito
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNAYuki Saito
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationYuki Saito
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会Yuki Saito
 
saito2017asj_tts
saito2017asj_ttssaito2017asj_tts
saito2017asj_ttsYuki Saito
 

More from Yuki Saito (20)

Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_asc
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversion
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentation
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn Meeting
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 reading
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumn
 
ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_published
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNA
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generation
 
Saito19asj_s
Saito19asj_sSaito19asj_s
Saito19asj_s
 
Saito18sp03
Saito18sp03Saito18sp03
Saito18sp03
 
Saito18asj_s
Saito18asj_sSaito18asj_s
Saito18asj_s
 
Saito17asjA
Saito17asjASaito17asjA
Saito17asjA
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会
 
miyoshi17sp07
miyoshi17sp07miyoshi17sp07
miyoshi17sp07
 
saito2017asj_tts
saito2017asj_ttssaito2017asj_tts
saito2017asj_tts
 

Recently uploaded

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...Henrik Hanke
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 

Recently uploaded (20)

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 

Saito2017icassp

  • 1. ©Yuki Saito, 07/03/2017 TRAINING ALGORITHM TO DECEIVE ANTI-SPOOFING VERIFICATION FOR DNN-BASED SPEECH SYNTHESIS Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari (The University of Tokyo) ICASSP 2017 SP-L4.2
  • 2. /17  Issue: quality degradation in statistical parametric speech synthesis due to over-smoothing of the speech params.  Countermeasures: reproducing natural statistics – 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.] – Histogram[Ohtani et al., 2012.]  Proposed: training algorithm to deceive an Anti-Spoofing Verification (ASV) for DNN-based speech synthesis – Tries to deceive the ASV which distinguishes natural / synthetic speech. – Compensates distribution difference betw. natural / synthetic speech.  Results: – Improves the synthetic speech quality. – Works comparably robustly against its hyper-parameter setting. 1 Outline of This Talk
  • 3. /17 Conventional Training Algorithm: Minimum Generation Error (MGE) Training 2 Generation error 𝐿G 𝒄, ො𝒄 Linguistic feats. [Wu et al., 2016.] Natural speech params. 𝐿G 𝒄, ො𝒄 = 1 𝑇 ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ Frame 𝑡 = 1 Static-dynamic mean vectors Frame 𝑡 = 𝑇
  • 4. /173 Issue of MGE Training: Over-smoothing of Generated Speech Parameters Natural MGE 21st mel-cepstral coefficient 23rdmel-cepstral coefficient These distributions are significantly different... (GV [Toda et al., 2007.] explicitly compensates the 2nd moment.) Narrow
  • 5. /174 Proposed algorithm: Training Algorithm to Deceive Anti-Spoofing Verification (ASV)
  • 6. /17 Anti-Spoofing Verification (ASV): Discriminator to Prevent Spoofing Attacks w/ Speech 5 [Wu et al., 2016.] [Chen et al., 2015.] 𝐿D,1 𝒄 𝐿D,0 ො𝒄 𝐿D 𝒄, ො𝒄 = → Minimize− 1 𝑇 ෍ 𝑡=1 𝑇 log 𝐷 𝒄 𝑡 − 1 𝑇 ෍ 𝑡=1 𝑇 log 1 − 𝐷 ො𝒄 𝑡 ො𝒄 Cross entropy 𝐿D 𝒄, ො𝒄 1: natural 0: generated Generated speech params. 𝒄Natural speech params. Feature function 𝝓 ⋅ Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅ or Loss to recognize generated speech as generated Loss to recognize natural speech as natural
  • 7. /17 Training Algorithm to Deceive ASV 6 𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D 𝐸 𝐿G 𝐸 𝐿D 𝐿D,1 ො𝒄 → Minimize 𝐿G 𝒄, ො𝒄 Linguistic feats. Natural speech params. 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ ASV 𝐷 ⋅ Loss to recognize generated speech as natural 𝜔D: weight, 𝐸𝐿G , 𝐸𝐿D : expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄 Static-dynamic mean vectors
  • 8. /17  ① Update the acoustic models  ② Update the ASV Iterative Optimization of Acoustic models and ASV 7 By iterating ① and ②, we construct the final acoustic models! Fixed Fixed 𝐿G 𝒄, ො𝒄 Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D 𝒄, ො𝒄 1: natural 0: generated Feature function 𝝓 ⋅ or
  • 9. /17  Compensations of speech feats. through the feature function: – Automatically-derived feats. such as auto-encoded feats. – Conventional analytically-derived feats. such as GV  Loss function for training the acoustic models: – Combination of MGE and adversarial training [Goodfellow et al., 2014.]  The effect of the adversarial training: – Minimizes the Jensen-Shannon divergence betw. the distributions of the natural data / generated data. 8 Discussions of Proposed Algorithm
  • 10. /179 Distributions of Speech Parameters Our algorithm alleviates the over-smoothing effect! 21st mel-cepstral coefficient 23rdmel-cepstral coefficient Natural MGE Proposed Narrow Wide as natural speech
  • 11. /17  Global Variance (GV): [Toda et al., 2007.] – 2nd moment of the parameter distribution 10 Compensation of Global Variance Feature index 0 5 10 15 20 10-3 10-1 101 Globalvariance Proposed Natural MGE 10-2 100 10-4 GV is NOT used for training, but compensated by the ASV!
  • 12. /17  Maximal Information Coefficient (MIC): [Reshef et al., 2011.] – Values to quantify a nonlinear correlation b/w two variables – Natural speech params. tend to have weak correlation [Ijima et al., 2016.] 11 Additional Effect: Alleviation of Unnaturally Strong Correlation Natural MGE 0 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 Strong Weak Proposed 0 6 12 18 24 0 6 12 18 24 Proposed algorithm not only compensates the GV, but also makes the correlations among speech params. natural!
  • 14. /17 Experimental Conditions 13 Dataset ATR Japanese speech database (phonetic balanced 503 sentences) Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling) Linguistic feats. 274-dimensional vector (phoneme, accent type, frame position, etc...) Speech params. Mel-cepstral coefficients (0th-through-24th), 𝐹0, 5-band aperiodicity Prediction params. Mel-cepstral coefficients (the others were NOT predicted) Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01) Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear) ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
  • 15. /17 Initialization, Training, and Objective Evaluation 14  Initialization: – Acoustic models: conventional MGE training – ASV: distinguish natural / generated speech after the MGE training  Training: – Acoustic models: update with the proposed algorithm – ASV: distinguish natural / generated speech after updating the acoustic models  Objective evaluation: – Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate Spoofing rate = # of the spoofing synthetic speech params. Total # of the synthetic speech params. We calculated these values w/ various 𝜔D.
  • 16. /17 Results of Objective Evaluations 15 Generation loss Spoofing rate 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D 0.45 0.50 0.55 0.60 0.65 0.70 0.75 1.0 0.8 0.6 0.4 0.2 0.0 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D Got worse when 𝜔D > 0.3, spoofing rate > 99% Got better Our algorithm makes the generation loss worse but can train the acoustic models to deceive the ASV!
  • 17. /17 Results of Subjective Evaluations in Terms of Speech Quality 16 Proposed 𝜔D = 1.0 Proposed 𝜔D = 0.3 MGE 𝜔D = 0.0 Preference score (w/ 8 listeners) 0.0 0.2 0.4 0.6 0.8 1.0 Got better NO significant difference Our algorithm improves the synthetic speech quality and works comparably robustly against its hyper-parameter setting! Error bars denote 95% confidence intervals. Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
  • 18. /17 Conclusion  Purpose: – Improving the speech quality of statistical parametric speech synthesis  Proposed: – Training algorithm to deceive an ASV • Compensates the difference b/w distributions of natural / generated speech params. using adversarial training  Results: – Improved the speech quality compared to conventional training – Worked comparably robustly against its hyper-parameter setting  Future work: – Devising temporal- and linguistic-dependent ASV – Extending our algorithm to generate 𝐹0 and duration 17