SlideShare una empresa de Scribd logo
1 de 29
Prior Distribution Design for
Music Bleeding-Sound Reduction Based on
Nonnegative Matrix Factorization
Yusaku Mizobuchi , Daichi Kitamura s
Tomohiko Nakamura , Hiroshi Saruwatari s
Yu Takahashi , Kazunobu Kondo s
13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference
Session: OD-SLA-3: Speech Enhancement and Separation
Time: Thu., 16 Dec, 10:45-11:00 (UTC +9)
National Institute of Technology, Kagawa College, Japan
The University of Tokyo, Japan
Yamaha Corporation, Japan
• Background
– Blind source separation
– Bleeding sound and it’s reduction problem
• Conventional Methods
– Nonnegative matrix factorization(NMF)
– Independent low-rank matrix analysis (ILRMA)
– Linear demixed domain multichannel NMF (DMNMF)
– Time-channel NMF (TCNMF)
• Proposed Method
– Introduce an a priori generative model for relative leakage levels
– Estimate parameters based on maximum a posteriori
• Experiments
Contents
2
• Background
– Blind source separation
– Bleeding sound and it’s reduction problem
• Conventional Methods
– Nonnegative matrix factorization(NMF)
– Independent low-rank matrix analysis (ILRMA)
– Linear demixed domain multichannel NMF (DMNMF)
– Time-channel NMF (TCNMF)
• Proposed Method
– Introduce an a priori generative model for relative leakage levels
– Estimate parameters based on maximum a posteriori
• Experiments
Contents
3
• Blind source separation (BSS)
– extracts audio sources from a multichannel mixture
– assumes that the mixing system is unknown and estimates the
demixing system
• Application
– Preprocessing for music analysis, automatic score production, and so on
– Sound-reinforcement of music liveshow
– High quality music recording in a studio
4
Background
Demixing
system
unknown estimate
Mixing
system
• Typical microphone placement
– Arranged at spatially close positions to each of sound sources
• The bleeding sound
– Non-target sources are also captured
– Bleeding sound is too small
Background
5
: Microphone
Target sound for mic.
Bleeding sound from
non-target sources
• Bleeding sound reduction problem
– is similar to a multichannel audio source separation problem
– aims to remove the interfering bleeding sounds from the non-target
sources
• Peculiarity of this problem
(a) The signal-to-noise ratio (SNR) of the observed signal is relatively high
because of a close miking setup
(b) The target source for each microphone is known
because of a close miking setup
(c) The microphones are spatially apart from each other (e.g., more than 2m)
Spatial aliasing occurs
Phase information (observed time differences between microphones) is
unreliable
BSS that utilizes phase information basically fails to separate bleeding
sounds
(d) Signal processing of music
The required separation quality is relatively high
Background
6
• Background
– Blind source separation
– Bleeding sound and it’s reduction problem
• Conventional Methods
– Nonnegative matrix factorization(NMF)
– Independent low-rank matrix analysis (ILRMA)
– Linear demixed domain multichannel NMF (DMNMF)
– Time-channel NMF (TCNMF)
• Proposed Method
– Introduce an a priori generative model for relative leakage levels
– Estimate parameters based on maximum a posteriori
• Experiments
Contents
7
Low rank modeling method
• Nonnegative matrix factorization (NMF) [Lee+, 1999]
– Low-rank decomposition with nonnegative constraint
• Limited number of nonnegative bases and their coefficients
– Spectrogram is decomposed in acoustic signal processing
• Frequency appearing spectral patterns and their activations
8
Amplitude
Amplitude
Input data matrix
(amplitude spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(time-varying gains)
Time
: # of frequency bins
: # of time flames
: # of basis
Time
Frequency
Frequency
Basis
Activation
• Optimizing parameters in NMF
– Define a cost function and minimize it
– Any cost function can be used
• Squared Euclidian distance etc.
• Using Kullback-Leibler (KL) divergence in the proposed method
– Efficient iterative optimization
• Multiplicative update rules (auxiliary function technique) [Lee, 2001]
Low rank modeling method
9
When the cost function is a squared Euclidian distance
• independent low-rank matrix analysis (ILRMA)
– Estimate frequency-wise complex demixing matrix using amplitude and
phase information
– The power spectrogram of each source is modeled by a low-rank matrix
Related methods: ILRMA
10
[Kitamura+, 2016]
…
…
…
Observed
Demixing
matrix
Estimated
Time
Frequency
Frequency
Time
Update demixing matrix so that estimated signals
have low-rank structure in time-frequency domain
STFT
Low-rank decomposition
When spatial aliasing occurs in the observed signal, BSS is difficult
The mixing system
in time- frequency domain
Complex-valued matrix
Estimated
spectrograms
Observed
spectrograms
Demixing
matrices
• Linear demixed domain multichannel NMF (DMNMF)
– Estimate frequency-wise real demixing matrix using power information
– The power spectrogram of each source is modeled by a low-rank matrix
– The power-based BSS (don’t use phase information)
Related method: DMNMF
11
[Taniguchi+, 2017]
Estimated power
spectrograms
…
…
…
Observed power
spectrograms
Demixing
matrices
The mixing system
in time- frequency domain
BSS may be possible even when
spatial aliasing occurs in the observed signal
Real valued matrix
(nonnegative)
• Time-channel NMF [Togami+, 2010]
– apply NMF frequency-wise time-channel signal in the amplitude
domain( )
– estimate mixing matrix and time-source activation matrix
– The amplitude-based BSS (don’t use phase information)
Related method: TCNMF
12
Cost function
Time
Source
Time
Time
Frequency
Observed amplitude
spectrograms
(Channel-wise) Frequency
Frequency
Mixing matrices
(Frequency-wise)
Time-source activation
matrices
(Frequency-wise)
Observed amplitude
spectrograms
(Frequency-wise)
BSS may be possible even when
spatial aliasing occurs in the observed signal
• In the determined case, The minimization problem has a trivial
solution , when
– Unable to separate sources in this situation
• To avoid this trivial solution, introduce a sparse regularizer in
– : A weight coefficient for regularization
– : A time-frame-wise vector in
– : the norm
Related method: TCNMF
13
Cost function
Regularizer
Time
Frequency
Source
…
…
Mixing matrices
(Frequency-wise)
Time-source activation matrices
(Frequency-wise)
• Background
– Blind source separation
– Bleeding sound and it’s reduction problem
• Conventional Methods
– Nonnegative matrix factorization(NMF)
– Independent low-rank matrix analysis (ILRMA)
– Linear demixed domain multichannel NMF (DMNMF)
– Time-channel NMF (TCNMF)
• Proposed Method
– Introduce an a priori generative model for relative leakage levels
– Estimate parameters based on maximum a posteriori
• Experiments
Contents
14
Motivation
• To avoid the trivial solution of , TCNMF apply the sparse
regularizer for
– W-disjoint-orthogonality assumption [Yılmaz and Rickard, 2004]
• at most one source is active in each time-frequency slot
(suitable only for speech mixtures)
– In music signals, multiple sound sources colide at the same frequency
• Sparse regularizer in bring about degradation of sound quality
• Approach in proposed method
– To avoid the trivial solution of , regularize instead of regularizing
• Diagonal elements are set to 1
• Off-diagonal elements are regularized to small value (i.e., 0.1~0.5)
15
1
1
1
1
Regularizing relative leakage
levels of bleeding sound
Diagonal elements
are set to 1
Off-diagonal elements
are small value
• Apply TCNMF to (observed amplitude signal) as in
the conventional method
• Introduce an a priori generative model to both diagonal and off-
diagonal elements of instead of regularizing
– The proposed method can be interpreted as MAP estimation
TCNMF based on MAP estimation
16
Time
Source
Time
Frequency
Mixing matrices
(Frequency-wise)
Time-source activation
matrices
(Frequency-wise)
Observed amplitude
spectrograms
(Frequency-wise)
Conventional TCNMF
regularizes
Proposed TCNMF
regularizes
Frequency
( is the scale parameter)
Random variable
Probabilistic
density
function
• Introduce the following a priori generative model into
– We can avoid
by setting the shape parameter
to
TCNMF based on MAP estimation
17
Dirac’s delta distribution (restricting )
Gamma distribution: ( is the shape parameter)
Diagonal elements
Off-diagonal elements
Trivial solution
Optimization of proposed TCNMF
• Cost function for MAP estimation
– Estimate by minimizing the above cost function
18
Taking a negative logarithm
Substitute prior distributions
Equivalent
Regularizer for
diagonal elements
Regularizer for
off-diagonal elements
Nonnegative prior
: indicator function
Optimization of proposed TCNMF
• Minimization of equals the following problem
– Use majorization-minimization (MM) algorithm [Hunter+, 2004]
19
Data Fidelity term
Regularizer
Optimization of proposed TCNMF
• Update rules of
– : element-wise multiplication
– ー : element-wise division
– : an matrix containing only ones
– : vector that consists of the diagonal element of the argument
– : matrix transpose
20
• (i) The KL div. has a scale-dependent property
• (ii) The diagonal elements of are restricted to be unity
– The off-diagonal elements correspond to the relative leakage levels of
bleeding sound
• From (i) and (ii),
– an observed gain of affects the balancing between fidelity term and
regularizer
Scale dependency problem of regularizer
21
is an arbitrary coefficient
Fidelity term Regularizer
• We also parameterize the observed gain as
– The smaller , the stronger the regularizer
• Normalize observed signal in advance
– After the normalization, a dynamic range of becomes
• Apply Wiener filtering to the complex-valued observed signal
Scale dependency problem of regularizer
22
Wiener filter
• Background
– Blind source separation
– Bleeding sound and it’s reduction problem
• Conventional Methods
– Nonnegative matrix factorization(NMF)
– Independent low-rank matrix analysis (ILRMA)
– Linear demixed domain multichannel NMF (DMNMF)
– Time-channel NMF (TCNMF)
• Proposed Method
– Introduce an a priori generative model for relative leakage levels
– Estimate parameters based on maximum a posteriori
• Experiments
Contents
23
Ob.
Cl.
Pf.
Tb.
Conditions
• Comparing blind bleeding-sound reduction performance
– Compared methods
• IVA,ILRMA,DMNMF,Conventional TCNMF
– Evaluation criterion
• Source-to-distortion ratio (SDR) [Vincent+, 2006] improvement
• Dry sources
– “songKitamura” [Kitamura+, 2015]
– Create 4 observed signals from 4 instruments
24
• Simulation of bleeding sound
– Dry sources were mixed by using the frequency-wise nonnegative
random mixing matrix
– 10 observed mixtures were prepared using 10 different mixing matrix
• Average SDR in observed signals
– Ob. + Bleeding sound:18.8 [dB]
– Cl. + Bleeding sound :15.0 [dB]
– Pf. + Bleeding sound :14.7 [dB]
– Tb. + Bleeding sound :8.6 [dB]
– These are relatively high
1
1
1
1
Conditions
25
Diagonal elements
are set to unity
Off-diagonal elements are
set to uniformly
distributed random values
in the range (0, 0.2)
Source Time
Frequency
Dry sources
(Complex spectrogram)
Nonnegative
mixing matrix
Time
Observed signals
(Complex spectrogram)
:Ob.
:Cl.
:Pf.
:Tb.
Frequency
Frequency
Conditions
• Other conditions
26
Sampling frequency 44.1 kHz
Window function Hamming window
Window length 4096 points (approximately 92.9 ms)
Shift length 2049 points (approximately 46.5 ms)
Number of iterations 200
Initial value of Diagonal elements are set to unity
Off-Diagonal elements are uniform
random values in the range (0, 0.1)
Initial value of Uniform random values in the range
(0, 1)
Shape parameter 1.25
Scale parameter 0.6
Observed gain parameter 0.006
Weight coefficient for regularizer 0.56
Number of bases 10,30,80
Results
27
Phase-sensitive BSS
Phase-insensitive BSS
Results
28
Ob. Cl. Pf. Tb.
Observed
Conventional
TCNMF
Proposed
TCNMF
• Demonstrations
Conclusion
• Purpose
– Reducing the bleeding sound in music signals
• SNR in observed signal is high
• The microphones are apart from each other
• Motivation
– Phase-sensitive BSS failed to reduce bleeding sounds
– DMNMF which is a phase-insensitive BSS method also failed
• Due to the difficulty of maximum likelihood estimation with many parameters
– TCNMF effective for reduction of bleeding sound
• However, the regularization to the source-time matrix degrades the
sound quality of the separated signal.
• Proposed method
– Introducing a priori distribution in the gain matrix of TCNMF
• Based on the assumption that the volume of the bleeding sound is small
• Proposed method outperformed conventional methods
29
Thank you for your attention.

Más contenido relacionado

La actualidad más candente

Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
奈良先端大 情報科学研究科
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
奈良先端大 情報科学研究科
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
奈良先端大 情報科学研究科
 
Final year project
Final year projectFinal year project
Final year project
Low Jun Jie
 

La actualidad más candente (20)

Ica2016 312 saruwatari
Ica2016 312 saruwatariIca2016 312 saruwatari
Ica2016 312 saruwatari
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 
Apsipa2016for ss
Apsipa2016for ssApsipa2016for ss
Apsipa2016for ss
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
 
Oceans13 Presentation
Oceans13 PresentationOceans13 Presentation
Oceans13 Presentation
 
Defying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital ConversionDefying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital Conversion
 
COGNITIVE RADIO
COGNITIVE RADIOCOGNITIVE RADIO
COGNITIVE RADIO
 
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
 
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
 
Ibfd presentation
Ibfd presentationIbfd presentation
Ibfd presentation
 
Overview of sampling
Overview of samplingOverview of sampling
Overview of sampling
 
Aliasing and Antialiasing filter
Aliasing and Antialiasing filterAliasing and Antialiasing filter
Aliasing and Antialiasing filter
 
Mixed presenration
Mixed presenrationMixed presenration
Mixed presenration
 
Sampling and Reconstruction of Signal using Aliasing
Sampling and Reconstruction of Signal using AliasingSampling and Reconstruction of Signal using Aliasing
Sampling and Reconstruction of Signal using Aliasing
 
Final year project
Final year projectFinal year project
Final year project
 
Pulse Compression Method for Radar Signal Processing
Pulse Compression Method for Radar Signal ProcessingPulse Compression Method for Radar Signal Processing
Pulse Compression Method for Radar Signal Processing
 
EC6651 COMMUNICATION ENGINEERING UNIT 2
EC6651 COMMUNICATION ENGINEERING UNIT 2EC6651 COMMUNICATION ENGINEERING UNIT 2
EC6651 COMMUNICATION ENGINEERING UNIT 2
 

Similar a Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization

Final presentation
Final presentationFinal presentation
Final presentation
Rohan Lad
 
GETA fall 07- OFDM with MathCAD.ppt
GETA fall 07- OFDM with MathCAD.pptGETA fall 07- OFDM with MathCAD.ppt
GETA fall 07- OFDM with MathCAD.ppt
RUPALIAGARWAL14
 

Similar a Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization (20)

Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
 
Introduction to DSP
Introduction to DSPIntroduction to DSP
Introduction to DSP
 
time based ranging via uwb radios
time based ranging via uwb radiostime based ranging via uwb radios
time based ranging via uwb radios
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Equalization.pdf
Equalization.pdfEqualization.pdf
Equalization.pdf
 
Digital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital Filters
 
PCM-Part 1.pptx
PCM-Part 1.pptxPCM-Part 1.pptx
PCM-Part 1.pptx
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
 
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
 
Final presentation
Final presentationFinal presentation
Final presentation
 
GETA fall 07- OFDM with MathCAD.ppt
GETA fall 07- OFDM with MathCAD.pptGETA fall 07- OFDM with MathCAD.ppt
GETA fall 07- OFDM with MathCAD.ppt
 
Dynamic sub arrays for Hybrid Precoding in Wide Band Millimeter Wave Wireless...
Dynamic sub arrays for Hybrid Precoding in Wide Band Millimeter Wave Wireless...Dynamic sub arrays for Hybrid Precoding in Wide Band Millimeter Wave Wireless...
Dynamic sub arrays for Hybrid Precoding in Wide Band Millimeter Wave Wireless...
 

Más de Kitamura Laboratory

Más de Kitamura Laboratory (20)

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズム
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
 
独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム
 
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
 
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
 

Último

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Último (20)

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 

Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization

  • 1. Prior Distribution Design for Music Bleeding-Sound Reduction Based on Nonnegative Matrix Factorization Yusaku Mizobuchi , Daichi Kitamura s Tomohiko Nakamura , Hiroshi Saruwatari s Yu Takahashi , Kazunobu Kondo s 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Session: OD-SLA-3: Speech Enhancement and Separation Time: Thu., 16 Dec, 10:45-11:00 (UTC +9) National Institute of Technology, Kagawa College, Japan The University of Tokyo, Japan Yamaha Corporation, Japan
  • 2. • Background – Blind source separation – Bleeding sound and it’s reduction problem • Conventional Methods – Nonnegative matrix factorization(NMF) – Independent low-rank matrix analysis (ILRMA) – Linear demixed domain multichannel NMF (DMNMF) – Time-channel NMF (TCNMF) • Proposed Method – Introduce an a priori generative model for relative leakage levels – Estimate parameters based on maximum a posteriori • Experiments Contents 2
  • 3. • Background – Blind source separation – Bleeding sound and it’s reduction problem • Conventional Methods – Nonnegative matrix factorization(NMF) – Independent low-rank matrix analysis (ILRMA) – Linear demixed domain multichannel NMF (DMNMF) – Time-channel NMF (TCNMF) • Proposed Method – Introduce an a priori generative model for relative leakage levels – Estimate parameters based on maximum a posteriori • Experiments Contents 3
  • 4. • Blind source separation (BSS) – extracts audio sources from a multichannel mixture – assumes that the mixing system is unknown and estimates the demixing system • Application – Preprocessing for music analysis, automatic score production, and so on – Sound-reinforcement of music liveshow – High quality music recording in a studio 4 Background Demixing system unknown estimate Mixing system
  • 5. • Typical microphone placement – Arranged at spatially close positions to each of sound sources • The bleeding sound – Non-target sources are also captured – Bleeding sound is too small Background 5 : Microphone Target sound for mic. Bleeding sound from non-target sources
  • 6. • Bleeding sound reduction problem – is similar to a multichannel audio source separation problem – aims to remove the interfering bleeding sounds from the non-target sources • Peculiarity of this problem (a) The signal-to-noise ratio (SNR) of the observed signal is relatively high because of a close miking setup (b) The target source for each microphone is known because of a close miking setup (c) The microphones are spatially apart from each other (e.g., more than 2m) Spatial aliasing occurs Phase information (observed time differences between microphones) is unreliable BSS that utilizes phase information basically fails to separate bleeding sounds (d) Signal processing of music The required separation quality is relatively high Background 6
  • 7. • Background – Blind source separation – Bleeding sound and it’s reduction problem • Conventional Methods – Nonnegative matrix factorization(NMF) – Independent low-rank matrix analysis (ILRMA) – Linear demixed domain multichannel NMF (DMNMF) – Time-channel NMF (TCNMF) • Proposed Method – Introduce an a priori generative model for relative leakage levels – Estimate parameters based on maximum a posteriori • Experiments Contents 7
  • 8. Low rank modeling method • Nonnegative matrix factorization (NMF) [Lee+, 1999] – Low-rank decomposition with nonnegative constraint • Limited number of nonnegative bases and their coefficients – Spectrogram is decomposed in acoustic signal processing • Frequency appearing spectral patterns and their activations 8 Amplitude Amplitude Input data matrix (amplitude spectrogram) Basis matrix (spectral patterns) Activation matrix (time-varying gains) Time : # of frequency bins : # of time flames : # of basis Time Frequency Frequency Basis Activation
  • 9. • Optimizing parameters in NMF – Define a cost function and minimize it – Any cost function can be used • Squared Euclidian distance etc. • Using Kullback-Leibler (KL) divergence in the proposed method – Efficient iterative optimization • Multiplicative update rules (auxiliary function technique) [Lee, 2001] Low rank modeling method 9 When the cost function is a squared Euclidian distance
  • 10. • independent low-rank matrix analysis (ILRMA) – Estimate frequency-wise complex demixing matrix using amplitude and phase information – The power spectrogram of each source is modeled by a low-rank matrix Related methods: ILRMA 10 [Kitamura+, 2016] … … … Observed Demixing matrix Estimated Time Frequency Frequency Time Update demixing matrix so that estimated signals have low-rank structure in time-frequency domain STFT Low-rank decomposition When spatial aliasing occurs in the observed signal, BSS is difficult The mixing system in time- frequency domain Complex-valued matrix Estimated spectrograms Observed spectrograms Demixing matrices
  • 11. • Linear demixed domain multichannel NMF (DMNMF) – Estimate frequency-wise real demixing matrix using power information – The power spectrogram of each source is modeled by a low-rank matrix – The power-based BSS (don’t use phase information) Related method: DMNMF 11 [Taniguchi+, 2017] Estimated power spectrograms … … … Observed power spectrograms Demixing matrices The mixing system in time- frequency domain BSS may be possible even when spatial aliasing occurs in the observed signal Real valued matrix (nonnegative)
  • 12. • Time-channel NMF [Togami+, 2010] – apply NMF frequency-wise time-channel signal in the amplitude domain( ) – estimate mixing matrix and time-source activation matrix – The amplitude-based BSS (don’t use phase information) Related method: TCNMF 12 Cost function Time Source Time Time Frequency Observed amplitude spectrograms (Channel-wise) Frequency Frequency Mixing matrices (Frequency-wise) Time-source activation matrices (Frequency-wise) Observed amplitude spectrograms (Frequency-wise) BSS may be possible even when spatial aliasing occurs in the observed signal
  • 13. • In the determined case, The minimization problem has a trivial solution , when – Unable to separate sources in this situation • To avoid this trivial solution, introduce a sparse regularizer in – : A weight coefficient for regularization – : A time-frame-wise vector in – : the norm Related method: TCNMF 13 Cost function Regularizer Time Frequency Source … … Mixing matrices (Frequency-wise) Time-source activation matrices (Frequency-wise)
  • 14. • Background – Blind source separation – Bleeding sound and it’s reduction problem • Conventional Methods – Nonnegative matrix factorization(NMF) – Independent low-rank matrix analysis (ILRMA) – Linear demixed domain multichannel NMF (DMNMF) – Time-channel NMF (TCNMF) • Proposed Method – Introduce an a priori generative model for relative leakage levels – Estimate parameters based on maximum a posteriori • Experiments Contents 14
  • 15. Motivation • To avoid the trivial solution of , TCNMF apply the sparse regularizer for – W-disjoint-orthogonality assumption [Yılmaz and Rickard, 2004] • at most one source is active in each time-frequency slot (suitable only for speech mixtures) – In music signals, multiple sound sources colide at the same frequency • Sparse regularizer in bring about degradation of sound quality • Approach in proposed method – To avoid the trivial solution of , regularize instead of regularizing • Diagonal elements are set to 1 • Off-diagonal elements are regularized to small value (i.e., 0.1~0.5) 15 1 1 1 1 Regularizing relative leakage levels of bleeding sound Diagonal elements are set to 1 Off-diagonal elements are small value
  • 16. • Apply TCNMF to (observed amplitude signal) as in the conventional method • Introduce an a priori generative model to both diagonal and off- diagonal elements of instead of regularizing – The proposed method can be interpreted as MAP estimation TCNMF based on MAP estimation 16 Time Source Time Frequency Mixing matrices (Frequency-wise) Time-source activation matrices (Frequency-wise) Observed amplitude spectrograms (Frequency-wise) Conventional TCNMF regularizes Proposed TCNMF regularizes Frequency
  • 17. ( is the scale parameter) Random variable Probabilistic density function • Introduce the following a priori generative model into – We can avoid by setting the shape parameter to TCNMF based on MAP estimation 17 Dirac’s delta distribution (restricting ) Gamma distribution: ( is the shape parameter) Diagonal elements Off-diagonal elements Trivial solution
  • 18. Optimization of proposed TCNMF • Cost function for MAP estimation – Estimate by minimizing the above cost function 18 Taking a negative logarithm Substitute prior distributions Equivalent Regularizer for diagonal elements Regularizer for off-diagonal elements Nonnegative prior : indicator function
  • 19. Optimization of proposed TCNMF • Minimization of equals the following problem – Use majorization-minimization (MM) algorithm [Hunter+, 2004] 19 Data Fidelity term Regularizer
  • 20. Optimization of proposed TCNMF • Update rules of – : element-wise multiplication – ー : element-wise division – : an matrix containing only ones – : vector that consists of the diagonal element of the argument – : matrix transpose 20
  • 21. • (i) The KL div. has a scale-dependent property • (ii) The diagonal elements of are restricted to be unity – The off-diagonal elements correspond to the relative leakage levels of bleeding sound • From (i) and (ii), – an observed gain of affects the balancing between fidelity term and regularizer Scale dependency problem of regularizer 21 is an arbitrary coefficient Fidelity term Regularizer
  • 22. • We also parameterize the observed gain as – The smaller , the stronger the regularizer • Normalize observed signal in advance – After the normalization, a dynamic range of becomes • Apply Wiener filtering to the complex-valued observed signal Scale dependency problem of regularizer 22 Wiener filter
  • 23. • Background – Blind source separation – Bleeding sound and it’s reduction problem • Conventional Methods – Nonnegative matrix factorization(NMF) – Independent low-rank matrix analysis (ILRMA) – Linear demixed domain multichannel NMF (DMNMF) – Time-channel NMF (TCNMF) • Proposed Method – Introduce an a priori generative model for relative leakage levels – Estimate parameters based on maximum a posteriori • Experiments Contents 23
  • 24. Ob. Cl. Pf. Tb. Conditions • Comparing blind bleeding-sound reduction performance – Compared methods • IVA,ILRMA,DMNMF,Conventional TCNMF – Evaluation criterion • Source-to-distortion ratio (SDR) [Vincent+, 2006] improvement • Dry sources – “songKitamura” [Kitamura+, 2015] – Create 4 observed signals from 4 instruments 24
  • 25. • Simulation of bleeding sound – Dry sources were mixed by using the frequency-wise nonnegative random mixing matrix – 10 observed mixtures were prepared using 10 different mixing matrix • Average SDR in observed signals – Ob. + Bleeding sound:18.8 [dB] – Cl. + Bleeding sound :15.0 [dB] – Pf. + Bleeding sound :14.7 [dB] – Tb. + Bleeding sound :8.6 [dB] – These are relatively high 1 1 1 1 Conditions 25 Diagonal elements are set to unity Off-diagonal elements are set to uniformly distributed random values in the range (0, 0.2) Source Time Frequency Dry sources (Complex spectrogram) Nonnegative mixing matrix Time Observed signals (Complex spectrogram) :Ob. :Cl. :Pf. :Tb. Frequency Frequency
  • 26. Conditions • Other conditions 26 Sampling frequency 44.1 kHz Window function Hamming window Window length 4096 points (approximately 92.9 ms) Shift length 2049 points (approximately 46.5 ms) Number of iterations 200 Initial value of Diagonal elements are set to unity Off-Diagonal elements are uniform random values in the range (0, 0.1) Initial value of Uniform random values in the range (0, 1) Shape parameter 1.25 Scale parameter 0.6 Observed gain parameter 0.006 Weight coefficient for regularizer 0.56 Number of bases 10,30,80
  • 28. Results 28 Ob. Cl. Pf. Tb. Observed Conventional TCNMF Proposed TCNMF • Demonstrations
  • 29. Conclusion • Purpose – Reducing the bleeding sound in music signals • SNR in observed signal is high • The microphones are apart from each other • Motivation – Phase-sensitive BSS failed to reduce bleeding sounds – DMNMF which is a phase-insensitive BSS method also failed • Due to the difficulty of maximum likelihood estimation with many parameters – TCNMF effective for reduction of bleeding sound • However, the regularization to the source-time matrix degrades the sound quality of the separated signal. • Proposed method – Introducing a priori distribution in the gain matrix of TCNMF • Based on the assumption that the volume of the bleeding sound is small • Proposed method outperformed conventional methods 29 Thank you for your attention.

Notas del editor

  1. Hi everyone , I’m Yusaku Mizobuchi / from National Institute of Technology, / Kagawa College, / Japan. I’m gonna talk about / Prior Distribution Design for Music Bleeding-Sound Reduction / Based on Nonnegative Matrix Factorization. それでは,非負値行列因子分解を用いた被り音の抑圧という題目で,香川高専の溝渕が発表いたします.
  2. This is the contents of today’s talk. コチラが概要です
  3. First, / I’m gonna talk about the research background. 始めに研究背景として,本研究で取り扱う問題について説明します.
  4. 1 Blind source separation is a technique (テクニーク) to extract audio sources from an observed multichannel mixture, where the mixing system, / A, / is unknown. 2 BSS estimates the demixing system, / W, / which is an inverse system of A, / and we get the separated signals. In this talk, we do not use the dataset of audio sources. Therefore, deep learning methods are out of the scope of BSS and will not be treated in this talk. 3 BSS can be applied in many situations / such as sound-reinforcement and high-quality audio recordings. 1 本研究では,音源分離問題について取り扱います.これは,人の声や楽器音等,複数の音源が混合した観測信号から,混合前の音源信号を分離・推定する技術です. とくに,ブラインド音源分離,通称BSSと呼ばれる技術課題を取り扱います. 2 BSSとは,音響信号を収録する際のマイクロホンの配置や音源の位置等の空間的な事前情報を用いることなく,観測信号のみから音源を推定する技術です. 中央の図は,左端の4つの楽器音が混合した状態で,4本のマイクにより録音されるイメージを表しています. 3 BSSでは,この4つの観測信号に対して,混合系Aや各音源の学習データ等を用いることなく分離系を推定し適用することで,混合前の楽器音を得ることを目標とします. 従って,近年盛んに研究されている深層学習を使う手法などは,BSSの対象外とし,本研究では扱いません. 4 音楽信号に対してBSSのような音源分離ができると,その後の処理として(コード進行推定やジャンル推定等の)音楽信号解析を適用したり,(楽譜を自動的に作る)自動採譜が実現できたりします. また,ライブ演奏では,音源分離を適用することで,質の高いミキシングやモニターを実現したり,音楽スタジオでのレコーディングの品質を上げたりすることが可能になります. # ブラインドの利点 ・マイクの位置情報等の面倒な設定が不要 # 音楽信号解析とは ・コード、テンポ、ジャンルの推定 約40秒
  5. In this research, we focus on a BSS problem of music recordings. In actual live performances or recordings, we often place microphones close to each of the sound sources like this picture. However, undesirable audio leakage from the non-target audio sources is also captured, which is often called “cross-talk” or “bleeding sound,” as shown in this Fig. The bleeding sound deteriorates (ディティリオレイツ) the quality of the live performance or music mastering. Therefore, sound engineers try to avoid bleeding sounds as much as possible when the recording. In this research, we address the problem of reducing the bleeding sound. このような音楽の収録に対する音源分離を考えた場合,実際のライブや録音の現場では,演奏者や楽器本体,あるいはアンプ等に対して,その音を収録する専用のマイクロホンを近接して置くことが多いです. 例えば右上の図のようなマイキングを行います. このとき,近接させた音源の信号を,そのマイクの「目的音」と呼びます. しかしながら,実際には録音は複数の楽器が演奏している環境で行われるため,近接させたマイクロホンには目的音だけではなく,それ以外の音源の信号も混ざってしまいます. これらの不要な音はクロストークや,被り音,bleeding sound,と呼ばれています. この被り音が入ってしまうと,例えばピアノに近接させたマイクロホンの信号にドラムの音などが入ってしまいますので,ピアノの音の編集の品質が下がる等の問題が生じてしまいます. 従って,サウンドエンジニアは極力被り音が入らないようなマイキングを行いますが,完全に被り音を防ぐことはほとんど不可能です. 約30秒
  6. Bleeding-sound reduction is similar to multichannel audio source separation, but some conditions are different. 1 The first point is that the observed SNR is relatively high because of the close miking setup. 2 Second, observed multichannel signal is already “labeled,” / namely, / the target source for each microphone is known / because each microphone is located close to each of sound sources. 3 Third, phase information of the observed signal is unreliable. This is because the microphones are apart from each other, / such as 1 or 2 meters. This setup causes a “spatial aliasing problem.” 4 Finally, to ensure the artistic value of music, a high-quality BSS is required. This is the background of the research. 2:40くらい そこで本研究では,観測信号のみを用いて,各マイクに混入している被り音を抑圧し,目的音の成分だけを残すという問題について取り組みます. この問題は多チャネルBSSとよく似ていますが,次に示す点で,被り音抑圧問題ならではの特徴があります. 1 まず初めに挙げられるのは観測信号のSN比が比較的高いという特徴です.これは各マイクを各目的音源に近接させていることに起因します. 2 次に,どのマイクをどの音源に近接させたかは事前に分かりますので,各マイクにおける目的音源は既知,という特徴もあります. 3 また,通常のマイキングでは,マイク間の距離が1mや2m等,空間的にかなり離れるため,観測信号の位相情報が信用できなくなる,という特徴もあります. 即ち,観測信号には空間エイリアシングが発生してしまいます. この特徴のせいで,位相情報を用いるBSSでは,基本的に被り音の抑圧に失敗してしまいます. 4 最後に,本研究では音楽信号を対象としますので,高品質なBSSが要求される,という特徴もあります. つまり,音楽の芸術的な価値を保持する必要があり,分離信号に歪みなどが残らないようにする必要があります. 以上が本研究の背景です. 約70秒 # 空間エイリアシング:位相が何回転したか分からなくなって,位相情報を使うような手法がうまく動かなくなる
  7. Next, / we explain some conventional methods. 従来手法をいくつか紹介します.
  8. First one is / nonnegative matrix factorization, / NMF. NMF is often used for modeling a spectrogram like this figure. X is an amplitude spectrogram, and we decompose X into the product of T and V. T is called basis matrix that includes spectral patterns (ペァタン), / and V is called activation matrix that includes time-varying gains of each pattern. Please note that / I, J, and K, are the numbers of frequencies, time frames, and bases, respectively. まずは,非負値行列因子分解,通称NMFについて紹介します. NMFは振幅やパワースペクトログラムをモデル化する手法であり,非負のスペクトログラムXを,別の2つの非負行列の行列積,(図ではT×V)で近似します. (これによって,Xの中に頻繁に出現する小数のスペクトルパターンと,それらの時間的な強度変化が推定できます.) (ポインタで指定しながら)図の左端が振幅スペクトログラムXです.ここでは,最初にある高さの音が鳴り,それが鳴りやまぬうちに,途中から別の高さの音が鳴っている振幅スペクトログラムです. このXをNMFでT×Vに分解すると,図のようなTとVが得られます.Tには,Xに含まれる2つの音のスペクトルパターンが現れ,Vには,それらがいつ生じたかという「楽譜のような情報」が現れます. TとVはマイナスの値を持たないように非負制約が課せられています. ここで,以後統一して,iは周波数,jは時間フレーム,kは基底数のインデクスを表します. 約40秒 次に,この振幅やパワースペクトログラムをモデル化する手法である,非負値行列因子分解,通称NMFについて紹介します. NMFでは,非負のスペクトログラムXを,別の2つの非負行列の行列積,図ではT×Vで近似します. これによって,Xの中に頻繁に出現する小数のスペクトルパターンと,それらの時間的な強度変化が推定できます. (ポインタで指定しながら)図の左端が振幅スペクトログラムXです.ここでは,最初にある高さの音が鳴り,それが鳴りやまぬうちに,途中から別の高さの音が鳴っている振幅スペクトログラムです. このXをNMFでT×Vに分解すると,図のようなTとVが得られます.Tには,Xに含まれる2つの音のスペクトルパターンが現れ,Vには,それらがいつ生じたかという「楽譜のような情報」が現れます. TとVはマイナスの値を持たないように非負制約が課せられています. ここで,以後統一して,iは周波数,jは時間フレーム,kは基底数のインデクスを表します.
  9. In NMF, we define a cost function as a divergence between X and TV, and minimize it to find the optimal T and V. There is no closed-form solution, but iterative update rules / were proposed. For example, these equations are the update rules when the cost function is squared Euclidian distance. このNMFにおけるTとVの推定方法は,非負制約の下での「XとTVの距離最小化」となります. ここで,用いる距離関数は数式ではDですが,これは任意です.本研究では,音響信号処理でよく使われる「KLダイバージェンス」を用います. この最適化問題は解析的には解けませんが,反復的に距離を小さくしていくアルゴリズムが提案されており,補助関数法と呼ばれています. 例えば,距離関数が2乗誤差の場合は,このような反復計算の更新式が得られます. スペクトルのピークに注目して
  10. As a well-known BSS method, Independent Low Rank Matrix Analysis, / ILRMA, / is often applied. ILRMA uses both amplitude and phase / of the observed signal / and estimates the complex-valued demixing matrix, Wi, in each frequency. ILRMA optimizes the demixing matrix / so that the estimated signals / have a low-rank time-frequency structure. However, in the bleeding sound reduction, spatial aliasing / occurs (アカーズ) in the observed signal, and ILRMA basically fails the separation. This will be confirmed in the experiments. ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- # 戻ってきたとき用 ここでスライド戻して)In ILRMA, the separation matrix of complex numbers is estimated for the observed complex spectrogram(ここで戻す). DMNMF estimates the non-negative real separation matrix Wi for the observed "power" spectrogram, X-squared. 次に,多チャネルBSSの代表手法として,独立低ランク行列分析,ILRMAについて紹介します. ILRMAは,観測信号の振幅と位相の両方を用いて,周波数毎の「複素」分離行列を推定します. コチラの図のように,観測信号の複素スペクトログラムXに対して,周波数毎の分離行列Wiを用意し,これをXに乗じた結果が分離信号になるように最適化します. また,分離信号のパワースペクトログラムはNMFでモデル化されますので,分離後の信号の時間周波数構造が低ランクになるように誘導しています. しかしながら,空間エイリアシングが生じる被り音抑圧問題においては,位相情報を必要とするILRMAを用いてもBSSは非常に困難です. これについては,実験でも確認します. ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー # スライド戻ってきたとき用 ここでスライド戻して)先ほどは複素数の信号に対して,複素数の分離行列を求めていましたが(ここで戻す),DMNMFは観測の「パワー」スペクトログラム,X二乗に対して, 非負実数の分離行列Wiを推定します. ※なんでILRMAやDMNMFでNMFするん NMFによる低ランク近似をしないとパーミュテーション問題が解けない
  11. For the observed signal with spatial aliasing, Linear dimixed domain multichannel NMF, DMNMF in short, was proposed. This method can be interpreted as a real-valued version of ILRMA. Namely, the input signal is not complex / but power spectrograms, and DMNMF estimates a real-valued demixing matrix Wi. Since DMNMF does not utilize the phase information ↑, it has a potential to separate the bleeding sound. 約40秒
  12. As another method without phase information, / time-channel NMF, TCNMF, was proposed. Since this method is closely related to the proposed method↑, we explain this method in detail. Whereas typical NMF decomposes time-frequency matrices↑, TCNMF decomposes / frequency-wise time-channel matrices in an amplitude domain / as shown in this figure. In this method, time-channel matrix X is decomposed into A and S. A is a nonnegative square matrix, which contains the channel-source components. This is called gain matrix / because A contains the volume coefficients / for each microphone of each source / in each frequency. The other matrix S / contains the source-time components, which ideally becomes the completely separated source signals. As in simple NMF, A and S can be estimated by minimizing KL divergence between X and AS as in this cost function. TCNMF also does not use the phase information at all / and has a potential to separate the bleeding sound / even if the spatial aliasing occurs. さらに,DMNMFの他にも位相情報を用いないBSSとして,時間チャネルNMF,TCNMFがあります. この手法は提案手法と関係が深いので、これについて詳しく説明します. TCNMFでは,観測信号の振幅スペクトログラムに対して,視点を変えて,周波数毎の時間チャネル行列を観測とみなします. つまり,真ん中の図の左端のように,チャネル毎の時間周波数行列ではなく,(ポインタでさしながら)周波数毎の時間チャネル行列にNMFを適用します. このとき,Ai×Siという行列で近似しますが,(指しながら)Aiはチャネル×音源の非負正方行列となります.これを,ゲイン行列と呼び,周波数毎の各マイクへの音量の係数が含まれます. また,行列Siには,音源×時間の成分が含まれ,理想的には完全に分離された音源の信号となります. AiとSiの推定は,NMFと同様であり,(指しながら)この距離関数のようにXiとAiSiのKLダイバージェンス最小化で行われます. 従って,TCNMFも位相情報を全く用いないことから,空間エイリアシングが生じる観測信号を分離できる可能性があります. 約45秒
  13. However, the NMF decomposition in TCNMF has a problem. Since a gain matrix A is square↑, this minimization has a trivial solution, namely, A is an identity matrix / and S equals X. To avoid this trivial solution, a sparse regularization was introduced. This L zero-point-five norm is applied to each time-frame vector of the time-source matrix S / as depicted in this figure. μ is a weight coefficient for this regularization. 従来手法だけで6:40 但し,TCNMFには問題があります.先ほどの行列分解において,ゲイン行列Aiが正方であり,低ランク近似にはなっていません. そのため,このモデルでは,ゲイン行列Aが単位行列のとき,観測信号の振幅スペクトログラムと音源時間行列Sの値が等しくなり,音源を分離できない無意味な解となってしまいます. このような自明解を避けるために,TCNMFでは,周波数ごとの音源時間行列Siに,時間フレーム毎のスパース正則化を導入しています. この正則化付きTCNMFの目的関数は中央の式になります.(指しながら) ここで,ミューはスパース正則化の重み係数です.正則化項は,右下の図のように,Siのある時間フレームの音源をまとめたベクトルsijに対してL0.5ノルムを取っています. 従って,ある時間周波数スロットでは,高々1つの音源がアクティブであるという,いわゆるダブルディスジョイント仮定を置いています. 約45秒
  14. Next, we explain our proposed method. さらに,提案手法の動機と詳細な説明を行います。
  15. This is the motivation of the proposed method. In TCNMF, / sparse regularization was introduced in S / to avoid the trivial solution of A. This regularization is based on the so-called W-disjoint assumption /, which is, / at most one source is active / in each time-frequency slot, / which is often valid for speech sources. However, since music signals have a hormonic structure↑, W-disjoint assumption does not hold and may lead to degradation of the sound quality of the separated signals. In the proposed method↑, instead of regularizing S↑, we propose to regularize A / to avoid the trivial solution. Specifically, / the diagonal elements of the gain matrix A are set to unity / and the off-diagonal elements are regularized to be small values. Since the off-diagonal elements represent the volume of the bleeding sound↑, this regularization controls relative loudness of the bleeding sounds, / that means / the relative leakage levels. めざせ1:10 まずは,提案手法の動機について紹介します. 先程説明したTCNMFでは,ゲイン行列Aiの自明解を避けるために,音源時間行列Siの音源方向にスパース正則化を導入していました. しかしながら,スパース正則化は,ダブルディスジョイント仮定が成立する前提で導入されています. これは,複数の音声が混合する観測信号では成り立つことが多いですが,音楽信号は基本的に同じ時間周波数で複数の音源が衝突し,ハーモニーとなりますので,このスパース正則化では,分離音の音質の劣化や分離精度の劣化を招いてしまう恐れがあります. そこで,今回提案する手法では,分離音の音質を損なわずに自明解を避けるために,音源時間行列Siのかわりに,ゲイン行列Aiを正則化することを考えます. 具体的には,ゲイン行列Aiの対角要素を1から動かないよう制約し,さらに非対角要素は比較的小さな値となるように正則化します. ゲイン行列Aiの非対角要素は,被り音の音量を表す成分ですので,この正則化は,周波数毎の被り音の相対的な音量,即ち相対漏れゲインを正則化していることになります. the gain matrix A the Time-source activation matrix S 約60秒
  16. In the proposed method, we apply TCNMF to reduce the bleeding sound. As explained in the previous slide↑, we regularize the gain matrix A by introducing an a priori distribution / to avoid the trivial solution. We optimize A and S in the MAP estimation framework with this prior model (マドー). ここからは,提案手法である最大事後確率推定TCNMF,通称MAP推定TCNMFについて説明します. 本手法では,従来のTCNMFと同様に,多チャネル観測信号の「振幅スペクトログラム」にTCNMFを適用します. そして,動機で説明したように,Aiの自明解を避けつつ,分離信号の音質を保つため,音源時間行列Siではなく,ゲイン行列Aiに事前分布を導入し,正則化を施します. この事前分布に基づいて,MAP推定の枠組みでAiとSiを最適化していきます. the gain matrix A the Time-source activation matrix S
  17. Let me talk about the a priori distribution for A. For the diagonal elements of A, we introduce Dirac's delta distribution to restrict them to be unity. For the off-diagonal elements, we introduce the gamma distribution, / this one. There are the shape parameter k and the scale parameter θ. This figure shows the p.d.f. of the gamma distribution / with various shape parameters. As you can see, we can avoid zero values at the off-diagonal elements, / which is a trivial solution like this, ↑ and the off-diagonal elements become positive small values, / resulting in a non-trivial solution. それでは,ゲイン行列Aiに導入する事前分布について,説明します. まず,対角要素には,ディラックのデルタ分布を導入し,必ず1になるような制約をかけます. 次に,非対角要素については,非負の確率変数を持つガンマ分布を導入します. ガンマ分布は,(さしながら)このような確率密度関数で表され,パラメータとして形状母数kと尺度母数θを持ちます. 右下に,形状母数kを1より大きくした場合のガンマ分布の密度関数を示しています. このように,kが1より大きいと,非対角要素の値は0より大きい数値となるため,自明解を回避でき,さらに0.2~0.5程度の値をなるべく持つように誘導することができます. (ガンマ分布を採用している理由はKLNMFの生成モデルであるポアソン分布の共役事前分布であるから.共役事前分布を用いると事前分布と事後分布が同じ形になってくれて嬉しい)
  18. Using this prior model (マドー)↑, we can define cost function in the MAP sense. From Bayes' theorem, the posterior distribution can be obtained as the product of likelihood and prior. By taking a negative logarithm↑, / we get the cost function J as shown here. In this cost function, the first term is equivalent to the KL divergence. Substituting the prior distributions of A and S leads to the regularizer as shown in the equation below. The term in red restricts the diagonal elements of A to be unity, and the term in blue is the regularizer for the non-diagonal elements of Ai. The proposed method estimates Ai and Si by minimizing above cost function. 先程示した事前分布に基づき,ゲイン行列Aiと音源行列SiをMAP推定で求めることを考えます. 事後分布は,ベイズの定理により,このように尤度と事前分布の積に比例します. この式の右辺の負対数を取ると,最小化をする目的関数Jがこのように得られます. 式中の第1項であるlog p Xiは,KLダイバージェンスと等価になります.(Xiの各成分がポアソン分布から生成されるという仮定を置くと) 第2項と第3項に,先ほどのAiとSiの事前分布をそれぞれ代入すると,下の式のように正則化項が導かれます. 赤色で書いた項は,Aiの対角要素が1になるという制約を与え,青色で書いた項は,Aiの非対角要素の正則化項になります. 提案手法では,このJを最小化するAiとSiを求めていきます.
  19. The minimization of the previous cost function is equal to this minimization problem, / which consists of the data fidelity term / and the regularizer corresponding to the gamma distribution prior. We can solve this problem using majorization-minimization algorithm. 先ほどの式の最小化は,上側の式の最小化問題と等価になります. 即ち,KLダイバージェンスからなるデータ近似項と,ゲイン行列Aiの非対角要素の事前分布に対応する正則化項の和の最小化となっています. この最小化問題を解くために,本研究ではNMFで良く用いられる補助関数法を適用します.
  20. The update rules of A and S of the proposed method are here. By iterating this calculation, the cost function is minimized. The convergence of this algorithms is theoretically guaranteed. 詳細は割愛しますが,先程の補助関数を最小化することで得られる更新式はスライドの通りになります. この更新式は行列形式で表されており,⊙は要素ごとの乗算,分数は要素ごとの除算を示しています. なお,1つめの式でAiの対角要素が更新されていますが,直後に対角要素を1で上書きすることで,効率的な実装となっています. この反復更新式は補助関数法に基づいていますので,目的関数の値が単調非増加であることが保証されています. 大丈夫。コスト関数の制約で対角は1にならんといかん。本来は要素ごとの更新式を更新するが、ここで示しているのは計算速度をあげるために要素にアクセスせずに行列全体を一気に更新している。理論的にはコスト関数の収束が保証されている。
  21. However, there is a slight problem with the proposed method. The KL divergence has a scale-dependent property shown in this equation. Where alpha is an arbitrary coefficient. Also↑, in the proposed method↑, the diagonal elements of A are restricted to be unity so that the off-diagonal elements correspond to the relative leakage levels of bleeding sound. Due to the above properties↑, the gain of the observed signal itself affects the balance between the data fidelity term and the regularizer of the cost function. 最後に,提案手法には少し問題があります. まず,KLダイバージェンスは,大きさが引数のスケールに依存して変動します. 上の式で表しているように,aとbの距離を測るときに,それぞれにα倍すると,距離自体がα倍されます. また,ゲイン行列Aiの対角成分を1に固定していることから,Aiの非対角要素は,被り音の「相対的な」漏れゲインに相当します. 上記の性質より,観測信号のゲインそのものが,目的関数のデータ近似項と正規化項のバランスに影響してしまいます. ※質問用 分離音の音量はSにあらわれている(Aを固定しているため)。 正則化項RはSの値に依存しないので大きさは変わらない。 しかし,音量が大きくなるとXとSともに大きくなるから①よりデータ近似項の値は大きくなってしまう。
  22. To solve this problem, we normalize the observed signal / and parameterize the observed gain. The normalization is performed by this equation, where the dynamic range of the signal becomes plus-minus alpha. Finally, we recover the complex spectrogram of the estimated signals by applying this Wiener filter, as in the conventional TCNMF. 提案手法だけで8:00 このスケール依存性の問題を回避するために,観測信号を正規化した上でゲインをパラメータ化します. 正規化の方法はこの式の通りです(ポインタで指しながら).観測信号を信号の最大値で割った後,αを乗じることで信号のダイナミックレンジを±αに制限しています. なので,重み係数(ゲインパラメータ)αの値が小さければ正則化項の影響が強くなり,大きければデータ近似項の影響が強くなります. 最後に,提案手法では,従来TCNMFと同様に,複素数の観測信号xに対してWienerフィルタを適用することで,複素数の推定信号yを復元します. (Wienerフィルタをかけ,逆STFTして得られた推定信号にv/αを乗じることでゲインの復元ができるよ) パラメータは2つでいいのでは? そのとおり。実験を行った際,3つのパラメータを変動させ確認したが.将来的には2つに落とせそう
  23. OK, let’s move on to the experiments. 実験により各手法の性能を評価します.
  24. To evaluate the performance of the proposed method, we conducted an experiment of blind bleeding-sound reduction. We compared five methods, / independent vector analysis (IVA) [9], ILRMA [11], DMNMF [15], conventional TCNMF [12], and proposed TCNMF. The observed music mixture was simulated using songKitamura, which is an artificial music dataset. We chose four musical instruments, clarinet, oboe, piano, and trombone, as dry sources and prepared a four-channel and four-source observed signals. そして,性能は信号対歪み比SDRの改善量を用いて評価しました. SDR改善量は観測信号のSDRからの改善量で求めます. また,音源信号には,下に示す楽譜をみでぃ音源で鳴らしたデータセットである“songKitamura”を使いました. 今回はOb.,Cl.,Pf.,Tb.の4楽器を用いました. ※被り音だから分離前でもSDR高め Ob.:オーボエ Cl.:クラリネット Pf:ピアノ Tb:トロンボーン
  25. In this experiment, to simulate the bleeding sound, we mixed these instrumental sounds Si using the frequency-wise nonnegative random mixing matrix A, where the diagonal elements are set to unity, / and the off-diagonal elements are set to uniformly distributed random values. Ten observed mixtures were prepared using different pseudo-random seeds, namely, ten different mixing matrices Ai. The average SDRs over the ten observed mixture signals of each instrument are shown here, which are relatively high because of the close miking setup. We calculated the improvements from these input SDRs for each source to evaluate the performance of each method. 今回の実験では,被り音を含む観測信号を模擬するために,先ほどの音源信号に周波数毎の非負実数の乱数ゲイン行列を乗じました. 中央の図に示すように,各楽器音を(右下の乱数行列をさしながら)このような周波数毎の乱数行列で混ぜ合わせて,観測信号を生成しています. 乱数行列は対角要素が1で,非対角要素が0~0.2の一様乱数です. この観測信号を,乱数を変えて10個用意し,平均性能を比較します. なお,観測信号のSN比は観測時点で高く,平均SDRは,(Ob.のマイクで18.8dB,Cl.で15dB,ピアノで14.7dB,トロンボーンで8.6dB)でした. このSDR値からの改善量を比較します.
  26. The other conditions are shown in this table. The parameters of the gamma distribution in the proposed method are set to these values. In this case, the gamma distribution looks like this (click). In addition, the sparse regularization coefficient, mu, for the conventional TCNMF was set to this value. These parameters recorded the best performance in this experiment. その他の条件はこの表のとおりです. 提案手法のガンマ分布のパラメータは,形状母数を1.25,尺度母数を0.6としました.これは(クリック)このような形状のガンマ分布となります. また,従来TCNMFのスパース正則化重み係数ミューは0.56としました. いずれも,パラメータを総当たり的に変動させ,最高性能を記録したパラメータとなっています. 説明するかどうかは時間次第 ※大事なやつだけ説明
  27. This is the result of the experiment. For each method, we showed average SDR improvements. Phase-sensitive BSS such as IVA and ILRMA couldn’t reduce the bleeding sounds because of the spatial aliasing. DMNMF is a phase-insensitive BSS, but it also failed to improve SDR. %This is due to the difficulty of optimization caused by the large number of parameters. For both the conventional and proposed TCNMFs, we can confirm that the average SDR improvements exceed zero dB. In particular, the proposed TCNMF outperformed the conventional TCNMF by more than two point five dB. This result shows the efficacy of the proposed method. コチラが実験結果です.平均SDR改善量を手法毎に示しています. 棒グラフの違いは手法の違い,縦軸が平均SDR改善量であり,高いほど良い結果を示します. (クリック)左側のIVAとILRMAは位相に依存するBSSであり,複素数の分離行列を求める手法です. 今回は観測信号の位相が全く役に立たない観測信号のため,これらの手法は分離が全くできませんでした. (クリック)右側のDMNMFとTCNMFは位相を用いないBSSです. しかしながら,DMNMFは観測信号以上のSDRを達成することはできませんした. これはパラメータが多いことによる最適化の困難さが原因と予想されます. 一方,TCNMFは提案手法が従来手法を大きく上回り,音源時間行列Siではなくゲイン行列Aiに正則化を掛けたことの効果が現れたと思われます.
  28. Let me demonstrate the result. I will play trombone signals in the order of observed signal, conventional TCNMF, and proposed TCNMF. 次にデモンストレーションを行います.実験で得られたトロンボーンの音源について観測信号,従来TCNMF,提案TCNMFの順番で再生します.
  29. This is a conclusion. That’s all. Thank you for your attention.
  30. For the activation matrix S , we do not assume explicit structure, but only the nonnegativity prior is used. This means that the amplitude spectrogram of the separated signal is constrained to be non-negative, and it is one-sided uniformly distribution with the probability of negative values set to zero. 一方,音源時間行列Siについては,非負性だけを保証するように事前分布を定めます. これは,分離信号の振幅スペクトログラムを非負に制約することに対応し,負の値の確率を0とした片側一様分布となっています.
  31. Aimn is assumed to be mutually independent w.r.t. i, m, and n Thus, the prior distribution of Ai becomes this equation. It is the product of the prior distribution of the diagonal elements and the prior distribution of the off-diagonal elements. By making the same assumption for the prior distribution of Si as for Ai, it becomes this equation. ゲイン行列Aiの要素が,周波数とチャネルと音源の全てに関して互いに独立と仮定すると,行列Aiの事前分布は真ん中の式のようになります. これは,対角要素の事前分布と非対角要素の事前分布の積となっています. また,音源時間行列Siの事前分布も同様に要素毎の独立性を仮定することで,下の式のようにあらわすことができます.
  32. In the equation, the first term hinders derivation of a stationary point with regards to the variable. Using Jensen’s inequality, we design the majorization function of the fidelity term as here. Then, minimize this function instead of the cost function J. Where くしー is an auxiliary variable that satisfies this equation. まず,KLダイバージェンスの中の赤色で示したlog sum項は,そのまま変数で偏微分しても解くことができません. そこで,良くあるテクニックですが,いぇんぜんの不等式を用いて,log sum項の上限関数に置き換え,KLダイバージェンスの補助関数をこの式のように設計し,目的関数の代わりにこちらを最小化します. ここで,くしー(くさい)はこの条件を満たす正の補助変数です.(条件式を指しながら) なお,正則化項の方は,そのまま偏微分して反復最適化アルゴリズムを導出することができます.