SlideShare una empresa de Scribd logo
1 de 46
13th Asia Pacific Signal and Information Processing Association
Annual Summit and Conference (APSIPA ASC 2021)
Overview Session OS-1: Acoustic Signal Processing
Blind Audio Source Separation Based
on Time-Frequency Structure Models
Daichi Kitamura
National Institute of Technology, Kagawa College
Japan
2
• Daichi Kitamura
• National Institute of Technology, Kagawa College
• Research interests
– Audio source separation
– Array signal processing
– Machine learning
– Music signal processing
– Biosignal processing
Self introduction
3
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
4
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
5
• Blind source separation (BSS) for audio signals
– estimates specific audio sources in the observed mixture
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
• The word “blind” means “unsupervised”.
– is available for many audio applications
• Hearing aid systems
• Automatic speech recognition (ASR)
• Preprocessing for music analysis etc.
Background: BSS for audio signals
Observed mixture
BSS
Estimated source signals
6
Background: BSS for audio signals
• Music BSS using ILRMA
Guitar
Vocal
Keyboard
Guitar
Vocal
Keyboard
BSS
Please pay attention to listen
three parts in the mixture.
MATLAB code: https://github.com/d-kitamura/ILRMA
Python code: Implemented in “Pyroomacoustics” library
7
• Numbers of mics and sources
• Consider only “determined” situation
– # of mics # of sources
– BSS estimates “demixing system” (inverse of mixing)
Background: BSS for audio signals
Source signals Observed signals Estimated signals
Mixing system Demixing system
Monaural rec.
1ch
Single-channel signal Mic array
1ch
Mch
Multichannel signal
2ch
…
…
8
Spectral subtraction
Time-frequency masking
Many other methods
Beamforming
Sparse coding
Time-frequency masking
DOA clustering
Many other methods
Historical overview (only the methods related in this talk)
1994
1998
2013
1999
2012
Permutation solvers
Extension of models
Generative models
Frequency-domain ICA
Itakura-Saito NMF
IVA
2016
2009
2006
2011 AuxIVA
Time-varying IVA
Multichannel NMF
2018 IDLMA
Single-channel
Spatial covariance model
Spatial covariance
model+DNN
Supervised approaches
based on deep neural
networks (DNN)
ICA
[Comon], [Bell and Sejnowski],
[Cardoso], [Amari], [Cichocki], …
[Smaragdis]
[Saruwatari], [Murata],
[Morgan], [Sawada], …
[Hiroe], [Kim]
[Ono]
[Ono]
[Kitamura]
[Nugraha]
[Ozerov, Sawada]
[Duong]
[Févotte]
[Lee]
[Virtanen], [Smaragdis],
[Kameoka], [Ozerov], …
2010
Underdetermined
Determined
[Yatabe&Kitamura]
2021
Time-freq.-masking-
based BSS (TFMBSS)
[Mogami]
NMF
ILRMA
Gray-colored methods
are “supervised”
(not fully blind)
9
Motivation of determined BSS
• Conventional BSS: IVA, AuxIVA, and ILRMA
– Minimum distortion (linear demixing)
– Relatively fast and stable optimization
• Iterative projection (AuxIVA) [Ono+, 2010], [Ono, 2011]
– Time-frequency (TF) structure model affects performance
• IVA: co-occurrence along frequency axis
• ILRMA: NMF-based low-rank time-frequency structure
– Optimization algorithm depends on the TF model
• Difficult to derive update rules
• Easily replace TF model and search the best one
– Time-frequency-masking-based BSS (TFMBSS)
: frequency bins
Observed
signal
Source signals
Frequency-wise mixing matrix
: time frames
Estimated
signal
Frequency-wise demixing matrix
[Yatabe & Kitamura, 2021]
10
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
11
Independence-based BSS in time domain
• Independent component analysis (ICA) [Comon, 1994]
– If we assume
– then we can estimate demixing matrix
• by maximizing independence between the estimates ( and )
Mixing matrix
Sources
(latent components)
1. Mutually
independent
2. Non-Gaussian
3. Invertible and
time-invariant
Mixtures
(observed signals)
Inverse matrix
12
• Independent component analysis (ICA) [Comon, 1994]
– Maximizes independence between source distributions
– Optimization problem in ICA
Independence-based BSS in time domain
Minimize
similarity
: Non-Gaussian source distribution
(e.g., Laplace distribution)
...
13
Independence-based BSS in time domain
• Independent component analysis (ICA) [Comon, 1994]
– However,
• 1. Signal scales (volumes) cannot be determined
• 2. Signal permutation cannot be determined
Sources
(latent components)
Mixtures
(observed signals)
Sources
(latent components)
Mixtures
(observed signals)
Separated signals
(estimated by ICA)
Separated signals
(estimated by ICA)
14
• General audio mixture
– Convolution with room reverberation
• To deconvolute (separate) them,
– apply short-time Fourier transform (STFT) and convert
signals to TF domain
– estimate frequency-wise demixing matrix
Independence-based BSS in frequency domain
Mixture without reverb.
Mixture with reverb.
Convolutive mixture in time domain
Mixture in TF domain
: freq. index
: time index
Reverb. length
15
• Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– applies ICA to each of frequencies separately
– estimates frequency-wise demixing matrix
Inverse matrix
Frequency-wise
mixing matrix
Frequency-wise
demixing matrix
FDICA
: freq. index
: time index
16
• Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– Optimization problem in FDICA
– By assuming circularly symmetric complex Laplace dist.,
– the minimization problem in FDICA becomes as
• separable w.r.t. frequency
FDICA
: Non-Gaussian complex-valued source distribution
(e.g., circularly symmetric complex Laplace distribution)
...
17
• Permutation problem in FDICA
– Order of separated signals is messed up
– Alignment along the frequency
*Signal scales are also messed up, but they can be easily fixed by applying projection back technique.
ICA
In all frequency
Source 1
Source 2
Mixture 1
Mixture 2
Permutation
Solver
Separated signal 1
Separated signal 2
Time
Permutation problem
18
Popular permutation solvers
• Signal correlation between frequencies
– FDICA + correlation-based clustering [Murata+, 2001], [Sawada+, 2011]
• Direction of arrival of each source (DOA)
– FDICA + DOA-based alignment [Saruwatari+, 2006]
• Co-occurrence among frequencies of each source
– Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006] , [Kim, 2007]
• Low-rank TF modeling of each source
– Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016]
• DNN-based supervised TF modeling of each source
– Independent deeply learned matrix analysis (IDLMA) [Makishima+, 2019]
• DNN-based permutation solver
– Generalized permutation solver with training [Yamaji&Kitamura, 2020]
• Spectrogram consistency
– Consistent IVA and consistent ILRMA [Yatabe, 2020], [Kitamura+, 2020]
19
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006]
– utilizes sourcewise frequency vector as a random variable
– Vector source model in IVA
• Spherical property of groups
components that have co-occurrence
of all frequencies as one source
IVA
Permutation-problem-free estimation
of can be achieved!
…
…
Mixing matrix
…
…
…
Observed vector
Demixing matrix
Estimated vector
Multivariate
distribution
Have internal
correlations
Source vector
Frequency
Time
Co-occurrence of all
frequencies in each source
20
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006]
– How much valid is IVA’s TF structure model?
• Typical audio sources have co-occurrence of all frequencies
• Can be interpreted as “group sparsity” in TF domain
IVA
Speech source
(conversation)
Vocal source
(pop music)
21
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006]
– Optimization problem in IVA
– By assuming spherical Laplace dist., [Hiroe, 2006], [Kim, 2006]
– the minimization problem in IVA becomes as follows
IVA
: Non-Gaussian multivariate and spherical complex-
valued source distribution
(e.g., spherical Laplace distribution)
22
• Auxiliary-function-based IVA (AuxIVA)[Ono, 2011]
– Fast and stable optimization called iterative projection (IP)
• Auxiliary function technique (or majorization-minimization algorithm)
– Convergence-guaranteed
fast and stable optimization
without stepsize parameters
Efficient optimization for IVA
Update of auxiliary variables Update of original variables
https://pyroomacoustics.readthedocs.io/en/pypi-
release/pyroomacoustics.bss.auxiva.html
Python code: Implemented in “Pyroomacoustics” library
23
Frequency
Time
TF
structure
in IVA
Frequency
Time
Frequency-uniform vector
Time activation
Frequency
Basis
Basis
Time
# of bases can arbitrarily be set
To represent more complicated TF structure,
NMF modeling can be introduced, resulting in
independent low-rank matrix analysis (ILRMA)
Extension of TF structure assumed in IVA
Frequency
Time
TF
structure
in ILRMA
24
ILRMA
• Independent low-rank matrix analysis (ILRMA)
– assumes each source has a low-rank TF structure
– is a unification of
• independence-based estimation of demixing matrix (FDICA or IVA)
• low-rank TF modeling of each source (NMF)
– avoids encountering the permutation problem
• TF structure is introduced as well as IVA
[Kitamura+,
2016]
Observed signal
Frequency-wise
demixing matrix
Estimated signal
Time
Frequency
Frequency
Time
Update demixing matrix so that estimated signals
are 1. mutually independent (ICA)
2. have low-rank TF structures (NMF)
STFT
Low-rank approximation by NMF
Low rank Low rank
Not low rank
25
• Independent low-rank matrix analysis (ILRMA)
– Optimization problem in ILRMA
– Convergence-guaranteed
update rules
• NMF’s multiplicative update
• AuxIVA (IP)
ILRMA
[Kitamura+,
2016]
Cost function in FDICA or IVA
Estimates frequency-wise
demixing matrix
Cost function in NMF
Estimates low-rank TF structure
of each source
MATLAB code: https://github.com/d-kitamura/ILRMA
Python code: Implemented in “Pyroomacoustics” library
26
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
27
Reformulation of BSS
• Cost functions of independence-based BSS
– FDICA w/ Laplace
– IVA w/ spherical Laplace
– ILRMA w/ Itakura-Saito NMF
28
Reformulation of BSS
• All of them are coming from ICA’s cost
• Source generative model
– corresponds to TF structure model for each source
– is necessary for avoiding the permutation problem
• Better assumption of TF structures
– provides better BSS performance
Freq.
Time
Low-rank
Freq.
Time
Sparse
Freq.
Time
Group-sparse
and more
29
Reformulation of BSS
• Derivation of optimization algorithm
– is problem dependent (depends on TF structure model)
– requires technical knowledges and math skills
• To try various TF structures in plug-and-play manner,
– let’s reformulate BSS problems in a more general form
– then solve it using a TF-structure-independent algorithm
BSS
algorithm
Sparse
Low-rank
Plug and play
Group-sparse
30
Reformulation of BSS
• Generalized optimization problem [Yatabe&Kitamura, 2018]
–
• TF structure model for each source
• Often called “source model” in the context of BSS
• Replace this function with a plug-and-play manner
–
• Coming from an ICA theory (Jacobian between and )
• Interpreted as “barrier function” avoiding to be rank-deficient of
31
Reformulation of BSS
• Generalized optimization problem [Yatabe&Kitamura, 2018]
– FDICA w/ Laplace (L1 sparse regularizer)
– IVA w/ spherical Laplace (L2,1 group-sparse regularizer)
– ILRMA w/ Itakura-Saito NMF (low-rank approximation)
Freq. vector
32
Reformulation of BSS
• Generalized optimization problem [Yatabe&Kitamura, 2018]
– But, how?
• Apply convex optimization technique
– Primal-dual splitting method
– Proximity operator
• If is “proximable”, then we obtain optimization algorithm!
If we change the TF structure model ,
its optimization algorithm can easily be obtained!
Objective
[Condat, 2013], [Vu, 2013], [Komodakis+,
2015]
33
Primal-dual splitting method
• Primal-dual splitting method [Condat, 2013], [Vu, 2013],
– considers following problem
– Iterative optimization algorithm
– Proximity operator
• If a proximity operator of can easily be calculated,
is called “proximable”
[Komodakis+,
2015]
Step size parameters
and : proper lower-semicontinuous
convex function
34
BSS using Primal-dual splitting method
• Convert BSS to primal-dual-splitting-applicable form
– Vectorization of demixing matrices
– Matrixization
th singular value of
...
...
Mat to vec Collect all freqs.
...
35
BSS using Primal-dual splitting method
• Convert BSS to primal-dual-splitting-applicable form
Introduce vectorized notation
( is a reshaped matrix that includes )
Ready to apply
primal-dual splitting!
C.f. problem for primal-dual splitting
36
BSS using Primal-dual splitting method
• General BSS algorithm using primal-dual splitting
– Function is always proximable [Yatabe&Kitamura, 2018]
Singular value decomposition
37
BSS using Primal-dual splitting method
• General BSS algorithm using primal-dual splitting
– L2,1 Group sparse BSS (IVA)
– Nuclear-norm-based low-rank BSS (ILRMA?)
Nuclear norm (sum of singular values)
38
BSS using Primal-dual splitting method
• Multiple TF structures can also be utilized
– L2,1 group-sparse + L1 sparse BSS (sparse IVA)
– Low-rank + L1 sparse BSS (sparse ILRMA?)
Proximable Proximable
Proximable Proximable
If TF structure models are proximable,
you can use them in a plug-and-play manner!
Advantage of proposed BSS
39
BSS using Primal-dual splitting method
• Experiment of two-speech-source BSS
– Compare improvement of source-to-distortion ratio (SDR)
Mixture A Mixture B
Group-sparse
Group-sparse + sparse
Low-rank + sparse
Low-rank Group-sparse
Group-sparse + sparse
Low-rank + sparse
Low-rank
40
Interpretation of TF masking
• Proximity operators of many sparsity-inducing
functions are obtained as thresholding operators
– L1 norm:
– L2,1 norm:
– They have the same form: TF masking to the variable
Proximity operator TF mask (0~1 values)
determined by TF structure model
Variable in
TF shape
Elementwise product
41
TMFBSS
• Time-frequency-masking-based BSS (TFMBSS)
– Skip designing TF structure model function
– TF mask of intended TF structure is employed in the
optimization algorithm
[Yatabe&Kitamura, 2021]
1. Design intended TF structure model
2. Calculate proximal operator
3. Optimize the problem
BSS based on primal-dual
splitting method TFMBSS
???
1. ―
2. Design intended TF mask
3. Optimize the problem
[Yatabe&Kitamura, 2019]
42
TMFBSS
• Time-frequency-masking-based BSS (TFMBSS)
– Intended TF structure model is input to TFMBSS as a TF
mask
– Demixing matrix is optimized so that the estimated signals
have the intended TF structures
– Iterative update of TF masks are also interesting
Mixture
Frequency-wise
demixing matrix
Time
Frequency
Frequency
Time
Update demixing matrix so that the estimated signals
have TF structures enhanced by the input TF masks
STFT
Enhancement by TF masking
Time
Frequency
Frequency
Time
Time
Frequency
Frequency
Time
Estimates
[Yatabe&Kitamura, 2021]
[Yatabe&Kitamura, 2019]
43
Application of TMFBSS
• HPSS-based TFMBSS [Oyabu&Kitamura, 2021]
– utilizes TF mask that is obtained via harmonic-
percussive sound separation (HPSS) in TFMBSS
44
• HPSS-based TFMBSS [Oyabu&Kitamura, 2021]
Mixture
Optimization-
based HPSS
[Ono+, 2008]
Median-based
HPSS
[FitzGerald, 2010]
Optimization-
based HPSS
+
TFMBSS
Median-
based HPSS
+
TFMBSS
Application of TMFBSS
Linear, multichannel
Estimated percussive sound
Estimated harmonic sound
Nonlinear, single-channel
45
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
46
Application of TMFBSS
• Audio BSS with TF structure model
– TF structure model is necessary for avoiding the
permutation problem
• Conventional algorithms (IVA, ILRMA, and so on)
– Which TF structure is the best? Try and error
– The optimization algorithm is problem-dependent
• Changing TF structure model requires derivation of the algorithm
• Proposed generalized BSS using primal-dual splitting
– Easy to replace TF structure model
• (if the function is “proximable”)
– Easy to search the best TF structure for each BSS problem
• TFMBSS
– Explicitly define TF structure as TF masking

Más contenido relacionado

La actualidad más candente

半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法Daichi Kitamura
 
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元NU_I_TODALAB
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Daichi Kitamura
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
 
独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展Kitamura Laboratory
 
ウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタToshihisa Tanaka
 
Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...
Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...
Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...Shunsuke Ono
 
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...Daichi Kitamura
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧Kitamura Laboratory
 
喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法NU_I_TODALAB
 
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)Daichi Kitamura
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Shinnosuke Takamichi
 
実環境音響信号処理における収音技術
実環境音響信号処理における収音技術実環境音響信号処理における収音技術
実環境音響信号処理における収音技術Yuma Koizumi
 
Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Ha Phuong
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audioDeep Learning JP
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討Shinnosuke Takamichi
 
畳み込みネットワークによる高次元信号復元と異分野融合への展開
畳み込みネットワークによる高次元信号復元と異分野融合への展開 畳み込みネットワークによる高次元信号復元と異分野融合への展開
畳み込みネットワークによる高次元信号復元と異分野融合への展開 Shogo Muramatsu
 
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...Daichi Kitamura
 

La actualidad más candente (20)

半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
 
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 
独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展
 
ウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタ
 
Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...
Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...
Ph.D. Thesis Presentation: A Study of Priors and Algorithms for Signal Recove...
 
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧
 
喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法
 
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 
Saito2103slp
Saito2103slpSaito2103slp
Saito2103slp
 
実環境音響信号処理における収音技術
実環境音響信号処理における収音技術実環境音響信号処理における収音技術
実環境音響信号処理における収音技術
 
Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
 
畳み込みネットワークによる高次元信号復元と異分野融合への展開
畳み込みネットワークによる高次元信号復元と異分野融合への展開 畳み込みネットワークによる高次元信号復元と異分野融合への展開
畳み込みネットワークによる高次元信号復元と異分野融合への展開
 
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
 

Similar a Blind audio source separation based on time-frequency structure models

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Kitamura Laboratory
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceDaichi Kitamura
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
 
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Kitamura Laboratory
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...Kitamura Laboratory
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...Kitamura Laboratory
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Kitamura Laboratory
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Final presentation
Final presentationFinal presentation
Final presentationRohan Lad
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to MusicEric Battenberg
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
 
Geometrically Constrained Independent Vector Analysis
Geometrically Constrained Independent Vector AnalysisGeometrically Constrained Independent Vector Analysis
Geometrically Constrained Independent Vector AnalysisAffan Khan
 

Similar a Blind audio source separation based on time-frequency structure models (20)

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to Music
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
Geometrically Constrained Independent Vector Analysis
Geometrically Constrained Independent Vector AnalysisGeometrically Constrained Independent Vector Analysis
Geometrically Constrained Independent Vector Analysis
 

Más de Kitamura Laboratory

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定Kitamura Laboratory
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定Kitamura Laboratory
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムKitamura Laboratory
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離Kitamura Laboratory
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法Kitamura Laboratory
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価Kitamura Laboratory
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討Kitamura Laboratory
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,Kitamura Laboratory
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討Kitamura Laboratory
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測Kitamura Laboratory
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析Kitamura Laboratory
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離Kitamura Laboratory
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離Kitamura Laboratory
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測Kitamura Laboratory
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化Kitamura Laboratory
 
独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システムKitamura Laboratory
 
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価Kitamura Laboratory
 
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用Kitamura Laboratory
 
スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析Kitamura Laboratory
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用Kitamura Laboratory
 

Más de Kitamura Laboratory (20)

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズム
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
 
独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム
 
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
 
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
 
スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
 

Último

multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 

Último (20)

multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 

Blind audio source separation based on time-frequency structure models

  • 1. 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021) Overview Session OS-1: Acoustic Signal Processing Blind Audio Source Separation Based on Time-Frequency Structure Models Daichi Kitamura National Institute of Technology, Kagawa College Japan
  • 2. 2 • Daichi Kitamura • National Institute of Technology, Kagawa College • Research interests – Audio source separation – Array signal processing – Machine learning – Music signal processing – Biosignal processing Self introduction
  • 3. 3 Contents • Background – Blind source separation (BSS) for audio signals and its history – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Time-frequency-masking-based BSS (TFMBSS) – Reformulation of BSS problems and its optimization – BSS based on primal-dual splitting method – Interpretation of TF masking and application • Conclusion
  • 4. 4 Contents • Background – Blind source separation (BSS) for audio signals and its history – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Time-frequency-masking-based BSS (TFMBSS) – Reformulation of BSS problems and its optimization – BSS based on primal-dual splitting method – Interpretation of TF masking and application • Conclusion
  • 5. 5 • Blind source separation (BSS) for audio signals – estimates specific audio sources in the observed mixture – does not require prior information of recording conditions • locations of mics and sources, room geometry, timbres, etc. • The word “blind” means “unsupervised”. – is available for many audio applications • Hearing aid systems • Automatic speech recognition (ASR) • Preprocessing for music analysis etc. Background: BSS for audio signals Observed mixture BSS Estimated source signals
  • 6. 6 Background: BSS for audio signals • Music BSS using ILRMA Guitar Vocal Keyboard Guitar Vocal Keyboard BSS Please pay attention to listen three parts in the mixture. MATLAB code: https://github.com/d-kitamura/ILRMA Python code: Implemented in “Pyroomacoustics” library
  • 7. 7 • Numbers of mics and sources • Consider only “determined” situation – # of mics # of sources – BSS estimates “demixing system” (inverse of mixing) Background: BSS for audio signals Source signals Observed signals Estimated signals Mixing system Demixing system Monaural rec. 1ch Single-channel signal Mic array 1ch Mch Multichannel signal 2ch … …
  • 8. 8 Spectral subtraction Time-frequency masking Many other methods Beamforming Sparse coding Time-frequency masking DOA clustering Many other methods Historical overview (only the methods related in this talk) 1994 1998 2013 1999 2012 Permutation solvers Extension of models Generative models Frequency-domain ICA Itakura-Saito NMF IVA 2016 2009 2006 2011 AuxIVA Time-varying IVA Multichannel NMF 2018 IDLMA Single-channel Spatial covariance model Spatial covariance model+DNN Supervised approaches based on deep neural networks (DNN) ICA [Comon], [Bell and Sejnowski], [Cardoso], [Amari], [Cichocki], … [Smaragdis] [Saruwatari], [Murata], [Morgan], [Sawada], … [Hiroe], [Kim] [Ono] [Ono] [Kitamura] [Nugraha] [Ozerov, Sawada] [Duong] [Févotte] [Lee] [Virtanen], [Smaragdis], [Kameoka], [Ozerov], … 2010 Underdetermined Determined [Yatabe&Kitamura] 2021 Time-freq.-masking- based BSS (TFMBSS) [Mogami] NMF ILRMA Gray-colored methods are “supervised” (not fully blind)
  • 9. 9 Motivation of determined BSS • Conventional BSS: IVA, AuxIVA, and ILRMA – Minimum distortion (linear demixing) – Relatively fast and stable optimization • Iterative projection (AuxIVA) [Ono+, 2010], [Ono, 2011] – Time-frequency (TF) structure model affects performance • IVA: co-occurrence along frequency axis • ILRMA: NMF-based low-rank time-frequency structure – Optimization algorithm depends on the TF model • Difficult to derive update rules • Easily replace TF model and search the best one – Time-frequency-masking-based BSS (TFMBSS) : frequency bins Observed signal Source signals Frequency-wise mixing matrix : time frames Estimated signal Frequency-wise demixing matrix [Yatabe & Kitamura, 2021]
  • 10. 10 Contents • Background – Blind source separation (BSS) for audio signals and its history – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Time-frequency-masking-based BSS (TFMBSS) – Reformulation of BSS problems and its optimization – BSS based on primal-dual splitting method – Interpretation of TF masking and application • Conclusion
  • 11. 11 Independence-based BSS in time domain • Independent component analysis (ICA) [Comon, 1994] – If we assume – then we can estimate demixing matrix • by maximizing independence between the estimates ( and ) Mixing matrix Sources (latent components) 1. Mutually independent 2. Non-Gaussian 3. Invertible and time-invariant Mixtures (observed signals) Inverse matrix
  • 12. 12 • Independent component analysis (ICA) [Comon, 1994] – Maximizes independence between source distributions – Optimization problem in ICA Independence-based BSS in time domain Minimize similarity : Non-Gaussian source distribution (e.g., Laplace distribution) ...
  • 13. 13 Independence-based BSS in time domain • Independent component analysis (ICA) [Comon, 1994] – However, • 1. Signal scales (volumes) cannot be determined • 2. Signal permutation cannot be determined Sources (latent components) Mixtures (observed signals) Sources (latent components) Mixtures (observed signals) Separated signals (estimated by ICA) Separated signals (estimated by ICA)
  • 14. 14 • General audio mixture – Convolution with room reverberation • To deconvolute (separate) them, – apply short-time Fourier transform (STFT) and convert signals to TF domain – estimate frequency-wise demixing matrix Independence-based BSS in frequency domain Mixture without reverb. Mixture with reverb. Convolutive mixture in time domain Mixture in TF domain : freq. index : time index Reverb. length
  • 15. 15 • Frequency-domain ICA (FDICA) [Smaragdis, 1998] – applies ICA to each of frequencies separately – estimates frequency-wise demixing matrix Inverse matrix Frequency-wise mixing matrix Frequency-wise demixing matrix FDICA : freq. index : time index
  • 16. 16 • Frequency-domain ICA (FDICA) [Smaragdis, 1998] – Optimization problem in FDICA – By assuming circularly symmetric complex Laplace dist., – the minimization problem in FDICA becomes as • separable w.r.t. frequency FDICA : Non-Gaussian complex-valued source distribution (e.g., circularly symmetric complex Laplace distribution) ...
  • 17. 17 • Permutation problem in FDICA – Order of separated signals is messed up – Alignment along the frequency *Signal scales are also messed up, but they can be easily fixed by applying projection back technique. ICA In all frequency Source 1 Source 2 Mixture 1 Mixture 2 Permutation Solver Separated signal 1 Separated signal 2 Time Permutation problem
  • 18. 18 Popular permutation solvers • Signal correlation between frequencies – FDICA + correlation-based clustering [Murata+, 2001], [Sawada+, 2011] • Direction of arrival of each source (DOA) – FDICA + DOA-based alignment [Saruwatari+, 2006] • Co-occurrence among frequencies of each source – Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006] , [Kim, 2007] • Low-rank TF modeling of each source – Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016] • DNN-based supervised TF modeling of each source – Independent deeply learned matrix analysis (IDLMA) [Makishima+, 2019] • DNN-based permutation solver – Generalized permutation solver with training [Yamaji&Kitamura, 2020] • Spectrogram consistency – Consistent IVA and consistent ILRMA [Yatabe, 2020], [Kitamura+, 2020]
  • 19. 19 • Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006] – utilizes sourcewise frequency vector as a random variable – Vector source model in IVA • Spherical property of groups components that have co-occurrence of all frequencies as one source IVA Permutation-problem-free estimation of can be achieved! … … Mixing matrix … … … Observed vector Demixing matrix Estimated vector Multivariate distribution Have internal correlations Source vector Frequency Time Co-occurrence of all frequencies in each source
  • 20. 20 • Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006] – How much valid is IVA’s TF structure model? • Typical audio sources have co-occurrence of all frequencies • Can be interpreted as “group sparsity” in TF domain IVA Speech source (conversation) Vocal source (pop music)
  • 21. 21 • Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006] – Optimization problem in IVA – By assuming spherical Laplace dist., [Hiroe, 2006], [Kim, 2006] – the minimization problem in IVA becomes as follows IVA : Non-Gaussian multivariate and spherical complex- valued source distribution (e.g., spherical Laplace distribution)
  • 22. 22 • Auxiliary-function-based IVA (AuxIVA)[Ono, 2011] – Fast and stable optimization called iterative projection (IP) • Auxiliary function technique (or majorization-minimization algorithm) – Convergence-guaranteed fast and stable optimization without stepsize parameters Efficient optimization for IVA Update of auxiliary variables Update of original variables https://pyroomacoustics.readthedocs.io/en/pypi- release/pyroomacoustics.bss.auxiva.html Python code: Implemented in “Pyroomacoustics” library
  • 23. 23 Frequency Time TF structure in IVA Frequency Time Frequency-uniform vector Time activation Frequency Basis Basis Time # of bases can arbitrarily be set To represent more complicated TF structure, NMF modeling can be introduced, resulting in independent low-rank matrix analysis (ILRMA) Extension of TF structure assumed in IVA Frequency Time TF structure in ILRMA
  • 24. 24 ILRMA • Independent low-rank matrix analysis (ILRMA) – assumes each source has a low-rank TF structure – is a unification of • independence-based estimation of demixing matrix (FDICA or IVA) • low-rank TF modeling of each source (NMF) – avoids encountering the permutation problem • TF structure is introduced as well as IVA [Kitamura+, 2016] Observed signal Frequency-wise demixing matrix Estimated signal Time Frequency Frequency Time Update demixing matrix so that estimated signals are 1. mutually independent (ICA) 2. have low-rank TF structures (NMF) STFT Low-rank approximation by NMF Low rank Low rank Not low rank
  • 25. 25 • Independent low-rank matrix analysis (ILRMA) – Optimization problem in ILRMA – Convergence-guaranteed update rules • NMF’s multiplicative update • AuxIVA (IP) ILRMA [Kitamura+, 2016] Cost function in FDICA or IVA Estimates frequency-wise demixing matrix Cost function in NMF Estimates low-rank TF structure of each source MATLAB code: https://github.com/d-kitamura/ILRMA Python code: Implemented in “Pyroomacoustics” library
  • 26. 26 Contents • Background – Blind source separation (BSS) for audio signals and its history – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Time-frequency-masking-based BSS (TFMBSS) – Reformulation of BSS problems and its optimization – BSS based on primal-dual splitting method – Interpretation of TF masking and application • Conclusion
  • 27. 27 Reformulation of BSS • Cost functions of independence-based BSS – FDICA w/ Laplace – IVA w/ spherical Laplace – ILRMA w/ Itakura-Saito NMF
  • 28. 28 Reformulation of BSS • All of them are coming from ICA’s cost • Source generative model – corresponds to TF structure model for each source – is necessary for avoiding the permutation problem • Better assumption of TF structures – provides better BSS performance Freq. Time Low-rank Freq. Time Sparse Freq. Time Group-sparse and more
  • 29. 29 Reformulation of BSS • Derivation of optimization algorithm – is problem dependent (depends on TF structure model) – requires technical knowledges and math skills • To try various TF structures in plug-and-play manner, – let’s reformulate BSS problems in a more general form – then solve it using a TF-structure-independent algorithm BSS algorithm Sparse Low-rank Plug and play Group-sparse
  • 30. 30 Reformulation of BSS • Generalized optimization problem [Yatabe&Kitamura, 2018] – • TF structure model for each source • Often called “source model” in the context of BSS • Replace this function with a plug-and-play manner – • Coming from an ICA theory (Jacobian between and ) • Interpreted as “barrier function” avoiding to be rank-deficient of
  • 31. 31 Reformulation of BSS • Generalized optimization problem [Yatabe&Kitamura, 2018] – FDICA w/ Laplace (L1 sparse regularizer) – IVA w/ spherical Laplace (L2,1 group-sparse regularizer) – ILRMA w/ Itakura-Saito NMF (low-rank approximation) Freq. vector
  • 32. 32 Reformulation of BSS • Generalized optimization problem [Yatabe&Kitamura, 2018] – But, how? • Apply convex optimization technique – Primal-dual splitting method – Proximity operator • If is “proximable”, then we obtain optimization algorithm! If we change the TF structure model , its optimization algorithm can easily be obtained! Objective [Condat, 2013], [Vu, 2013], [Komodakis+, 2015]
  • 33. 33 Primal-dual splitting method • Primal-dual splitting method [Condat, 2013], [Vu, 2013], – considers following problem – Iterative optimization algorithm – Proximity operator • If a proximity operator of can easily be calculated, is called “proximable” [Komodakis+, 2015] Step size parameters and : proper lower-semicontinuous convex function
  • 34. 34 BSS using Primal-dual splitting method • Convert BSS to primal-dual-splitting-applicable form – Vectorization of demixing matrices – Matrixization th singular value of ... ... Mat to vec Collect all freqs. ...
  • 35. 35 BSS using Primal-dual splitting method • Convert BSS to primal-dual-splitting-applicable form Introduce vectorized notation ( is a reshaped matrix that includes ) Ready to apply primal-dual splitting! C.f. problem for primal-dual splitting
  • 36. 36 BSS using Primal-dual splitting method • General BSS algorithm using primal-dual splitting – Function is always proximable [Yatabe&Kitamura, 2018] Singular value decomposition
  • 37. 37 BSS using Primal-dual splitting method • General BSS algorithm using primal-dual splitting – L2,1 Group sparse BSS (IVA) – Nuclear-norm-based low-rank BSS (ILRMA?) Nuclear norm (sum of singular values)
  • 38. 38 BSS using Primal-dual splitting method • Multiple TF structures can also be utilized – L2,1 group-sparse + L1 sparse BSS (sparse IVA) – Low-rank + L1 sparse BSS (sparse ILRMA?) Proximable Proximable Proximable Proximable If TF structure models are proximable, you can use them in a plug-and-play manner! Advantage of proposed BSS
  • 39. 39 BSS using Primal-dual splitting method • Experiment of two-speech-source BSS – Compare improvement of source-to-distortion ratio (SDR) Mixture A Mixture B Group-sparse Group-sparse + sparse Low-rank + sparse Low-rank Group-sparse Group-sparse + sparse Low-rank + sparse Low-rank
  • 40. 40 Interpretation of TF masking • Proximity operators of many sparsity-inducing functions are obtained as thresholding operators – L1 norm: – L2,1 norm: – They have the same form: TF masking to the variable Proximity operator TF mask (0~1 values) determined by TF structure model Variable in TF shape Elementwise product
  • 41. 41 TMFBSS • Time-frequency-masking-based BSS (TFMBSS) – Skip designing TF structure model function – TF mask of intended TF structure is employed in the optimization algorithm [Yatabe&Kitamura, 2021] 1. Design intended TF structure model 2. Calculate proximal operator 3. Optimize the problem BSS based on primal-dual splitting method TFMBSS ??? 1. ― 2. Design intended TF mask 3. Optimize the problem [Yatabe&Kitamura, 2019]
  • 42. 42 TMFBSS • Time-frequency-masking-based BSS (TFMBSS) – Intended TF structure model is input to TFMBSS as a TF mask – Demixing matrix is optimized so that the estimated signals have the intended TF structures – Iterative update of TF masks are also interesting Mixture Frequency-wise demixing matrix Time Frequency Frequency Time Update demixing matrix so that the estimated signals have TF structures enhanced by the input TF masks STFT Enhancement by TF masking Time Frequency Frequency Time Time Frequency Frequency Time Estimates [Yatabe&Kitamura, 2021] [Yatabe&Kitamura, 2019]
  • 43. 43 Application of TMFBSS • HPSS-based TFMBSS [Oyabu&Kitamura, 2021] – utilizes TF mask that is obtained via harmonic- percussive sound separation (HPSS) in TFMBSS
  • 44. 44 • HPSS-based TFMBSS [Oyabu&Kitamura, 2021] Mixture Optimization- based HPSS [Ono+, 2008] Median-based HPSS [FitzGerald, 2010] Optimization- based HPSS + TFMBSS Median- based HPSS + TFMBSS Application of TMFBSS Linear, multichannel Estimated percussive sound Estimated harmonic sound Nonlinear, single-channel
  • 45. 45 Contents • Background – Blind source separation (BSS) for audio signals and its history – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Independent low-rank matrix analysis (ILRMA) • Time-frequency-masking-based BSS (TFMBSS) – Reformulation of BSS problems and its optimization – BSS based on primal-dual splitting method – Interpretation of TF masking and application • Conclusion
  • 46. 46 Application of TMFBSS • Audio BSS with TF structure model – TF structure model is necessary for avoiding the permutation problem • Conventional algorithms (IVA, ILRMA, and so on) – Which TF structure is the best? Try and error – The optimization algorithm is problem-dependent • Changing TF structure model requires derivation of the algorithm • Proposed generalized BSS using primal-dual splitting – Easy to replace TF structure model • (if the function is “proximable”) – Easy to search the best TF structure for each BSS problem • TFMBSS – Explicitly define TF structure as TF masking

Notas del editor

  1. Hi everyone, thank you for coming to my overview presentation. The title is Blind Audio Source Separation Based on Time-Frequency Structure models
  2. First of all, let me introduce myself.
  3. This is the contents of this talk; Background, Preliminaries, main topic, and conclusion.
  4. The first topic is background.
  5. This talk treats blind source separation problem, BSS, which is a separation technique of individual sources from the recorded mixture. The word “blind” means “unsupervised”. Thus, the BSS method does not require any prior information about the recording conditions and sources, such as locations of microphones, sources, room geometry, training dataset of sound sources, and so on. This kind of technique is very useful for many applications. For example, hearing aid systems, automatic speech recognition, and preprocessing for music analysis.
  6. This is a demonstration of music BSS using the method called ILRMA. Here we have a mixture signal of three parts, which was recorded using three microphones. Please pay attention to listen three parts, guitar, vocal, and keyboard, OK? Let’s listen. Then, if we apply ILRMA to this multichannel signal, we can obtain this kind of estimates. So, we can remix them, re-edit them, or anything we want. This is a source separation. By the way, the source code of ILRMA is available here, so please check it.
  7. In BSS for audio signals, numbers of microphones and sources are important. In this talk, we only consider a “determined” situation, namely, the numbers of microphones and sources are equal. If we want to separate three sources, we have to put three microphones. In the determined situation, the BSS problem becomes an estimation of the demixing system W, which is an inverse system of the mixture A.
  8. Here we show the historical overview in this slide, where only the related methods are shown here. There are three columns, determined, underdetermined, and single-channel. The origin of determined BSS is independent component analysis, ICA. And the important methods in this talk are IVA, AuxIVA, and ILRMA. In this talk, we review this column, namely, from ICA to the newest method called TFMBSS from the viewpoint of the utilized time-frequency structure models in each method.
  9. I here explain the motivation of this talk. The conventional determined BSS have advantages. One is a minimum distortion. Since these algorithms separate sources by multiplying frequency-wise demixing matrices, we can avoid artificial distortion as much as possible. Another advantage is a fast and stable optimization. In AuxIVA, very efficient algorithm called iterative projection was proposed, and this advantage was inherited to ILRMA. IVA and ILRMA assumes their own time-frequency structure models. However, if this model does not fit to the actual sources in the mixture, the BSS performance is degraded. So, we want to try various TF structure models in BSS. But we need to derive the optimization algorithms for each of TF structure models. Motivated by this issue, we propose a new BSS algorithm that can easily replace TF structure model and can easily search the best one. This is the main topic of this talk.
  10. 5分 The next one is Preliminaries. I’m gonna review the conventional methods from ICA to ILRMA.
  11. ICA is a fundamental algorithm for BSS. ICA assumes that the source distributions are mutually independent and non-Gaussian. Also, the mixing system is modeled by a multiplication of mixing matrix A, which is invertible and time-invariant. Based on these assumptions, ICA estimates the demixing matrix W, which is ideally an inverse matrix of A.
  12. The estimation theory in ICA is here. ICA minimizes the similarity between these distributions. This is equivalent to a maximization of independence between the separated sources. Since the separated signal y includes the demixing matrix, the optimization problem in ICA can be formulated as this problem, where p(y) is a non-Gaussian source distribution we need to assume. So, we find W that minimizes this function.
  13. However, ICA has two ambiguities: scales and permutation. ICA cannot determine the scales and the order of the estimated signals. In particular, the permutation ambiguity will be a serious problem in an audio BSS problem.
  14. For audio mixture signals, simple ICA cannot separate the sources. This is because the mixture of audio signals is not the multiplication of A but the convolution of mixing filters, which is due to the room reverberation. To deconvolute the mixture, we apply short-time Fourier transform and convert signals to TF domain. Since convolution in the time domain becomes multiplication in the TF domain, we can apply ICA and estimates frequency-wise demixing matrix.
  15. This method is called frequency-domain ICA, FDICA in short. We apply ICA to each of frequencies separately. Then, we estimate the demixing matrix Wi, where i is the index of frequencies and j is the index of time frames.
  16. Optimization problem in FDICA is formulated like this, and p(y) is a source distribution in the TF domain. Complex Laplace distribution, shown here, is often used for this assumption, and the minimization problem can be obtained like this.
  17. However, FDICA encounters the serious problem, which is so-called the permutation problem. In FDICA, simple ICA is performed in each frequency separately. Therefore, the order of the estimated signals is messed up along the frequency axis. Even if we completely separate the sources in each frequency, we have to take an alignment of the order of them along the frequency. Several permutation solvers have been proposed so far.
  18. I here listed popular permutation solvers. Before 2006, the permutation solver was a post processing (戻って) as shown in this figure, which uses correlation between frequencies or direction of arrival. Then, independent vector analysis, IVA, and independent low-rank matrix analysis, ILRMA, were proposed. These methods are a unification of ICA and permutation solver.
  19. From this slide, we review the important BSS algorithms, IVA and ILRMA, from the viewpoint of the TF structure models. IVA is a multivariate extension of FDICA, namely, IVA utilizes sourcewise frequency vector as a random variable to unify all the frequency components in the estimation of ICA. IVA assumes a joint distribution of all the frequency components as a source distribution p(s). In addition, this distribution p(s) has an inner structure, a co-occurrence of all the frequency components. This model is called “spherical property” of multivariate distribution, but anyway, ICA assumes the co-occurrence of all the frequency components in the same source, which is depicted in this figure. By the assumption of this TF structure for each source, Wi is estimated so that the permutation problem does not arise. 10分
  20. The question is how much valid is IVA’s TF structure model? I here showed the time-frequency powers of speech and vocal sources. As you can see, typical audio sources have co-occurrence of all the frequencies when the source is active, and IVA’s assumption seems to be valid. Also, this structure can be interpreted as group sparsity in the TF domain.
  21. The optimization problem in IVA can be defined like this, and the joint distribution p will enforce previous TF structure by assuming the spherical distribution here. For example, when we assume a spherical Laplace distribution, this model, the minimization problem in IVA becomes as shown in the bottom. In the original IVA paper, this problem was optimized by a simple gradient descent, but
  22. in 2011, an efficient update algorithm for IVA was proposed, which is called AuxIVA. It provides an elegant update rules called iterative projection, IP, and the convergence-guaranteed fast optimization without stepsize parameters was established. This graph shows the value of cost function and the number of iterations. AuxIVA sufficiently converges in less than 20 times update. I play the sound demo of AuxIVA.
  23. In 2016, we extended the TF structure model in IVA to richer one. IVA assumes the uniform co-occurrence of all the frequencies. This can be considered as a rank-1 time-frequency structure, namely, frequency-uniform vector is activated along time axis. As we already shown, this model is valid for typical audio signals, but it may be too simple because audio sources have a harmonic frequency structure. To represent more complicated TF structure, we proposed independent low-rank matrix analysis, ILRMA, which employs NMF modeling as a TF structure. In ILRMA, the single uniform frequency vector in IVA is extended to the multiple complicated vectors, and more accurate spectrogram can be modeled as a low-rank matrix. Such an accurate TF model will improve the estimation performance of the frequency-wise demixing matrices.
  24. ILRMA assumes that each source has a low-rank TF structure, and the rank of mixture spectrogram increases. Thus, by enforcing the low-rankness of each estimated signal in the TF domain, the demixing matrix can avoid encountering the permutation problem, and richer TF structure model than IVA will improve the BSS performance. 14分
  25. The optimization problem in ILRMA is shown here. We find Wi, and the NMF variables Tn and Vn that minimize this cost function. (クリック)The first and second terms of this function coincide with the cost function in NMF, (クリック)and the second and third terms coincide with the cost function in FDICA or IVA. (クリック)Thus, we can iterate NMF update rules and IP-based update of the demixing matrix. This iteration guarantees the theoretical convergence. This graph shows the behavior of the cost function value. ILRMA converges in less than 100 iterations. Let’s play the sample sounds. This result is better than that of IVA. 15分くらい
  26. Let’s move on to the main topic of this talk.
  27. So far, we showed the cost functions of FDICA, IVA, and ILRMA, which are listed in this slide. We can see that they have the similar forms. This is because
  28. all of them are coming from the original ICA’s cost function, this one, and the difference is just an assumption of the source distribution p(Y), which is often called source generative model. This generative model corresponds to the TF structure model for each source, and this model is necessary for avoiding the permutation problem. Of course, better assumption of TF structures provides better BSS performance, but the suitable TF structure model depends on the type of sources, such as speech, music, harmonic source, percussive source, noise source, and so on. Therefore, we have to search the best TF structure model with a try-and-error approach.
  29. However, in the conventional method, it is difficult to replace the TF structure model because we have to derive the optimization algorithm, which requires technical knowledges and math skills. If we derive a general BSS algorithm, and if we can replace the TF structure model in a plug-and-play manner, it is very useful to search the best model for each problem. So, to try various TF structure models in a “plug-and-play manner”, first, we reformulate the BSS problem in a more general form. Then, we solve it using a TF-structure-independent algorithm. 17分
  30. This problem is our proposed generalized BSS problem, which includes FDICA, IVA, and ILRMA. The function P(W, X) corresponds to the TF structure model we assume, which is often called the source model. By replacing the function P, we can try various TF structure models. The negative log-determinant term is coming from an original ICA theory. We can interpret this function as a “barrier function” avoiding to be rank-deficient of Wi. If Wi becomes a rank-deficient matrix, its determinant becomes zero, and this term becomes infinity. So, we can avoid such solution in the optimization. 18分
  31. For the conventional BSS algorithm, the function P(W, X) corresponds to these functions, respectively. FDICA corresponds to an L1-norm sparse regularizer, and IVA is an L2,1-norm group-sparse regularizer. ILRMA is a little bit difficult, but still we can represent it using an argument minimum as shown here, where DIS is an Itakura-Saito divergence.
  32. The objective of this reformulation is that / if we change the TF structure model P, its optimization algorithm can easily be obtained. This is because we want to establish a new BSS algorithm with plug-and-play TF structure models. But the question is, how can we do that? The idea is coming from a convex optimization field. We utilize an algorithm called “primal-dual splitting method”. In this algorithm, we need a proximity operator of the function P. The function whose proximity operator can easily be calculated / is called “proximable”. So, if the TF structure model P is proximable, we can obtain the optimization algorithm for this generalized BSS problem.
  33. Primal-dual splitting method considers this problem. Minimize the vector w for the function g(w) + h(Lw), where L is just a matrix. This minimization can be solved by this iterative optimization algorithm. This is a primal-dual splitting method. In the first line, we calculate the proximity operator of the function g with this input. Then, the second line calculates the new input z, and in the third line, we calculate the proximity operator of the function h with the input z. By iterating these three steps, we can minimize this cost function. Prox is a regularized minimization of the function f in the neighborhood of input x, which always has a unique solution. We do not dive into the details of this algorithm in this overview, but you can referrer some papers to know the theory of the method. The important point is that, we can use any function P, any TF structure P if the functions P are all proximable. We just switch the proximity operator of P according to the recipe of well-known proximity operators of popular functions. 21分
  34. The goal is to convert this minimization function to the primal-dual-splitting-applicable form. So, we convert this function (戻って)to this. As a first step, we transform the determinant of Wi to the singular values sigma using this equation. Next, we vectorize the demixing matrices Wi with this computation, where V is a linear operator converting a matrix Wi into a vector And we also define the inverse operation M, namely, M is a linear operator converting the vector w back into the matrices Wi.
  35. By introducing the vectorization, we get this function. Its almost there. Then, we define I(w) like this, and now we are ready to apply the primal-dual splitting method. Now we have the same form as this original function.
  36. In summary, we defined the general BSS algorithm as this minimization problem, and we can optimize this using a primal-dual splitting method. The algorithm is shown here. And we have a proximity operator of a new function I in this line. I(w) is a sum the logarithm of singular values. The proximity operator of the Logarithm function and singular values are well-known. Thus, we can easily obtain the proximity operator of I(w) as shown in the bottom of this slide.
  37. OK, let me see how IVA and ILRMA are defined in this BSS formulation. The TF structure assumed in IVA is group sparseness, which can be defined as L2,1 norm of the estimated spectrogram Yn. So, we replace the function P to the L2,1 norm, and we do not have to resolve the algorithm. The proximity operator of L2,1 norm is obtained like this, so we use this calculation in the third line of this algorithm. Next, ILRMA assumes the low-rank TF structure by applying NMF to the estimated spectrogram Yn. Instead of NMF, we use a nuclear norm to represent the low-rank regularization. Again, the proximity operator of the nuclear norm is well-known. We can obtain the optimization algorithm by replacing the third line to this calculation. From this, we can see that the proposed algorithm can handle various TF structures in a unified algorithm, which is very useful to search the best TF structure.
  38. In addition, multiple TF structures can also be utilized. For example, group sparse + sparse BSS can be defined like this function, which can be interpreted as a sparse IVA. Of course, these functions are both proximable, we can obtain the optimization algorithm. As another example, low-rank + sparse BSS can also be defined as sparse ILRMA like this problem. As you can see, the important point is that, when you want to utilize a new TF structure model P, check whether P is proximable. If P is proximable, you can use it in the proposed BSS algorithm in a plug-and-play manner. This is a strong advantage of the proposed BSS. 25分半
  39. These graphs show the BSS performance of two-speech mixtures with AuxIVA and various TF structures. The vertical axis shows SDR improvements, which indicates the separation performance. And the horizontal axis shows the number of iterations in each algorithm. Since the group-sparse model is equivalent the IVA model, it provides the completely same performance in the converged point. Low-rank model is similar to ILRMA, and group sparse + sparse model is a sparsity-induced IVA. Also, low-rank + sparse is a sparse version of ILRMA. Again, we can easily compare which TF structure model is the best for the speech source separation. In this experiment, Low-rank + sparse model provides the best performance for both mixture samples. 26分半
  40. Now we have extended the proposed BSS algorithm to more explicit formulation, namely, we do not assume a function P, but we directly introduce TF mask as an intended TF structure. Let me explain this extension as a final topic of this talk. It is known that the proximity operators of many sparsity-inducing functions are obtained as thresholding operators. For example, prox of L1 norm is obtained like this, and this calculation is soft thresholding of the input variable because this term becomes a value between 0 and 1. L2,1 norm also becomes soft thresholding. Since the input vector z includes spectrograms of the estimated signals, these soft thresholding in each element can be interpreted as a time-frequency soft masking. Namely, the calculation of proximity operator, (戻って)the third line of the algorithm, is just applying a TF soft mask defined by the intended TF model and the current optimization variable Z. This fact tells us that we don’t have to design a TF structure function P. Just we have to do is to design a TF mask of the intended TF structure. 28分
  41. From this motivation, we proposed time-frequency-masking-based BSS, TFMBSS in short. The different point between the previous general BSS and TFMBSS is shown here. In the previous algorithm, we had to design the TF model function P, and we obtain its proximity operator. In TFMBSS, we skip designing the function P, and we directly design the intended TF mask. Therefore, we don’t care about what kind of cost function is minimized in this algorithm.
  42. This figure is a concept of TFMBSS. We input TF masks as a TF structure model. And the demixing matrix is optimized so that the estimated signals have the intended TF structures.
  43. Let me introduce one application of TFMBSS. We utilized a well-known music BSS algorithm called harmonic-percussive sound separation, HPSS, to accurately separate drum sounds and the other musical instruments. In this method, we apply HPSS to the temporal estimated signals Zharmonic and Zpercussive independently and produce the masks in a Wiener filtering manner. These masks are input to TFMBSS as a TF structure model. This process is iterated until it converges, so in each iteration of TFMBSS, two HPSS are performed.
  44. This is a demonstration. We utilized two types of HPSS. Since HPSS is a single-channel nonlinear algorithm, the artificial distortions may arise. If we have a multichannel observation, we can use these HPSS in TFMBSS and achieve linear distortion-less separation. The red cells are harmonic estimates, and the blue ones are the percussive estimates. 再生 As you can see, TFMBSS provides better separation.
  45. This is a conclusion.