Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization

Prior Distribution Design for
Music Bleeding-Sound Reduction Based on
Nonnegative Matrix Factorization
Yusaku Mizobuchi , Daichi Kitamura s
Tomohiko Nakamura , Hiroshi Saruwatari s
Yu Takahashi , Kazunobu Kondo s
13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference
Session: OD-SLA-3: Speech Enhancement and Separation
Time: Thu., 16 Dec, 10:45-11:00 (UTC +9)
National Institute of Technology, Kagawa College, Japan
The University of Tokyo, Japan
Yamaha Corporation, Japan

• Background
– Blind source separation
– Bleeding sound and it’s reduction problem
• Conventional Methods
– Nonnegative matrix factorization（NMF）
– Independent low-rank matrix analysis （ILRMA）
– Linear demixed domain multichannel NMF （DMNMF）
– Time-channel NMF （TCNMF）
• Proposed Method
– Introduce an a priori generative model for relative leakage levels
– Estimate parameters based on maximum a posteriori
• Experiments
Contents
2

• Background
• Proposed Method
• Experiments
Contents
3

• Blind source separation (BSS)
– extracts audio sources from a multichannel mixture
– assumes that the mixing system is unknown and estimates the
demixing system
• Application
– Preprocessing for music analysis, automatic score production, and so on
– Sound-reinforcement of music liveshow
– High quality music recording in a studio
4
Background
Demixing
system
unknown estimate
Mixing
system

• Typical microphone placement
– Arranged at spatially close positions to each of sound sources
• The bleeding sound
– Non-target sources are also captured
– Bleeding sound is too small
Background
5
: Microphone
Target sound for mic.
Bleeding sound from
non-target sources

• Bleeding sound reduction problem
– is similar to a multichannel audio source separation problem
– aims to remove the interfering bleeding sounds from the non-target
sources
• Peculiarity of this problem
(a) The signal-to-noise ratio (SNR) of the observed signal is relatively high
because of a close miking setup
(b) The target source for each microphone is known
because of a close miking setup
(c) The microphones are spatially apart from each other (e.g., more than 2m)
Spatial aliasing occurs
Phase information (observed time differences between microphones) is
unreliable
BSS that utilizes phase information basically fails to separate bleeding
sounds
(d) Signal processing of music
The required separation quality is relatively high
Background
6

• Background
• Proposed Method
• Experiments
Contents
7

Low rank modeling method
• Nonnegative matrix factorization (NMF) [Lee+, 1999]
– Low-rank decomposition with nonnegative constraint
• Limited number of nonnegative bases and their coefficients
– Spectrogram is decomposed in acoustic signal processing
• Frequency appearing spectral patterns and their activations
8
Amplitude
Amplitude
Input data matrix
(amplitude spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(time-varying gains)
Time
: # of frequency bins
: # of time flames
: # of basis
Time
Frequency
Frequency
Basis
Activation

• Optimizing parameters in NMF
– Define a cost function and minimize it
– Any cost function can be used
• Squared Euclidian distance etc.
• Using Kullback-Leibler (KL) divergence in the proposed method
– Efficient iterative optimization
• Multiplicative update rules (auxiliary function technique) [Lee, 2001]
Low rank modeling method
9
When the cost function is a squared Euclidian distance

• independent low-rank matrix analysis (ILRMA)
– Estimate frequency-wise complex demixing matrix using amplitude and
phase information
– The power spectrogram of each source is modeled by a low-rank matrix
Related methods: ILRMA
10
[Kitamura+, 2016]
…
…
…
Observed
Demixing
matrix
Estimated
Time
Frequency
Frequency
Time
Update demixing matrix so that estimated signals
have low-rank structure in time-frequency domain
STFT
Low-rank decomposition
When spatial aliasing occurs in the observed signal, BSS is difficult
The mixing system
in time- frequency domain
Complex-valued matrix
Estimated
spectrograms
Observed
spectrograms
Demixing
matrices

• Linear demixed domain multichannel NMF (DMNMF)
– Estimate frequency-wise real demixing matrix using power information
– The power spectrogram of each source is modeled by a low-rank matrix
– The power-based BSS (don’t use phase information)
Related method: DMNMF
11
[Taniguchi+, 2017]
Estimated power
spectrograms
…
…
…
Observed power
spectrograms
Demixing
matrices
The mixing system
in time- frequency domain
BSS may be possible even when
spatial aliasing occurs in the observed signal
Real valued matrix
(nonnegative)

• Time-channel NMF [Togami+, 2010]
– apply NMF frequency-wise time-channel signal in the amplitude
domain（ )
– estimate mixing matrix and time-source activation matrix
– The amplitude-based BSS (don’t use phase information)
Related method: TCNMF
12
Cost function
Time
Source
Time
Time
Frequency
Observed amplitude
spectrograms
(Channel-wise) Frequency
Frequency
Mixing matrices
(Frequency-wise)
Time-source activation
matrices
(Frequency-wise)
Observed amplitude
spectrograms
(Frequency-wise)
BSS may be possible even when
spatial aliasing occurs in the observed signal

• In the determined case, The minimization problem has a trivial
solution , when
– Unable to separate sources in this situation
• To avoid this trivial solution, introduce a sparse regularizer in
– : A weight coefficient for regularization
– : A time-frame-wise vector in
– : the norm
Related method: TCNMF
13
Cost function
Regularizer
Time
Frequency
Source
…
…
Mixing matrices
(Frequency-wise)
Time-source activation matrices
(Frequency-wise)

• Background
• Proposed Method
• Experiments
Contents
14

Motivation
• To avoid the trivial solution of , TCNMF apply the sparse
regularizer for
– W-disjoint-orthogonality assumption [Yılmaz and Rickard, 2004]
• at most one source is active in each time-frequency slot
(suitable only for speech mixtures)
– In music signals, multiple sound sources colide at the same frequency
• Sparse regularizer in bring about degradation of sound quality
• Approach in proposed method
– To avoid the trivial solution of , regularize instead of regularizing
• Diagonal elements are set to 1
• Off-diagonal elements are regularized to small value (i.e., 0.1~0.5)
15
1
1
1
1
Regularizing relative leakage
levels of bleeding sound
Diagonal elements
are set to 1
Off-diagonal elements
are small value

• Apply TCNMF to (observed amplitude signal) as in
the conventional method
• Introduce an a priori generative model to both diagonal and off-
diagonal elements of instead of regularizing
– The proposed method can be interpreted as MAP estimation
TCNMF based on MAP estimation
16
Time
Source
Time
Frequency
Mixing matrices
(Frequency-wise)
Time-source activation
matrices
(Frequency-wise)
Observed amplitude
spectrograms
(Frequency-wise)
Conventional TCNMF
regularizes
Proposed TCNMF
regularizes
Frequency

( is the scale parameter)
Random variable
Probabilistic
density
function
• Introduce the following a priori generative model into
– We can avoid
by setting the shape parameter
to
TCNMF based on MAP estimation
17
Dirac’s delta distribution (restricting )
Gamma distribution： ( is the shape parameter)
Diagonal elements
Off-diagonal elements
Trivial solution

Optimization of proposed TCNMF
• Cost function for MAP estimation
– Estimate by minimizing the above cost function
18
Taking a negative logarithm
Substitute prior distributions
Equivalent
Regularizer for
diagonal elements
Regularizer for
off-diagonal elements
Nonnegative prior
: indicator function

• Minimization of equals the following problem
– Use majorization-minimization (MM) algorithm [Hunter+, 2004]
19
Data Fidelity term
Regularizer

• Update rules of
– ： element-wise multiplication
– ー： element-wise division
– ： an matrix containing only ones
– ： vector that consists of the diagonal element of the argument
– ： matrix transpose
20

• (i) The KL div. has a scale-dependent property
• (ii) The diagonal elements of are restricted to be unity
– The off-diagonal elements correspond to the relative leakage levels of
bleeding sound
• From (i) and (ii),
– an observed gain of affects the balancing between fidelity term and
regularizer
Scale dependency problem of regularizer
21
is an arbitrary coefficient
Fidelity term Regularizer

• We also parameterize the observed gain as
– The smaller , the stronger the regularizer
• Normalize observed signal in advance
– After the normalization, a dynamic range of becomes
• Apply Wiener filtering to the complex-valued observed signal
Scale dependency problem of regularizer
22
Wiener filter

• Background
• Proposed Method
• Experiments
Contents
23

Ob.
Cl.
Pf.
Tb.
Conditions
• Comparing blind bleeding-sound reduction performance
– Compared methods
• IVA，ILRMA，DMNMF，Conventional TCNMF
– Evaluation criterion
• Source-to-distortion ratio (SDR) [Vincent+, 2006] improvement
• Dry sources
– “songKitamura” [Kitamura+, 2015]
– Create 4 observed signals from 4 instruments
24

• Simulation of bleeding sound
– Dry sources were mixed by using the frequency-wise nonnegative
random mixing matrix
– 10 observed mixtures were prepared using 10 different mixing matrix
• Average SDR in observed signals
– Ob. ＋ Bleeding sound：18.8 [dB]
– Cl. ＋ Bleeding sound ：15.0 [dB]
– Pf. ＋ Bleeding sound ：14.7 [dB]
– Tb. ＋ Bleeding sound ：8.6 [dB]
– These are relatively high
1
1
1
1
Conditions
25
Diagonal elements
are set to unity
Off-diagonal elements are
set to uniformly
distributed random values
in the range (0, 0.2)
Source Time
Frequency
Dry sources
(Complex spectrogram)
Nonnegative
mixing matrix
Time
Observed signals
(Complex spectrogram)
：Ob.
：Cl.
：Pf.
：Tb.
Frequency
Frequency

Conditions
• Other conditions
26
Sampling frequency 44.1 kHz
Window function Hamming window
Window length 4096 points (approximately 92.9 ms)
Shift length 2049 points (approximately 46.5 ms)
Number of iterations 200
Initial value of Diagonal elements are set to unity
Off-Diagonal elements are uniform
random values in the range (0, 0.1)
Initial value of Uniform random values in the range
(0, 1)
Shape parameter 1.25
Scale parameter 0.6
Observed gain parameter 0.006
Weight coefficient for regularizer 0.56
Number of bases 10，30，80

Results
27
Phase-sensitive BSS
Phase-insensitive BSS

Results
28
Ob. Cl. Pf. Tb.
Observed
Conventional
TCNMF
Proposed
TCNMF
• Demonstrations

Conclusion
• Purpose
– Reducing the bleeding sound in music signals
• SNR in observed signal is high
• The microphones are apart from each other
• Motivation
– Phase-sensitive BSS failed to reduce bleeding sounds
– DMNMF which is a phase-insensitive BSS method also failed
• Due to the difficulty of maximum likelihood estimation with many parameters
– TCNMF effective for reduction of bleeding sound
• However, the regularization to the source-time matrix degrades the
sound quality of the separated signal.
• Proposed method
– Introducing a priori distribution in the gain matrix of TCNMF
• Based on the assumption that the volume of the bleeding sound is small
• Proposed method outperformed conventional methods
29
Thank you for your attention.

Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization

Similar a Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization (20)

Más de Kitamura Laboratory

Más de Kitamura Laboratory (20)

Último

Último (20)

Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization

Notas del editor