Blind seperation image sources via adaptive dictionary learning

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 2921

Blind Separation of Image Sources via Adaptive
Dictionary Learning
Vahid Abolghasemi, Member, IEEE, Saideh Ferdowsi, Student Member, IEEE, and Saeid Sanei, Senior Member, IEEE

Abstract—Sparsity has been shown to be very useful in source represents the instrumental noise or the imperfection of the
separation of multichannel observations. However, in most cases, model. In BSS, the aim is to estimate both and from the ob-
the sources of interest are not sparse in their current domain and servations. This problem does not have a unique solution in gen-
one needs to sparsify them using a known transform or dictionary.
If such a priori about the underlying sparse domain of the sources eral. However, one can find a solution for (1) by imposing some
is not available, then the current algorithms will fail to success- constraints into the separation process and making the sources
fully recover the sources. In this paper, we address this problem distinguishable. Independent component analysis (ICA) is one
and attempt to give a solution via fusing the dictionary learning of the well-established methods that exploits the statistical in-
into the source separation. We first define a cost function based dependence of the sources [2]. It also assumes that the sources
on this idea and propose an extension of the denoising method in
the work of Elad and Aharon to minimize it. Due to impracticality are non-Gaussian and attempts to separate them by minimizing
of such direct extension, we then propose a feasible approach. In the mutual information.
the proposed hierarchical method, a local dictionary is adaptively Nonnegativity is another constraint that has been shown to be
learned for each source along with separation. This process im- useful for source separation. In nonnegative matrix factoriza-
proves the quality of source separation even in noisy situations. tion, a nonnegative matrix is decomposed into a product of two
In another part of this paper, we explore the possibility of adding
global priors to the proposed method. The results of our experi- nonnegative matrices. The nonnegativity constraint allows only
ments are promising and confirm the strength of the proposed ap- additive combination; hence, it can produce a part-based repre-
proach. sentation of data [3].
Index Terms—Blind source separation (BSS), dictionary In particular, and in image separation applications, there
learning, image denoising, morphological component analysis have been proposed several Bayesian approaches [4]–[8]. In
(MCA), sparsity. [4], Kayabol et al. adopted a Markov random field (MRF)
model to preserve the spatial dependence of neighboring pixels
(e.g., sharpness of edges) in a 2-D model. They provided a
I. INTRODUCTION
full Bayesian solution to the image separation using Gibbs
sampling. Another MRF-based method has been proposed by

I N SIGNAL and image processing, there are many instances
where a set of observations is available and we wish to re-
cover the sources generating these observations. This problem,
Tonazzini et al. in [7]. They used a maximum a posteriori
(MAP) estimation method for recovering both the mixing
matrix and the sources, based on alternating maximization in a
which is known as blind source separation (BSS), can be pre- simulated annealing scheme. Their experimental results have
sented by the following linear mixture model: shown that this scheme is able to increase the robustness against
noise. Later, Tonazzini et al. [8] proposed to apply mean field
(1) approximation to MRF and used expectation maximization
(EM) algorithm to estimate parameters of both the mixing
where is the observation matrix,
matrix and the sources. This method has been successfully
is the source matrix, and is the
used in document processing application. In addition, in [9],
mixing matrix (throughout the paper, we assume that the mixing
the images have been modeled by hidden Markov fields with
matrix is column normalized). The additive of size
unknown parameters and a Bayesian formulation has been
considered. A fast Monte Carlo Markov chain (MCMC) al-
gorithm has been proposed in this paper for joint separation
Manuscript received July 16, 2011; revised November 04, 2011 and Jan-
uary 08, 2012; accepted February 04, 2012. Date of publication February 13, and segmentation of mixed images. Kuruoglu et al. [10] have
2012; date of current version May 11, 2012. The associate editor coordinating studied the detection of cosmic microwave background (CMB)
the review of this manuscript and approving it for publication was Prof. Bulent
using image separation techniques. They exploited the spatial
Sankur.
V. Abolghasemi is with the School of Engineering and Design, Brunel Uni- information in the images by modeling the correlation and
versity, Uxbridge UB8 3PH, U.K., and also with the Faculty of Engineering and interactions between pixels by MRFs. They proposed a MAP
Physical Sciences, University of Surrey, Guildford, GU2 7XH, U.K. (e-mail:
algorithm for estimation of the sources and the mixing matrix
vabolghasemi@ieee.org).
S. Ferdowsi and S. Sanei are with the Department of Computing, Faculty of parameters.
Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, Different from MRF-based approaches for CMB detection
U.K.
in astrophysical images is spectral matching ICA (SMICA)
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. method proposed by Cardoso et al. [5]. In SMICA, the spectral
Digital Object Identifier 10.1109/TIP.2012.2187530 diversity of the sources is exploited. This enables SMICA

1057-7149/$31.00 © 2012 IEEE

2922 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012

to separate the sources even if they are Gaussian and noisy, in MMCA is still an open issue. One possible approach to use
provided that they have different power spectra. It is shown in dictionary learning for MMCA is to learn a specific dictionary
[5] that this idea leads to an efficient EM algorithm, much faster for each source from a set of exemplar images. This, however,
than previous algorithms with non-Gaussianity assumption. is rarely the case, since in most BSS problems, such training
An extension of this method to the wavelet domain, called samples are not available.
wSMICA, has been proposed by Moudden et al. [6]. In this paper, we adapt MMCA to those cases that the spar-
Wavelet domain analysis for image separation problem has sifying dictionaries/transforms are not available. The proposed
been also shown to be advantageous in some other works. In algorithm is designed to adaptively learn the dictionaries from
[11], Ichir and Mohammad-Djafari assumed three different the mixed images within the source separation process. This
models of the wavelet coefficients, i.e., independent Gaussian method is motivated by the idea of image denoising using a
mixture model, hidden Markov tree model, and contextual learned dictionary from the corrupted image in [1]. Other ex-
hidden Markov field model. Then, they proposed an MCMC tensions of the work in [1] have been also reported in [22], [24],
algorithm for joint blind separation of the sources and estima- and [25] for the purpose of image separation from a single mix-
tion of the mixing matrix and the hyperparameters. The work ture. In this paper, the multichannel case with more observations
in [12] also uses the wavelet domain for source separation. than sources is considered. We start by theoretically
In another work, Bronstein et al. [13] studied the problem of extending the denoising problem to BSS. Then, a practical algo-
recovering a scene recorded through a semi-reflecting medium rithm is proposed for BSS without any prior knowledge about
and proposed a technique called sparse ICA (SPICA). They the sparse domains of the sources. The results indicate that adap-
proposed to apply the wavelet packet transform (WPT) to the tive dictionary learning, one for each source, enhances the sep-
mixtures prior to applying Infomax ICA. Their algorithm is arability of the sources.
suitable for the cases where all the sources can be sparsely
represented in the wavelet domain. A. Notation
Sparsity-based approaches for the BSS problem have re- In this paper, all parameters have real values although it is not
ceived much attention recently. The term sparse refers to explicitly mentioned. We use small and capital boldface charac-
signals or images with small number of nonzeros with respect ters to represent vectors and matrices, respectively. All vectors
to some representation bases. In sparse component analysis are represented as column vectors by default. For instance, the
(SCA), the assumption is that the sources can be sparsely th column and th row of matrix are represented by column
represented using a known common basis or dictionary. For vectors and , respectively. The vectorized version of ma-
instance, the aforementioned SPICA method [13] considers trix is shown by vec . The matrix Frobenius norm
the wavelet domain for this purpose. In addition, the proposed is indicated by . The transpose operation is denoted by
methods in [14] and [15] use sparse dictionaries for separation . The -norm, which counts the number of nonzeros, and
of speech mixtures. Nevertheless, there are many cases where the -norm, which sums over the absolute values of all ele-
each source is sparse in a different domain, which makes it ments, are shown by and , respectively.
difficult to directly apply SCA methods to their mixtures. Mul-
tichannel morphological component analysis (MMCA) [16] B. Organization of the Paper
and generalized morphological component analysis (GMCA) In the next section, the motivation behind the proposed
[17] have been recently proposed to address this problem. method is stated. We start by describing the denoising problem
The main assumption in MMCA is that each source can be and its relation to image separation. Then, the extension of
sparsely represented in a specific known transform domain. In single mixture separation to the multichannel observation
GMCA, each source is modeled as the linear combination of a scenario is argued. In Section III, a practical approach for this
number of morphological components where each component purpose is proposed. Section IV is devoted to discussing the
is sparse in a specific basis. MMCA and GMCA are extensions possibility of adding global sparsity priors to the proposed
of a previously proposed method of morphological component method. Some practical issues are discussed in Section V. The
analysis (MCA) [18]–[21] to the multichannel case. In MCA, numerical results are given in Section VI. Finally, this paper is
the given signal/image is decomposed into different morpho- concluded in Section VII.
logical components subject to sparsity of each component in a
known basis (or dictionary). II. FROM IMAGE DENOISING TO SOURCE SEPARATION
MMCA performs well where prior knowledge about the
sparse domain of each individual source is available; however, A. Image Denoising
what if such a priori is not available or what if the sources Consider a noisy image corrupted by additive noise. Elad and
are not sparsely representable using the existing transforms? Aharon [1] showed that if the knowledge about the noise power
One may answer these questions by referring to the dictionary is available, it is possible to denoise it by learning a local dic-
learning framework. In dictionary learning framework, the aim tionary from the noisy image itself. In order to deal with large
is to find an overcomplete dictionary that can sparsely represent images, they used small (overlapped) patches to learn such dic-
a given set of images or signals. Utilizing learned dictionaries tionary. The obtained dictionary is considered local since it de-
into MCA has been shown to yield promising results [22], scribes the features extracted from small patches. Let us repre-
[23]. However, taking the advantages of learned dictionaries sent the noisy image by vector of length . The

ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2923

unknown denoised image is also vectorized and represented global transforms (e.g., wavelet and curvelet) for image sepa-
as . The th patch from is shown by vector ration from a single mixture. Their simulation results show the
of pixels. For notational simplicity, the th patch is ex- effects of adaptive dictionaries on separation of complex texture
pressed as explicit multiplication of operator (a binary patterns from natural images.
matrix) by .1 The overall denoising problem is expressed as All the related studies have demonstrated the advantages that
adaptive dictionary learning can have on the separation task.
However, there is still one missing piece and that is considering
(2)
such adaptivity for the multichannel mixtures. Performing the
idea of learning local dictionaries within the source separation
where scalars and control the noise power and sparsity de- of multichannel observations obviously has many benefits in
gree, respectively. In addition, is the sparsifying dic- different applications. Next, we extend the denoising method
tionary that contains normalized columns (also called atoms), in [1] for this purpose.
and are sparse coefficients of length .
In the proposed algorithm by Elad and Aharon [1], and
C. Multichannel Source Separation
are respectively initialized with and overcomplete
discrete cosine transform (DCT) dictionary. The minimization In this section, the aim is to extend the denoising problem
of (2) starts with extracting and rearranging all the patches of . (2) to the multichannel source separation. Consider the BSS
The patches are then processed by K-SVD [26], which updates model introduced in Section I, and assume further that the
and estimates sparse coefficients . Afterward, and sources of interest are 2-D grayscale images. The BSS model
are assumed fixed and is estimated by computing for 2-D sources can be represented by vectorizing all images
to and then stacking them
to form . The BSS model (1) cannot be
(3) directly incorporated into (2) as it requires both and to
be single vectors. Hence, we use the vectorized versions of
these matrices in model (1), which can be obtained using the
where is the identity matrix and is the refined version of .
properties of the Kronecker product2
Again, and are updated by K-SVD but this time using
the patches from that are less noisy. Such conjoined denoising
and dictionary adaptation is repeated to minimize (2). In prac-
tice, (3) is obtained computationally easier since is (4)
diagonal and the above expression can be calculated in a pixel-
wise fashion. In the above expression, and are column vectors of lengths
It is shown that (3) is a kind of averaging using both noisy and , respectively. In addition, is a column vector of
and denoised patches, which, if repeated along with updating of length . is a block diagonal matrix of size
other parameters, will denoise the entire image [1]. However, in , and is the Kronecker product symbol. We consider the
the sequel, we will try to find out if this strategy is extendable noiseless setting and modify (2) to
for the cases where the noise is added to the mixtures of more
than one image.

B. Image Separation
(5)
Image separation is a more complicated case of image de-
noising where more than one image are to be recovered from a
single observation. Consider a single linear mixture of two tex- It is clearly seen that the above expression is similar to (2) but
tures with additive noise: (or ). with an extra mixing matrix to be estimated. In addition, vec-
The authors of [25] attempt to recover and using the prior tors and are much lengthier than and as they represent
knowledge about two sparsifying dictionaries and . They vectorized versions of multiple images and mixtures.
use a minimum mean-squared-error (MSE) estimator for this The problem (5) can be minimized in an alternating scheme
purpose. In contrast, the recent work in [24] does not assume by keeping all but one unknown fixed at a time. The estima-
any prior knowledge about the dictionaries. It rather attempts to tion of and can be achieved using K-SVD similar to
learn a single dictionary from the mixture and then applies a de- [1]. In K-SVD, the dictionary is updated columnwise using sin-
cision criterion to the dictionary atoms to separate the images. gular value decomposition (SVD); the sparse coefficients are
In another recent work, Peyre et al. [22] presented an adaptive estimated using one of the common sparse coding techniques.
MCA scheme by learning the morphologies of image layers. However, estimation of is slightly different from that of
They proposed to use both adaptive local dictionaries and fixed in (3) and needs more attention. It has a closed-form solution,
1Practically, we apply a nonlinear operation to extract the patches from image 2The matrix multiplication can be expressed as a linear transformation on
by sliding a mask of appropriate size over the entire image similar to [1] and matrices. In particular, vec vec for three matrices ,
[22]. , and . In our case, vec vec .


which is obtained by taking the gradient of (5) and setting it to cost function in (10) does not require dealing with cumbersome
zero matrices and vectors [as shown in (5)] and allows calculating
the image sources in a pixelwise fashion.
The solution for (10) is achieved using an alternating scheme.
(6) The minimization process for the th level of hierarchy can be
expressed as follows. The image patches are extracted from
leading to and then processed by K-SVD for learning and sparse coef-
ficients , whereas other parameters are kept fixed. Then, the
gradient of (10) with respect to is calculated and set to zero

(12)
(7)

Finally, after some manipulations and simplifications in (12),
In order to estimate , we consider all unknowns, except , the estimation of the th source is obtained as
fixed and simplify (5) by converting the first quadratic term into
an ordinary matrix product and obtain

(8)
(13)
It is interesting to notice that the inverting term in the above
The above minimization problem is easily solved using pseu-
expression is the same as that in (3) for the denoising problem.
doinverse of as
Thus, as aforementioned, this calculation can be obtained pixel-
(9) wise. Next, in order to update , a simple least square linear re-
gression such as the one in [16] will give the following solution:
The above steps (i.e., estimation of , , , and ) should
(14)
be alternately repeated to minimize (5). However, the long ex-
pression (7) is not practically computable, particularly if the However, normalization of is necessary after each update
number of sources and observations is large. This is because to preserve the column norm of the mixing matrix. The above
of dealing with a huge matrix . In addition, steps for updating all variables are executed for all from 1 to
in contrast to the aforementioned denoising problem, the matrix . Moreover, the entire procedure should be repeated to mini-
to be inverted in (7) is not diagonal, and therefore, the estima- mize (10). A pseudocode of the proposed algorithm is given in
tion of cannot be calculated pixel by pixel, which makes the Algorithm 1.
situation more difficult to handle. In the next section, a practical
approach is proposed to solve this problem. Algorithm 1 The proposed algorithm.

III. ALGORITHM Input: Observation matrix , patch size , number of
In order to find a practical solution for (5), we use a hierar- dictionary atoms , number of sources , noise standard
chical scheme such as the one in MMCA [16]. To do this, the deviation , noise gain , regularization parameter , and
BSS model (1) is broken into rank-1 multiplications and the total number of iterations .
following minimization problem is defined: Output: Dictionaries , sparse coefficients , source
matrix , and mixing matrix .
1 begin
(10) 2 Initialization:
3 Set to overcomplete DCT;
where denotes the dictionary corresponding to the th
source, i.e., , and is the th residual expressed as 4 Set to a random column-normalized matrix;
5 ;
(11)
6 Choose to be multiple times of ;
7 ;
It is important to note that, with this new formulation, we have
to learn dictionaries, one for each source, different from (5). 8 repeat
The advantage of this scheme is learning adaptive source-spe-
9 ;
cific dictionaries, which improves source diversity, as also seen
in MMCA using known transform domains [21]. In addition, the 10 for to do


11 Extract all the patches from ; Consider a known global unitary basis for each
source. The minimization problem (10) can be modified as fol-
12 Solve the sparse recovery problem: lows to capture both global and local structures of the sources:
s.t.
;
13 Update using K-SVD; (15)
14 Calculate the residual:
;
Note that term is exactly similar to what was used in
15 Compute using (13); the original MMCA [16]. All variables in the above expression
can be similarly estimated using Algorithm 1, except the actual
16 ; sources . In order to find , the gradient of (15) with
17 ; respect to is set to zero, leading to

18 end
19 ;
20 until stopping criterion is met; sgn (16)
21 end

As already implied, the motivation behind the proposed al-
gorithm is to learn source-specific dictionaries offering sparse where sgn is a componentwise signum function. The above
representations. Such dictionary learning, when embedded into expression amounts to soft thresholding [16] due to the signum
the source separation process, improves both separation and dic- function, and hence, the estimation of can be obtained by
tionary learning quality. In other words, in the first few iter- the following steps:
ations of Algorithm 1, each source includes portions of other • Soft thresholding of with threshold and
sources. However, the dictionaries gradually learn the domi- attaining ;
nant components and reject the weak portions caused by other • Reconstructing by
sources. Using these dictionaries, the estimated sources, which
are used for dictionary learning in the next iteration, will have
(17)
less amount of irrelevant components. This adaptive process is
repeated until most of the irrelevant portions are rejected and
dictionaries become source specific. The entire above procedure Note that, since is a unitary and known matrix, it is not
is carried out to minimize the cost function in (10). It is also explicitly stored but implicitly applied as a forward or inverse
noteworthy to mention that (10) is not jointly convex in all pa- transform where applicable. Similar to the previous section, the
rameters. Therefore, the alternating estimation of these parame- above expression can be executed pixelwise and is not compu-
ters, using the proposed method, does not necessarily lead to an tationally expensive.
optimal solution. This is the case for other previous alternating
minimization problems too [16], [24]. However, the obtained
parameters are local minima of (10) and, hence, are considered V. PRACTICAL ISSUES
as approximate solutions. Implementation of the proposed methods needs some careful
considerations, which are addressed here.

IV. GLOBAL DICTIONARIES VERSUS LOCAL DICTIONARIES A. Noise

The proposed method takes the advantages of fully local dic- Similar to the denoising methods in [1], the proposed method
tionaries that are learned within the separation task. We call also requires the knowledge about noise power. This should be
these dictionaries local since they capture the structure of small utilized in the sparse coding step of the K-SVD algorithm for
image patches to generate dictionary atoms. In contrast, the solving the following problem:
global dictionaries are generally applied to the entire image or
signal and capture the global features. Incorporating the global s.t. (18)
fixed dictionaries into the proposed method can be advanta-
geous where a prior knowledge about the structure of sources where is a constant and is the noise standard deviation. The
is available. Such combined local and global dictionaries have above well-known sparse recovery problem can be solved using
been used in [22] for single mixture separation. Here, we con- orthogonal matching pursuit (OMP) [27]. The FOCal Underde-
sider the multichannel case and extend our proposed method in termined System Solver (FOCUSS) [28] method is also used to
Section III for this purpose. solve (18) by replacing the -norm with the -norm.


Furthermore, incorporating the noise power in solving (18) and , where denotes the number of ex-
has an important advantage in the dictionary update stage. It tracted patches. Finally, normalization in line 17 costs .
ensures that the dictionary does not learn the existing noise in Since all these calculations are executed for each source (i.e.,
the patches. Consequently, the estimated image using this dic- ), then the total computation cost per each iteration of
tionary would become cleaner, which will later refine the dic- the algorithm would be
tionary atoms in the next iteration. This progressive denoising . It is seen that, due to learning one dictio-
loop is repeated until a clean image is achieved. nary for each individual source, the proposed algorithm would
In the proposed method, however, noise means the portions be computationally demanding for large-scale problems.
of other sources that are mixed with . Hence, we initially con- A possible approach to speed up the algorithm is to learn one
sider a high value for since the portions of other sources dictionary for all the sources and then choose relevant atoms
might be high although the noise is zero, and then gradually for estimating each source. This strategy, however, has not been
reduce it to zero as the iterations evolve. Nevertheless, if the completed yet and is under development. In addition, consid-
observations themselves are noisy, then we should start from a ering faster dictionary learning algorithms rather than K-SVD
higher bound and decrease it toward the actual noise power as can alleviate the problem.
the iterations evolve.
VI. RESULTS
B. Dictionary and Patch Sizes
In the first experiment, we illustrate the results of applying
Unfortunately, not much can be theoretically said about
Algorithm 1 to the mixtures of four sources. A severe case
choosing the optimum dictionary size (or redundancy factor
of image sources with different morphologies was chosen to
). Highly overcomplete dictionaries allow higher sparsity
examine the performance of the proposed method. A 6 4
of the coefficients, although being computationally expensive.
full-rank random column-normalized was used as the mixing
However, it may not be necessary to choose very high redun-
matrix. Five hundred iterations were selected as the stopping
dancy and it depends on the data type and number of training
criterion. The mixtures were noiseless. However, we selected
signals [24]. There have been some reported works allowing a
(see below for the reasoning of this choice) and used
variable dictionary size to find the optimum redundancy factor
a decreasing starting from 10 and reaching 0.01 at the end
such as in [29] and [30]. Such analysis is out of the scope of
of iterations. The patches had 50% overlap. Other parameters
this paper, and we choose a fixed redundancy factor in our
were chosen as follows: , , ,
simulations.
and . In addition, in order to show the advantages of
Patch size is normally chosen depending on the entire image
learning adaptive local dictionaries over the fixed local dic-
size and also the knowledge (if available) about the patterns of
tionaries, we applied Algorithm 1 to the same mixtures while
the actual sources. However, adopting very large patches should
ignoring the dictionary learning part. This way, source sepa-
be avoided as they lead to large dictionaries and also provide
ration is performed using local fixed DCT dictionaries for all
few training signals for the dictionary learning stage. We have
sources. We also applied the following methods for comparison
chosen 8 8 patches for all the experiments, which seems to
purposes: GMCA3 [21] based on wavelet transform, SPICA4
become a standard in the literature.
[13] based on WPT, and SMICA [5] based on Fourier trans-
Furthermore, we normally use overlapping patches as shown
form. Fig. 1 illustrates the results of applying all these methods
to give better results in our simulations and also in the cor-
together with the corresponding MSEs. MSE is calculated as
responding literature [1], [25]. However, we have empirically
MSE , where and are the original
observed that the percentage of overlapping can be adjusted
and recovered images, respectively. Fig. 1 (see the bottom row)
based on the noise level. In noiseless settings, less percentage of
shows that the proposed method could successfully recover the
overlap (e.g., 50%) would be sufficient, whereas in noisy situa-
image sources via adaptive learning of the sparsifying dictio-
tions, full overlapped patches result in the desired performance.
naries. The achieved MSEs, given in Fig. 1, are lowest for the
Full overlapped patches are obtained by one-pixel shifting of
proposed method for all image sources except for the bricklike
the operator , in both vertical and horizontal directions, over
texture. In addition, the learned dictionary atoms, as shown in
the entire image.
Fig. 2, indicate good adaptation to the corresponding sources.
C. Complexity Analysis Other methods do not perform as well as the proposed method.
For instance, as shown in Fig. 1, SPICA has a problem in sep-
The proposed algorithm is more computationally expensive
aration of noiselike texture and Barbara. GMCA has separated
than standard MMCA. It includes K-SVD, which imposes two
all image sources with some interference from other sources.
extra steps of sparse coding and dictionary update. However,
SMICA has perfectly recovered the bricklike texture with the
the complexity of computing is almost the same as in MMCA
lowest MSE among all other methods. However, it has some
since (13) can be computed pixelwise. The detailed analysis of
difficulties in recovering Barbara and cartoon boy images.
the complexity of the proposed method per iteration is as fol-
As another measure on the performance of the proposed
lows. Consider lines 12 and 13 in Algorithm 1, and assume
method, the reconstruction error as a function of the number
further that we use approximate K-SVD [31]. Based on [31],
of iterations is shown in Fig. 3. It is shown in this figure that
these two lines cost , where , for
each . The complexity of computing the residual (line 14) is 3Available at: http://md.cosmostat.org/Home.html
. Lines 15 and 16 respectively cost 4Available at: http://visl.technion.ac.il/bron/spica/


Fig. 2. Obtained dictionaries as a result of applying the proposed method to
the mixtures shown in Fig. 1.

Fig. 3. MSE computed as . Note that the elements of
image sources have amplitude in the range [0, 255].
Fig. 1. Results of applying different methods to six noiseless mixtures.

lead the algorithm to fail. Among the existing related methods,
a monotonic decrease in the value of the separation error is GMCA [21] uses a thresholding scheme to denoise the image
achieved and the asymptotic error is almost zero. sources during the separation. Hence, we applied this algorithm
In the next experiment, the performance of the proposed to the same mixtures to compare its performance with that of
method in noisy situations is evaluated. For this purpose, we the proposed method. We applied SPICA and SMICA to the
generated four mixtures from two image sources, namely, Lena same mixtures too. The results of this experiment are demon-
and boat. Then, Gaussian noise with a standard deviation of strated in Fig. 4. It is shown in Fig. 4(f) that both separation and
15 was added to the mixtures so that the peak signal-to-noise denoising tasks have been successful in the proposed method.
ratios (PSNRs) of the mixtures are equal to 20 dB. (Note that The corresponding learned dictionaries using the proposed
the image size is 128 128.) The noisy mixtures are shown in method, given in Fig. 5(a), confirm these observations. In
Fig. 4(a). We applied the proposed algorithm to these mixtures addition, the proposed method is superior to GMCA, SPICA,
starting with and evolving while was gradually and SMICA, and it denoises the sources with higher PSNRs,
decreasing to . We used fully overlapped patches and as seen by comparing Fig. 4(c)–(f). Although GMCA, SPICA,
200 iterations, with the rest of parameters similar to those in and SMICA have separated the image sources, they are not
the first experiment. One of the considerable advantages of the successful in denoising them.
proposed method is the ability to denoise the sources during In addition, the sparsity rate of ’s, obtained by averaging
the separation. This is because the core idea of the proposed over the support of all (where is the number of
method comes from a denoising task. In contrast, in most patches), for both sources is given in Fig. 5(b). Note that a 4
conventional BSS methods, denoising should be carried out 2 mixing matrix was used in this experiment. This graph
either before or after the separation, which is not ideal and may shows the percentage of used atoms by sparse coefficients,


Fig. 4. (a) Noisy mixtures. (b) Original sources. Separated sources using (c) GMCA, (d) SPICA, (e) SMICA, and (f) the proposed method.

linear combinations of their atoms are sufficient to generate the
image sources of interest.
In another experiment, we evaluated and compared the influ-
ences of adding global sparsity prior to the proposed algorithm.
The experiment settings are as follows. Two images, namely,
texture and Lena, were mixed together using a 4 2 random
. Algorithm 1, which only takes advantage of locally learned
dictionaries, was applied to the mixtures. The proposed algo-
rithm in Section IV (local and global) was also applied to the
same mixtures. We considered discrete wavelet transform and
DCT as global sparsifying bases for Lena and texture, respec-
tively. The methods of FastICA5 [32], JADE [2], and SMICA
were also used in this experiment for comparison. We varied the
noise standard deviation from 0 to 20 to investigate the perfor-
mance of all these methods in dealing with different noise levels.
Similar to [16] and [17], we calculated the mixing matrix crite-
rion (also called Amari criterion [33]) as ,
where is the scaling and permutation matrix.6 The mixing ma-
trix criterion curves are depicted in Fig. 6 as a function of stan-
dard deviation. As it is shown in Fig. 6, in low/moderate noise
situations, the best performance is achieved when both global
and local dictionaries are used. However, the performance of
the proposed method using only local and both local and global
Fig. 5. (a) Obtained dictionaries as a result of applying the proposed method to dictionaries decreases in high noise. Moreover, the recovery re-
the mixtures shown in Fig. 4(a). (b) Percentage of average number of nonzeros sults of JADE and FastICA are less accurate than those of the
of sparse coefficients, equivalent to the occurrence rate of dictionary atoms
sorted based on a descending order. other methods. The performance of SMICA is somewhere be-
tween JADE’s and that of the proposed method.
Next, in order to find an optimal , we investigated the ef-
sorted in descending order, for both dictionaries corresponding fects of choosing different ’s on the recovery quality. As ex-
to Lena and boat. As found from Fig. 5(b), the most frequently pected and also shown in [1], selection of is dependent on the
used atom shows a small percentage of appearance ( 17%), 5Available at: http://research.ics.tkk.fi/ica/fastica/
and other dictionary atoms have been used less frequently. This 6The proposed method only causes signed permutations due to dealing with
means that the learned dictionaries are so efficient that few column-normalized mixing matrices.


TABLE I
DETAILED SIMULATION TIME OF ALGORITHM 1. “TOTAL” MEANS THE TOTAL TIME (IN SECONDS) ELAPSED PER ONE ITERATION

sources. A set of simulations was conducted to numerically
demonstrate this effect and also to evaluate the effects of
changing other dimensions such as patch size and dictionary
size on the simulation time. In Table I, we have given these re-
sults per one iteration of Algorithm 1, where the image patches
had 50% overlap. A desktop computer with a Core 2 Duo CPU
of 3 GHz and 2 GB of RAM was used for this experiment. It
is interesting to note that, as shown in Table I, much of the
computational burden is incurred by the dictionary update
stage (using K-SVD). In addition, the computation of is
far less complicated than both “sparse coding” and “dictionary
update” due to pixelwise operation. This implies that further
Fig. 6. Mixing matrix criterion as a function of noise standard deviation for
different algorithms. Note that the “Global” method is actually equivalent to effort is required to speed up the dictionary learning part of the
the original MMCA in [16]. proposed algorithm.

VII. CONCLUSION
In this paper, the BSS problem has been addressed. The aim
has been to take advantage of sparsifying dictionaries for this
purpose. Unlike the existing sparsity-based methods, we as-
sumed no prior knowledge about the underlying sparsity domain
of the sources. Instead, we have proposed to fuse the learning
of adaptive sparsifying dictionaries for each individual source
into the separation process. Motivated by the idea of image de-
noising via a learned dictionary from the patches of the cor-
rupted image in [1], we proposed a hierarchical approach for this
purpose. Our simulation results on both noisy and noiseless mix-
tures have been encouraging and conﬁrmed the effectiveness
of the proposed approach. However, further work is required
Fig. 7. Mixing matrix criterion as a function of . to speed up the algorithm and make it suitable for large-scale
problems. In addition, the possibility of applying the proposed
method to the underdetermined mixtures (less observations than
noise standard deviation. However, our case is slightly different sources) is on our future plans.
due to the source separation task.7 In our simulations, Gaussian
noises with different power values were added to the mixtures REFERENCES
and the mixing matrix criterion was calculated while varying
[1] M. Elad and M. Aharon, “Image denoising via sparse and redun-
. Fig. 7 represents the achieved results for this experiment. In dant representations over learned dictionaries,” IEEE Trans. Image
this ﬁgure, it can be found that all the curves achieve nearly the Process., vol. 15, no. 12, pp. 3736–3745, Dec. 2006.
same recovery quality for . Therefore, is [2] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal-
ysis. New York: Wiley-Interscience, May 2001.
a reasonable choice. Interestingly, these results and the range of [3] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-
appropriate choices for are very similar to what were achieved negative matrix factorization,” Nature vol. 401, no. 6755, pp. 788–791,
in [1]. Oct. 1999 [Online]. Available: http://dx.doi.org/10.1038/44565
[4] K. Kayabol, E. E. Kuruoglu, and B. Sankur, “Bayesian separation
As already mentioned, it is expected that the computation of images modeled with MRFs using MCMC,” IEEE Trans. Image
time of the proposed method increases for a large number of Process., vol. 18, no. 5, pp. 982–994, May 2009.
[5] J.-F. Cardoso, H. Snoussi, J. Delabrouille, and G. Patanchon, “Blind
7Indeed, we empirically observed better performance by starting with a higher separation of noisy Gaussian stationary sources. Application to cosmic
value of for solving (18) and decreasing it toward the true noise standard microwave background imaging,” in Proc. 11th EUSIPCO, Toulouse,
deviation as the iterations proceed. France, Sep. 2002, pp. 561–564.


[6] Y. Moudden, J.-F. Cardoso, J.-L. Starck, and J. Delabrouille, “Blind [26] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
component separation in wavelet space: Application to CMB analysis,” designing overcomplete dictionaries for sparse representation,” IEEE
EURASIP J. Appl. Signal Process., vol. 2005, pp. 2437–2454, 2005. Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
[7] A. Tonazzini, L. Bedini, E. E. Kuruoglu, and E. Salerno, “Blind [27] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measure-
separation of auto-correlated images from noisy mixtures using MRF ments via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol.
models,” in Proc. Int. Symp. ICA, Nara, Japan, Apr. 2003, pp. 675–680. 53, no. 12, pp. 4655–4666, Dec. 2007.
[8] A. Tonazzini, L. Bedini, and E. Salerno, “A Markov model for blind [28] I. Gorodnitsky and B. Rao, “Sparse signal reconstruction from limited
image separation by a mean-field EM algorithm,” IEEE Trans. Image data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE
Process., vol. 15, no. 2, pp. 473–482, Feb. 2006. Trans. Signal Process., vol. 45, no. 3, pp. 600–616, Mar. 1997.
[9] H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and seg- [29] M. Yaghoobi, T. Blumensath, and M. Davies, “Parsimonious dictio-
mentation of mixed images,” J. Electron. Imaging, vol. 13, no. 2, pp. nary learning,” in Proc. IEEE ICASSP, Apr. 2009, pp. 2869–2872.
349–361, Apr. 2004. [30] S. Tjoa, M. Stamm, W. Lin, and K. Liu, “Harmonic variable-size dic-
[10] E. Kuruoglu, A. Tonazzini, and L. Bianchi, “Source separation in noisy tionary learning for music source separation,” in Proc. IEEE ICASSP,
astrophysical images modelled by Markov random fields,” in Proc. Mar. 2010, pp. 413–416.
ICIP, Oct. 2004, vol. 4, pp. 2701–2704. [31] R. Rubinstein, M. Zibulevsky, and M. Elad, Efficient implementation
[11] M. M. Ichir and A. Mohammad-Djafari, “Hidden Markov models for of the K-SVD algorithm using batch orthogonal matching pursuit Tech-
wavelet-based blind source separation,” IEEE Trans. Image Process., nion, Haifa, Israel, Tech. Rep., 2008.
vol. 15, no. 7, pp. 1887–1899, Jul. 2006. [32] A. Hyvarinen, “Fast and robust fixed-point algorithms for independent
[12] W. Addison and S. Roberts, “Blind source separation with non-sta- component analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp.
tionary mixing using wavelets,” in Proc. Int. Conf. ICA, Charleston, 626–634, May 1999.
SC, 2006. [33] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for
[13] A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, and Y. Y. Zeevi, blind signal separation,” in Advances in Neural Information Processing
“Sparse ICA for blind separation of transmitted and reflected images,” Systems. Cambridge, MA: MIT Press, 1996, pp. 757–763.
Int. J. Imaging Syst. Technol., vol. 15, no. 1, pp. 84–91, 2005.
[14] M. G. Jafari and M. D. Plumbley, “Separation of stereo speech signals
based on a sparse dictionary algorithm,” in Proc. 16th EUSIPCO, Lau-
sanne, Switzerland, Aug. 2008, pp. 25–29.
[15] M. G. Jafari, M. D. Plumbley, and M. E. Davies, “Speech separation
Vahid Abolghasemi (S’08–M’12) received the
using an adaptive sparse dictionary algorithm,” in Proc. Joint Work-
Ph.D. degree in signal and image processing from
shop HSCMA, Trento, Italy, May 2008, pp. 25–28.
the University of Surrey, Guildford, U.K., in 2011.
[16] J. Bobin, Y. Moudden, J. Starck, and M. Elad, “Morphological diver-
He is currently with the School of Engineering
sity and source separation,” IEEE Signal Process. Lett., vol. 13, no. 7,
and Design, Brunel University, Uxbridge, U.K. His
pp. 409–412, Jul. 2006.
main research interests include sparse signal and
[17] J. Bobin, J. Starck, J. Fadili, and Y. Moudden, “Sparsity and mor-
image analysis, dictionary learning, compressed
phological diversity in blind source separation,” IEEE Trans. Image
sensing, and blind source separation.
Process., vol. 16, no. 11, pp. 2662–2674, Nov. 2007.
[18] J. Starck, M. Elad, and D. Donoho, “Redundant multiscale transforms
and their application for morphological component separation,” Adv.
Imaging Electron Phys., vol. 132, pp. 287–348, 2004.
[19] J.-L. Starck, M. Elad, and D. Donoho, “Image decomposition via the
combination of sparse representations and a variational approach,”
IEEE Trans. Image Process., vol. 14, no. 10, pp. 1570–1582, Oct. Saideh Ferdowsi (S’09) received the B.Sc. and M.Sc. degrees in electronic
2005. engineering from Shahrood University of Technology, Shahrood, Iran, in 2005
[20] M. Elad, J. Starck, P. Querre, and D. Donoho, “Simultaneous cartoon and 2007, respectively. She is currently working toward the Ph.D. degree in the
and texture image inpainting using morphological component analysis Faculty of Engineering and Physical Sciences, University of Surrey, Guildford,
(MCA),” Appl. Comput. Harmonic Anal., vol. 19, no. 3, pp. 340–358, U.K.
Nov. 2005. Her main research interests include blind source separation and biomedical
[21] J. Bobin, J.-L. Starck, J. M. Fadili, Y. Moudden, and D. L. Donoho, signal and image processing.
“Morphological component analysis: An adaptive thresholding
strategy,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2675–2681,
Nov. 2007.
[22] G. Peyre, J. Fadili, and J. Starck, “Learning the morphological diver-
sity,” SIAM J. Imaging Sci., vol. 3, no. 3, pp. 646–669, Jul. 2010. Saeid Sanei (SM’05) received the Ph.D. degree in
[23] G. Peyre, J. Fadili, and J.-L. Starck, “Learning adapted dictionaries for biomedical signal and image processing from Impe-
geometry and texture separation,” in Proc. SPIE, 2007, p. 670 11T. rial College London, London, U.K., in 1991.
[24] M. Elad, Sparse and Redundant Representations: From Theory He is currently a Reader in Neurocomputing at the
to Applications in Signal and Image Processing. New York: Faculty of Engineering and Physical Sciences, Uni-
Springer-Verlag, 2010. versity of Surrey, Guildford, U.K. His main area of
[25] N. Shoham and M. Elad, “Algorithms for signal separation exploiting research is signal processing and its application in
sparse representations, with application to texture image separation,” biomedical engineering.
in Proc. IEEEI 25th Conv. Elect. Electron. Eng. Israel, Dec. 2008, pp. Dr. Sanei is an associate editor of IEEE SIGNAL
538–542. PROCESSING LETTERS and EURASIP Journals.

Blind seperation image sources via adaptive dictionary learning

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Similar a Blind seperation image sources via adaptive dictionary learning

Similar a Blind seperation image sources via adaptive dictionary learning (20)

Más de Mohan Raj

Más de Mohan Raj (8)

Último

Último (20)

Blind seperation image sources via adaptive dictionary learning