Adaptive noise estimation algorithm for speech enhancement
1. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
Adaptive Noise Estimation Algorithm for Speech Enhancement
Institute of Electrical and Electronics Engineering
Abstract:
A fast and robust speech noise estimation technique is proposed. The noisy speech is composed using a
critical-band-rate filter bank so that a perceptual modification of Wiener filtering can be applied in
speech denoising. The sub-band noise estimate is updated adaptively using a smoothing parameter that
depends on the estimated signal-to-noise ratio (SNR). This noise estimation technique can give accurate
results even at very low signal-to noise ratios. Speech denoising using perceptually modified Wiener
filtering combined with the proposed noise estimation technique gives enhanced speech of good
quality.
2. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
A: -
(a) Basic Overview of Additive Noise
(b) Basic Overview of Speech Enhancement System
(c) Overview of Spectral Subtraction System
3. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
(d) Two Channel Speech Enhancement
(e) Voice Activity Detection
(f) Block Diagram of Subspace Speech Enhancement System
(g) Block Diagram of Complete Subspace Speech Enhancement with Adaptive Noise Estimation
Algorithm
4. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
(h) Block Diagram of PESQ Algorithm
5. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
B: - Waveforms
(a) Original wave, (b) Noisy (Corrupted) wave, and (c) Enhanced wave.
C: - Spectrograms
Spectrogram of (a) Original wave, (b) Noisy (Corrupted) wave, and (c) Enhanced wave.
6. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
Conclusion:
This thesis has focused on the design, implementation and testing of an adaptive noise estimation
algorithm for signal subspace speech enhancement. This is a novel approach to the subspace method [5]
which traditionally uses voice activity detection to estimate the noise in a signal. The proposed method
requires no voice activity detection and thus can update the noise estimate throughout the signal instead
of being limited to silence intervals. This allows a more accurate noise estimate to be produced and
improves the quality of the enhanced speech.
Objective and subjective tests were carried out to evaluate the success of the proposed algorithm. The
results were compared with those of contemporary speech enhancement systems and were shown to
outperform these systems for the majority of situations. The proposed algorithm was shown to produce
good quality speech in most noise types even at low signal to noise ratios. The proposed system has
potential applications in cellular telephony, audio archive restoration and automatic speech recognition.
All of these applications are heavily reliant on accurate and robust noise estimation to provide high quality
enhanced speech. Thus the proposed method is an ideal speech enhancement algorithm for these
situations.
Future Work:
Recent developments is subspace based speech enhancement, such as Klein and Kabal’s perceptual post
filter [22], and the work of Jabloun and Champagne in [34] have involved the exploitation of auditory
masking properties. The algorithm in this paper does not make use of these properties but they could be
incorporated relatively easily. This could potentially result in a further increase in system performance.
The subspace method is also rather computationally complex. Future work should also focus on the
reduction of this complexity. The discrete cosine transform was been proposed as an alternative to the
computationally complex KLT transform, and single value decomposition is another option for reducing
complexity. This will be significant as speech enhancement algorithms require in real-time implementation
for some applications, with more efficient algorithms allowing less power consumption and processor
usage.
7. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
References:
Ambikairajah, E., Epps, J. and Lin, L. (2001). Wideband speech and audio coding using Gamma tone filter
banks. Proc. ICASSP, pp. 773-776.
Brandenburg, K.B. and Stoll, G.(1994). ISO-MPEG-1 audio: A generic standard for coding of high-quality
digital audio. Journal of the Audio Engineering Society, 42 (10) 780-792.
Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in sub-
bands. Proc. EUROSPEECH'95, Madrid, pp 1513-1516.
Gustafsson, S., Jax, P. and Vary, P. (1998). A novel psychoacoustically motivated audio enhancement
algorithm preserving background noise characteristics. Proc. ICASSP, pp. 397-400.
Lim, J.S. and Oppenheim, A.V. (1979). Enhancement and bandwidth compression of noisy speech. Proc.
of IEEE, 67 (12) 1586-1604.
Lin, L., Ambikairajah, E. and Holmes, W.H. (2001). Auditory filterbank design using masking curves. Proc.
EUROSPEECH, Aalborg, pp. 411-414.
Lin, L., Holmes, W.H. and Ambikairajah, E. (2002). Speech enhancement based on a perceptual
modification of Wiener filtering. Proc. ICSLP, Denver, pp. 781-784.
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum
statistics. IEEE Transactions on Speech and Audio Processing. 9 (5) 504-512.
Virag, N. (1999). Single channel speech enhancement based on masking properties of the human
auditory system. IEEE Transactions on Speech and Audio Processing. 7 (2) 126-82
Additional references:
[1] S.F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on
Acoustics, Speech, Signal Processing, vol. ASSP-27, Apr. 1979
[2] History of Automatic Speech Synthesis and Recognition
http://www.ieee.org/organizations/history_center/sloan/ASSR/assr_index.html
[3] Audio Demonstration: Speech Enhancement for Electronic Hearing Aids
http://www.ind.rwth-aachen.de/research/cochlear/audiodemo.html
[4] S. J. Godsill, P. J. Wolfe, and W. N. W. Fong, “Statistical model-based approaches to audio restoration
and analysis”. Journal of New Music Research, 30(4):323-338, 2001. Special Issue: Conservation,
Restoration and Archiving of Electroacoustic Music.
[5] Y. Ephraim and H.L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE
Transactions on Speech and Audio Processing, vol. 3, July 1995
[6] J.S. Lim and A.V. Oppenheim, “Enhancement and bandwidth compression of Noisy Speech,” Proc.
IEEE, vol. 67, No. 2, pp. 1586-1604, Dec. 1979
[7] M. Berouti, R Schwartz and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc.
8. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 208-211, Apr. 1979
[8] Y Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short-term
spectral amplitude estimator”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-32, No.
6, pp. 1109- 1121, Dec 1984.
[9] P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor (NSS), hidden Markov
models and projection, for robust recognition in cars,” Speech Commun., vol. 11, pp. 215-228, June
1992.
[10] N Virag, “Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory
System,” IEEE Trans. On Speech and Audio Processing, vol. 7, No. 2, March 1999.
[11] K.Brandenburg, G.Stoll, et al., "The ISO/MPEG-Audio Codec: A Generic Standard for Coding of High
Quality Digital Audio," 92nd AES-Convention, preprint 3336, Vienna 1992
[12] S.F. Boll and D.C. Pulsipher, “Suppression of acoustic noise in speech using two microphone adaptive
noise cancellation,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-28, pp. 752-753, Dec. 1980
[13] M. Dorbecker, S. Ernst, “Combination of Two-Channel Spectral Subtraction and Adaptive Wiener
Post-Filtering for Noise Reduction and Dereverberation,”
[14] L.R. Rabiner and M.R. Sambur, “An algorithm for determining the Endpoint of Isolated Utterances,”
The Bell Systems Technical journal, Vol. 54, No.2, pp.297-315, February 1975
[15] R. Martin, “Spectral Subtraction based on Minimum Statistics,” Proc. EUSIPCO, pp. 1182-11185,
1994.
[16] G. Doblinger, “Computationally Efficient Speech Enhancement By Spectral Minima Tracking in
Subbands,” Proc. EuroSpeech, vol. 2, pp 1513- 1516, 1995.
[17] R. Martin, “Noise Power spectral density estimation based on optimal smoothing and minimum
statistics,” IEEE Trans. on Speech and Audio Processing, vol. 9, no. 5, pp. 504-512, July 2001
[18] S. Rangachari, P.C. Loizou and Y. Hu, “A Noise Estimation Algorithm with Rapid Adaptation for
Highly Non-Stationary Environments,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing,
pp. I-305-I-308, May 2004
[19] L. Lin, W.H. Holmes and E. Ambikairajah, “Subband noise estimation for speech enhancement using
a perceptual wiener filter,” Proc. IEEE Int. Conf. on Acoustics, Speech and Audio Processing, pp. I_80 –
I_83, 2003
[20] I. Cohen and B. Berdugo, “Noise Estimation by Minima Controlled Recursive Averaging for Robust
Speech Enhancement,” IEEE Signal Processing Letters, vol. 9, no. 1, pp 12-15, Jan 2002
[21] Y. Bresler and A. Mackovski, “Exact Maximum Likelihood Parameter Estimation of Superimposed
Exponential Signals in Noise” IEEE Trans On Acoustics, Speech and Signal Processing, vol. ASSP-34, no. 5,
pp. 1081-1089, Oct 1986.
[22] M. Klein and P. Kabal, “Signal Subspace Speech enhancement with perceptual post filtering,” Proc.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. I-537-I-540, May 2002
9. Base paper: - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199987
[23] N. Merhav, “The Estimation of Model Order in Exponential Families,” IEEE Trans. Inform.
Theory. vol. 35, pp. 1109-1114, Sept. 1989
[24] S. Gazor and A. Rezayee, “An adaptive KLT approach for Speech Enhancement,” IEEE Trans. on
Speech and Audio Processing, vol. 9, pp. 97- 95, Feb. 2001
[25] E. Wan, A. Nelson, and Rick Peterson, Speech Enhancement Assessment Resource (SpEAR)
Database http://ee.ogi.edu/NSEL/
[26] Noisex-92 database, taken from Signal Processing information base website:
http://spib.rice.edu/spib/select_noise.html
[27] “Subjective Performance Assessment of Telephone-Band Wideband Digital
Codecs,” recommendation ITU-T P.830, International Telecommunication Union, Feb 1996
[28] “Perceptual Evaluation of Speech Quality (PESQ),” recommendation ITU-T P.862, International
Telecommunication Union, Feb. 01
[29]M. Klein, “Signal Subspace Speech Enhancement with Perceptual Post-Filtering,” Master’s
Thesis, McGill University, Montreal, Canada, 2002
[30] N. Ma, M. Bouchard, R.A. Goubran, “Perceptual Kalman Filtering for Speech Enhancement in
Colored Noise,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, May 2004
[31] I. Cohen, “Speech Enhancement using a non-causal a priori SNR Estimator”, IEE Signal
Processing Letters, vol. 11, no. 9, September 2004
[32]T.S. Gunawan, E. Ambikairajah, “Speech Enhancement using Temporal Masking and Fractional Bark
Gammatone Filters,” Proc. 10th Australian International Conference on Speech Science and
Technology, Dec 2004
[33] Opticom website PESQ description: http://www.opticom.de/technology/pesq.html
[34] F. Jabloun and B. Champagne, “Incorporating the Human Hearing Properties in the Signal Subspace
Approach for Speech Enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 11, No.6,
Nov 2003