Shoichi Koyama, Naoki Murata, and Hiroshi Saruwatari. "Super-resolution in sound field recording and reproduction based on sparse representation"
presented at 5th Joint Meeting Acoustical Society of America and Acoustical Society of Japan (28 Nov. - 2 Dec. 2016, Honolulu, USA)
Selaginella: features, morphology ,anatomy and reproduction.
Koyama ASA ASJ joint meeting 2016
1. Super-resolution in sound field recording and
reproduction based on sparse representation
Shoichi Koyama1,2, Naoki Murata1, and Hiroshi Saruwatari1
1The University of Tokyo
2Paris Diderot University / Institute Langevin
2. November 29, 2016
Sound field reproduction for audio system
Microphone array Loudspeaker array
Large listening area can be achieved
Listeners can perceive source distance
Real-time recording and reproduction can be achieved
without recording engineers
Recording area Target area
3. November 29, 2016
Sound field reproduction for audio system
Microphone array Loudspeaker array
Telecommunication system
NW
Home Theatre
Live broadcasting
Applications
Recording area Target area
4. November 29, 2016
Sound field reproduction for audio system
Microphone array Loudspeaker array
Improve reproduction accuracy when # of array elements is small
# of microphones > # of loudspeakers
– Higher reproduction accuracy within local region of target area
# of microphones < # of loudspeakers
– Higher reproduction accuracy of sources in local region of recording area
[Koyama+ IEEE JSTSP2015], [Koyama+ ICASSP 2014, 2015]
[Ahrens+ AES Conv. 2010], [Ueno+ ICASSP 2017 (submitted)]
Recording area Target area
5. Sound Field Recording and Reproduction
November 29, 2016
Recording area Target area
Obtain driving signals of secondary sources (= loudspeakers)
arranged on to reconstruct desired sound field inside
Inherently, sound pressure and its gradient on is required to obtain
, but sound pressure is usually only known
Signal conversion for sound field recording and reproduction with ordinary
acoustic sensors and transducers is necessary
Primary
sources
6. November 29, 2016
Conventional: WFR filtering method
Recording area Target area
Secondary source planeReceiving plane
Primary
sources
Signal
conversion
[Koyama+ IEEE TASLP 2013]
Received
signals
Driving signals
Plane wave Plane wave
Each plane wave determines entire sound field
Signal conversion can be achieved in spatial frequency domain
7. November 29, 2016
Conventional: WFR filtering method
Target area
Received
signals
Driving signals
Plane wave Plane wave
Each plane wave determines entire sound field
Spatial aliasing artifacts due to plane wave decomposition
Significant error at high freq even when microphone < loudspeaker
Recording area
Signal
conversion
Secondary source planeReceiving plane
Primary
sources
[Koyama+ IEEE TASLP 2013]
8. Sound field representation for super-resolution
Plane wave decomposition suffers from spatial aliasing artifacts because
many basis functions are used
Observed signals should be represented by a few basis functions for accurate
interpolation of sound field
Appropriate basis function may be close to pressure distribution originating
from sound sources
To obtain driving signals of loudspeakers, basis functions must be
fundamental solutions of Helmholtz equation (e.g. Green functions)
November 29, 2016
Basis functionReceived
signals
Sound field decomposition into fundamental solutions
of Helmholtz equation is necessary
Sound field decomposition
9. Generative model of sound field
Inhomogeneous and homogeneous Helmholtz eq. Distribution of
source components
November 29, 2016
[Koyama+ ICASSP 2014]
Sound field is divided into two regions
10. Generative model of sound field
Inhomogeneous and homogeneous Helmholtz eq.
November 29, 2016
[Koyama+ ICASSP 2014]
Green’s function
Inhomogeneous + homogeneous terms
Plane wave
11. November 29, 2016
Generative model of sound field
Observe sound pressure distribution on plane
Conversion into driving signals
Synthesize monopole sources [Spors+ AES Conv. 2008]
Ambient componentsDirect source components
Applying WFR filtering method [Koyama+ IEEE TASLP 2013]
Decomposition into two components can lead to
higher reproduction accuracy above spatial Nyquist freq
12. November 29, 2016
Sparse sound field representation
・・・・・・・・
Microphone array
Source components
Grid points Sparsity-based signal decomposition
Discretization
Ambient components
Dictionary matrix of Green’s functions
Observed signal Distribution of source components
A few elements of has non-zero values
under the assumption of spatially sparse source distribution
13. Sparse signal decomposition
Sparse signal representation in vector form
Signal decomposition based on sparsity of
November 29, 2016
Minimize -norm of
14. Group sparsity based on physical properties
November 29, 2016
Group sparse signal models for robust decomposition
• Multiple time frames
• Temporal frequencies
• Multipole components
Decomposition algorithm extending FOCUSS
[Koyama+ ICASSP 2015]
Sparse signal representation in vector form Structure of sparsity induced
by physical properties
15. Block diagram of signal conversion
Decomposition stage
– Group sparse decomposition of
Reconstruction stage
– and are respectively converted into driving signals
– is obtained as sum of two components
November 29, 2016
16. Simulation Experiment
Proposed method (Proposed), WFR filtering method (WFR), and Sound
Pressure Control method (SPC) were compared
32 microphones (6 cm intervals)and 48 loudspeakers (4 cm intervals)
: Rectangular region of 2.4x2.4 m, Grid points: (10cm, 20cm) intervals
Source directivity: unidirectional
Source signal: single frequency sinewave
Recording area Target area
November 29, 2016
17. Simulation Experiment
Signal-to-distortion ratio of reproduction (SDRR)
Recording area Target area
November 29, 2016
Original pressure distribution
Reproduced pressure distribution
18. November 29, 2016
Frequency vs. SDR
SDRRs above spatial Nyquist frequency were improved
Source location: (-0.32, -0.84, 0.0) m
19. Reproduced sound pressure distribution (1.0 kHz)PressureError
November 29, 2016
Proposed WFR SPC
18.1 dB 18.0 dB 19.4 dB
Source location: (-0.32, -0.84, 0.0) m
SDRR:
20. Reproduced sound pressure distribution (4.0 kHz)PressureError
November 29, 2016
19.7 dB 6.8 dB 7.8 dB
Proposed WFR SPC
Source location: (-0.32, -0.84, 0.0) m
SDRR:
21. Frequency response of reproduced sound field
November 29, 2016
Frequency response at (0.0, 1.0, 0.0) m
Reproduced frequency response was improved
22. Conclusion
Super-resolution sound field recording and
reproduction based on sparse representation
– Conventional plane wave decomposition is suffered from
spatial aliasing artifacts
– Sound field representation using source and plane wave
components
– Sound field decomposition based on spatial sparsity of
source components
– Group sparsity based on physical properties of sound field
– Experimental results indicated that reproduction accuracy
above spatial Nyquist frequency can be improved
November 29, 2016
Thank you for your attention!
23. Related publications
• S. Koyama and H. Saruwatari, “Sound field decomposition in reverberant environment
using sparse and low-rank signal models,” Proc. IEEE ICASSP, 2016.
• N. Murata, S. Koyama, et al. “Sparse sound field decomposition with multichannel
extension of complex NMF,” Proc. IEEE ICASSP, 2016.
• S. Koyama, et al. “Sparse sound field decomposition using group sparse Bayesian
learning,” Proc. APSIPA ASC, 2015.
• N. Murata, S. Koyama, et al. “Sparse sound field decomposition with parametric
dictionary learning for super-resolution recording and reproduction,” Proc. IEEE
CAMSAP, 2015.
• S. Koyama, et al. “Source-location-informed sound field recording and reproduction
with spherical arrays,” Proc. IEEE WASPAA, 2015.
• S. Koyama, et al. “Source-location-informed sound field recording and reproduction,”
IEEE J. Sel. Topics Signal Process., vol. 9, no. 5, pp. 881-894, 2015.
• S. Koyama, et al. “Structured sparse signal models and decomposition algorithm for
super-resolution in sound field recording and reproduction,” Proc. IEEE ICASSP, 2015.
• S. Koyama, et al. “Sparse sound field representation in recording and reproduction for
reducing spatial aliasing artifacts,” Proc. IEEE ICASSP, 2014.
November 29, 2016
Notas del editor
These are the reproduced sound pressure and time-averaged squared error distributions of the three methods at 1 kHz. The white region indicates the region of high reproduction accuracy. Because 1.0 kHz is below the spatial Nyquist frequency, the sound field was accurately reproduced using these three methods.