Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound

Defense
Sound Space Rendering Based on the
Virtual Sphere Model
Graduate School of Information Sciences
System Information Sciences
Acoustic Information System laboratory
Junjie Shi
B7IM2028

Motivation
• Human beings have a remarkable
ability to observe their surroundings
through hearing.
o Hearing enable us to localize sound source in
any direction.
o Listeners can roughly percept the acoustical
environment.
• Immersive experience is asked in
game/movie/virtual reality.
o Audio contents (media contents and spatial
cues) is required to match the visual contents.
o Spatial cues of sound is one of the key to the
feeling of immersion.
Chapter 1: Introduction
Immersion
Game Movie
Virtual reality
Vision
Hearing
2

Previous studies
• Head-Related Impulse Response (HRIR);
Head-Related Transfer Function (HRTF)
Describes how an ear receives a sound from a point in
space.
o Localization cues
 Interaural time difference (ITD)
 Interaural level difference (ILD)
 Spectrum
o Azimuth/Elevation/Distance
• Room Impulse Response (RIR); Room
Transfer Function (RTF)
Characterizes how sound transfer in a room
o Direct sound
o Early reflection
o Late reverberation
3

Previous studies
• Computational room acoustics
o Geometrical room acoustics
Treats sound as ray, approximates the its
reflection paths.
 Image-source method
 Ray tracing
o Physically based room acoustics
Treats sound as wave, simulates the wave
propagation.
 Finite difference time-domain method (FDTD)
 Adaptive rectangular decomposition (ARD)
o Only frequencies up to about 5 kHz (low
frequencies components) are perceptually
critical for acoustics simulation.
4

Previous studies
• Issue of spatial audio in VR
o Recording is not capable in the virtual scene.
 A spatial sound rendering technique designed for virtual sound space is asked.
o Listener’s movement will break the immersion if the sound is static.
 A dynamic spatial sound rendering technique is asked.
o Auditory display based on virtual sphere model (ADVISE)
 For both virtual and real sound scenes.
 Interactive spatial audio: movement, rotation.
 Takane et al.,1997, Kirchhoff-Helmholtz integral equation (KHIE)-based ADVISE
 Tamura et al.,2016, Higher-order ambisonics (HOA)-based ADVISE
6

Previous studies
7

Objective
• Develop and Implement ADVISE
o Complete room acoustics program.
o Integrate room acoustics into ADVISE.
• Structure of this thesis
8
Chapter 5
Chapter 2Chapter 3
Chapter 4

Chapter 2: Review of auditory
display based on the virtual
sphere model (ADVISE)
9

HOA-based ADVISE
Higher order ambisonics (HOA)
• Use a spherical array of speakers to synthesize desired field.
𝑝 𝒓, 𝑘 = 𝑙=1
𝐿
𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘)
o 𝐺(𝒓|𝒓𝑙, 𝑘): Transfer function of sound in free-field.
• To find driving signals that played by speakers.
o 𝑝 𝒓, 𝑘 → 𝐷𝑙 𝑘 ?
Spherical harmonic
• Fourier transform: 𝑓(𝑡) → 𝐹(𝜔)
o Represent a time-varying function by frequencies.
• Spherical harmonic transform: 𝑝 𝒓 → 𝐴 𝑛
𝑚
o Represent a space-varying function by directions.
Chapter 2: Review of ADVISE
10

HOA-based ADVISE
• Spherical harmonic representation
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚 𝑘 𝑌𝑛
𝑚(𝜃, 𝜙)
o 𝑝 𝑟, 𝜃, 𝜙, 𝑘 : sound pressure in spherical coordinate.
o 𝑛: order of spherical harmonic.
o 𝐴 𝑛
𝑚 𝑘 : spherical harmonic coefficients.
o 𝑗 𝑛(𝑘𝑟): spherical Bessel function of the first kind.
o 𝑌𝑛
𝑚(𝜃, 𝜙): spherical harmonic.
• Mode matching method
𝑝 𝒓, 𝑘 = 𝑙=1
𝐿
𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘)
𝑫=𝜳† 𝑨
o 𝜳†: inverse transfer matrix.
11

Binaural rendering
HRTF (head-related transfer function)
For the right ear: 𝐻 𝑅 𝒓𝑖, 𝜔 =
𝑃 𝑅 𝒓𝑖,𝜔
𝑃 𝑂 𝒓𝑖,𝜔
𝒓𝑖: position of source
𝑃𝑂: sound pressure at sphere center
𝑃𝑅: sound pressure at right ear
Binaural signal: Sum up responses at ears of all speakers.
𝑝 𝑅 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻 𝑅(𝒓ℓ, 𝜔)
𝑝 𝐿 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻𝐿(𝒓ℓ, 𝜔)
Head rotation: shift HRTF.
12

Chapter 3: Review of adaptive
rectangular decomposition
13

Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2 𝛻2 𝑝 = 𝑓
o 𝛻2: Laplace operator, 𝛻2 𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated
along time by applying finite difference approximation.
• Limitation of FDTD
1. Error introduced by finite approximation leads to numerical dispersion of simulation.
2. Higher sampling rate than the Nyquist sampling rate (10~20 times of desired frequency) is
required for faithful results.
3. Increase sampling rate 𝑛 times requires 𝑛3 times of memory usage and consumes 𝑛4 times
of computation time.
Chapter 3: Review of adaptive rectangular decomposition
14

Adaptive rectangular decomposition (ARD)
1. Decompose an arbitrary space to rectangular parts.
2. Update sound field in each part using the analytical solution which allows a
error-free sound field updating.
3. Put all parts together and let wave pass through the boundaries.
o Boundary between two partitions is assumed perfectly reflective but should be transparent.
o Boundary condition assumption is compensated by applying imaginary sources close to the
boundary.
15
*Detailed formula of ARD in appendix (pg. 34-35).

ARD program
16

Chapter 4: Spherical harmonic
representation of generated sound
fields
17

Room model ➔ 𝑝 𝑥, 𝑦, 𝑧, 𝜔 Room acoustics (Chap. 3)
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔 ➔ 𝐷𝑙(𝜔) Sound field mapping (Chap. 2)
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝑘 ➔ 𝐴 𝑛
𝑚 𝑘 ➔ 𝐷𝑙(𝜔) ➔ 𝑝 𝐿(𝜔) & 𝑝 𝑅(𝜔) Binaural rendering
Introduction
18
Chapter 4: Spherical harmonic representation of generated sound fields
𝑝 𝑥, 𝑦, 𝑧, 𝜔 ➔ 𝑚 𝜂 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔
A formula that derives the spherical harmonic
coefficients 𝐴 𝑛
𝑚 𝜔 from generated sound field
𝑝 𝑥, 𝑦, 𝑧, 𝜔 is proposed.

Formulation
• 𝑝 𝑥, 𝑦, 𝑧, 𝜔 ➔ 𝑚 𝜂 𝜔 : 3D discrete cosine transformation (3D DCT)
o 𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
• 𝑚 𝜂 𝜔 ➔ 𝐴 𝑛
𝑚 𝜔
o 𝐴 𝑛
𝑚
𝜔 = 4𝜋𝑖 𝑛
𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒅
𝑌𝑛
𝑚∗
𝑘 𝜂,𝑁
o 𝒅: displacement vector points from space corner to listener’s position.
• Strategy to listener’s movement
o Adjust 𝑒 𝑖𝒌 𝜂,ℓ 𝒅
based on the listener’s position.
19
*Detailed derivation in appendix (pg. 36).
𝑴 = DCT(𝑷)

Numerical experiments
• (3 m, 3 m, 3 m) rectangular space, discretized every 0.1 m.
• Monopole source: (1.5 m, 60°, 0°), 1000 Hz.
• 256 Speakers located on the dash line sphere.
20
Maximum error at
10 cm sphere
−37 dB
Maximum error at
20 cm sphere
−26 dB
• Synthesis error is as small as −37 dB within
a volume comparable to human head size,
which is imperceptible.

Numerical experiments
21
Strategy to listener’s movement
𝐴 𝑛
𝑚
𝜔
= 4𝜋𝑖 𝑛
𝑚 𝜂 𝜔
1
8
ℓ=1
8
𝑌𝑛
𝑚∗
𝑘 𝜂,𝑁

Chapter 5: Implementation of
ADVISE
22

Demonstration
• Scene:
o A highly simplified music hall
o 𝑉 = 7600 m3, 𝑆 = 2720 m2
23
Chapter 5: Implementation of ADVISE
30 m
20 m
30 m
10 m
listener
source

Demonstration
• Scene in Unity
o The listener is surround by a
spherical speaker array.
o Virtual secondary speakers are
represented by the green spheres.
24

Demonstration
• Simulation results
o Maximum reliable frequency of reverberation: 1700 Hz
25
Absorption
coefficients
Reverb
time
Processed sound
Music Speech
original -
≈0.2 ≈2.0 s

Demonstration
• Computational aspects
26
Maximum
reliable
frequency
Sampling properties
Total samples
on the scene
Total time
steps
(per sec)
Theoretical FLOP
requirement
Actual time
cost of 2 sec
simulation
3400 Hz
𝑑ℎ = 5 × 10−2
m
𝑑𝑡 = 6.25 × 10−5
s
60.8 M 16000
per step: 6 G FLOP
per sec: 96 T FLOP
19.2 hr
1700 Hz
𝑑ℎ = 10 × 10−2
m
𝑑𝑡 = 1.25 × 10−4
s
7.6 M 8000
per step: 0.7 G FLOP
per sec: 5.6 T FLOP
1.5 hr
Intel(R) Core(TM) i7-8086K
62.12 G FLOPS
5.18 G FLOPS/core
NVIDIA GeForce GTX 1080 Ti
11.3 T FLOPS
*FLOPS: FLOP per second.

Conclusions
1. Chapter 3: Build a 3D sound propagation simulator based on ARD.
2. Chapter 4: Derive an formula that integrate room acoustics into ADVISE.
3. Chapter 5: Give an implementation of ADVISE.
Even for a large scene like a music hall, It is theoretical possible for a real time
sound space rendering using ADVISE.
27
Chapter 6: Conclusions

Response to comments in pre-defense
• What role does ARD play in ADVISE?
o ARD serves to generate the sound field in the given scene. The generated sound field is further
used to render the spatial audio. As a computational room acoustics method, ARD enable
ADVISE to render any sound space, whether it exists or not.
• How long does ARD takes for simulating the seminar room?
o A two seconds simulation; Seminar room size: 6 m * 6 m * 3m.
o The theoretical computational cost is 60 G FLOPS for a reliable simulation up to 1700 Hz and
960 G FLOPS for 3400 Hz.
o Our ARD program costs around 2 minutes with frequencies up to 1700 Hz and 21 minutes
with frequencies up to 3400 Hz.
28

KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s
gradient on its surface
𝑃(𝒓 𝟎, 𝑘) = Γ
𝐺(𝒓0|𝒓, 𝑘)
𝜕𝑃(𝒓,𝑘)
𝜕𝑛
− 𝑃(𝒓, 𝑘)
𝜕𝐺(𝒓0|𝒓,𝑘)
𝜕𝑛
𝑑Γ
o 𝑘: wave number, 𝑘 =
𝜔
𝑐
, 𝜔 denotes angular, 𝑐 is speed of sound.
o 𝑃(𝒓0, 𝑘): sound pressure at 𝒓0.
o 𝐺(𝒓0|𝒓, 𝑘): free-field Green function from 𝒓 to 𝒓0, 𝐺 𝒓0|𝒓, 𝑘 =
𝑒 𝑖𝑘 𝒓0−𝒓
𝒓0−𝒓
.
• Discretization of KHIE
𝑃(𝒓 𝟎, 𝑘)
≈
𝑖=1
𝑁
𝐺(𝒓 𝟎|𝒓𝒊, 𝑘)
𝑃 𝒓 𝒊
+
,𝑘 −𝑃 𝒓 𝒊
−
,𝑘
𝛿 𝑖
− 𝑃(𝒓𝒊, 𝑘) 𝐺(𝒓 𝟎|𝒓 𝒊
+
,𝑘)−𝐺(𝒓 𝟎|𝒓 𝒊
−
,𝑘)
𝛿 𝑖
∆𝑆𝑖
o Use 𝟑𝑵 secondary sources to reproduce inside sound field.
𝒓0
30

KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s gradient on its surface
• KHIE-based ADVISE can reproduce 2D sound field with high accuracy, but is unstable when
reproducing 3D sound field.
Reproduction error of 2D field Reproduction error of 3D field
31
𝑁: division number
on the surface

HOA-based ADVISE
• Sound field reproduction using HOA
o 252 secondary sources located on a 1 m sphere.
o 1000 Hz monopole source located at (1.5,60°, 0°).
o Reproduction error is less than −20 dB when distance is less than 0.5 m.
Ideal field Reproduced field Reproduction error
32

Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2
𝛻2
𝑝 = 𝑓
o 𝛻2
: Laplace operator, 𝛻2
𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated along time
by applying finite difference approximation.
𝜕2 𝑝
𝜕𝑡2 =
𝑝(𝑡+1)−2𝑝(𝑡)+𝑝(𝑡−1)
Δ𝑡2
𝜕2 𝑝
𝜕𝑥2 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
 1D FDTD update formula:
𝑝 𝑥, 𝑡 + 1 = 𝑓 + 𝑐2 𝑝 𝑥+1,𝑡 −2𝑝 𝑥,𝑡 +𝑝 𝑥−1,𝑡
Δ𝑥2 Δ𝑡2
+ 2𝑝 𝑥, 𝑡 − 𝑝(𝑥, 𝑡 − 1)
33

• Normal modes in rectangular space with rigid boundaries
𝑝 𝑥, 𝑦, 𝑧, 𝑡 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝑡) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
o 𝑝 𝑥, 𝑦, 𝑧, 𝑡 : sound pressure sampled in Cartesian coordinate.
o 𝑚 𝜂: mode coefficients rectangular room.
o 𝜂 𝑥, 𝜂 𝑦, 𝜂 𝑧: index of discretized space, 𝜂 𝑥 = 1,2, … ,
𝑙𝑥
Δ𝑥
.
o The formulation can be interpreted as discrete cosine transformation
𝑷 = iDCT(𝑴) ⟺ 𝑴 = DCT(𝑷)
• Update formula of mode coefficients 𝑚 𝜂(𝑡)
𝜕2 𝑀 𝜂
𝜕𝑡2 − 𝑐2 𝑘 𝜂
2 𝑀 𝜂 = DCT(𝑓)
o 𝑘 𝜂
2 = 𝜋2( 𝜂 𝑥
2
𝑙 𝑥
2 +
𝜂 𝑦
2
𝑙 𝑦
2 +
𝜂 𝑧
2
𝑙 𝑧
2 )
34
𝑙 𝑥
𝑙 𝑦
𝑙 𝑧
𝑥
𝑧
𝑦

• Interface handling
o Rigid boundary condition: 𝑝 𝑥 = 𝑝(𝑥 + 1)
o Finite difference close to rigid boundary
𝑆 𝑥
0 =
𝑝(𝑥)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Finite difference of propagation
𝑆 𝑥 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Residual term
𝑆 𝑥
′ = 𝑆 𝑥 − 𝑆 𝑥
0 =
𝑝(𝑥+1)−𝑝(𝑥)
Δ𝑥2
35
𝑝 𝑥 − 1 𝑝 𝑥 𝑝 𝑥 + 1

Derivation of spherical harmonic coefficients
• 3D discrete cosine transformation on a rectangular space sound field
𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
• Plane wave representation of sound fields
𝑝 𝒙, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒙
)
• Coordinate transformation and displacement compensation
𝑝 𝒓, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒓+𝒅
)
• Plane wave expansion and matching
𝑒 𝑖𝒌𝒓
= 𝑛=0
∞
𝑚=−𝑛
𝑛
4𝜋𝑖 𝑛
𝑗 𝑛 𝑘𝑟 𝑌𝑛
𝑚∗
𝑘 𝑌𝑛
𝑚
( 𝑟)
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚 𝑘 𝑌𝑛
𝑚(𝜃, 𝜙)
𝐴 𝑛
𝑚 𝜔 = 4𝜋𝑖 𝑛
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑌𝑛
𝑚∗ 𝑘 𝜂,𝑁
𝒙 =
𝑥
𝑦
𝑧
, 𝑘 𝜂𝑥 = 𝜋𝜂 𝑥
𝑙 𝑥
, 𝑘 𝜂𝑦 =
𝜋𝜂 𝑦
𝑙 𝑦
, 𝑘 𝜂𝑧 = 𝜋𝜂 𝑧
𝑙 𝑧
, 𝑘 𝜂,ℓ =
𝑘 𝜂𝑥 𝑘 𝜂𝑦 𝑘 𝜂𝑧
𝑘 𝜂𝑥
⋮
𝑘 𝜂𝑦
⋱
−𝑘 𝜂𝑧
⋮
−𝑘 𝜂𝑥 −𝑘 𝜂𝑦 −𝑘 𝜂𝑧
.
𝒐
𝒅
𝒐′
𝒙
𝒚
𝒛
36
𝑴 = DCT(𝑷)

Binaural rendering
Sound field mapping
ARDImpulse
signal
Room
model
Audio clip
RIRs
ℝ 𝑁 𝑥×𝑁 𝑦×𝑁 𝑧×𝑁𝑡
RTFs
ℂ 𝑁 𝑥×𝑁 𝑦×𝑁 𝑧×𝑁 𝑓
Spherical array driving
signal TFs
ℂ 𝑁 𝑠𝑝𝑒𝑎𝑘𝑒𝑟𝑠×𝑁 𝑓
Spherical array driving
signals
HRIRs
Binaural
audios
Flowchart of the implementation

Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound

Similar a Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound (20)

Último

Último (20)

Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound

Notas del editor