Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound
Similar a Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound
Trailing edge and Tip Noise mechanism from wind turbine blades vasishta bhargava
Similar a Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound (20)
Software and Systems Engineering Standards: Verification and Validation of Sy...
Defense - Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound space rendering based on the virtual Sound
1. Defense
Sound Space Rendering Based on the
Virtual Sphere Model
Graduate School of Information Sciences
System Information Sciences
Acoustic Information System laboratory
Junjie Shi
B7IM2028
2. Motivation
• Human beings have a remarkable
ability to observe their surroundings
through hearing.
o Hearing enable us to localize sound source in
any direction.
o Listeners can roughly percept the acoustical
environment.
• Immersive experience is asked in
game/movie/virtual reality.
o Audio contents (media contents and spatial
cues) is required to match the visual contents.
o Spatial cues of sound is one of the key to the
feeling of immersion.
Chapter 1: Introduction
Immersion
Game Movie
Virtual reality
Vision
Hearing
2
3. Previous studies
• Head-Related Impulse Response (HRIR);
Head-Related Transfer Function (HRTF)
Describes how an ear receives a sound from a point in
space.
o Localization cues
Interaural time difference (ITD)
Interaural level difference (ILD)
Spectrum
o Azimuth/Elevation/Distance
• Room Impulse Response (RIR); Room
Transfer Function (RTF)
Characterizes how sound transfer in a room
o Direct sound
o Early reflection
o Late reverberation
Chapter 1: Introduction
3
4. Previous studies
• Computational room acoustics
o Geometrical room acoustics
Treats sound as ray, approximates the its
reflection paths.
Image-source method
Ray tracing
o Physically based room acoustics
Treats sound as wave, simulates the wave
propagation.
Finite difference time-domain method (FDTD)
Adaptive rectangular decomposition (ARD)
o Only frequencies up to about 5 kHz (low
frequencies components) are perceptually
critical for acoustics simulation.
Chapter 1: Introduction
4
5. Previous studies
• Issue of spatial audio in VR
o Recording is not capable in the virtual scene.
A spatial sound rendering technique designed for virtual sound space is asked.
o Listener’s movement will break the immersion if the sound is static.
A dynamic spatial sound rendering technique is asked.
o Auditory display based on virtual sphere model (ADVISE)
For both virtual and real sound scenes.
Interactive spatial audio: movement, rotation.
Takane et al.,1997, Kirchhoff-Helmholtz integral equation (KHIE)-based ADVISE
Tamura et al.,2016, Higher-order ambisonics (HOA)-based ADVISE
Chapter 1: Introduction
6
7. Objective
• Develop and Implement ADVISE
o Complete room acoustics program.
o Integrate room acoustics into ADVISE.
• Structure of this thesis
Chapter 1: Introduction
8
Chapter 5
Chapter 2Chapter 3
Chapter 4
8. Chapter 2: Review of auditory
display based on the virtual
sphere model (ADVISE)
9
9. HOA-based ADVISE
Higher order ambisonics (HOA)
• Use a spherical array of speakers to synthesize desired field.
𝑝 𝒓, 𝑘 = 𝑙=1
𝐿
𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘)
o 𝐺(𝒓|𝒓𝑙, 𝑘): Transfer function of sound in free-field.
• To find driving signals that played by speakers.
o 𝑝 𝒓, 𝑘 → 𝐷𝑙 𝑘 ?
Spherical harmonic
• Fourier transform: 𝑓(𝑡) → 𝐹(𝜔)
o Represent a time-varying function by frequencies.
• Spherical harmonic transform: 𝑝 𝒓 → 𝐴 𝑛
𝑚
o Represent a space-varying function by directions.
Chapter 2: Review of ADVISE
10
10. HOA-based ADVISE
• Spherical harmonic representation
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚 𝑘 𝑌𝑛
𝑚(𝜃, 𝜙)
o 𝑝 𝑟, 𝜃, 𝜙, 𝑘 : sound pressure in spherical coordinate.
o 𝑛: order of spherical harmonic.
o 𝐴 𝑛
𝑚 𝑘 : spherical harmonic coefficients.
o 𝑗 𝑛(𝑘𝑟): spherical Bessel function of the first kind.
o 𝑌𝑛
𝑚(𝜃, 𝜙): spherical harmonic.
• Mode matching method
𝑝 𝒓, 𝑘 = 𝑙=1
𝐿
𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘)
𝑫=𝜳† 𝑨
o 𝜳†: inverse transfer matrix.
Chapter 2: Review of ADVISE
11
11. Binaural rendering
HRTF (head-related transfer function)
For the right ear: 𝐻 𝑅 𝒓𝑖, 𝜔 =
𝑃 𝑅 𝒓𝑖,𝜔
𝑃 𝑂 𝒓𝑖,𝜔
𝒓𝑖: position of source
𝑃𝑂: sound pressure at sphere center
𝑃𝑅: sound pressure at right ear
Binaural signal: Sum up responses at ears of all speakers.
𝑝 𝑅 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻 𝑅(𝒓ℓ, 𝜔)
𝑝 𝐿 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻𝐿(𝒓ℓ, 𝜔)
Head rotation: shift HRTF.
Chapter 2: Review of ADVISE
12
13. Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2 𝛻2 𝑝 = 𝑓
o 𝛻2: Laplace operator, 𝛻2 𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated
along time by applying finite difference approximation.
• Limitation of FDTD
1. Error introduced by finite approximation leads to numerical dispersion of simulation.
2. Higher sampling rate than the Nyquist sampling rate (10~20 times of desired frequency) is
required for faithful results.
3. Increase sampling rate 𝑛 times requires 𝑛3 times of memory usage and consumes 𝑛4 times
of computation time.
Chapter 3: Review of adaptive rectangular decomposition
14
14. Adaptive rectangular decomposition (ARD)
1. Decompose an arbitrary space to rectangular parts.
2. Update sound field in each part using the analytical solution which allows a
error-free sound field updating.
3. Put all parts together and let wave pass through the boundaries.
o Boundary between two partitions is assumed perfectly reflective but should be transparent.
o Boundary condition assumption is compensated by applying imaginary sources close to the
boundary.
Chapter 3: Review of adaptive rectangular decomposition
15
*Detailed formula of ARD in appendix (pg. 34-35).
17. Room model ➔ 𝑝 𝑥, 𝑦, 𝑧, 𝜔 Room acoustics (Chap. 3)
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔 ➔ 𝐷𝑙(𝜔) Sound field mapping (Chap. 2)
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝑘 ➔ 𝐴 𝑛
𝑚 𝑘 ➔ 𝐷𝑙(𝜔) ➔ 𝑝 𝐿(𝜔) & 𝑝 𝑅(𝜔) Binaural rendering
Introduction
18
Chapter 4: Spherical harmonic representation of generated sound fields
𝑝 𝑥, 𝑦, 𝑧, 𝜔 ➔ 𝑚 𝜂 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔
A formula that derives the spherical harmonic
coefficients 𝐴 𝑛
𝑚 𝜔 from generated sound field
𝑝 𝑥, 𝑦, 𝑧, 𝜔 is proposed.
18. Formulation
• 𝑝 𝑥, 𝑦, 𝑧, 𝜔 ➔ 𝑚 𝜂 𝜔 : 3D discrete cosine transformation (3D DCT)
o 𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
• 𝑚 𝜂 𝜔 ➔ 𝐴 𝑛
𝑚 𝜔
o 𝐴 𝑛
𝑚
𝜔 = 4𝜋𝑖 𝑛
𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒅
𝑌𝑛
𝑚∗
𝑘 𝜂,𝑁
o 𝒅: displacement vector points from space corner to listener’s position.
• Strategy to listener’s movement
o Adjust 𝑒 𝑖𝒌 𝜂,ℓ 𝒅
based on the listener’s position.
19
Chapter 4: Spherical harmonic representation of generated sound fields
*Detailed derivation in appendix (pg. 36).
𝑴 = DCT(𝑷)
19. Numerical experiments
• (3 m, 3 m, 3 m) rectangular space, discretized every 0.1 m.
• Monopole source: (1.5 m, 60°, 0°), 1000 Hz.
• 256 Speakers located on the dash line sphere.
Chapter 4: Spherical harmonic representation of generated sound fields
20
Maximum error at
10 cm sphere
−37 dB
Maximum error at
20 cm sphere
−26 dB
• Synthesis error is as small as −37 dB within
a volume comparable to human head size,
which is imperceptible.
22. Demonstration
• Scene:
o A highly simplified music hall
o 𝑉 = 7600 m3, 𝑆 = 2720 m2
23
Chapter 5: Implementation of ADVISE
30 m
20 m
30 m
10 m
listener
source
23. Demonstration
• Scene in Unity
o The listener is surround by a
spherical speaker array.
o Virtual secondary speakers are
represented by the green spheres.
24
Chapter 5: Implementation of ADVISE
24. Demonstration
• Simulation results
o Maximum reliable frequency of reverberation: 1700 Hz
25
Chapter 5: Implementation of ADVISE
Absorption
coefficients
Reverb
time
Processed sound
Music Speech
original -
≈0.2 ≈2.0 s
25. Demonstration
• Computational aspects
26
Chapter 5: Implementation of ADVISE
Maximum
reliable
frequency
Sampling properties
Total samples
on the scene
Total time
steps
(per sec)
Theoretical FLOP
requirement
Actual time
cost of 2 sec
simulation
3400 Hz
𝑑ℎ = 5 × 10−2
m
𝑑𝑡 = 6.25 × 10−5
s
60.8 M 16000
per step: 6 G FLOP
per sec: 96 T FLOP
19.2 hr
1700 Hz
𝑑ℎ = 10 × 10−2
m
𝑑𝑡 = 1.25 × 10−4
s
7.6 M 8000
per step: 0.7 G FLOP
per sec: 5.6 T FLOP
1.5 hr
Intel(R) Core(TM) i7-8086K
62.12 G FLOPS
5.18 G FLOPS/core
NVIDIA GeForce GTX 1080 Ti
11.3 T FLOPS
*FLOPS: FLOP per second.
26. Conclusions
1. Chapter 3: Build a 3D sound propagation simulator based on ARD.
2. Chapter 4: Derive an formula that integrate room acoustics into ADVISE.
3. Chapter 5: Give an implementation of ADVISE.
Even for a large scene like a music hall, It is theoretical possible for a real time
sound space rendering using ADVISE.
27
Chapter 6: Conclusions
27. Response to comments in pre-defense
• What role does ARD play in ADVISE?
o ARD serves to generate the sound field in the given scene. The generated sound field is further
used to render the spatial audio. As a computational room acoustics method, ARD enable
ADVISE to render any sound space, whether it exists or not.
• How long does ARD takes for simulating the seminar room?
o A two seconds simulation; Seminar room size: 6 m * 6 m * 3m.
o The theoretical computational cost is 60 G FLOPS for a reliable simulation up to 1700 Hz and
960 G FLOPS for 3400 Hz.
o Our ARD program costs around 2 minutes with frequencies up to 1700 Hz and 21 minutes
with frequencies up to 3400 Hz.
28
29. KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s
gradient on its surface
𝑃(𝒓 𝟎, 𝑘) = Γ
𝐺(𝒓0|𝒓, 𝑘)
𝜕𝑃(𝒓,𝑘)
𝜕𝑛
− 𝑃(𝒓, 𝑘)
𝜕𝐺(𝒓0|𝒓,𝑘)
𝜕𝑛
𝑑Γ
o 𝑘: wave number, 𝑘 =
𝜔
𝑐
, 𝜔 denotes angular, 𝑐 is speed of sound.
o 𝑃(𝒓0, 𝑘): sound pressure at 𝒓0.
o 𝐺(𝒓0|𝒓, 𝑘): free-field Green function from 𝒓 to 𝒓0, 𝐺 𝒓0|𝒓, 𝑘 =
𝑒 𝑖𝑘 𝒓0−𝒓
𝒓0−𝒓
.
• Discretization of KHIE
𝑃(𝒓 𝟎, 𝑘)
≈
𝑖=1
𝑁
𝐺(𝒓 𝟎|𝒓𝒊, 𝑘)
𝑃 𝒓 𝒊
+
,𝑘 −𝑃 𝒓 𝒊
−
,𝑘
𝛿 𝑖
− 𝑃(𝒓𝒊, 𝑘) 𝐺(𝒓 𝟎|𝒓 𝒊
+
,𝑘)−𝐺(𝒓 𝟎|𝒓 𝒊
−
,𝑘)
𝛿 𝑖
∆𝑆𝑖
o Use 𝟑𝑵 secondary sources to reproduce inside sound field.
Chapter 2: Review of ADVISE
𝒓0
30
30. KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s gradient on its surface
• KHIE-based ADVISE can reproduce 2D sound field with high accuracy, but is unstable when
reproducing 3D sound field.
Chapter 2: Review of ADVISE
Reproduction error of 2D field Reproduction error of 3D field
31
𝑁: division number
on the surface
31. HOA-based ADVISE
• Sound field reproduction using HOA
o 252 secondary sources located on a 1 m sphere.
o 1000 Hz monopole source located at (1.5,60°, 0°).
o Reproduction error is less than −20 dB when distance is less than 0.5 m.
Chapter 2: Review of ADVISE
Ideal field Reproduced field Reproduction error
32
32. Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2
𝛻2
𝑝 = 𝑓
o 𝛻2
: Laplace operator, 𝛻2
𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated along time
by applying finite difference approximation.
𝜕2 𝑝
𝜕𝑡2 =
𝑝(𝑡+1)−2𝑝(𝑡)+𝑝(𝑡−1)
Δ𝑡2
𝜕2 𝑝
𝜕𝑥2 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
1D FDTD update formula:
𝑝 𝑥, 𝑡 + 1 = 𝑓 + 𝑐2 𝑝 𝑥+1,𝑡 −2𝑝 𝑥,𝑡 +𝑝 𝑥−1,𝑡
Δ𝑥2 Δ𝑡2
+ 2𝑝 𝑥, 𝑡 − 𝑝(𝑥, 𝑡 − 1)
Chapter 3: Review of adaptive rectangular decomposition
33
33. Adaptive rectangular decomposition (ARD)
• Normal modes in rectangular space with rigid boundaries
𝑝 𝑥, 𝑦, 𝑧, 𝑡 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝑡) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
o 𝑝 𝑥, 𝑦, 𝑧, 𝑡 : sound pressure sampled in Cartesian coordinate.
o 𝑚 𝜂: mode coefficients rectangular room.
o 𝜂 𝑥, 𝜂 𝑦, 𝜂 𝑧: index of discretized space, 𝜂 𝑥 = 1,2, … ,
𝑙𝑥
Δ𝑥
.
o The formulation can be interpreted as discrete cosine transformation
𝑷 = iDCT(𝑴) ⟺ 𝑴 = DCT(𝑷)
• Update formula of mode coefficients 𝑚 𝜂(𝑡)
𝜕2 𝑀 𝜂
𝜕𝑡2 − 𝑐2 𝑘 𝜂
2 𝑀 𝜂 = DCT(𝑓)
o 𝑘 𝜂
2 = 𝜋2( 𝜂 𝑥
2
𝑙 𝑥
2 +
𝜂 𝑦
2
𝑙 𝑦
2 +
𝜂 𝑧
2
𝑙 𝑧
2 )
34
Chapter 3: Review of adaptive rectangular decomposition
𝑙 𝑥
𝑙 𝑦
𝑙 𝑧
𝑥
𝑧
𝑦
34. Adaptive rectangular decomposition (ARD)
• Interface handling
o Rigid boundary condition: 𝑝 𝑥 = 𝑝(𝑥 + 1)
o Finite difference close to rigid boundary
𝑆 𝑥
0 =
𝑝(𝑥)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Finite difference of propagation
𝑆 𝑥 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Residual term
𝑆 𝑥
′ = 𝑆 𝑥 − 𝑆 𝑥
0 =
𝑝(𝑥+1)−𝑝(𝑥)
Δ𝑥2
35
Chapter 3: Review of adaptive rectangular decomposition
𝑝 𝑥 − 1 𝑝 𝑥 𝑝 𝑥 + 1
36. Binaural rendering
Sound field mapping
ARDImpulse
signal
Room
model
Audio clip
RIRs
ℝ 𝑁 𝑥×𝑁 𝑦×𝑁 𝑧×𝑁𝑡
RTFs
ℂ 𝑁 𝑥×𝑁 𝑦×𝑁 𝑧×𝑁 𝑓
Spherical array driving
signal TFs
ℂ 𝑁 𝑠𝑝𝑒𝑎𝑘𝑒𝑟𝑠×𝑁 𝑓
Spherical array driving
signals
HRIRs
Binaural
audios
Flowchart of the implementation
Notas del editor
By analyzing the responses at both ears, human can percept where the sound comes from.
The environment also affects the sound we hear.
To render a sound scene, what we have is the room structure and where the source is.
Now is the key part of ADVISE, from now, we only care about the sound field around the listeners and we use a speaker array to synthesis a sound field. We need modify the signals that played by the speakers carefully to let the synthesized field fit the sound field we generated before.
To present the spatial audio, we don’t actually need this speaker array, we then virtualize it and rendering the responses of at ear, which are further played by headphone.
The most straight forward method of solving the wave equation in time domain is FDTD.