Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Build Your Own VR Display Course - SIGGRAPH 2017: Part 4

267 visualizaciones

Publicado el

Spatial sound

Publicado en: Ingeniería
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Build Your Own VR Display Course - SIGGRAPH 2017: Part 4

  1. 1. Build Your Own VR Display Spatial Sound Nitish Padmanaban Stanford University stanford.edu/class/ee267/
  2. 2. Overview • What is sound? • The human auditory system • Stereophonic sound • Spatial audio of point sound sources • Recorded spatial audio Zhong and Xie, “Head-Related Transfer Functions and Virtual Auditory Display”
  3. 3. What is Sound? • “Sound” is a pressure wave propagating in a medium • Speed of sound is where c is velocity, is density of medium and K is elastic bulk modulus • In air, speed of sound is 340 m/s • In water, speed of sound is 1,483 m/s c = K r r
  4. 4. Producing Sound • Sound is longitudinal vibration of air particles • Speakers create wavefronts by physically compressing the air, much like one could a slinky
  5. 5. The Human Auditory System pinna Wikipedia
  6. 6. The Human Auditory System pinna cochlea Wikipedia • Hair receptor cells pick up vibrations
  7. 7. The Human Auditory System • Human hearing range: ~20–20,000 Hz • Variation between individuals • Degrades with age D. W. Robinson and R. S. Dadson, 1957 Hearing Threshold in Quiet
  8. 8. Stereophonic Sound • Mainly captures differences between the ears: • Inter-aural time difference • Amplitude differences from path length and scatter Wikipedia time L R t + Dt t L R Hello, SIGGRAPH!
  9. 9. Stereo Panning • Only uses the amplitude differences • Relatively common in stereo audio tracks • Works with any source of audio Line of sound 0 0.5 1 L R 0 0.5 1 L R 0 0.5 1 L R
  10. 10. Stereophonic Sound Recording • Use two microphones • A-B techniques captures differences in time-of-arrival • Other configurations work too, capture differences in amplitude Rode Olympus Wikipedia X-Y technique
  11. 11. Stereophonic Sound Synthesis L R R time amplitude L time amplitude• Ideal case: scaled & shifted Dirac peaks • Shortcoming: many positions are identical Input time amplitude Input
  12. 12. Stereophonic Sound Synthesis • In practice: the path lengths and scattering are more complicated, includes scattering in the ear, shoulders etc. R time amplitude L time amplitude R time amplitude L time amplitude
  13. 13. Head-Related Impulse Response (HRIR) • Captures temporal responses at all possible sound directions, parameterized by azimuth and elevation • Could also have a distance parameter • Can be measured with two microphones in ears of mannequin & speakers all around Zhong and Xie, “Head-Related Transfer Functions and Virtual Auditory Display” q q f L R
  14. 14. Head-Related Impulse Response (HRIR) • CIPIC HRTF database: http://interface.cipic.ucdavis.edu/sound/hrtf.html • Elevation: –45° to 230.625°, azimuth: –80° to 80° • Need to interpolate between discretely sampled directions V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano, "The CIPIC HRTF Database,” 2001
  15. 15. Head-Related Impulse Response (HRIR) • Storing the HRIR • Need one timeseries for each location • Total of samples, where is the number of samples for azimuth, elevation, and time, respectively hrirL t;q,f( ) hrirR t;q,f( ) 2×Nq ×Nf ×Nt Nq,f,t
  16. 16. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position L R s t( ) s t( )
  17. 17. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position 1. Compute relative to center of listener’s head L R q,f( ) s t( ) s t( ) q,f( )
  18. 18. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position 1. Compute relative to center of listener’s head 2. Look up interpolated HRIR for left and right ear at these angles time time amplitude hrirL t;q,f( ) hrirR t;q,f( ) amplitude q,f( ) s t( )
  19. 19. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position 1. Compute relative to center of listener’s head 2. Look up interpolated HRIR for left and right ear at these angles 3. Convolve signal with HRIRs to get the sound at each ear time time amplitudeamplitude sL t( )= hrirL t;q,f( )*s t( ) sR t( )= hrirR t;q,f( )*s t( ) hrirL t;q,f( ) hrirR t;q,f( ) q,f( ) s t( )
  20. 20. Head-Related Transfer Function (HRTF) frequency amplitude hrtfL wt;q,f( ) hrtfR wt;q,f( ) amplitude sL t( )= hrirL t;q,f( )*s t( ) sR t( )= hrirR t;q,f( )*s t( ) • HRTF is Fourier transform of HRIR! (you’ll find the term HRTF more often that HRIR) sL t( ) = F-1 hrtfL wt;q,f( )×F s t( ){ }{ } sR t( ) = F-1 hrtfR wt;q,f( )× F s t( ){ }{ } time time amplitude hrirL t;q,f( ) hrirR t;q,f( ) frequency
  21. 21. • HRTF is Fourier transform of HRIR! (you’ll find the term HRTF more often that HRIR) • HRTF is complex-conjugate symmetric (since the HRIR must be real-valued) Head-Related Transfer Function (HRTF) frequency amplitudeamplitude frequency hrtfL wt;q,f( ) hrtfR wt;q,f( ) sL t( )= hrirL t;q,f( )*s t( ) sR t( )= hrirR t;q,f( )*s t( ) sL t( ) = F-1 hrtfL wt;q,f( )×F s t( ){ }{ } sR t( ) = F-1 hrtfR wt;q,f( )× F s t( ){ }{ }
  22. 22. Spatial Sound of N Point Sound Sources L R s2 t( ) • Superposition principle holds, so just sum the contributions of each s1 t( ) q2 ,f2 ( ) q1 ,f1 ( ) sL t( ) = F-1 hrtfL wt;qi ,fi ( )× F si t( ){ }{ } i=1 N å sR t( ) = F-1 hrtfR wt;qi ,fi ( )× F si t( ){ }{ } i=1 N å
  23. 23. Spatial Audio for VR • VR/AR requires us to re-think audio, especially spatial audio! • User’s head rotates freely  traditional surround sound systems like 5.1 or even 9.2 surround isn’t sufficient
  24. 24. Spatial Audio for VR Two primary approaches: 1. Real-time sound engine • Render 3D sound sources via HRTF in real time, just as discussed in the previous slides • Used for games and synthetic virtual environments • A lot of libraries available: FMOD, OpenAL, etc.
  25. 25. Spatial Audio for VR Two primary approaches: 2. Spatial sound recorded from real environments • Most widely used format now: Ambisonics • Simple microphones exist • Relatively easy mathematical model • Only need 4 channels for starters • Used in YouTube VR and many other platforms
  26. 26. Ambisonics • Idea: represent sound incident at a point (i.e. the listener) with some directional information • Using all angles is impractical – need too many sound channels (one for each direction) • Some lower-order (in direction) components may be sufficient  directional basis representation to the rescue! q,f
  27. 27. Ambisonics – Spherical Harmonics • Use spherical harmonics! • Orthogonal basis functions on the surface of a sphere, i.e. full-sphere surround sound • Think Fourier transform equivalent on a sphere
  28. 28. Ambisonics – Spherical Harmonics 0th order 1st order 2nd order 3rd order Wikipedia Remember, these representing functions on a sphere’s surface
  29. 29. Ambisonics – Spherical Harmonics 1st order approximation  4 channels: W, X, Y, Z W X Y Z Wikipedia
  30. 30. Ambisonics – Recording • Can record 4-channel Ambisonics via special microphone • Same format supported by YouTube VR and other platforms http://www.oktava-shop.com/
  31. 31. Ambisonics – Rendered Sources W = S × 1 2 X = S ×cosq cosf Y = S ×sinq cosf Z = S ×sinf • Can easily convert a point sound source, S, to the 4- channel Ambisonics representation • Given azimuth and elevation , compute W, X, Y, Z asq,f omnidirectional component (angle-independent) “stereo in x” “stereo in y” “stereo in z”
  32. 32. Ambisonics – Playing it Back LF = 2W + X +Y( ) 8 LB = 2W - X +Y( ) 8 RF = 2W + X -Y( ) 8 RB = 2W - X -Y( ) 8 • Easiest way to render Ambisonics: convert W, X, Y, Z channels into 4 virtual speaker positions • For a regularly-spaced square setup, this results in LF LB RF R L R
  33. 33. Ambisonics – Omnitone • Javascript-based first-order Ambisonic decoder Google, https://github.com/GoogleChrome/omnitone
  34. 34. References and Further Reading • Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio HRTF: • Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics • download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html Resources by Google: • https://github.com/GoogleChrome/omnitone • https://developers.google.com/vr/concepts/spatial-audio • https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html • http://googlechrome.github.io/omnitone/#home • https://github.com/google/spatial-media/
  35. 35. References and Further Reading • Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio HRTF: • Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics • download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html Resources by Google: • https://github.com/GoogleChrome/omnitone • https://developers.google.com/vr/concepts/spatial-audio • https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html • http://googlechrome.github.io/omnitone/#home • https://github.com/google/spatial-media/ Demo

×