Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Color-plus-Depth Level-of-Detail in 3D Tele-immersive Video: A Psychophysical Approach
1. Color-plus-Depth Level-of-Detail in 3D Tele-
immersive Video: A Psychophysical Approach
Wanmin Wu, Ahsan Arefin, Gregorij Kurillo,
Pooja Agarwal, Klara Nahrstedt, Ruzena Bajcsy
University of Illinois at Urbana-Champaign
University of California, Berkeley
ACM Multimedia 2011 1
2. Outline
Background (3D Tele-immersion)
Motivation
“Color-plus-Depth Level-of-Detail” (CZLoD)
Our Psychophysical Study on CZLoD
Perception-based reduction of CZLoD
Experimental Results
Conclusion
2
3. Background: 3D Video in Tele-immersion
Scottsdale Urbana
3D Capturing Internet 3D Capturing
Video
Streaming
3D Visualization 3D Visualization
3
4. Challenge
Huge Computation Poor 3D Video
Resource Demand Performance
Each 3D video frame taking low frame rate,
too long to process flickering/freezing effects
4
6. How can we improve 3D video performance?
Past approaches: system-centric (algorithmic optimization)
• 3D Capturing: Depth reconstruction [Wurmlin’03][Vasudevan’10]
• Video Streaming: Coordinated data transport protocol [Ott’02]
• 3D Visualization: rendering [Towles’02]
Our (orthogonal) approach: human-centric (psychophysics)
Reduces per-frame computation resource usage by up to 60%
AND significantly improves overall video quality
6
7. Inspiration
Raw Image: 108.5 KB JPEG: 9.4 KB
Human vision is limited.
“Free” data reduction in tele-immersive video
may be possible.
7
8. Is “Free” Data Reduction Possible in Tele-immersive Video?
HOW TELE-IMMERSIVE VIDEO IS GENERATED
Meshing Depth Mapping
Left
Left Fg
nd
Right
Me
sh
Background + =
Color
Subtraction
2D Capture Right
+ = 3D Frame
Texture Mapping
8
9. Is “Free” Data Reduction Possible in Tele-immersive Video?
HOW TELE-IMMERSIVE VIDEO IS GENERATED
Depth Texture
Mapping Mapping
Number of:
A Critical Accurate Accurate
Vertex Pixels
Metric Mapping Mapping
Non-vertex
Linear Interpolation
Pixels
9
10. Is “Free” Data Reduction Possible in Tele-immersive Video?
METRIC: “Color-plus-Depth Level-of-Detail” (CZLoD)
= Number of vertices in mesh
Accuracy and density of color and depth maps
Computation resource usage
22K 1K
vertices vertices
Our idea: keep CZLoD at a minimally necessary level
10
11. Is “Free” Data Reduction Possible in Tele-immersive Video?
HOW MUCH IS “MINIMALLY NECESSARY”
Number of vertices (in current video frame)
CZLoD Degradation Ratio = 1 -
Number of vertices (in baseline/best-quality frame)
0%
90% 10%
80% 20%
70% 30%
60% 40%
50% 1 When degradation becomes noticeable
2 When degradation becomes unacceptable
11
12. Is “Free” Data Reduction Possible in Tele-immersive Video?
EXPERIMENTAL METHOD Ascending Method of Limits (Psychophysics)
…
Baseline ~10% Baseline ~20% Baseline ~90%
(best) degraded (best) degraded (best) degraded
1 Do you notice any difference in quality between the clips?
Voting:
2 Do you feel any clip has an unacceptable quality?
12
13. Is “Free” Data Reduction Possible in Tele-immersive Video?
RESULTS
CZLoD Degradation Ratio 100% Just Unacceptable Degradation Ratio:
80% 90%
Just Noticeable Degradation Ratio:
60% 70%
40%
20%
0%
“Free” data reduction in tele-immersive
video is indeed feasible.
13
14. Overview of Our Approach
Understanding Applying the
Perceptual Limits Perception
Understanding
on 3D Tele- Thresholds to System
immersive Video Development
14
15. Applying to System Development
MAIN IDEA
Frame Rate
Just Unacceptable
12 fps
Just Noticeable
7 fps
3 fps
0% 60% 70% 80% 90%
CZLoD Degradation Ratio
15
16. Applying to System Development
ARCHITECTURE
2D Capture, CZLoD Degradation Ratio
Controller
Bkgnd Subtraction Control Parameter
Decision Frame Rate
Meshing
Engine CZLoD Degradation Ratio
{unnoticeable, acceptable}
3D Depth QoS Monitors for abnormal
Mapping Monitor frame rate, etc.
16
17. Applying to System Development
SYSTEM EVALUATION
Average Frame Rate Improvement
160%
150%
140%
120%
100%
80% “Free” Improvement
60%
40%
20%
0% imperceptible perceptible
10% 20% 30% 40% 50% 60%
Average CZLoD Degradation Ratio
17
18. Quality Adaptor
USER EVALUATION
Anonymous Crowdsourcing
Comparison Test
(Unimpaired) (Adapted)
78 Users
The Slightly
Same, 3.8 Better, 12.
0% 80%
Much
Better, 32.
10%
Better, 51.
30%
18
20. Conclusion
Main Contribution
We introduce a psychophysical approach
for 3D tele-immersive video that reduces
per-frame computation resource usage by
up to 60% “for free” and significantly
improves overall perceived video quality.
Perception-based degradation of
Color-plus-Depth Level-of-Detail
20
21. Thank You
Wanmin Wu:
wanmin.wu@gmail.com
TEEVE Project:
http://cairo.cs.uiuc.edu/projects/teleimmersion
21
Notas del editor
Hi everybody. I’m Wanmin Wu. The title of my talk today is “…”. This is a joint work with my many colleagues in Univ… and Univ
First, I would give a little background information on 3D teleimmesion. I’ll then talk about motivation for our work. Next, I’ll introduce a new concept called color-plus-depth level-of-detail (CZLoD) in tele-immersive video. I’ll then present our psychophysical study on CZLoD and our perception-based reduction strategy of CZLoD. I’ll present our experimental results to show how we vastly improve performance of 3D tele-immersive video. Finally, I’ll conclude the talk.
So first, background: 3D video tele-immersion. What is tele-immersion? 3D tele-immersion is basically a multimedia technology that enables remote people to interact in a virtual space. Suppose there are two sites, Scottsdale and Urbana. In each site, we use an array of 3D cameras to capture the scene from different angles. The data are then exchanged on the Internet; and finally, the remote data and local data are visualized in a virtual-reality space. In this case, the two users can play a light saber game as if they were face-to-face.
That sounds like an awesome technology, right?! One can imagine a lot of applications with that. Unfortunately, the technology at this point is still facing a lot of practical challenges. One of the foremost challenge is that there is a huge computation resource demand. Basically, each 3D video frame is taking too long to process, like 200 milliseconds! This is not desirable for real-time interaction. This leads to poor 3D video performance, such as low frame rate (because each frame takes too long, so in one second, we cannot produce many frames), also, some complicated frame can take even longer to process, leading to flickering and freezing effect.
To illustrate, I would like to show a short video as an example. This video is recorded in a real 3D tele-immersive system. You can see the freezing.. The flickering… in general, very bad video performance.
So how can we improve? There have been many past approaches that try to improve performance in different parts of the system, including…. Most of these approaches are system-centric, algorithmic optimization. At this point, we’re still facing performance challenges. In this work, we present a sort of orthogonal approach, which is human-centric. Our psychophysical approach can reduce… and significantly…
How did we do that? We got inspired by some old multimedia technology. So let me show you two pictures. How many of you thought the two pictures have the same quality? Most of you do. In fact, the left image is a raw image, which takes over 100 Kilobytes, and the right image is the JPEG image, which only takes 1/10 of the size. But most people wouldn’t tell any difference in their visual quality. The reason behind this is that human vision is limited. This implies that free data reduction in tele-immersive video may be possible. Remember that, once we reduce data in every video frame, the frame processing time will be reduced as well, because it directly depends on how much data there are in a frame.
So we want to find out whether free data reduction is indeed possible in tele-immersive video. To do this, let’s first understand how tele-immersive video is generated. What is its format? Basically, on this camera host computer, there’s this whole pipeline going on. First, 2D images are taken from different eyes of the camera, then, one of them is taken as a reference image to do background subtraction, then the foreground is partitioned into a polygon mesh; then importantly, depth mapping is done for the mesh vertices by correlating with the other 2D image, here, the right one. The depth map is finally combined with the texture or color map to produce a 3D video frame.
Now, this mesh representation is critical, because for vertex pixels, depth mapping and texture mapping are accurately done, which are computationally expensive. At the receiver side, the depth and texture information for the non-vertex pixels are just approximated by linear interpolation. You might wonder why this is so/ normally one would want to do depth mapping and texture mapping for every pixel, but this turns out to be very expensive, hurting real-time performance. The mesh is a good approximate representation. Because of the different treatments for vertices and non-vertices, the number of vertex pixels in mesh is a critical metric, as it determines the accuracy and density of depth and texture maps.
So, we define this metric as “color-plus-depth level-of-detail”, which is essentially number of vertices in the mesh. As I mentioned, it determines … So here are two examples of different CZLoD levels, the left one has 22K vertices, and the right one has 1K vertex. Those of you who sit closer can notice the face is particularly blurry in the right frame. So our basic idea is to see if it is necessary to maintain the original best CZLoD level. Do people notice it if we remove some of the details? Or in other words, as the title suggests, is free data reduction possible? We want to keep CZLoD at a minimally neessary level.
The next step is to find out how much is minimally necessary. First of all, CZLoD itself is number of vertices in the mesh, and therefore it largely depends on the complexity of the scene. To make the metric comparable across video, we use its degradation ratio in our study. Basically, it is 1-… What we want to do is gradually change this degradation ratio from 0% (which is the best quality possible) to up to 90%, and see how human eyes react to that. We want to understand two perceptual thresholds: 1…2…
So we adapted some old psychophysical methodology as our experimental method. The basic idea is to…