Instrumenting Open vSwitch with Monitoring Capabilities: Designs and Challenges
Defense_20140625
1. Video Summarization in Video Sensor Networks
Presenter: Shun-Hsing Ou (歐順興)
Advisor: Shao-Yi Chien (簡韶逸博士)
Media IC & System Lab
Graduate Institute of Electronics Engineering
National Taiwan University
2. • Widely applied in our daily life
Video Sensor Network (1/2)
Media IC & System Lab Shun-Hsing Ou 2
TrafficSecurity Environment Monitoring
3. Video Sensor Network (2/2)
• The EYEs of Machine-to-Machine (M2M)
or Internet-of-Things (IoT)
Media IC & System Lab Shun-Hsing Ou 3
Plenty of video sensor companies in M2M or
IoT applications shown in Computex 2014.
Goal-line Technology in
FIFA World Cup 2014
4. Problems
• Video data is usually very large
– Large storage space
– Large transmission data
• Watching video is usually time consuming
Media IC & System Lab Shun-Hsing Ou 4
5. Wireless Video Sensor Network (1/2)
• Streaming videos through wireless
communication
– Without wire = more flexible
• Wider coverage
• Better view angles
Media IC & System Lab Shun-Hsing Ou 5
6. Wireless Video Sensor Network (2/2)
• Power is the key
– Powered by
• Batteries
• Energy harvest devices
– Streaming video requires large power.
Media IC & System Lab Shun-Hsing Ou 6
7. Media IC & System Lab Shun-Hsing Ou 7
An efficient video management and
filtering method is required
8. Redundancy of Video Data
• Video usually contains redundant data
– Repeated events
– Overlapped field-of-views
Media IC & System Lab Shun-Hsing Ou 8
9. Automatic Video Summarization
• Generating short representation of original
video
• Providing an excellent solution for video
management
Media IC & System Lab Shun-Hsing Ou 9
10. Our Idea
• Applying multi-view video summarization
in video sensor networks
– Saving storage space
– Saving transmission data
– Saving power
– Increasing usability
Media IC & System Lab Shun-Hsing Ou 10
Video Sensor
Sensor Encoder Transceiver
Server
Analyzer
data
info
Summarization Unit
11. Contributions
• Propose to apply video summarization algorithms
in (wireless) video sensor networks
– Saving 60% ~ 90% storage space & transmission data
– Saving 50% ~ 80% power
– Increasing usability
• Propose an efficient video summarization
algorithm
– Multi-view
– Distributed
– On-line
• Implement real wireless video sensor networks
with summarization system
Media IC & System Lab Shun-Hsing Ou 11
12. Outline
• Background
• Proposed summarization algorithm
• Experiments
• Implementations
• Conclusion
Media IC & System Lab Shun-Hsing Ou 12
14. Requirements (1/2)
• Multi-view
• On-line
• Distributed
• Low-complexity
Media IC & System Lab Shun-Hsing Ou 14
Video Sensor
Sensor Encoder Transceiver
Server
Analyzer
data
info
Summarization Unit
15. Requirements (2/2)
• 28 summarization methods were surveyed
– Only 4 on-line approaches
– Only 7 multi-view approaches
– No multi-view AND on-line approach
– Existing on-line approaches require large memory and computing
power
– Existing multi-view approaches are centralized
Media IC & System Lab Shun-Hsing Ou 15
TMM. 4
CVPR. 5
ICIP. 2
ACMMM. 4
ICME. 4
CSVT. 1
ICCV. 2
Other. 6• As a result, a new
summarization algorithm
is required
Conferences and journals
of the references
17. System Structure
• Two stages design
– Intra-view stage
– Inter-view stage
Media IC & System Lab Shun-Hsing Ou 17
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 1
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 2
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 3
Server
Video
Feature
Intra-view Stage Inter-view Stage
18. Intra-view Stage: Overview
• On-line single-view video summarization
– Clustering
• A common technique of video summarization
• Applied to reduce redundancy
– On-line clustering is applied in our system
Media IC & System Lab Shun-Hsing Ou 18
GMM
Cluster 1 Cluster 2 Cluster n…
On-line
Clustering
Feature
Extraction
Frame
Selection
Input
Frame
Summarization
19. Intra-view Stage: Feature Extraction
• Frame representative feature is required
• MPEG7 color-layout descriptor is applied
– Simple
– Good representative ability
Media IC & System Lab Shun-Hsing Ou 19
20. Intra-view Stage: Clustering (1/2)
• Gaussian Mixture Model
– Each cluster has three parameters
• Mean
• Covariance
• Weighting
– At time t, the probability of each feature can be
represented as
Media IC & System Lab Shun-Hsing Ou 20
21. Intra-view Stage: Clustering (2/2)
• Parameter estimation
– EM is usually applied in off-line applications
– On-line estimation
• Step 1: Matching
• Step 2: Updating
Media IC & System Lab Shun-Hsing Ou 21
:pre-defined learning rate
:1 for matched component, 0 otherwise
22. Intra-view Stage: Frame Selection
• Using clustering parameters
– Low-weighting cluster: rare events
– High-variance cluster: high activity events
• Algorithm:
– Step 1: Sort clusters in ascending order by
– Step 2: Keep frames if
Media IC & System Lab Shun-Hsing Ou 22
:pre-defined summarization rate
23. Intra-view Stage: Another Point of View (1/2)
• The difficulty of on-line summarization
– Partial Information
Media IC & System Lab Shun-Hsing Ou 23
Off-line Process
Video Data
On-line Process
On-line Process with
Memory Limitation
24. Intra-view Stage: Another Point of View (2/2)
• The Gaussian-Mixture-Model keeps the
information of previous frames
– A model for what is redundant and what is
active
• No frame buffer is required
Media IC & System Lab Shun-Hsing Ou 24
GMM
Cluster 1 Cluster 2 Cluster n…
On-line
Clustering
Feature
Extraction
Frame
Selection
Input
Frame
Summarization
25. Inter-view Stage: Overview
• View selection
• Distributed view selection
– Exchange features & scores between sensors
Media IC & System Lab Shun-Hsing Ou 25
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 1
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 2
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 3
Server
Video
Feature
Intra-view Stage Inter-view Stage
26. Inter-view Stage: Overview
• Step 1: Extract inter-view feature and score for
each frame
– Color Layout Descriptor is not suitable
• Step 2: Exchange features and scores with other
sensors
• Step 3: If there is a “matched” feature with higher
score, drop the current frame
Media IC & System Lab Shun-Hsing Ou 26
27. Inter-view Stage: Feature Extraction
• Step 1: Foreground mask
– By color layout feature & GMM
• Step 2: Extract HSV histogram of the foreground pixels. (H: 16,
S: 2, V: 2) as the inter-view feature
• Step 3: Mask size is used as the frame score
Media IC & System Lab Shun-Hsing Ou 27
30. Dataset (1/2)
• Three datasets are applied
– BL-7F: 19 videos, 320 x 240, 30 FPS
– Office1: 4 videos, 640 x 480, 30 FPS
– Lobby1: 3 videos, 640 x 480, 30 FPS
Media IC & System Lab Shun-Hsing Ou 30
1Yanwei Fu, et al., “Multi-view Video Summarization,” TMM 2010
31. Dataset (2/2)
• Ground truth
– People who have no knowledge of our project
were asked to mark time period of events in
each video
– They were also asked to add flags if two
segments from different views are the same
event
Media IC & System Lab Shun-Hsing Ou 31
33. Intra-view Stage: Evaluation
• Single-View Video Summarization
– Frame level precision & recall are applied
• Precision: the ability of the algorithm to remove
useless content
• Recall: the ability of the algorithm to keep important
events
Media IC & System Lab Shun-Hsing Ou 33
34. Intra-view Stage: Baseline
• Tree-based1
– D = 30
– D = 90
• Compressed domain2
Media IC & System Lab Shun-Hsing Ou 34
1Víctor Valdés, et al., “Binary Tree Based On-line Video Summarization,” TVS 2008
2J. Almeida, et al., “Online Video Summarization on Compressed Domain,” JVCIR 2012
37. Inter-view Stage: Evaluation
• Multi-View Video Summarization
– Cross-view redundant frame are calculated as
false positive
Media IC & System Lab Shun-Hsing Ou 37
38. Inter-view Stage: Baseline
• Baseline
– Concatenate the results of single-view
methods
• Tree-based
• Compressed domain
• The proposed GMM
– Graph-based1
• The results are provided by the authors
Media IC & System Lab Shun-Hsing Ou 38
1Yanwei Fu, et al., “Multi-view Video Summarization,” TMM 2010
41. Complexity (1/2)
• Tested on EeePC
– CPU: ATOM N570
– RAM: 2GB
• Dataset: Office
– 640 X 480
• All methods are implemented using C++
Media IC & System Lab Shun-Hsing Ou 41
42. Video Skimming: Complexity (2/2)
Media IC & System Lab Shun-Hsing Ou 42
Tree-Based,
D-30
Tree-Based, D-
90
Compressed
Domain
GMM
FPS (f/s) 21.8 18.8 9.3 34.7
Latency (s) 30 90 ~200 ~0
# Buffered Frames 900 2700 ~6000 1
Memory > 414.7 MB > 1244.1 MB > 2764.8 MB 474.6 KB
44. Power Analysis
• We compare the power consumption
– With/Without summarization
• Platform: EeePC
– Battery power is measured
– DVC is applied as the encoder
Media IC & System Lab Shun-Hsing Ou 44
1 S.-Y. Chien, et al., Power consumption analysis for distributed video sensors in machine-to-machine
networks,“ JETCAS 2013
45. Without Summarization
• Total power
– Encoding power (Pc)
– Transmission power (Pt)
Media IC & System Lab Shun-Hsing Ou 45
Wireless Video Sensor
Sensor Encoder Transceiver
data
Server
Analyzer info
46. With Summarization
• Total power
– Encoding power (Pc)
– Video transmission power (Pt)
– Feature transmission power (Pf)
– Summarization power (Ps)
Media IC & System Lab Shun-Hsing Ou 46
Video Sensor
Sensor Encoder Transceiver
Server
Analyzer
data
info
Summarization Unit
47. 0
20
40
60
80
100
120
DVC DVC + Intra-view Stage DVC + Inter-view Stage
Power(mW)
Pf: Feature Transmission Power
Ps: Summarization Power
Pt: Transmission Power
Pc: Encoding Power
Media IC & System Lab Shun-Hsing Ou 47
BL-7F, Processor-Based
73.5%
49. Implementation
• We use Raspberry Pi to implement our
wireless video sensor network
Media IC & System Lab Shun-Hsing Ou 49
50. Raspberry Pi
• Spec
– SoC: Broadcom BCM2835
– CPU: 700 MHz ARM11
– GPU: Broadcom VideoCore IV @ 250 MHz
– Memory: 512 MB
– Power: 5V x 700mA = 3.5W
• Related I/O
– 5V Micro USB power input
– Two USB I/O
– Camera Serial Interface (CSI)
Media IC & System Lab Shun-Hsing Ou 50
52. Video Acquisition and Encoding (1/2)
• We need raw RGB from camera module
– Color space conversion is slow
• We need to encode video after
summarization
– Encoding is a high-complexity task
Media IC & System Lab Shun-Hsing Ou 52
53. Video Acquisition and Encoding (2/2)
• Hardware Acceleration: Broadcom
VideoCore IV
– Hardware camera pipeline
– Hardware H.264 encoder/decoder
– OpenMAX API
Media IC & System Lab Shun-Hsing Ou 53
54. Synchronization
• Network Time Protocol (NTP)
– Error may be large when cross domains
(>100ms)
– Error is small in local (< 1ms)
• We create NTP server in our server
Media IC & System Lab Shun-Hsing Ou 54
58. Conclusion
• In this thesis, we propose to apply
summarization on video sensor network
– Saving 60% ~ 90% storage space & transmission
data
– Saving 50% ~ 80% power
• A distributed on-line multi-view
summarization algorithm is proposed
– Low-complexity, low memory requirement
– Generating comparable results with other
methods
• A wireless video sensor network is
implemented to validate the concept
Media IC & System Lab Shun-Hsing Ou 58
60. Appendix: Proposed System II -
Distributed On-line Multi-view Keyframe
Extraction
Media IC & System Lab Shun-Hsing Ou 60
61. Representation of Video Summarization (1/3)
• Video Skimming: A short video highlight
– More enjoyable to watch
– Better for further vision processing
• Keyframe Extraction: Representative
keyframes
– More compact representation
– Better for video browsing, surveillance, etc.
Media IC & System Lab Shun-Hsing Ou 61
62. Representation of Video Summarization (2/3)
• Storyboard: Arranged keyframes
• Fast forwards: Smart video player
• Video Synopsis: Retargeting in time
domain
Media IC & System Lab Shun-Hsing Ou 62
1Y. Pritch, et al., “Webcam Synopsis: Peeking Around the World,” ICCV 2007
63. Representation of Video Summarization (3/3)
• “Video skimming” and “Keyframe
extraction” are better for video sensor
networks
– The results are more suitable for other vision
processing
– We focus on data filtering instead of summary
representation
Media IC & System Lab Shun-Hsing Ou 63
64. Video-MMR1 (1/2)
• Video maximum marginal relevance
• Iterative algorithm
– Select one frame with max Video-MMR at one
time
Media IC & System Lab Shun-Hsing Ou 64
1Yingbo Li, et al., “Multi-video Summarization Based on Video-MMR,” WAMIAS 2010
- Frame
- Set of all frames
- Frames in summary
Represent ability Redundancy
65. Video-MMR1 (2/2)
• Centralized algorithm
• Off-line algorithm
Media IC & System Lab Shun-Hsing Ou 65
1Yingbo Li, et al., “Multi-video Summarization Based on Video-MMR,” WAMIAS 2010
66. Distributed On-line Video-MMR (1/2)
• Perform operation for every fixed time
period T
– is used instead of , where is the
set of frame captured from t to t + T
– Avoid buffering all frames
• If there are M camera
– We change MMR to
Media IC & System Lab Shun-Hsing Ou 66
68. Distributed On-line Video-MMR (3/3)
• First term can be calculated at each sensor
• Second term can be calculated by sending
all feature of from the server to sensors
– Large data overhead
• We send frames as
Media IC & System Lab Shun-Hsing Ou 68
69. Data Overhead
• There is large data overhead if we want to
send all features belong to to all sensors
• MsWave1 is applied
– MsWave is a distributed kNN/kFN algorithm
– MsWave reduce large amount of data
exchanged
Media IC & System Lab Shun-Hsing Ou 69
1J.-P. Wang, et al., “Communication-efficient distributed multiple reference pattern
matching for M2M systems, ” ICDM 2013
70. MsWAVE
• Distributed kNN/kFN search algorithm
between a group of sensors and a server
• Haar transform is applied to generate
coarse level feature
– Upper bond and lower bond are estimated
using the coarse feature
Media IC & System Lab Shun-Hsing Ou 70
73. Keyframe Extraction: Baseline
• Single-view
– Uniform sampling (US)
– Random sampling (RS)
– Visual attention based1 (VA)
• Multi-view
– MMR2
– K-means (KM)
Media IC & System Lab Shun-Hsing Ou 73
1Y.-F. Ma, “A Generic Framework of User Attention Model and Its Application in Video
Summarization,” TMM 2005
2Yingbo Li, et al., “Multi-video Summarization Based on Video-MMR,” WAMIAS 2010
74. Keyframe Extraction: Extra Data
• Since keyframes are much smaller than
video skimming
– Extra data becomes relatively large
• We compare extra data with centralized
method, which features of all frames are
sent
Media IC & System Lab Shun-Hsing Ou 74
75. Media IC & System Lab Shun-Hsing Ou 75
Single-view Multi-view
RS US VA KM MMR Ours
BL-7F
(19 videos)
Keyframe 77 77 82 77 77 77
Recall (%) 22 30 74 74 67 74
Redundant Frame 1 3 64 38 36 32
Data Sent (%) 0 0 0 100 100 33
Office
(4 videos)
Keyframe 94 94 116 94 94 94
Recall (%) 13 18 52 52 66 63
Redundant Frame 2 0 44 45 38 21
Data Sent (%) 0 0 0 100 100 26
Lobby
(3 videos)
Keyframe 70 70 117 70 70 70
Recall (%) 66 63 72 72 70 76
Redundant Frame 8 11 69 29 28 14
Data Sent (%) 0 0 0 100 100 16
77. On-line Summarization (1/3)
• Tree-based Method1
– Type: video skimming
– Method:
• On-line decision tree
– Cons
• Long latency
• Large memory required
Media IC & System Lab Shun-Hsing Ou 77
1Víctor Valdés, et al., “Binary Tree Based On-line Video Summarization,” TVS 2008
78. On-line Summarization (2/3)
• Summarization in compress domain1
– Type: video skimming
– Method
• On-line shot detection: calculate different between frames
• Redundancy removal
– Cons
• Long latency
• Large memory required
Media IC & System Lab Shun-Hsing Ou 78
1J. Almeida, et al., “Online Video Summarization on Compressed Domain,” JVCIR
2012
79. On-line Summarization (3/3)
• Visual Attention Model1
– Type: keyframe
– Method
• Visual attention index
• Attention curve peek detection
– Cons
• Not able to remove redundant frames
Media IC & System Lab Shun-Hsing Ou 79
1Y.-F. Ma, “A Generic Framework of User Attention Model and Its Application in Video Summarization,”
TMM 2005
80. Multi-view Summarization (1/2)
• Clustering1
– Type: video skimming
– Method
• Shot detection
• Graph
• Clustering
– Cons
• Centralized
• High-complexity
Media IC & System Lab Shun-Hsing Ou 80
1Yanwei Fu, et al., “Multi-view Video Summarization,” TMM 2010
81. Multi-view Summarization (2/2)
• MMR1
– Type: keyframe extraction
– Method:
• Video maximum marginal relevance
– Cons
• Centralized
• Large memory required
Media IC & System Lab Shun-Hsing Ou 81
1Yingbo Li, et al., “Multi-video Summarization Based on Video-MMR,” WAMIAS 2010
Represent ability Redundancy
83. Video Skimming
• The result is like video skimming
– Parameter updating is smooth
Media IC & System Lab Shun-Hsing Ou 83
84. Media IC & System Lab Shun-Hsing Ou 84
Tree-based, D=30
85. Media IC & System Lab Shun-Hsing Ou 85
Tree-based, D=90
86. Media IC & System Lab Shun-Hsing Ou 86
Compress Domain
87. Media IC & System Lab Shun-Hsing Ou 87
The Proposed GMM Approach
88. Video Skimming: Packet Loss
Media IC & System Lab Shun-Hsing Ou 88
• Dataset: BL-7F
• Each sensor has a uniform probability
failing to receive a feature
89. Platform
• Processor-based
– EeePC
– Battery power is measured
• ASIC-based1
– Transmission power is
estimated
– H.264 power is estimated
– Summarization power is
estimated
Media IC & System Lab Shun-Hsing Ou 89
1 S.-Y. Chien, et al., Power consumption analysis for distributed video sensors in machine-to-machine
networks,“ JETCAS 2013
90. Media IC & System Lab Shun-Hsing Ou 90
BL-7F, ASIC-Based
0
5
10
15
20
25
No motion DVC DVC + Intra Stage DVC + Inter Stage
Power(mW)
Pf: Feature Transmission Power
Ps: Summarization Power
Pt: Transmission Power
Pc: Encoding Power
83.4%
93. Communication Issues
• Feature broadcasting
– Only need to broadcast to nearby sensors
• Communication latency
– An additional buffer is needed
• Synchronization
– Clocks of all sensors are synchronized
Media IC & System Lab Shun-Hsing Ou 93
94. Wireless Video Sensor Network
• Connected by a single Wi-Fi AP
Media IC & System Lab Shun-Hsing Ou 94
95. Communication Channel
• 3 TCP channels are connected to the
server for each sensor
– Video Channel: Streaming video
– Feature Channel: Exchanging features
– Control Channel: Control signals, time
information
Media IC & System Lab Shun-Hsing Ou 95