SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
© 2019 Pathpartner Technology
Using Deep Learning for
Video Event Detection on a
Compute Budget
Praveen Nayak
Pathpartner Technology
May 2019
© 2019 Pathpartner Technology
Outline
• Introduction to Video Event Detection
• Learning Representations from Video
• From Video Representation to Event Detection
• Decoupling the “When” and “What”
• Results, case study on UCF101-Thumos2015 challenge
• Conclusion
2
© 2019 Pathpartner Technology
Introduction to Video Event Detection
3
© 2019 Pathpartner Technology
Video data as viewed by ML
• Video is a 3D signal
• Spatial Coordinates x,y
(limited by WxH)
• Temporal Coordinates t
(limited by T)
• If we fix t, we obtain an
image/frame
• We can understand videos
as sequence of images
4
Introduction
© 2019 Pathpartner Technology
Event Detection
• Retrieve start (tstart) and end (tend)points of “event” from temporally
“untrimmed” video
• Evaluation metric: mAP and Recall for a given temporal IoU (tIoU)
5
Introduction to Event Detection
nI
nU
𝑡𝐼𝑂𝑈 =
𝑛𝐼
𝑛𝑈
© 2019 Pathpartner Technology
Learning Representations from Video
6
© 2019 Pathpartner Technology
Spatiotemporal fusion networks
7
Learning representations from Video
Image: Kim et. al, Weighing classes and streams: toward better methods for two-stream convolutional networks
© 2019 Pathpartner Technology
Convolutions for spatiotemporal data
8
Learning representations from Video
• C3D model: All convs are 3D, Fewer parameters than 2D convolutions
over multiple frames
Image: D.Tran et. al, Learning Spatiotemporal Features with 3D Convolutional Networks
C3D feature vector
3D Convolutions
© 2019 Pathpartner Technology
Convolutions for spatiotemporal data
9
Learning representations from Video
• C3D model: All convs are 3D, Fewer parameters than 2D convolutions
over multiple frames.
Image: D.Tran et. al, Learning Spatiotemporal Features with 3D Convolutional Networks
C3D feature vector
3D Convolutions Joint Appearance and Motion
features at every layer
© 2019 Pathpartner Technology
State-of-the-art video descriptors
10
Learning representations from Video
Image: J Carriera et al, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
© 2019 Pathpartner Technology
Action Classification Datasets
11
Learning representations from Video
Dataset Action
Classes
# Clips Temporal
trimming
HMDB-51[1] 51 ~7k Yes
UCF-101[2] 101 ~13k Yes
Kinetics[3] 400 ~160k No
• The classification metric: mAP,
similar to image based
classification metric, extended in
temporal domain
• Classifier to make one decision
per clip
UCF-101
HMDB-51
Kinetics
[1] UCF-101, University of Central Florida, https://www.crcv.ucf.edu/data/UCF101.php
[2] HMDB-51, Brown university (http://serre-lab.clps.brown.edu/resource/hmdb-a-large-
human-motion-database/)
[3] Kinetics dataset, Deepmind (https://deepmind.com/research/open-source/open-
source-datasets/kinetics/)
© 2019 Pathpartner Technology
State-of-the-art video descriptors
12
Learning representations from Video
Image and Table: J Carriera et al, Quo Vadis, Action
Recognition? A New Model and the Kinetics Dataset
© 2019 Pathpartner Technology
From Video Representation to Event
Detection
13
© 2019 Pathpartner Technology
From RCNN to Segment-CNN
14
From Video Representation to Event Detection
Image: Z. Shou, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
© 2019 Pathpartner Technology
From RCNN to Segment-CNN
15
From Video Representation to Event Detection
Image: Z. Shou, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
Computationally very intensive!
© 2019 Pathpartner Technology
TAL-NET: Parallels from Faster R-CNN
• Problem split :
class-sensitive
proposal
generation,
followed by
inference
16
From Video representations to Video event detection
Faster-RCNN
TAL-NET
Image: Y. Chao et. Al, Rethinking the Faster R-CNN Architecture for Temporal Action Localization
© 2019 Pathpartner Technology
TAL-NET: Parallels from Faster R-CNN
• Problem split :
class-sensitive
proposal
generation,
followed by
inference
17
From Video representations to Video event detection
Faster-RCNN
TAL-NET
Image: Y. Chao et. Al, Rethinking the Faster R-CNN Architecture for Temporal Action Localization
End-to-end-Trainable
© 2019 Pathpartner Technology
Computational cost of Detection
18
From Video Representations to Video event detection
Model GMAC/
inference
#params
(Million)
GMAC/ video
VGG 15 138 46.5k
C3D- SCNN[1] 79 80 237k
C3D-LSTM[2] 24 86 72k
TAL-NET[3] 29 98 87k
SSAD[4] 61 356 183k
• Template size:
1. C3D, TAL-NET:171x128x16
2. VGG: 224x224x3
• GMAC/inference: could be for frame-level inference (VGG) or clip-
level inference (C3D)
• GMAC/video: assumes average video length of 3000 frames
GMACs per video of event detectors
[1] Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
http://dvmmweb.cs.columbia.edu/files/dvmm_scnn_paper.pdf
[2] Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
https://imatge-upc.github.io/activitynet-2016-cvprw/
[3] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
https://arxiv.org/pdf/1804.07667.pdf
[4] Single Shot action detection https://arxiv.org/abs/1710.06236
Accuracy vs GMACs / video of event detectors
© 2019 Pathpartner Technology
Decoupling the “When” and “What”
19
© 2019 Pathpartner Technology
Proposed event detection system
• Break down problem into two parts:
1. Class agnostic segment proposal (low complexity)
2. Video segment inference (can be high complexity depending on
nature of video)
20
Decoupling the “when” and “what”
© 2019 Pathpartner Technology
Proposed event detection system
• Break down problem into two parts:
1. Class agnostic segment proposal (low complexity)
2. Video segment inference (can be high complexity depending on
nature of video)
• Inference model may be long or short term temporal, called on-demand
21
Decoupling the “when” and “what”
© 2019 Pathpartner Technology
Proposed event detection system
• Break down problem into two parts:
1. Class agnostic segment proposal (low complexity)
2. Video segment inference (can be high complexity depending on
nature of video)
• Inference model may be long or short term temporal, called on-demand.
• Characteristics of a good segment proposal:
• Cater to arbitrary event lengths
• Discriminate event from background, irrespective of event
• needs to run for every frame, so low in complexity
22
Decoupling the “when” and “what”
© 2019 Pathpartner Technology
Class agnostic segment proposal
• Formulate problem as unsupervised “anomaly detection”
• Train a model to learn anomalies against background
• At deployment, predict a binary label, i.e., anomaly has occurred or not
23
Decoupling the “when” and “what”
Anomaly
Detector
Video clip
Anomaly
DetectorAnomaly
DetectorAnomaly
DetectorAnomaly
DetectorAnomaly
Detector
Yes/No
Yes/No
Yes/No
Yes/No
Yes/No
Yes/No
Result
t
© 2019 Pathpartner Technology
Training an anomaly detector
24
Decoupling the “when” and “what”
Yt-1
Yt
Yt+1
E LSTM
Encoder Decoder
LSTM D
Space
Time
Memory Memory
Convolutional
Encoder
Convolutional
Decoder
Y’t+1
Error
𝑒𝑡 = 𝑌′
𝑡 + 1 −
𝑌𝑡 + 1 2
Video Autoencoder Framework
© 2019 Pathpartner Technology
Training an anomaly detector
25
Decoupling the “when” and “what”
Video Autoencoder Framework
Yt-1
Yt
Yt+1
E LSTM
Encoder Decoder
LSTM D
Space
Time
Memory Memory
Convolutional
Encoder
Convolutional
Decoder
Y’t+1
Error
𝑒𝑡 = 𝑌′
𝑡 + 1 −
𝑌𝑡 + 1 2
2D-CNN
2D-CNN
Conv-
LSTM
Conv-
LSTM
© 2019 Pathpartner Technology
Training an anomaly detector
26
Decoupling the “when” and “what”
Video Autoencoder Framework
Yt-1
Yt
Yt+1
E LSTM
Encoder Decoder
LSTM D
Space
Time
Memory Memory
Convolutional
Encoder
Convolutional
Decoder
Y’t+1
Error
𝑒𝑡 = 𝑌′
𝑡 + 1 −
𝑌𝑡 + 1 2
2D-CNN
2D-CNN
Conv-
LSTM
Conv-
LSTM
Sparse, low-dimensional
encoding
© 2019 Pathpartner Technology
Training an anomaly detector
27
Decoupling the “when” and “what”
Yt-1
Yt
Yt+1
E LSTM
Encoder Decoder
LSTM D
Space
Time
Memory Memory
Convolutional
Encoder
Convolutional
Decoder
Y’t+1
Error
𝑒𝑡 = 𝑌′
𝑡 + 1 −
𝑌𝑡 + 1 2
Video Autoencoder Framework
“Learn to Represent
Background”
© 2019 Pathpartner Technology
Deployment of anomaly detector
• When event occurs,
reconstruction is poor
• Anomaly decision
based on “Regularity
score”
• rmin and rmax are
derived on validation
set.
28
Decoupling the “when” and “what”
Yt-1
Yt
Yt+1
E LSTM
Encoder Decoder
LSTM D
Space
Time
Memory Memory
Convolutional
Encoder
Convolutional
Decoder
Y’t+1
Error
Regularity ScoreRt = 1 − (
𝑟𝑡
−𝑟𝑚𝑖 𝑛
𝑟 𝑚𝑎𝑥
)
Rt ≈ 1, higher likelihood of background
© 2019 Pathpartner Technology
Deployment of anomaly detector
• When event occurs,
reconstruction is poor
• Anomaly decision
based on “Regularity
score”
• rmin and rmax are
derived on validation
set.
• 0.8 GMACs / frame,
template: 171x128
29
Decoupling the “when” and “what”
Yt-1
Yt
Yt+1
E LSTM
Encoder Decoder
LSTM D
Space
Time
Memory Memory
Convolutional
Encoder
Convolutional
Decoder
Y’t+1
Error
Regularity ScoreRt = 1 − (
𝑟𝑡
−𝑟𝑚𝑖 𝑛
𝑟 𝑚𝑎𝑥
)
Rt ≈ 1, higher likelihood of background
© 2019 Pathpartner Technology
Event detection pipeline
30
Decoupling the “when” and “what”
Regularity
Score
t
C3D C3D
CleanAndJerk CleanAndJerk CleanAndJerk
C3D
© 2019 Pathpartner Technology
Choosing threshold for characterizing anomalies
• Choosing threshold R critical
for accuracy and complexity
• Large values of R ≈ 1 →
more false positives , better
recall
• Small values of R ≈ rmin/rmax →
better recall with large number
of false positives
31
Decoupling the “when” and “what”
R
R
R
Pred
GT
Pred
GT
Pred
GT
False detection
Event Missed
© 2019 Pathpartner Technology
Results – F1 score vs complexity on Thumos ’15
• Thumos ‘15 challenge,
conducted as a CVPR ’15
workshop
• Subset of UCF101, Only
20 human action classes
have event labels in
untrimmed videos.
• Evaluation metric:
Average recall and mAP
for given tIOU
32
Results
Threshold
(R)
#frame
proposals
mAP (0.4
tIoU)
Recall
(0.4 tIOU)
GMAC/ video
(avg)
0.3 10978 0.43 0.19 15.5k
0.4 16505 0.33 0.21 22.2k
0.5 18048 0.31 0.39 24.1k
0.6 23486 0.17 0.40 30.5k
0.7 30107 0.11 0.41 38.5k
Model #Event
Proposals
(avg)
Recall (0.2
tIoU)
GMAC/video
(avg)
TAL-NET 200 0.51 87k
TORNADO[1] 30 0.63 46.8k
Ours 84 0.421 13.9k
TORNADO is a better event proposal,
at the cost of additional compute
Effect of Threshold R on Accuracy/Compute
Comparison with Event Proposal methods
[1] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal
http://www.ntu.edu.sg/home/shijian.lu/Publicationss
© 2019 Pathpartner Technology
Analysis of effect of Segment proposals
33
Results
No proposals, inference at single temporal scale
Missed detections
for a class likely to
be false positive for
another!
© 2019 Pathpartner Technology
Analysis of effect of Segment proposals
34
Results
With proposals, inference at single temporal scale
Reduced Confusion
between events →
reduced false
positives, decrease
in recall for some
events.
Also, Increase in
confusion against
background
© 2019 Pathpartner Technology
Computational Cost of Detection
35
Results
Model GMAC/
inference
(max)
#params
(Million)
GMAC/
video
VGG 15 138 46.5k
C3D- SCNN[1] 79 80 237k
C3D-LSTM[2] 24 86 72k
TAL-NET[3] 29 98 87k
SSAD[4] 61 356 183k
TORNADO[5] 32 90.5 46.8k
Ours (ConvLSTM-
AD + C3D)
24 81 13.9k
Accuracy vs GMACs/video of event detectors
[1] Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
http://dvmmweb.cs.columbia.edu/files/dvmm_scnn_paper.pdf
[2] Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
https://imatge-upc.github.io/activitynet-2016-cvprw/
[3] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
https://arxiv.org/pdf/1804.07667.pdf
[4] Single Shot action detection https://arxiv.org/abs/1710.06236
[5] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal
http://www.ntu.edu.sg/home/shijian.lu/Publicationss
GMACs per video of event detectors
Segment
Proposals
© 2019 Pathpartner Technology
Summary
• Event Detection in Video with joint spatiotemporal features is
computationally expensive
• Decoupling models for making inferences in the spatial and long-term
temporal modalities can effectively reduce the overall GMACs/Video
• On systems with multiple compute units, decoupling provides logical
separation of algorithm into a part that needs to run at a high rate and a
part that is only called on-demand, enables heterogenous compute
• Accuracy – Complexity trade-off controlled by segment proposals
36
© 2019 Pathpartner Technology
Thank You
37
© 2019 Pathpartner Technology
Example of Resource Slide
38
Technical Papers
[1] Learning Spatiotemporal Features with
3D Convolutional Networks
https://arxiv.org/pdf/1412.0767.pdf
[2] Temporal Activity Detection in Untrimmed
Videos with Recurrent Neural Networks
https://imatge-upc.github.io/activitynet-
2016-cvprw/
[3] Rethinking the Faster R-CNN Architecture
for Temporal Action Localization
https://arxiv.org/pdf/1804.07667.pdf
Embedded Vision Summit
“Using Deep Learning for
Video Event Detection on a Compute
Budget”

Más contenido relacionado

La actualidad más candente

The motion estimation
The motion estimationThe motion estimation
The motion estimationsakshij91
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA accelerationMarco77328
 
Object recognition
Object recognitionObject recognition
Object recognitionsaniacorreya
 
[DL輪読会]Blind Video Temporal Consistency via Deep Video Prior
[DL輪読会]Blind Video Temporal Consistency via Deep Video Prior[DL輪読会]Blind Video Temporal Consistency via Deep Video Prior
[DL輪読会]Blind Video Temporal Consistency via Deep Video PriorDeep Learning JP
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnRahat Yasir
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal TransportGabriel Peyré
 
04 image enhancement edge detection
04 image enhancement edge detection04 image enhancement edge detection
04 image enhancement edge detectionRumah Belajar
 
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentationUNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentationTELKOMNIKA JOURNAL
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compressionasodariyabhavesh
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression techniquePriyanka Pachori
 
DIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image TransformDIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image Transformvijayanand Kandaswamy
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptxRADO7900
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Digital Image Processing - Image Restoration
Digital Image Processing - Image RestorationDigital Image Processing - Image Restoration
Digital Image Processing - Image RestorationMathankumar S
 

La actualidad más candente (20)

The motion estimation
The motion estimationThe motion estimation
The motion estimation
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
Visual CryptoGraphy
Visual CryptoGraphyVisual CryptoGraphy
Visual CryptoGraphy
 
Object recognition
Object recognitionObject recognition
Object recognition
 
[DL輪読会]Blind Video Temporal Consistency via Deep Video Prior
[DL輪読会]Blind Video Temporal Consistency via Deep Video Prior[DL輪読会]Blind Video Temporal Consistency via Deep Video Prior
[DL輪読会]Blind Video Temporal Consistency via Deep Video Prior
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Image processing Presentation
Image processing PresentationImage processing Presentation
Image processing Presentation
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
EMOTION DETECTION USING AI
EMOTION DETECTION USING AIEMOTION DETECTION USING AI
EMOTION DETECTION USING AI
 
04 image enhancement edge detection
04 image enhancement edge detection04 image enhancement edge detection
04 image enhancement edge detection
 
Fractal Image Compression
Fractal Image CompressionFractal Image Compression
Fractal Image Compression
 
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentationUNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compression
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression technique
 
DIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image TransformDIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image Transform
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Digital Image Processing - Image Restoration
Digital Image Processing - Image RestorationDigital Image Processing - Image Restoration
Digital Image Processing - Image Restoration
 

Similar a "Using Deep Learning for Video Event Detection on a Compute Budget," a Presentation from PathPartner Technology

Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationIRJET Journal
 
Real-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataReal-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataIRJET Journal
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...University of Southern California
 
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET Journal
 
Secured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWTSecured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWTEditor IJMTER
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...University of Southern California
 
CCTV Surveillance System, attacks and design goals
CCTV Surveillance System, attacks and design goals  CCTV Surveillance System, attacks and design goals
CCTV Surveillance System, attacks and design goals IJECEIAES
 
Harvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth ConstraintHarvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth ConstraintUniversity of Southern California
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...Journal For Research
 
Implementation Of Video Digital Watermarking Based on Python
Implementation Of Video Digital Watermarking Based on PythonImplementation Of Video Digital Watermarking Based on Python
Implementation Of Video Digital Watermarking Based on PythonIRJET Journal
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfPubrica
 
A Novel Approach for Compressing Surveillance System Videos
A Novel Approach for Compressing Surveillance System VideosA Novel Approach for Compressing Surveillance System Videos
A Novel Approach for Compressing Surveillance System VideosINFOGAIN PUBLICATION
 
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from QualcommEdge AI and Vision Alliance
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxPubrica
 
SUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOSSUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOSIRJET Journal
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONMtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONNEERAJ BAGHEL
 
3 d video coding & streaming real time of hd
3 d video coding & streaming real time of hd3 d video coding & streaming real time of hd
3 d video coding & streaming real time of hdEmpirix
 

Similar a "Using Deep Learning for Video Event Detection on a Compute Budget," a Presentation from PathPartner Technology (20)

Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
 
Real-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataReal-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big Data
 
E0704019023
E0704019023E0704019023
E0704019023
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
 
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
 
Secured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWTSecured Video Watermarking Based On DWT
Secured Video Watermarking Based On DWT
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
 
CCTV Surveillance System, attacks and design goals
CCTV Surveillance System, attacks and design goals  CCTV Surveillance System, attacks and design goals
CCTV Surveillance System, attacks and design goals
 
Harvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth ConstraintHarvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth Constraint
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
 
Implementation Of Video Digital Watermarking Based on Python
Implementation Of Video Digital Watermarking Based on PythonImplementation Of Video Digital Watermarking Based on Python
Implementation Of Video Digital Watermarking Based on Python
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
 
A Novel Approach for Compressing Surveillance System Videos
A Novel Approach for Compressing Surveillance System VideosA Novel Approach for Compressing Surveillance System Videos
A Novel Approach for Compressing Surveillance System Videos
 
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
 
SUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOSSUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOS
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONMtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
 
3 d video coding & streaming real time of hd
3 d video coding & streaming real time of hd3 d video coding & streaming real time of hd
3 d video coding & streaming real time of hd
 

Más de Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

Más de Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

"Using Deep Learning for Video Event Detection on a Compute Budget," a Presentation from PathPartner Technology

  • 1. © 2019 Pathpartner Technology Using Deep Learning for Video Event Detection on a Compute Budget Praveen Nayak Pathpartner Technology May 2019
  • 2. © 2019 Pathpartner Technology Outline • Introduction to Video Event Detection • Learning Representations from Video • From Video Representation to Event Detection • Decoupling the “When” and “What” • Results, case study on UCF101-Thumos2015 challenge • Conclusion 2
  • 3. © 2019 Pathpartner Technology Introduction to Video Event Detection 3
  • 4. © 2019 Pathpartner Technology Video data as viewed by ML • Video is a 3D signal • Spatial Coordinates x,y (limited by WxH) • Temporal Coordinates t (limited by T) • If we fix t, we obtain an image/frame • We can understand videos as sequence of images 4 Introduction
  • 5. © 2019 Pathpartner Technology Event Detection • Retrieve start (tstart) and end (tend)points of “event” from temporally “untrimmed” video • Evaluation metric: mAP and Recall for a given temporal IoU (tIoU) 5 Introduction to Event Detection nI nU 𝑡𝐼𝑂𝑈 = 𝑛𝐼 𝑛𝑈
  • 6. © 2019 Pathpartner Technology Learning Representations from Video 6
  • 7. © 2019 Pathpartner Technology Spatiotemporal fusion networks 7 Learning representations from Video Image: Kim et. al, Weighing classes and streams: toward better methods for two-stream convolutional networks
  • 8. © 2019 Pathpartner Technology Convolutions for spatiotemporal data 8 Learning representations from Video • C3D model: All convs are 3D, Fewer parameters than 2D convolutions over multiple frames Image: D.Tran et. al, Learning Spatiotemporal Features with 3D Convolutional Networks C3D feature vector 3D Convolutions
  • 9. © 2019 Pathpartner Technology Convolutions for spatiotemporal data 9 Learning representations from Video • C3D model: All convs are 3D, Fewer parameters than 2D convolutions over multiple frames. Image: D.Tran et. al, Learning Spatiotemporal Features with 3D Convolutional Networks C3D feature vector 3D Convolutions Joint Appearance and Motion features at every layer
  • 10. © 2019 Pathpartner Technology State-of-the-art video descriptors 10 Learning representations from Video Image: J Carriera et al, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
  • 11. © 2019 Pathpartner Technology Action Classification Datasets 11 Learning representations from Video Dataset Action Classes # Clips Temporal trimming HMDB-51[1] 51 ~7k Yes UCF-101[2] 101 ~13k Yes Kinetics[3] 400 ~160k No • The classification metric: mAP, similar to image based classification metric, extended in temporal domain • Classifier to make one decision per clip UCF-101 HMDB-51 Kinetics [1] UCF-101, University of Central Florida, https://www.crcv.ucf.edu/data/UCF101.php [2] HMDB-51, Brown university (http://serre-lab.clps.brown.edu/resource/hmdb-a-large- human-motion-database/) [3] Kinetics dataset, Deepmind (https://deepmind.com/research/open-source/open- source-datasets/kinetics/)
  • 12. © 2019 Pathpartner Technology State-of-the-art video descriptors 12 Learning representations from Video Image and Table: J Carriera et al, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
  • 13. © 2019 Pathpartner Technology From Video Representation to Event Detection 13
  • 14. © 2019 Pathpartner Technology From RCNN to Segment-CNN 14 From Video Representation to Event Detection Image: Z. Shou, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
  • 15. © 2019 Pathpartner Technology From RCNN to Segment-CNN 15 From Video Representation to Event Detection Image: Z. Shou, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs Computationally very intensive!
  • 16. © 2019 Pathpartner Technology TAL-NET: Parallels from Faster R-CNN • Problem split : class-sensitive proposal generation, followed by inference 16 From Video representations to Video event detection Faster-RCNN TAL-NET Image: Y. Chao et. Al, Rethinking the Faster R-CNN Architecture for Temporal Action Localization
  • 17. © 2019 Pathpartner Technology TAL-NET: Parallels from Faster R-CNN • Problem split : class-sensitive proposal generation, followed by inference 17 From Video representations to Video event detection Faster-RCNN TAL-NET Image: Y. Chao et. Al, Rethinking the Faster R-CNN Architecture for Temporal Action Localization End-to-end-Trainable
  • 18. © 2019 Pathpartner Technology Computational cost of Detection 18 From Video Representations to Video event detection Model GMAC/ inference #params (Million) GMAC/ video VGG 15 138 46.5k C3D- SCNN[1] 79 80 237k C3D-LSTM[2] 24 86 72k TAL-NET[3] 29 98 87k SSAD[4] 61 356 183k • Template size: 1. C3D, TAL-NET:171x128x16 2. VGG: 224x224x3 • GMAC/inference: could be for frame-level inference (VGG) or clip- level inference (C3D) • GMAC/video: assumes average video length of 3000 frames GMACs per video of event detectors [1] Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs http://dvmmweb.cs.columbia.edu/files/dvmm_scnn_paper.pdf [2] Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks https://imatge-upc.github.io/activitynet-2016-cvprw/ [3] Rethinking the Faster R-CNN Architecture for Temporal Action Localization https://arxiv.org/pdf/1804.07667.pdf [4] Single Shot action detection https://arxiv.org/abs/1710.06236 Accuracy vs GMACs / video of event detectors
  • 19. © 2019 Pathpartner Technology Decoupling the “When” and “What” 19
  • 20. © 2019 Pathpartner Technology Proposed event detection system • Break down problem into two parts: 1. Class agnostic segment proposal (low complexity) 2. Video segment inference (can be high complexity depending on nature of video) 20 Decoupling the “when” and “what”
  • 21. © 2019 Pathpartner Technology Proposed event detection system • Break down problem into two parts: 1. Class agnostic segment proposal (low complexity) 2. Video segment inference (can be high complexity depending on nature of video) • Inference model may be long or short term temporal, called on-demand 21 Decoupling the “when” and “what”
  • 22. © 2019 Pathpartner Technology Proposed event detection system • Break down problem into two parts: 1. Class agnostic segment proposal (low complexity) 2. Video segment inference (can be high complexity depending on nature of video) • Inference model may be long or short term temporal, called on-demand. • Characteristics of a good segment proposal: • Cater to arbitrary event lengths • Discriminate event from background, irrespective of event • needs to run for every frame, so low in complexity 22 Decoupling the “when” and “what”
  • 23. © 2019 Pathpartner Technology Class agnostic segment proposal • Formulate problem as unsupervised “anomaly detection” • Train a model to learn anomalies against background • At deployment, predict a binary label, i.e., anomaly has occurred or not 23 Decoupling the “when” and “what” Anomaly Detector Video clip Anomaly DetectorAnomaly DetectorAnomaly DetectorAnomaly DetectorAnomaly Detector Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Result t
  • 24. © 2019 Pathpartner Technology Training an anomaly detector 24 Decoupling the “when” and “what” Yt-1 Yt Yt+1 E LSTM Encoder Decoder LSTM D Space Time Memory Memory Convolutional Encoder Convolutional Decoder Y’t+1 Error 𝑒𝑡 = 𝑌′ 𝑡 + 1 − 𝑌𝑡 + 1 2 Video Autoencoder Framework
  • 25. © 2019 Pathpartner Technology Training an anomaly detector 25 Decoupling the “when” and “what” Video Autoencoder Framework Yt-1 Yt Yt+1 E LSTM Encoder Decoder LSTM D Space Time Memory Memory Convolutional Encoder Convolutional Decoder Y’t+1 Error 𝑒𝑡 = 𝑌′ 𝑡 + 1 − 𝑌𝑡 + 1 2 2D-CNN 2D-CNN Conv- LSTM Conv- LSTM
  • 26. © 2019 Pathpartner Technology Training an anomaly detector 26 Decoupling the “when” and “what” Video Autoencoder Framework Yt-1 Yt Yt+1 E LSTM Encoder Decoder LSTM D Space Time Memory Memory Convolutional Encoder Convolutional Decoder Y’t+1 Error 𝑒𝑡 = 𝑌′ 𝑡 + 1 − 𝑌𝑡 + 1 2 2D-CNN 2D-CNN Conv- LSTM Conv- LSTM Sparse, low-dimensional encoding
  • 27. © 2019 Pathpartner Technology Training an anomaly detector 27 Decoupling the “when” and “what” Yt-1 Yt Yt+1 E LSTM Encoder Decoder LSTM D Space Time Memory Memory Convolutional Encoder Convolutional Decoder Y’t+1 Error 𝑒𝑡 = 𝑌′ 𝑡 + 1 − 𝑌𝑡 + 1 2 Video Autoencoder Framework “Learn to Represent Background”
  • 28. © 2019 Pathpartner Technology Deployment of anomaly detector • When event occurs, reconstruction is poor • Anomaly decision based on “Regularity score” • rmin and rmax are derived on validation set. 28 Decoupling the “when” and “what” Yt-1 Yt Yt+1 E LSTM Encoder Decoder LSTM D Space Time Memory Memory Convolutional Encoder Convolutional Decoder Y’t+1 Error Regularity ScoreRt = 1 − ( 𝑟𝑡 −𝑟𝑚𝑖 𝑛 𝑟 𝑚𝑎𝑥 ) Rt ≈ 1, higher likelihood of background
  • 29. © 2019 Pathpartner Technology Deployment of anomaly detector • When event occurs, reconstruction is poor • Anomaly decision based on “Regularity score” • rmin and rmax are derived on validation set. • 0.8 GMACs / frame, template: 171x128 29 Decoupling the “when” and “what” Yt-1 Yt Yt+1 E LSTM Encoder Decoder LSTM D Space Time Memory Memory Convolutional Encoder Convolutional Decoder Y’t+1 Error Regularity ScoreRt = 1 − ( 𝑟𝑡 −𝑟𝑚𝑖 𝑛 𝑟 𝑚𝑎𝑥 ) Rt ≈ 1, higher likelihood of background
  • 30. © 2019 Pathpartner Technology Event detection pipeline 30 Decoupling the “when” and “what” Regularity Score t C3D C3D CleanAndJerk CleanAndJerk CleanAndJerk C3D
  • 31. © 2019 Pathpartner Technology Choosing threshold for characterizing anomalies • Choosing threshold R critical for accuracy and complexity • Large values of R ≈ 1 → more false positives , better recall • Small values of R ≈ rmin/rmax → better recall with large number of false positives 31 Decoupling the “when” and “what” R R R Pred GT Pred GT Pred GT False detection Event Missed
  • 32. © 2019 Pathpartner Technology Results – F1 score vs complexity on Thumos ’15 • Thumos ‘15 challenge, conducted as a CVPR ’15 workshop • Subset of UCF101, Only 20 human action classes have event labels in untrimmed videos. • Evaluation metric: Average recall and mAP for given tIOU 32 Results Threshold (R) #frame proposals mAP (0.4 tIoU) Recall (0.4 tIOU) GMAC/ video (avg) 0.3 10978 0.43 0.19 15.5k 0.4 16505 0.33 0.21 22.2k 0.5 18048 0.31 0.39 24.1k 0.6 23486 0.17 0.40 30.5k 0.7 30107 0.11 0.41 38.5k Model #Event Proposals (avg) Recall (0.2 tIoU) GMAC/video (avg) TAL-NET 200 0.51 87k TORNADO[1] 30 0.63 46.8k Ours 84 0.421 13.9k TORNADO is a better event proposal, at the cost of additional compute Effect of Threshold R on Accuracy/Compute Comparison with Event Proposal methods [1] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal http://www.ntu.edu.sg/home/shijian.lu/Publicationss
  • 33. © 2019 Pathpartner Technology Analysis of effect of Segment proposals 33 Results No proposals, inference at single temporal scale Missed detections for a class likely to be false positive for another!
  • 34. © 2019 Pathpartner Technology Analysis of effect of Segment proposals 34 Results With proposals, inference at single temporal scale Reduced Confusion between events → reduced false positives, decrease in recall for some events. Also, Increase in confusion against background
  • 35. © 2019 Pathpartner Technology Computational Cost of Detection 35 Results Model GMAC/ inference (max) #params (Million) GMAC/ video VGG 15 138 46.5k C3D- SCNN[1] 79 80 237k C3D-LSTM[2] 24 86 72k TAL-NET[3] 29 98 87k SSAD[4] 61 356 183k TORNADO[5] 32 90.5 46.8k Ours (ConvLSTM- AD + C3D) 24 81 13.9k Accuracy vs GMACs/video of event detectors [1] Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs http://dvmmweb.cs.columbia.edu/files/dvmm_scnn_paper.pdf [2] Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks https://imatge-upc.github.io/activitynet-2016-cvprw/ [3] Rethinking the Faster R-CNN Architecture for Temporal Action Localization https://arxiv.org/pdf/1804.07667.pdf [4] Single Shot action detection https://arxiv.org/abs/1710.06236 [5] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal http://www.ntu.edu.sg/home/shijian.lu/Publicationss GMACs per video of event detectors Segment Proposals
  • 36. © 2019 Pathpartner Technology Summary • Event Detection in Video with joint spatiotemporal features is computationally expensive • Decoupling models for making inferences in the spatial and long-term temporal modalities can effectively reduce the overall GMACs/Video • On systems with multiple compute units, decoupling provides logical separation of algorithm into a part that needs to run at a high rate and a part that is only called on-demand, enables heterogenous compute • Accuracy – Complexity trade-off controlled by segment proposals 36
  • 37. © 2019 Pathpartner Technology Thank You 37
  • 38. © 2019 Pathpartner Technology Example of Resource Slide 38 Technical Papers [1] Learning Spatiotemporal Features with 3D Convolutional Networks https://arxiv.org/pdf/1412.0767.pdf [2] Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks https://imatge-upc.github.io/activitynet- 2016-cvprw/ [3] Rethinking the Faster R-CNN Architecture for Temporal Action Localization https://arxiv.org/pdf/1804.07667.pdf Embedded Vision Summit “Using Deep Learning for Video Event Detection on a Compute Budget”