SlideShare una empresa de Scribd logo
1 de 10
Video Object Detection
Challenge
Box-level post-processing
*Feature level learning
• Flow-Guided Feature Aggregation for Video Object Detection
• Deep Feature Flow for Video Recognition
• Towards High Performance Video Object Detection
• Fully Motion-Aware Network for Video Object Detection
Flow-Guided Feature Aggregation
1. Warping:
2. Aggregation:
3. Adaptive weight:
Flow-Guided Feature Aggregation Deep Feature Flow
Flow-Guided Feature Aggregation Deep Feature Flow
Towards High Performance Video Object Detection
Fully Motion-Aware Network
Pixel level feature calibration is bad in:
• Appearance dynamic changes
• Occlusion
ROI-level motion:
Motion guided calibration:
Aggregation:
Fully Motion-Aware Network
Paper Date Base detector Backbone Tracking? Optical flow? Online? mAP(%) FPS (Titan X)
Seq-NMS Feb 2016 R-FCN ResNet101 no no no 76.8 2.3
T-CNN Apr 2016 RCNN
DeepIDNet+CRAF
T
yes no no 73.8 -
DFF Nov 2016 R-FCN ResNet101 no yes yes 73.0 29
TPN Feb 2017 TPN GoogLeNet yes no no 68.4 -
FGFA Mar 2017 R-FCN ResNet101 no yes yes 76.3 1.4
FGFA + Seq-NMS 29 Mar 2017 R-FCN ResNet101 no yes no 78.4 1.14
D&T Oct 2017
R-FCN(15
anchors)
ResNet101 yes no no 79.8 7.09
STMN Dec 2017 R-FCN ResNet101 no no no 80.5 -
Scale-time-lattice 16 Apr 2018
Faster RCNN(15
anchors)
ResNet101 no no no 79.6 20
Scale-time-lattice Apr 2018
Faster RCNN(15
anchors)
ResNet101 no no no 79.0 62
SSN (per-frame
baseline for STSN)
Mar 2018 R-FCN
Deformable
ResNet101
no no yes 76.0 -
STSN Mar 2018 R-FCN
Deformable
ResNet101
no no yes 78.9 -
STSN+Seq-NMS Mar 2018 R-FCN
Deformable
ResNet101
no no no 80.4 -
MANet Sep. 2018 R-FCN ResNet101 no yes yes 78.1 5
MANet+Seq-NMS Sep. 2018 R-FCN ResNet101 no yes no 80.3 -
Tracklet-
Conditioned
Detection
Nov 2018 R-FCN ResNet101 yes no yes 78.1 -
Tracklet-
Conditioned
Detection+DCNv2
Nov 2018 R-FCN ResNet101 yes no yes 82.0 -

Más contenido relacionado

Similar a Video object detection

Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what...
Techdays 2013   managing your hybrid cloud datacenter with scom 2012 and what...Techdays 2013   managing your hybrid cloud datacenter with scom 2012 and what...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what...
CompuTrain. De IT opleider.
 
Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...
Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...
Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...
Savvius, Inc
 

Similar a Video object detection (20)

OpenStack-Based NFV Cloud at Swisscom: challenges and best practices
OpenStack-Based NFV Cloud at Swisscom: challenges and best practicesOpenStack-Based NFV Cloud at Swisscom: challenges and best practices
OpenStack-Based NFV Cloud at Swisscom: challenges and best practices
 
The Need for Complex Analytics from Forwarding Pipelines
The Need for Complex Analytics from Forwarding Pipelines The Need for Complex Analytics from Forwarding Pipelines
The Need for Complex Analytics from Forwarding Pipelines
 
NetBrain CE 5.0
NetBrain CE 5.0NetBrain CE 5.0
NetBrain CE 5.0
 
#lspe: Dynamic Scaling
#lspe: Dynamic Scaling #lspe: Dynamic Scaling
#lspe: Dynamic Scaling
 
Project Plan_v1.2.pptx
Project Plan_v1.2.pptxProject Plan_v1.2.pptx
Project Plan_v1.2.pptx
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Paris Video Tech - 1st Edition: Dailymotion Améliorer l'expérience utilisateu...
Paris Video Tech - 1st Edition: Dailymotion Améliorer l'expérience utilisateu...Paris Video Tech - 1st Edition: Dailymotion Améliorer l'expérience utilisateu...
Paris Video Tech - 1st Edition: Dailymotion Améliorer l'expérience utilisateu...
 
Action_recognition-topic.pptx
Action_recognition-topic.pptxAction_recognition-topic.pptx
Action_recognition-topic.pptx
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
 
IPTV QoE Monitoring
IPTV QoE MonitoringIPTV QoE Monitoring
IPTV QoE Monitoring
 
“Develop Next-gen Camera Apps Using Snapdragon Computer Vision Technologies,”...
“Develop Next-gen Camera Apps Using Snapdragon Computer Vision Technologies,”...“Develop Next-gen Camera Apps Using Snapdragon Computer Vision Technologies,”...
“Develop Next-gen Camera Apps Using Snapdragon Computer Vision Technologies,”...
 
RIPE NCC Operations and Analysis Tools
RIPE NCC Operations and Analysis ToolsRIPE NCC Operations and Analysis Tools
RIPE NCC Operations and Analysis Tools
 
IRJET- Fish Recognition and Detection Based on Deep Learning
IRJET-  	  Fish Recognition and Detection Based on Deep LearningIRJET-  	  Fish Recognition and Detection Based on Deep Learning
IRJET- Fish Recognition and Detection Based on Deep Learning
 
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
 
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what...
Techdays 2013   managing your hybrid cloud datacenter with scom 2012 and what...Techdays 2013   managing your hybrid cloud datacenter with scom 2012 and what...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what...
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
Taiwan IPv6 Readiness Measurement
Taiwan IPv6 Readiness MeasurementTaiwan IPv6 Readiness Measurement
Taiwan IPv6 Readiness Measurement
 
Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...
Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...
Don't Let History Repeat Itself – Network Monitoring and Reporting with Watch...
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 

Más de 哲东 郑

Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...
哲东 郑
 
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and StyleImage Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style
哲东 郑
 
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalPolysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
哲东 郑
 
Scops self supervised co-part segmentation
Scops self supervised co-part segmentationScops self supervised co-part segmentation
Scops self supervised co-part segmentation
哲东 郑
 
Semantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive NormalizationSemantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive Normalization
哲东 郑
 
Instance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flowInstance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flow
哲东 郑
 
Learning to adapt structured output space for semantic
Learning to adapt structured output space for semanticLearning to adapt structured output space for semantic
Learning to adapt structured output space for semantic
哲东 郑
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation
哲东 郑
 
Graph based global reasoning networks
Graph based global reasoning networks Graph based global reasoning networks
Graph based global reasoning networks
哲东 郑
 
Variational Discriminator Bottleneck
Variational Discriminator BottleneckVariational Discriminator Bottleneck
Variational Discriminator Bottleneck
哲东 郑
 

Más de 哲东 郑 (20)

Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...
 
Step zhedong
Step zhedongStep zhedong
Step zhedong
 
Visual saliency
Visual saliencyVisual saliency
Visual saliency
 
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and StyleImage Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style
 
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalPolysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
 
Weijian image retrieval
Weijian image retrievalWeijian image retrieval
Weijian image retrieval
 
Scops self supervised co-part segmentation
Scops self supervised co-part segmentationScops self supervised co-part segmentation
Scops self supervised co-part segmentation
 
Center nets
Center netsCenter nets
Center nets
 
C2 ae open set recognition
C2 ae open set recognitionC2 ae open set recognition
C2 ae open set recognition
 
Sota semantic segmentation
Sota semantic segmentationSota semantic segmentation
Sota semantic segmentation
 
Deep randomized embedding
Deep randomized embeddingDeep randomized embedding
Deep randomized embedding
 
Semantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive NormalizationSemantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive Normalization
 
Instance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flowInstance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flow
 
Learning to adapt structured output space for semantic
Learning to adapt structured output space for semanticLearning to adapt structured output space for semantic
Learning to adapt structured output space for semantic
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation
 
Graph based global reasoning networks
Graph based global reasoning networks Graph based global reasoning networks
Graph based global reasoning networks
 
Style gan
Style ganStyle gan
Style gan
 
Vi2vi
Vi2viVi2vi
Vi2vi
 
Variational Discriminator Bottleneck
Variational Discriminator BottleneckVariational Discriminator Bottleneck
Variational Discriminator Bottleneck
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Video object detection

  • 3. Box-level post-processing *Feature level learning • Flow-Guided Feature Aggregation for Video Object Detection • Deep Feature Flow for Video Recognition • Towards High Performance Video Object Detection • Fully Motion-Aware Network for Video Object Detection
  • 4. Flow-Guided Feature Aggregation 1. Warping: 2. Aggregation: 3. Adaptive weight:
  • 7. Towards High Performance Video Object Detection
  • 8. Fully Motion-Aware Network Pixel level feature calibration is bad in: • Appearance dynamic changes • Occlusion ROI-level motion: Motion guided calibration: Aggregation:
  • 10. Paper Date Base detector Backbone Tracking? Optical flow? Online? mAP(%) FPS (Titan X) Seq-NMS Feb 2016 R-FCN ResNet101 no no no 76.8 2.3 T-CNN Apr 2016 RCNN DeepIDNet+CRAF T yes no no 73.8 - DFF Nov 2016 R-FCN ResNet101 no yes yes 73.0 29 TPN Feb 2017 TPN GoogLeNet yes no no 68.4 - FGFA Mar 2017 R-FCN ResNet101 no yes yes 76.3 1.4 FGFA + Seq-NMS 29 Mar 2017 R-FCN ResNet101 no yes no 78.4 1.14 D&T Oct 2017 R-FCN(15 anchors) ResNet101 yes no no 79.8 7.09 STMN Dec 2017 R-FCN ResNet101 no no no 80.5 - Scale-time-lattice 16 Apr 2018 Faster RCNN(15 anchors) ResNet101 no no no 79.6 20 Scale-time-lattice Apr 2018 Faster RCNN(15 anchors) ResNet101 no no no 79.0 62 SSN (per-frame baseline for STSN) Mar 2018 R-FCN Deformable ResNet101 no no yes 76.0 - STSN Mar 2018 R-FCN Deformable ResNet101 no no yes 78.9 - STSN+Seq-NMS Mar 2018 R-FCN Deformable ResNet101 no no no 80.4 - MANet Sep. 2018 R-FCN ResNet101 no yes yes 78.1 5 MANet+Seq-NMS Sep. 2018 R-FCN ResNet101 no yes no 80.3 - Tracklet- Conditioned Detection Nov 2018 R-FCN ResNet101 yes no yes 78.1 - Tracklet- Conditioned Detection+DCNv2 Nov 2018 R-FCN ResNet101 yes no yes 82.0 -