SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
1
Information Technologies Institute
Centre for Research and Technology Hellas
Fast object re-detection and localization
in video for spatio-temporal fragment
creation
Evlampios Apostolidis, Vasileios Mezaris, Ioannis Kompatsiaris
Information Technologies Institute / Centre for Research and Technology Hellas
ICME MMIX 2013, San Jose, CA, USA, July 2013
2
Information Technologies Institute
Centre for Research and Technology Hellas
Overview
• Introduction - problem formulation
• Related work
• Baseline approach
• Proposed approach
– GPU-based processing
– Video-structure-based sampling of video frames
– Robustness to scale variations
• Experiments and results
• Conclusions
3
Information Technologies Institute
Centre for Research and Technology Hellas
Introduction – problem formulation
• Object re-detection: a particular case of image matching
• Main goal: find instances of a specific object within a single video or a
collection of videos
– Input: object of interest + video file
– Processing: similarity estimation by means of image matching
– Output: detected instances of the object of interest
4
Information Technologies Institute
Centre for Research and Technology Hellas
Introduction – problem formulation
Extension for interactive and linked TV
• Semi-automatic identification and annotation of object-specific spatio-
temporal media fragments
– Annotate the object of interest
– Run the object re-detection algorithm
– Get automatically instance-based annotated video fragments
– Find related content fragments and establish links between them
Assign a label
to the object of
interest
Instance-based
annotated
video fragment
Links to related
content
5
Information Technologies Institute
Centre for Research and Technology Hellas
Related work
• Extraction and matching of scale- and rotation-invariant local descriptors
is one of the most popular SoA approaches for similarity estimation
between pairs of images
– Local feature extraction
• Edge detectors (e.g. Canny), corner detectors (e.g. Harris-Laplace)
– Local feature description
• SIFT or extensions of it, SURF, BRISK, binary descriptors such as BRIEF, …
– Matching of local descriptors
• k-Nearest Neighbor search between descriptor pairs using brute-force or hashing
– Filtering of erroneous matches
• Symmetry test between the pairs of matched descriptors
• Ratio test regarding the distances of the calculated nearest neighbors
• Geometric verification between the pair of images using RANSAC
– Extensions
• Combined use of keypoints and motion information (tracking)
• Bag-of-Words (BoW) matching for pruning
6
Information Technologies Institute
Centre for Research and Technology Hellas
Proposed approach
• Starting from a baseline approach,
– Improve detection accuracy
– Reduce the needed processing time
• Work directions:
– GPU-based processing
– Video-structure-based sampling of frames
– Enhancing robustness to scale variations
7
Information Technologies Institute
Centre for Research and Technology Hellas
GPU-based processing
Accelerated parts of the overall pipeline:
• Video decompression
into frames
• Keypoint detection and
description
• Brute-Force matching
and 2-NN search
• Drawing of the
calculated bounding
boxes (optional)
8
Information Technologies Institute
Centre for Research and Technology Hellas
Video-structure-based sampling
• Sequential processing of video frames is replaced by a structure-based
one, using the analysis results of a shot segmentation method
Example
Check shot 1
No detection!
Move to the
next shot
Check shot 2
Detection!
Check all
shot-2 frames
Detect and highlight
the object of interest
9
Information Technologies Institute
Centre for Research and Technology Hellas
Robustness to scale variations
Problem
• Major changes in scale may lead to detection failure due to the significant
limitation of the area that is used for matching
• Zoom-in case: the middle image (b) corresponds to a small upper right
area of the object O in the left one (a)
• Zoom-out case: in the right image (c) the object O occupies a very small
part of the frame
• Both cases lead to a considerable reduction of the number of matched
pairs of descriptors, and thus often to detection failure
a b c
10
Information Technologies Institute
Centre for Research and Technology Hellas
Robustness to scale variations
Solution
• we automatically generate a zoomed-out and a centralized zoomed-in
instance of the object O and we utilize them in the matching procedure
Zoomed-in instance
– selection of a center-aligned sub-
area of the original object O and
enlargement to the actual size of O
using bilinear interpolation
– choice: 70% of the original image
area  140% zoom-in factor
Zoomed-out instance
– shrink the original image O into a
smaller one using nearest neighbor
interpolation
– the maximum zoom-out factor is
determined by the restrictions of
the GPU-based implementation of
SURF
Original
image
Zoomed-in
instance
Zoomed-out
instance
11
Information Technologies Institute
Centre for Research and Technology Hellas
Experiments and Results
• System specifications
– Intel Core i7 processor at 3.4GHz
– 8GB RAM memory
– CUDA-enabled NVIDIA GeForce GTX560 GPU
• Dataset
– 6 videos* of 273 minutes total duration
– 30 manually selected objects
• Ground-truth (generated via manual annotation)
– 75.632 frames contain at least one of these objects
– 333.455 frames do not include any of the selected objects
* The videos are episodes from the “Antiques Roadshow” of the Dutch public broadcaster AVRO (http://avro.nl/)
Examples of sought objects
12
Information Technologies Institute
Centre for Research and Technology Hellas
Experiments and Results
• Aim: quantify the improvement that each extension of the baseline
approach is responsible for
• Four experimental configurations:
– C1: baseline implementation
– C2: GPU-accelerated implementation,
– C3: GPU-accelerated and video-structure-based sampling
implementation
– C4: complete proposed approach which includes:
GPU-processing
video-structure-based sampling
and robustness to scale variations
13
Information Technologies Institute
Centre for Research and Technology Hellas
Experiments and Results
• Detection accuracy is expressed in terms of Precision, Recall and F-Score
• Evaluation was performed in a per-frame basis, i.e. considering the 30
selected objects and counting the number of frames where these were
correctly detected, missed, etc.
• Time efficiency was evaluated by expressing the processing time of each
configuration as a factor of the actual duration of the processed videos
• Robustness to scale variations was quantified using two specific sets of
frames where the object of interest was observed from:
– a very close viewing position (2.940 frames) and
– a very distant viewing position (4.648 frames)
14
Information Technologies Institute
Centre for Research and Technology Hellas
Experiments and Results
Precision Recall F-Score
C1 0.999 0.856 0.922
C2 0.999 0.856 0.922
C3 1.000 0.852 0.920
C4 1.000 0.992 0.996
Precision Recall F-Score Processing Time
(x Real-Time)
C1 0.999 0.868 0.929 2.98-5.26
C2 0.999 0.850 0.918 0.35-1.24
C3 0.999 0.849 0.918 0.03-0.13
C4 0.999 0.872 0.931 0.03-0.19
Evaluation results for configurations C1 to C4
Precision Recall F-Score
C1 0.999 0.831 0.907
C2 0.999 0.831 0.907
C3 1.000 0.799 0.888
C4 1.000 0.914 0.955
Evaluation results for highly zoomed-out instances Evaluation results for highly zoomed-in instances
15
Information Technologies Institute
Centre for Research and Technology Hellas
Experiments and Results
Detection accuracy
• All versions exhibited very good results in terms of detection accuracy
• Version C4 (complete proposed approach) achieved the best results
• The algorithm performed considerably well for a range of different scales
and orientations and for partial visibility or partial occlusion
Processing time
• The video-structure-based sampling
strategy led to a great reduction of the
required processing time
• The algorithm needs about 10% of the
video’s duration, preserving the same
high levels of detection accuracy with
the slower configurations
Online demo available at: http://www.youtube.com/watch?v=0IeVkXRTYu8
16
Information Technologies Institute
Centre for Research and Technology Hellas
Extensions, ideas and plans
• Recent extension: Multiple instances of an object of interest can be used
as input for more efficient re-detection of 3D objects
• Future ideas: test the algorithm’s performance as a tool for chapter
segmentation in videos where the chapters are temporally demarcated by
the presence of a specific object (e.g. a painting in a video about art)
• Future plans: evaluate the extended algorithm’s performance (detection
accuracy and time efficiency) in a new set of videos
Input Output
17
Information Technologies Institute
Centre for Research and Technology Hellas
Conclusions
• The proposed method can be used for fast and accurate re-detection of
pre-defined objects in videos
• The time performance of the implemented algorithm allows for real-time
processing of multi-media content
• Extended by a prior object labeling step, this technique can be seen as:
– A reliable tool for instance-based annotated, spatio-temporal
fragments in videos
– A key-enabled technology for finding similar content and establishing
links between related media fragments, thus contributing to the
realization of interactive and linked TV
18
Information Technologies Institute
Centre for Research and Technology Hellas
Questions?
More information:
http://www.iti.gr/~bmezaris
bmezaris@iti.gr

Más contenido relacionado

Similar a Fast Object Re-Detection in Video for Fragment Creation

Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV
 
Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)LinkedTV
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...SWAMI06
 
Video copy detection using segmentation method and
Video copy detection using segmentation method andVideo copy detection using segmentation method and
Video copy detection using segmentation method andeSAT Publishing House
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realityNAVER Engineering
 
Research Proposal Presentation Pitch
Research Proposal Presentation PitchResearch Proposal Presentation Pitch
Research Proposal Presentation Pitchtchoonyong
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal Ziqiang Feng
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET Journal
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningAly Abdelkareem
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationIRJET Journal
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018InVID Project
 

Similar a Fast Object Re-Detection in Video for Fragment Creation (20)

Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...
 
Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
 
Video copy detection using segmentation method and
Video copy detection using segmentation method andVideo copy detection using segmentation method and
Video copy detection using segmentation method and
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
Research Proposal Presentation Pitch
Research Proposal Presentation PitchResearch Proposal Presentation Pitch
Research Proposal Presentation Pitch
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
 
Defense_20140625
Defense_20140625Defense_20140625
Defense_20140625
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
VVC_PPT.pptx
VVC_PPT.pptxVVC_PPT.pptx
VVC_PPT.pptx
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
VVC Project.pptx
VVC Project.pptxVVC Project.pptx
VVC Project.pptx
 
VVC Project.pptx
VVC Project.pptxVVC Project.pptx
VVC Project.pptx
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018
 

Más de MediaMixerCommunity

VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...MediaMixerCommunity
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
 
Remixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and SemanticsRemixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and SemanticsMediaMixerCommunity
 
Re-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and ExamplesRe-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and ExamplesMediaMixerCommunity
 
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskSemantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskMediaMixerCommunity
 
Opening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchersOpening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchersMediaMixerCommunity
 
The Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixingThe Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixingMediaMixerCommunity
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesMediaMixerCommunity
 
Media Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industryMedia Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industryMediaMixerCommunity
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesMediaMixerCommunity
 
Semantic technologies for copyright management
Semantic technologies for copyright managementSemantic technologies for copyright management
Semantic technologies for copyright managementMediaMixerCommunity
 
Tell me why! ain't nothin' but a mistake describing media item differences w...
Tell me why! ain't nothin' but a mistake  describing media item differences w...Tell me why! ain't nothin' but a mistake  describing media item differences w...
Tell me why! ain't nothin' but a mistake describing media item differences w...MediaMixerCommunity
 
A feature analysis based fragment remix instrument
A feature analysis based fragment remix instrumentA feature analysis based fragment remix instrument
A feature analysis based fragment remix instrumentMediaMixerCommunity
 
Video concept detection by learning from web images
Video concept detection by learning from web imagesVideo concept detection by learning from web images
Video concept detection by learning from web imagesMediaMixerCommunity
 
Example-Based Remixing of Multimedia Contents
Example-Based Remixing of Multimedia ContentsExample-Based Remixing of Multimedia Contents
Example-Based Remixing of Multimedia ContentsMediaMixerCommunity
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...MediaMixerCommunity
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013MediaMixerCommunity
 

Más de MediaMixerCommunity (18)

VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Remixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and SemanticsRemixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and Semantics
 
Re-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and ExamplesRe-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and Examples
 
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskSemantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
 
Opening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchersOpening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchers
 
The Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixingThe Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixing
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ Archives
 
Media Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industryMedia Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industry
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ Archives
 
Semantic multimedia remixing
Semantic multimedia remixingSemantic multimedia remixing
Semantic multimedia remixing
 
Semantic technologies for copyright management
Semantic technologies for copyright managementSemantic technologies for copyright management
Semantic technologies for copyright management
 
Tell me why! ain't nothin' but a mistake describing media item differences w...
Tell me why! ain't nothin' but a mistake  describing media item differences w...Tell me why! ain't nothin' but a mistake  describing media item differences w...
Tell me why! ain't nothin' but a mistake describing media item differences w...
 
A feature analysis based fragment remix instrument
A feature analysis based fragment remix instrumentA feature analysis based fragment remix instrument
A feature analysis based fragment remix instrument
 
Video concept detection by learning from web images
Video concept detection by learning from web imagesVideo concept detection by learning from web images
Video concept detection by learning from web images
 
Example-Based Remixing of Multimedia Contents
Example-Based Remixing of Multimedia ContentsExample-Based Remixing of Multimedia Contents
Example-Based Remixing of Multimedia Contents
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Fast Object Re-Detection in Video for Fragment Creation

  • 1. 1 Information Technologies Institute Centre for Research and Technology Hellas Fast object re-detection and localization in video for spatio-temporal fragment creation Evlampios Apostolidis, Vasileios Mezaris, Ioannis Kompatsiaris Information Technologies Institute / Centre for Research and Technology Hellas ICME MMIX 2013, San Jose, CA, USA, July 2013
  • 2. 2 Information Technologies Institute Centre for Research and Technology Hellas Overview • Introduction - problem formulation • Related work • Baseline approach • Proposed approach – GPU-based processing – Video-structure-based sampling of video frames – Robustness to scale variations • Experiments and results • Conclusions
  • 3. 3 Information Technologies Institute Centre for Research and Technology Hellas Introduction – problem formulation • Object re-detection: a particular case of image matching • Main goal: find instances of a specific object within a single video or a collection of videos – Input: object of interest + video file – Processing: similarity estimation by means of image matching – Output: detected instances of the object of interest
  • 4. 4 Information Technologies Institute Centre for Research and Technology Hellas Introduction – problem formulation Extension for interactive and linked TV • Semi-automatic identification and annotation of object-specific spatio- temporal media fragments – Annotate the object of interest – Run the object re-detection algorithm – Get automatically instance-based annotated video fragments – Find related content fragments and establish links between them Assign a label to the object of interest Instance-based annotated video fragment Links to related content
  • 5. 5 Information Technologies Institute Centre for Research and Technology Hellas Related work • Extraction and matching of scale- and rotation-invariant local descriptors is one of the most popular SoA approaches for similarity estimation between pairs of images – Local feature extraction • Edge detectors (e.g. Canny), corner detectors (e.g. Harris-Laplace) – Local feature description • SIFT or extensions of it, SURF, BRISK, binary descriptors such as BRIEF, … – Matching of local descriptors • k-Nearest Neighbor search between descriptor pairs using brute-force or hashing – Filtering of erroneous matches • Symmetry test between the pairs of matched descriptors • Ratio test regarding the distances of the calculated nearest neighbors • Geometric verification between the pair of images using RANSAC – Extensions • Combined use of keypoints and motion information (tracking) • Bag-of-Words (BoW) matching for pruning
  • 6. 6 Information Technologies Institute Centre for Research and Technology Hellas Proposed approach • Starting from a baseline approach, – Improve detection accuracy – Reduce the needed processing time • Work directions: – GPU-based processing – Video-structure-based sampling of frames – Enhancing robustness to scale variations
  • 7. 7 Information Technologies Institute Centre for Research and Technology Hellas GPU-based processing Accelerated parts of the overall pipeline: • Video decompression into frames • Keypoint detection and description • Brute-Force matching and 2-NN search • Drawing of the calculated bounding boxes (optional)
  • 8. 8 Information Technologies Institute Centre for Research and Technology Hellas Video-structure-based sampling • Sequential processing of video frames is replaced by a structure-based one, using the analysis results of a shot segmentation method Example Check shot 1 No detection! Move to the next shot Check shot 2 Detection! Check all shot-2 frames Detect and highlight the object of interest
  • 9. 9 Information Technologies Institute Centre for Research and Technology Hellas Robustness to scale variations Problem • Major changes in scale may lead to detection failure due to the significant limitation of the area that is used for matching • Zoom-in case: the middle image (b) corresponds to a small upper right area of the object O in the left one (a) • Zoom-out case: in the right image (c) the object O occupies a very small part of the frame • Both cases lead to a considerable reduction of the number of matched pairs of descriptors, and thus often to detection failure a b c
  • 10. 10 Information Technologies Institute Centre for Research and Technology Hellas Robustness to scale variations Solution • we automatically generate a zoomed-out and a centralized zoomed-in instance of the object O and we utilize them in the matching procedure Zoomed-in instance – selection of a center-aligned sub- area of the original object O and enlargement to the actual size of O using bilinear interpolation – choice: 70% of the original image area  140% zoom-in factor Zoomed-out instance – shrink the original image O into a smaller one using nearest neighbor interpolation – the maximum zoom-out factor is determined by the restrictions of the GPU-based implementation of SURF Original image Zoomed-in instance Zoomed-out instance
  • 11. 11 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results • System specifications – Intel Core i7 processor at 3.4GHz – 8GB RAM memory – CUDA-enabled NVIDIA GeForce GTX560 GPU • Dataset – 6 videos* of 273 minutes total duration – 30 manually selected objects • Ground-truth (generated via manual annotation) – 75.632 frames contain at least one of these objects – 333.455 frames do not include any of the selected objects * The videos are episodes from the “Antiques Roadshow” of the Dutch public broadcaster AVRO (http://avro.nl/) Examples of sought objects
  • 12. 12 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results • Aim: quantify the improvement that each extension of the baseline approach is responsible for • Four experimental configurations: – C1: baseline implementation – C2: GPU-accelerated implementation, – C3: GPU-accelerated and video-structure-based sampling implementation – C4: complete proposed approach which includes: GPU-processing video-structure-based sampling and robustness to scale variations
  • 13. 13 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results • Detection accuracy is expressed in terms of Precision, Recall and F-Score • Evaluation was performed in a per-frame basis, i.e. considering the 30 selected objects and counting the number of frames where these were correctly detected, missed, etc. • Time efficiency was evaluated by expressing the processing time of each configuration as a factor of the actual duration of the processed videos • Robustness to scale variations was quantified using two specific sets of frames where the object of interest was observed from: – a very close viewing position (2.940 frames) and – a very distant viewing position (4.648 frames)
  • 14. 14 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results Precision Recall F-Score C1 0.999 0.856 0.922 C2 0.999 0.856 0.922 C3 1.000 0.852 0.920 C4 1.000 0.992 0.996 Precision Recall F-Score Processing Time (x Real-Time) C1 0.999 0.868 0.929 2.98-5.26 C2 0.999 0.850 0.918 0.35-1.24 C3 0.999 0.849 0.918 0.03-0.13 C4 0.999 0.872 0.931 0.03-0.19 Evaluation results for configurations C1 to C4 Precision Recall F-Score C1 0.999 0.831 0.907 C2 0.999 0.831 0.907 C3 1.000 0.799 0.888 C4 1.000 0.914 0.955 Evaluation results for highly zoomed-out instances Evaluation results for highly zoomed-in instances
  • 15. 15 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results Detection accuracy • All versions exhibited very good results in terms of detection accuracy • Version C4 (complete proposed approach) achieved the best results • The algorithm performed considerably well for a range of different scales and orientations and for partial visibility or partial occlusion Processing time • The video-structure-based sampling strategy led to a great reduction of the required processing time • The algorithm needs about 10% of the video’s duration, preserving the same high levels of detection accuracy with the slower configurations Online demo available at: http://www.youtube.com/watch?v=0IeVkXRTYu8
  • 16. 16 Information Technologies Institute Centre for Research and Technology Hellas Extensions, ideas and plans • Recent extension: Multiple instances of an object of interest can be used as input for more efficient re-detection of 3D objects • Future ideas: test the algorithm’s performance as a tool for chapter segmentation in videos where the chapters are temporally demarcated by the presence of a specific object (e.g. a painting in a video about art) • Future plans: evaluate the extended algorithm’s performance (detection accuracy and time efficiency) in a new set of videos Input Output
  • 17. 17 Information Technologies Institute Centre for Research and Technology Hellas Conclusions • The proposed method can be used for fast and accurate re-detection of pre-defined objects in videos • The time performance of the implemented algorithm allows for real-time processing of multi-media content • Extended by a prior object labeling step, this technique can be seen as: – A reliable tool for instance-based annotated, spatio-temporal fragments in videos – A key-enabled technology for finding similar content and establishing links between related media fragments, thus contributing to the realization of interactive and linked TV
  • 18. 18 Information Technologies Institute Centre for Research and Technology Hellas Questions? More information: http://www.iti.gr/~bmezaris bmezaris@iti.gr