SlideShare una empresa de Scribd logo
1 de 1
Exploiting Collective Knowledge in an Image Folksonomy
                                                                                                            for Semantic-based Near-duplicate Video Detection
                                                                                                                                                                                                                                                                Hyun-seok Min, Wesley De Neve, and Yong Man Ro
                                                                                                                                                                                                                                                                             Image and Video Systems Lab
                                                                                                                                                                                                                                                                Korea Advanced Institute of Science and Technology (KAIST)
                                                                                                                                                                                                                                                                                  Daejeon, South Korea
                                                                                                                                                        e-mail: hsmin@kaist.ac.kr                                                                                                                                                                           website: http://ivylab.kaist.ac.kr
I. INTRODUCTION                                                                                                                                                                                                                                                                                                                   IV. DETECTION OF NEAR-DUPLICATES
- Increasing number of duplicates and near-duplicates on websites for
                                                                                                                                                                                                                                                                                                                                  Video matching aims at determining whether a given query video
  video sharing
                                                                                                                                                                                                                                                                                                                                  sequence Vq appears in a target or reference video sequence Vt
   - need for efficient and effective near-duplicate detection techniques
- Conventional video signatures are based on low-level visual features                                                                                                                                                                                                                                                            - The semantic dissimilarity between two video sequences Vq and Vt:
   - highly sensitive to spatiotemporal transformations                                                                                                                                                                                                                                                                                                                   N
- This paper proposes a novel technique for semantic-based near-                                                                                                                                                                                                                                                                                                     1
  duplicate video detection
                                                                                                                                                                                                                                                                                                                                           d video ( U q , Ut ) =
                                                                                                                                                                                                                                                                                                                                                                     N   ∑d
                                                                                                                                                                                                                                                                                                                                                                          i =1
                                                                                                                                                                                                                                                                                                                                                                                          q     t
                                                                                                                                                                                                                                                                                                                                                                                 shot ( A i , A i + p ),

   - based on the observation that near-duplicates still convey the same
     semantic information                                                                                                                                                                                                                                                                                                                              U q , U t : the semantic video signatures of Vq and Vt
   - takes advantage of the wide variety of user-supplied tags present in                                                                                                                                                                                                                                                                                 p      : the video shot in the reference video sequence
     a set of user-contributed images (i.e., an image folksonomy)                                                                                                                                                                                                                                                                                                  at which similarity measurement starts
                                                                                                                                                                                                                                                                                                                                  - The semantic distance between two video shots:
II. SYSTEM ARCHITECTURE
                                                                                                                            Query video sequence
                                                                                                                                                                                                                                                                                                                                                                         A iq ∩ A tj
                                                                                                                                                                                                                                                                                                                                            d shot ( A iq , A tj )   =                   ,            A : the cardinality of A
                              Pre-processing                                                                                                                                                                                                                                                                                                                             A iq × A tj
                                                                                                                                     Shot segmentation
                                                                                                                                                                                                                                                                                                                                  V. EXPERIMENTS
                                                                                                           Low-level feature extraction
                                                                                                                                                                                                                                                                                                                                  1. Experimental setup
                              Creation of a semantic video signature                                                                                                                                                                                                                                                               - Our experiments made use of the MUSCLE-VCD-2007 dataset
                                                                                                                                                                                                                                                                                                                                   - To construct an image folksonomy, 3000 images with at least one or
                                                                                                     Detection of semantic concepts                                                                                                                                                     Image
                                                                                                                                                                                                                                                                                     folksonomy                                      more relevant tags were retrieved from Flickr
                                                                                                     Creation of semantic signature                                                                                                                                                                                               2. Experimental results
                                                                                                                                                                                                                                                                                                                                   - The proposed method misclassified only two out of 15 spatially
                             Video matching using semantic video signatures                                                                                                                                                                                                                                                          transformed query video sequences
                                                                                                                                                                                                                                                                                     Reference
                                                                                                                   Semantic video matching                                                                                                                                             video
                                                                                                                                                                                                                                                                                                                                   - For the 1,604 query video shots, the total number of detected semantic
                                                                                                                                                                                                                                                                                      database                                       concepts is 7,927
                                                                                                                Computation of similarity                                                                                                                                                                                             - five semantic concepts were predicted on average for a video shot
                                                                                                                                                                                                                                                                                                                                      - among the 7,927 detected semantic concepts, 272 different concepts
                             Near-duplicate detection                                                                                                                                                                                                                                                                                   could be identified
                                    Decide whether the query video is a near-
                                                  duplicate or not                                                                                                                                                                                                                                                                 3. Visual results
                      Fig. 1. Semantic-based near-duplicate detection using an image folksonomy                                                                                                                                                                                                                                                             Reference video sequence                 Query video sequence

III. MODEL-FREE SEMANTIC CONCEPT DETECTION
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.




                                                                                                                                                                                                                     Folksonomy images (strongly tagged images)                                                                               Key
                                                                                                                                                                                                  I1                                                                           I2        …             IF
                                                                                                                                                                                                                                                                                                                                             frame



                                                                                                                                                                                                                                                              Visual similarity measurement
                                                                        si                                                                                          Nearest neighbor images
                                                                                                                                                                                                                                                                                                                                            Nearest
                     ith shot of a query video                                                                                                                                                                                                               I1                      …            IK                                        neighbor
                              sequence
                                                                                                                                                                                                                                                                                                                                             images
                               If                                    : folksonomy image
                                                                                                                                                                            Folksonomy-based semantic concept detection                                                                                                                                                          …
                                                                                                                                                                                                                                                                                                                                                                                 …
                                                                                                                                                                                                                                                                                                                                                                                 …
                                                                                                                                                                                                                                                                                                                                                                                 …                                   …
                                                                                                                                                                                                                                                                                                                                                                                                                     …
                                                                                                                                                                                                                                                                                                                                                                                                                     …
                                                                                                                                                                                                                                                                                                                                                                                                                     …
                                                                 : tag                                                                                                                     Set of tags                                                                                                      The frequency of
                                                                                                                                                                                                                                                                                                            tag t in the set of             Detected
                                                               : tag frequency & the                                                                                                                                                                                                 …                      visual neighbors                               interior, home, inside, night,      home, house, interior, inside, style,
                                                                                                                                                                                                                                                                                                                                            semantic
                                                               number of images                                                                                                                                                                                                                                reflects the                                            sunset                              cottage
                                                                                                                                                                                            Semantic concepts                                                                                                                               concepts
                                                               labeled with t in the                                                                                                                                                                                                                        relevance of tag t
                                                               image folksonomy                                                                                                                                                                                                                              with respect to        Fig. 3. Example key frames with visual neighbors and detected semantic concepts
                                                                                                                                                                                                                               …
                                                                                                                                                                                                                               …
                                                                                                                                                                                                                               …
                                                                                                                                                                                                                               …                                                                            the content of si .                (underlined semantic concepts are considered to be correct)
                                                                                         Fig. 2. Folksonomy-based semantic concept detection                                                                                                                                                                                      VI. CONCLUSIONS
- Metric for measuring the relevance of a tag t:                                                                                                                                                                                                                                                                                   - This paper discussed a novel technique for semantic-based near-
                                                                                                                                                                                                                                                                                                                                     duplicate video detection
                                                                    c Lt                                                                                           c : neighbor images tag t in the set of K nearest
                                                                                                                                                                        the frequency of                                                                                                                                              - near-duplicates still convey the same semantic information
           J (t ) =                                                  − ,                                                                                                                                                                                                                                                              - takes advantage of the wide variety of user-supplied tags present in
                                                                    K F                                                                                            Lt : the number of images labeled with tag t in the                                                                                                                  an image folksonomy (i.e., collective knowledge)
                                                                                                                                                                                                  image folksonomy (containing F images)
                                                                                                                                                                                                                                                                                                                                   - Semantic video signatures are constructed by detecting semantic
- The semantic signature U of V, with V = {S1, S2, …, SN}:                                                                                                                                                                                                                                                                           concepts along the temporal axis of video sequences
                                                                                                                                                                                                                                                                                                                                      - our model-free approach is able to exploit an unrestricted tag
         U = {A1, A2,K, AN }. Ai : the set of semantic concepts for Sj                                                                                                                                                                                                                                                                  vocabulary (unlike model-based semantic concept detection)
                                                                                                                                                                                                                                                                                                                                   - Preliminary experimental results look encouraging

                                                                                                                                                                                 IEEE International Conference on Image Processing (ICIP), September 2010, Hong Kong

Más contenido relacionado

Más de Wesley De Neve

Más de Wesley De Neve (20)

Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
 
Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...
 
Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
 
The 5th Aslla Symposium
The 5th Aslla SymposiumThe 5th Aslla Symposium
The 5th Aslla Symposium
 
Ghent University Global Campus 101
Ghent University Global Campus 101Ghent University Global Campus 101
Ghent University Global Campus 101
 
Booklet for the First GUGC Research Symposium
Booklet for the First GUGC Research SymposiumBooklet for the First GUGC Research Symposium
Booklet for the First GUGC Research Symposium
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global Campus
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global Campus
 
Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...
 
Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniques
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
GUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsGUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and Bioinformatics
 
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
 
Ghent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research ActivitiesGhent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research Activities
 
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
 
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
 Exploring Deep Machine Learning for Automatic Right Whale Recognition and No... Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
 
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
 
Towards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingTowards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processing
 
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
 

Último

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Último (20)

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 

Exploiting collective knowledge in an image folksonomy for semantic-based near-duplicate video detection

  • 1. Exploiting Collective Knowledge in an Image Folksonomy for Semantic-based Near-duplicate Video Detection Hyun-seok Min, Wesley De Neve, and Yong Man Ro Image and Video Systems Lab Korea Advanced Institute of Science and Technology (KAIST) Daejeon, South Korea e-mail: hsmin@kaist.ac.kr website: http://ivylab.kaist.ac.kr I. INTRODUCTION IV. DETECTION OF NEAR-DUPLICATES - Increasing number of duplicates and near-duplicates on websites for Video matching aims at determining whether a given query video video sharing sequence Vq appears in a target or reference video sequence Vt - need for efficient and effective near-duplicate detection techniques - Conventional video signatures are based on low-level visual features - The semantic dissimilarity between two video sequences Vq and Vt: - highly sensitive to spatiotemporal transformations N - This paper proposes a novel technique for semantic-based near- 1 duplicate video detection d video ( U q , Ut ) = N ∑d i =1 q t shot ( A i , A i + p ), - based on the observation that near-duplicates still convey the same semantic information U q , U t : the semantic video signatures of Vq and Vt - takes advantage of the wide variety of user-supplied tags present in p : the video shot in the reference video sequence a set of user-contributed images (i.e., an image folksonomy) at which similarity measurement starts - The semantic distance between two video shots: II. SYSTEM ARCHITECTURE Query video sequence A iq ∩ A tj d shot ( A iq , A tj ) = , A : the cardinality of A Pre-processing A iq × A tj Shot segmentation V. EXPERIMENTS Low-level feature extraction 1. Experimental setup Creation of a semantic video signature - Our experiments made use of the MUSCLE-VCD-2007 dataset - To construct an image folksonomy, 3000 images with at least one or Detection of semantic concepts Image folksonomy more relevant tags were retrieved from Flickr Creation of semantic signature 2. Experimental results - The proposed method misclassified only two out of 15 spatially Video matching using semantic video signatures transformed query video sequences Reference Semantic video matching video - For the 1,604 query video shots, the total number of detected semantic database concepts is 7,927 Computation of similarity - five semantic concepts were predicted on average for a video shot - among the 7,927 detected semantic concepts, 272 different concepts Near-duplicate detection could be identified Decide whether the query video is a near- duplicate or not 3. Visual results Fig. 1. Semantic-based near-duplicate detection using an image folksonomy Reference video sequence Query video sequence III. MODEL-FREE SEMANTIC CONCEPT DETECTION The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again. Folksonomy images (strongly tagged images) Key I1 I2 … IF frame Visual similarity measurement si Nearest neighbor images Nearest ith shot of a query video I1 … IK neighbor sequence images If : folksonomy image Folksonomy-based semantic concept detection … … … … … … … … : tag Set of tags The frequency of tag t in the set of Detected : tag frequency & the … visual neighbors interior, home, inside, night, home, house, interior, inside, style, semantic number of images reflects the sunset cottage Semantic concepts concepts labeled with t in the relevance of tag t image folksonomy with respect to Fig. 3. Example key frames with visual neighbors and detected semantic concepts … … … … the content of si . (underlined semantic concepts are considered to be correct) Fig. 2. Folksonomy-based semantic concept detection VI. CONCLUSIONS - Metric for measuring the relevance of a tag t: - This paper discussed a novel technique for semantic-based near- duplicate video detection c Lt c : neighbor images tag t in the set of K nearest the frequency of - near-duplicates still convey the same semantic information J (t ) = − , - takes advantage of the wide variety of user-supplied tags present in K F Lt : the number of images labeled with tag t in the an image folksonomy (i.e., collective knowledge) image folksonomy (containing F images) - Semantic video signatures are constructed by detecting semantic - The semantic signature U of V, with V = {S1, S2, …, SN}: concepts along the temporal axis of video sequences - our model-free approach is able to exploit an unrestricted tag U = {A1, A2,K, AN }. Ai : the set of semantic concepts for Sj vocabulary (unlike model-based semantic concept detection) - Preliminary experimental results look encouraging IEEE International Conference on Image Processing (ICIP), September 2010, Hong Kong