TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks
Similar a TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks
Similar a TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks (20)
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks
1. TUD MediaEval 2012 Tagging Task
Reporter: Martha A. Larson
Multimedia Information Retrieval Lab
Delft University of Technology
05-10-2012
Delft
University of
Technology
Challenge the future
2. Outline
• TUD-MM: Multi-modality video categorization with one-
vs-all classifiers
• Peng Xu, Yangyang Shi, Martha A. Larson
• MediaEval 2012 Tagging Task: Prediction based on One Best
List and Confusion Networks
• Yangyang Shi, Martha A. Larson, Catholijn M. Jonker
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
2
4. Introduction
• Features from different modalities
• Visual feature
• Visual Words based representation & Global video representation
• Text features
• ASR, Metadata
• Term-frequency, LDA
• Classification and Fusion
• One-vs-all linear SVMs
• Reciprocal Rank Fusion
• Post-processing procedure to assign one category label for each video
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
4
5. Visual representations
• Visual words based video representation
• SIFT features are extracted from each key-frame
• Visual vocabulary is build by hierarchical k-means clustering
• The normalized term-frequency of the entire video
• Global video representation
• Edit features
• Content features
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
5
6. Classification and Fusion
• One-vs-all linear SVM
• C is determined by the 5-folder cross-validation
• Reciprocal Rank Fusion (RRF)*
• K=60 is to balance the importance of the lower ranked items
• The weights w(r) are determined by the cross-validation errors
from each modalities
• Post-processing procedure
* G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms
Condorcet and individual rank learning methods. SIGIR '09, pages 758-759..
•
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
6
7. Result analysis
• MAP of different runs
Run_1 Run_2 Run_3 Run_4 Run_5 *Run_6 *Run_7
MAP 0.0061 0.3127 0.2279 0.3675 0.2157 0.0577 0.0047
• Run_1 to Run_5 are official runs
• Run_6 is the visual-only run without post-processing
• Run_7 is the visual-only run with global feature
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
7
8. Performance of visual features
Random basline VW Global
0,025
0,02
0,015
0,01
0,005
0
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
8
9. MediaEval 2012 Tagging Task:
Prediction based on One Best List and
Confusion Networks
Yangyang Shi, Martha A. Larson, Catholijn M. Jonker
05-10-2012
Delft
University of
Technology
Challenge the future
10. Models for One-best list and
Confusion Networks
Dynamic
Bayesian
Networks
Support Conditional
vector random
machine fields
ASR
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
10
11. One-best List SVM
Linear
Cut-off 3 kernel multi-
TF-IDF
vocabulary class SVM
(c=0.5)
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
11
12. One-best List DBN
E1 E2 E3
T1 T2 T3
W1 W2 W3
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
12
13. One-best List DBN
•
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
13
14. Results on Only ASR Run
Models MAP
Run2-one-best SVM 0.23
Run2-one-best DBN 0.25
Run2-one-best CRF 0.10
Run2-CN-CRF 0.09
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
14
15. Average Precision on Each Genre
0,8
0,7
0,6
0,5
0,4
DBN
0,3
0,2 SVM
0,1
0
TUD MediaEval 2012 Tagging Task
Visual similarity measures for semantic video retrieval
15
16. Discussion and Future work
• Discussion
• Visual only methods can be improved in several ways
• Features selection or dimensional reduction methods can be applied.
• Genre-level video representation
• CRF failure
• A document is treated as a item rather than one word.
• Feature size is too big to converge.
• DBN outperforms SVM: The sequence order information probably helps
prediction
• Potentials
• Generate clear and useful labels
Visual similarity measures MediaEval 2012 Tagging Task
Video Search Reranking for Genre retrieval
TUD for semantic video Tagging
16
17. Thank you!
Visual similarity measures for semantic Genre retrieval
Video Search Reranking for video Tagging
17