SlideShare una empresa de Scribd logo
1 de 17
~ Multimodal Video Classification ~

                            ARF (Austria-Romania-France) team


       Bogdan IONESCU*1,3              Ionuț MIRONICĂ1              Klaus SEYERLEHNER2
           bionescu@imag.pub.ro          imironica@imag.pub.ro               music@cp.jku.at

          Peter KNEES2                  Jan SCHLÜTER4                  Markus SCHEDL2
            peter.knees@jku.at            jan.schlueter@ofai.at           markus.schedl@jku.at

           Horia CUCU1                     Andi BUZO1                 Patrick LAMBERT3
            horia.cucu@upb.ro              andi.buzo@upb.ro           patrick.lambert@univ-savoie.fr


    *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557.
1                       2                             3                        4
         University                                                                         Austrian Research
         POLITEHNICA                                                                        Institute for Artificial
         of Bucharest                                                                       Intelligence
Presentation outline


          • The approach

          • Video content description

          • Experimental results

          • Conclusions and future work




MediaEval - Pisa, Italy, 4-5 October 2012   1/16 2
The approach
  > challenge: find a way to assign (genre) tags to unknown videos;
  > approach: machine learning paradigm;

                                     …
      web       food       autos             label data

                        train


                                            unlabeled data

               classifier                   labeled data



                                                             tagged video database
                                                                 video database
MediaEval - Pisa, Italy, 4-5 October 2012                                            2/163
The approach: classification
  > the entire process relies on the concept of “similarity” computed
  between content annotations (numeric features),

  > this year focus is on:

       objective 1: go multimodal (truly)




                   visual                   audio   text


       objective 2: test a broad range of classifiers and descriptor
       combinations;


MediaEval - Pisa, Italy, 4-5 October 2012                               3/164
Video content description - audio
   block-level audio features                           • Spectral Pattern,
  (capture also local temporal information)              ~ soundtrack’s timbre;
                                                         • delta Spectral Pattern,
    e.g. 50% overlapping
                                                         ~ strength of onsets;
                                                         • variance delta Spectral Pattern,
                                             average     ~ variation of the onset strength;
                                             median      • Logarithmic Fluctuation Pattern,
                                             variance    ~ rhythmic aspects;
                                             ...         • Correlation Pattern,
                                                         ~ loudness changes;
                                                         • Spectral Contrast Pattern,
                                                         ~ ”toneness”;
                                                            • Local Single Gaussian model,
                  [Klaus Seyerlehner et al., MIREX’11, USA] ~ timbral;
                                                         • George Tzanetakis model,
                                                         ~ timbral;

MediaEval - Pisa, Italy, 4-5 October 2012                                               4/16
                                                                                           5
Video content description - audio
     standard audio features
    (audio frame-based)

                                                         • Zero-Crossing Rate,

                                                         • Linear Predictive Coefficients,

                                       time              • Line Spectral Pairs,

                                                         • Mel-Frequency Cepstral Coefficients,
                                              global
                                             feature     • spectral centroid, flux, rolloff, and
    f1 f2        …        fn
                                                =        kurtosis,
+                                           mean &       + variance of each feature over
     var{f2}          var{fn}               variance     a certain window.



                                            [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands]

MediaEval - Pisa, Italy, 4-5 October 2012                                                       5/16
                                                                                                   6
Video content description - visual
   MPEG-7 & color/texture descriptors
  (visual frame-based)

                                                            • Local Binary Pattern,

                                              global        • Autocorrelogram,
                                             feature        • Color Coherence Vector,
                                                 =
                                             mean &         • Color Layout Pattern,
                                          dispersion &      • Edge Histogram,
                                          skewness &
                               time
                                            kurtosis &      • Classic color histogram,
    f1      f2    …       fn                median &
                                                            • Scalable Color Descriptor,
                                        root mean square
                                                            • Color moments.



                                              [OpenCV toolbox, http://opencv.willowgarage.com]

MediaEval - Pisa, Italy, 4-5 October 2012                                                6/16
                                                                                            7
Video content description - visual
   feature descriptors
  (visual frame-based)
  • Histogram of oriented Gradients (HoG)
  ~ counts occurrences of gradient orientation
                                                                 feature points (e.g. Harris)
  in localized portions of an image (20º per bin)

  • Harris corner detector

  • Speeded Up Robust Feature (SURF)




                                                    image source http://www.ifp.illinois.edu/~yuhuang

                                               [OpenCV toolbox, http://opencv.willowgarage.com]

MediaEval - Pisa, Italy, 4-5 October 2012                                                       7/16
                                                                                                   8
Video content description - text
   TF-IDF descriptors
  (Term Frequency-Inverse Document Frequency)

  > text sources: ASR and metadata,

     1. remove XML markups,

     2. remove terms <5%-percentile of the frequency distribution,

     3. select term corpus: retaining for each genre class m terms (e.g. m =
     150 for ASR and 20 for metadata) with the highest χ2 values that
     occur more frequently than in complement classes,

     4. for each document we represent the TF-IDF values.



MediaEval - Pisa, Italy, 4-5 October 2012                                      8/16
                                                                                  9
Experimental results: devset (5,127 seq.)
  > classifiers from Weka (Bayes, lazy, functional, trees, etc.),
  > cross-validation (train 50% – test 50%),
  avg. Fscore (over all genres)




    - visual descriptors capabilities 30%±10%,
    - using more visual is not more accurate than using few,
    - best LBP+CCV+histogram (Fscore=41.2%).
                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                9/1610
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - audio still better than visual (improvement ~6%),

     - proposed block-based better than standard (by ~10%),

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                10/16
                                                                                             11
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - ASR from LIMSI more representative than LIUM (~3%),

     - best performance ASR LIMSI + metadata (Fscore=68%).

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                11/16
                                                                                             12
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - audio-visual close to text (ASR) for the automatic descriptors,

     - increasing the number of modalities increases the performance.

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                12/16
                                                                                             13
Experimental results: official runs (9,550 seq.)
  > train on devset, test on testset (SVM linear),

 MediaEval                                                                MediaEval
   2011                                                                     2011
 MAP 12%                                                                  MAP 10.3%




     Run1              Run2                  Run3              Run4         Run5
  LBP+CCV+           TF-IDF on        audio block-based +      audio      TF-IDF on
  hist + audio       ASR LIMSI        LBP + CCV + hist +    block-based   metadata +
                                                                          metadata
  block-based                           TF-IDF on ASR                     ASR LIMSI
                                             LIMSI




MediaEval - Pisa, Italy, 4-5 October 2012                                        13/16
                                                                                     14
Experimental results: official runs (9,550 seq.)
  > genre MAP for Run 5: TF-IDF on ASR + metadata,
                  Run 1: visual + audio
  autos                             gaming   religion   environment
  52%                                71%      71%           50%




MediaEval - Pisa, Italy, 4-5 October 2012                             14/16
                                                                          15
Conclusions and future work
  > classification adapts to the corpus – changing the corpus will
  change the performance;
  > audio-visual descriptors are inherently limited;
  > how far can we go with ad-hoc classification without human
  intervention?

  > future work:
      more elaborated late-fusion ?
      pursue tests on the entire data set;
      perhaps more elaborated Bag-of-Visual-Words.

    Acknowledgement: we would like to thank Prof. Fausto Giunchiglia and
    Prof. Nicu Sebe from University of Trento for their support.

MediaEval - Pisa, Italy, 4-5 October 2012                                  15/16
                                                                               16
thank you !
                       any questions ?




MediaEval - Pisa, Italy, 4-5 October 2012   16/16
                                                17

Más contenido relacionado

Destacado

GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account MatchingMediaEval2012
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonSharon Jimenez
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskMediaEval2012
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012MediaEval2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesMediaEval2012
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...MediaEval2012
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skillsJNavarro0321
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingMediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskMediaEval2012
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMMediaEval2012
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanewilovepurin
 
2010 Marketing Plan
2010 Marketing Plan2010 Marketing Plan
2010 Marketing PlanJPemberton15
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4souzadea1
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация стуStanislav Litvinenko
 

Destacado (20)

10 ρ. δρακουλησ
10 ρ. δρακουλησ10 ρ. δρακουλησ
10 ρ. δρακουλησ
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharon
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing Task
 
κειμενο
κειμενοκειμενο
κειμενο
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
 
Papiloma humano
Papiloma humanoPapiloma humano
Papiloma humano
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skills
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-Tagging
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanew
 
2010 Marketing Plan
2010 Marketing Plan2010 Marketing Plan
2010 Marketing Plan
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация сту
 

Similar a ARF @ MediaEval 2012: Multimodal Video Classification

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_featuresBo Li
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Color: from craft to computation
Color: from craft to computationColor: from craft to computation
Color: from craft to computationJan Morovic
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskMediaEval2012
 
Nema e newsletter
Nema e newsletterNema e newsletter
Nema e newsletterLeigh Smead
 
Experimental Media Voodoo™
Experimental Media Voodoo™Experimental Media Voodoo™
Experimental Media Voodoo™SkyRonDotOrg
 
Vdfp audio and video fingerprinting
Vdfp   audio and video fingerprintingVdfp   audio and video fingerprinting
Vdfp audio and video fingerprintingWietskevdHeuvel
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Fairfield High School Handout
Fairfield High School HandoutFairfield High School Handout
Fairfield High School HandoutKatherineHaratsis
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschStephan Baumann
 
Open archive islandora-channel-training
Open archive islandora-channel-trainingOpen archive islandora-channel-training
Open archive islandora-channel-trainingscottmertz
 

Similar a ARF @ MediaEval 2012: Multimodal Video Classification (14)

Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Color: from craft to computation
Color: from craft to computationColor: from craft to computation
Color: from craft to computation
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
 
Nema e newsletter
Nema e newsletterNema e newsletter
Nema e newsletter
 
Experimental Media Voodoo™
Experimental Media Voodoo™Experimental Media Voodoo™
Experimental Media Voodoo™
 
Vdfp audio and video fingerprinting
Vdfp   audio and video fingerprintingVdfp   audio and video fingerprinting
Vdfp audio and video fingerprinting
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Fairfield High School Handout
Fairfield High School HandoutFairfield High School Handout
Fairfield High School Handout
 
Dmk audioviz
Dmk audiovizDmk audioviz
Dmk audioviz
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
 
Open archive islandora-channel-training
Open archive islandora-channel-trainingOpen archive islandora-channel-training
Open archive islandora-channel-training
 

Más de MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding MediaEval2012
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingMediaEval2012
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval2012
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodMediaEval2012
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...MediaEval2012
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskMediaEval2012
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 

Más de MediaEval2012 (20)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
Closing
ClosingClosing
Closing
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music Tagging
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
 

Último

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

ARF @ MediaEval 2012: Multimodal Video Classification

  • 1. ~ Multimodal Video Classification ~ ARF (Austria-Romania-France) team Bogdan IONESCU*1,3 Ionuț MIRONICĂ1 Klaus SEYERLEHNER2 bionescu@imag.pub.ro imironica@imag.pub.ro music@cp.jku.at Peter KNEES2 Jan SCHLÜTER4 Markus SCHEDL2 peter.knees@jku.at jan.schlueter@ofai.at markus.schedl@jku.at Horia CUCU1 Andi BUZO1 Patrick LAMBERT3 horia.cucu@upb.ro andi.buzo@upb.ro patrick.lambert@univ-savoie.fr *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557. 1 2 3 4 University Austrian Research POLITEHNICA Institute for Artificial of Bucharest Intelligence
  • 2. Presentation outline • The approach • Video content description • Experimental results • Conclusions and future work MediaEval - Pisa, Italy, 4-5 October 2012 1/16 2
  • 3. The approach > challenge: find a way to assign (genre) tags to unknown videos; > approach: machine learning paradigm; … web food autos label data train unlabeled data classifier labeled data tagged video database video database MediaEval - Pisa, Italy, 4-5 October 2012 2/163
  • 4. The approach: classification > the entire process relies on the concept of “similarity” computed between content annotations (numeric features), > this year focus is on: objective 1: go multimodal (truly) visual audio text objective 2: test a broad range of classifiers and descriptor combinations; MediaEval - Pisa, Italy, 4-5 October 2012 3/164
  • 5. Video content description - audio  block-level audio features • Spectral Pattern, (capture also local temporal information) ~ soundtrack’s timbre; • delta Spectral Pattern, e.g. 50% overlapping ~ strength of onsets; • variance delta Spectral Pattern, average ~ variation of the onset strength; median • Logarithmic Fluctuation Pattern, variance ~ rhythmic aspects; ... • Correlation Pattern, ~ loudness changes; • Spectral Contrast Pattern, ~ ”toneness”; • Local Single Gaussian model, [Klaus Seyerlehner et al., MIREX’11, USA] ~ timbral; • George Tzanetakis model, ~ timbral; MediaEval - Pisa, Italy, 4-5 October 2012 4/16 5
  • 6. Video content description - audio  standard audio features (audio frame-based) • Zero-Crossing Rate, • Linear Predictive Coefficients, time • Line Spectral Pairs, • Mel-Frequency Cepstral Coefficients, global feature • spectral centroid, flux, rolloff, and f1 f2 … fn = kurtosis, + mean & + variance of each feature over var{f2} var{fn} variance a certain window. [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands] MediaEval - Pisa, Italy, 4-5 October 2012 5/16 6
  • 7. Video content description - visual  MPEG-7 & color/texture descriptors (visual frame-based) • Local Binary Pattern, global • Autocorrelogram, feature • Color Coherence Vector, = mean & • Color Layout Pattern, dispersion & • Edge Histogram, skewness & time kurtosis & • Classic color histogram, f1 f2 … fn median & • Scalable Color Descriptor, root mean square • Color moments. [OpenCV toolbox, http://opencv.willowgarage.com] MediaEval - Pisa, Italy, 4-5 October 2012 6/16 7
  • 8. Video content description - visual  feature descriptors (visual frame-based) • Histogram of oriented Gradients (HoG) ~ counts occurrences of gradient orientation feature points (e.g. Harris) in localized portions of an image (20º per bin) • Harris corner detector • Speeded Up Robust Feature (SURF) image source http://www.ifp.illinois.edu/~yuhuang [OpenCV toolbox, http://opencv.willowgarage.com] MediaEval - Pisa, Italy, 4-5 October 2012 7/16 8
  • 9. Video content description - text  TF-IDF descriptors (Term Frequency-Inverse Document Frequency) > text sources: ASR and metadata, 1. remove XML markups, 2. remove terms <5%-percentile of the frequency distribution, 3. select term corpus: retaining for each genre class m terms (e.g. m = 150 for ASR and 20 for metadata) with the highest χ2 values that occur more frequently than in complement classes, 4. for each document we represent the TF-IDF values. MediaEval - Pisa, Italy, 4-5 October 2012 8/16 9
  • 10. Experimental results: devset (5,127 seq.) > classifiers from Weka (Bayes, lazy, functional, trees, etc.), > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - visual descriptors capabilities 30%±10%, - using more visual is not more accurate than using few, - best LBP+CCV+histogram (Fscore=41.2%). [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 9/1610
  • 11. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - audio still better than visual (improvement ~6%), - proposed block-based better than standard (by ~10%), [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 10/16 11
  • 12. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - ASR from LIMSI more representative than LIUM (~3%), - best performance ASR LIMSI + metadata (Fscore=68%). [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 11/16 12
  • 13. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - audio-visual close to text (ASR) for the automatic descriptors, - increasing the number of modalities increases the performance. [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 12/16 13
  • 14. Experimental results: official runs (9,550 seq.) > train on devset, test on testset (SVM linear), MediaEval MediaEval 2011 2011 MAP 12% MAP 10.3% Run1 Run2 Run3 Run4 Run5 LBP+CCV+ TF-IDF on audio block-based + audio TF-IDF on hist + audio ASR LIMSI LBP + CCV + hist + block-based metadata + metadata block-based TF-IDF on ASR ASR LIMSI LIMSI MediaEval - Pisa, Italy, 4-5 October 2012 13/16 14
  • 15. Experimental results: official runs (9,550 seq.) > genre MAP for Run 5: TF-IDF on ASR + metadata, Run 1: visual + audio autos gaming religion environment 52% 71% 71% 50% MediaEval - Pisa, Italy, 4-5 October 2012 14/16 15
  • 16. Conclusions and future work > classification adapts to the corpus – changing the corpus will change the performance; > audio-visual descriptors are inherently limited; > how far can we go with ad-hoc classification without human intervention? > future work:  more elaborated late-fusion ?  pursue tests on the entire data set;  perhaps more elaborated Bag-of-Visual-Words. Acknowledgement: we would like to thank Prof. Fausto Giunchiglia and Prof. Nicu Sebe from University of Trento for their support. MediaEval - Pisa, Italy, 4-5 October 2012 15/16 16
  • 17. thank you ! any questions ? MediaEval - Pisa, Italy, 4-5 October 2012 16/16 17