SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Technische Universität München




     Violent Scenes Detection with
Large, Brute-Forced Acoustic and Visual
              Feature Sets

   Florian Eyben, Felix Weninger, Nicolas Lehment,
            Gerhard Rigoll, Björn Schuller
          Institute for Human-Machine Communication,
                 Technische Universität München




           Session “Affect Task: Violent Scenes Detection”
                           October 4, 2012
Technische Universität München




“Large”
• Start with frame-wise features (audio / video)
• Summarize over „meaningful unit“
      – Shot?

      – Sliding window?

      – Overlap?

• Application of functionals:
      – Percentiles, moments, …
• Results in 3.8k audio and 9.7k video features


October 4, 2012    TUM / Felix Weninger                                     2
Technische Universität München




Frame-Wise Features (LLDs)
• Acoustic energy LLDs
      – Loudness, energy, ZCR
• Acoustic spectral LLDs
      – MFCCs, band energy, centroid, roll-off
        point, flux, entropy, moments, sharpness, harmonicity
• Visual LLDs
      – HSV histogram
      – Optical Flow: histogram + mean + std.dev.
      – Laplacian edge image histogram + strongest edge




October 4, 2012     TUM / Felix Weninger                                             3
Technische Universität München




“Brute-Forced”
•   Fully data-based approach (no pre-classification)
•   Little hand-crafting / engineering of features
•   Systematic feature (over-)generation
•   Emphasize on machine learning
•   Successful in affect recognition and speaker
    characterization tasks
      – INTERSPEECH 2009 Emotion Challenge
      – INTERSPEECH 2010 Paralinguistic Challenge
      – INTERSPEECH 2011 Intoxication / Sleepiness
• Generalization?

October 4, 2012   TUM / Felix Weninger                                          4
Technische Universität München




A Data-Based Approach
• System development based on 3-fold CV of
  development data
      – „Movie-independent“
      – Stratified by violence proportion and age
• Use all features from development data for evaluation
  on test data




October 4, 2012     TUM / Felix Weninger                                             5
Technische Universität München




„Acoustic and Visual“
• Expect complementarity of modalities
• Late fusion by confidences of single-modal classifiers




October 4, 2012   TUM / Felix Weninger                                     6
Technische Universität München




Segmentation and Classification
• Two segmentations evaluated on development set:
      – Functionals over shots
      – Functionals over X sec. sliding window
• Sliding window segmentation:
      – Classify per window
      – Fuse window classification per shot
      – Alternative: Generate segmentation
• Weka, SVM (SMO), C = 0.01
• Logistic regression to obtain confidences



October 4, 2012     TUM / Felix Weninger                                          7
Technische Universität München




TUM Test Runs


   Run            Modality      Overlap         Overlap   MAP100    MAP100    MAP20
                                 Train           Eval      Test     Dev (CV) Dev (CV)
 TUM-1              A+V              X                     .484          .397                 .525
 TUM-2               A               X                     .376          .445                 .515
 TUM-3               A               X              X      .360          .428                 .518
 TUM-4               A                                     .392          .442                 .503
 TUM-5               V                                     .320          .224                 .213




October 4, 2012              TUM / Felix Weninger                                                    8
Technische Universität München




TUM Test Runs


            Run   Modality       Overlap   Overlap   UA Rec          WA Rec
                                  Train     Eval      Dev             Dev
          TUM-1     A+V              X                .584               .848
          TUM-2      A               X                .648               .830
          TUM-3      A               X       X        .648               .826
          TUM-4      A                                .634               .829
          TUM-5      V                                .537               .832




October 4, 2012     TUM / Felix Weninger                                                9
Technische Universität München




Test Data: MAP 100 by Movie



Movie                               TUM-1 (A+V)          TUM-2 (A)
Dead Poets Society                          .523               .158
Fight Club                                  .321               .315
Independence Day                            .609               .656




October 4, 2012      TUM / Felix Weninger                                           10
Technische Universität München




Discussion
• MAP very sensitive to segmentation
      – Ex.: MAP100 = .73, MAP20 = .88 on Dev iff segment
        boundaries are aligned to violent / non-violent scenes
                                     NV             V

                  Aligned:

          Not Aligned:                          ?

      – Train on aligned data / test on not aligned data: MAP100 = .49
• Accuracies: similar ranking, but less „sensitive“
      – Correlated with target function in learning

October 4, 2012          TUM / Felix Weninger                                            11
Technische Universität München




Conclusions and Outlook
•   Demonstrated feasibility of „brute-force“ approach
•   Acoustic features alone are often competitive
•   Visual features are complementary
•   Future: Deeper analysis of
      – Individual features„ worth
      – Influence of segmentation on model training and evaluation




October 4, 2012     TUM / Felix Weninger                                            12
Technische Universität München




                                   Thank you.

                            weninger@tum.de

            openSMILE: http://opensmile.sourceforge.net




October 4, 2012     TUM / Felix Weninger                                         13

Más contenido relacionado

Destacado

Week 2 discussion 2
Week 2 discussion 2Week 2 discussion 2
Week 2 discussion 2LILBIT2012
 
Event Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED TaskEvent Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED TaskMediaEval2012
 
CERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskCERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskMediaEval2012
 
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo CollectionsQMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo CollectionsMediaEval2012
 
The Watershed-based Social Events Detection Method with Support from External...
The Watershed-based Social Events Detection Method with Support from External...The Watershed-based Social Events Detection Method with Support from External...
The Watershed-based Social Events Detection Method with Support from External...MediaEval2012
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodMediaEval2012
 
Mentor Strategy Session: Business Plan and Video
Mentor Strategy Session: Business Plan and VideoMentor Strategy Session: Business Plan and Video
Mentor Strategy Session: Business Plan and VideoGrow America
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012Philip Polstra
 
Idea or opportunity?
Idea or opportunity?Idea or opportunity?
Idea or opportunity?Grow America
 
John Richards: My Life Lessons As An Entrepreneur
John Richards: My Life Lessons As An EntrepreneurJohn Richards: My Life Lessons As An Entrepreneur
John Richards: My Life Lessons As An EntrepreneurGrow America
 
Simha_23_REFFIT_Biochar_ICT_Published Version
Simha_23_REFFIT_Biochar_ICT_Published VersionSimha_23_REFFIT_Biochar_ICT_Published Version
Simha_23_REFFIT_Biochar_ICT_Published VersionPrithvi Simha
 
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012The JHU-HLTCOE Spoken Web Search System for MediaEval 2012
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012MediaEval2012
 

Destacado (16)

Week 2 discussion 2
Week 2 discussion 2Week 2 discussion 2
Week 2 discussion 2
 
Event Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED TaskEvent Detection via LDA for the MediaEval2012 SED Task
Event Detection via LDA for the MediaEval2012 SED Task
 
Closing
ClosingClosing
Closing
 
CERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskCERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection Task
 
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo CollectionsQMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
 
The Watershed-based Social Events Detection Method with Support from External...
The Watershed-based Social Events Detection Method with Support from External...The Watershed-based Social Events Detection Method with Support from External...
The Watershed-based Social Events Detection Method with Support from External...
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
 
Mentor Strategy Session: Business Plan and Video
Mentor Strategy Session: Business Plan and VideoMentor Strategy Session: Business Plan and Video
Mentor Strategy Session: Business Plan and Video
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012
 
Live pitch event
Live pitch eventLive pitch event
Live pitch event
 
Idea or opportunity?
Idea or opportunity?Idea or opportunity?
Idea or opportunity?
 
Thotcon2013
Thotcon2013Thotcon2013
Thotcon2013
 
John Richards: My Life Lessons As An Entrepreneur
John Richards: My Life Lessons As An EntrepreneurJohn Richards: My Life Lessons As An Entrepreneur
John Richards: My Life Lessons As An Entrepreneur
 
Simha_23_REFFIT_Biochar_ICT_Published Version
Simha_23_REFFIT_Biochar_ICT_Published VersionSimha_23_REFFIT_Biochar_ICT_Published Version
Simha_23_REFFIT_Biochar_ICT_Published Version
 
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012The JHU-HLTCOE Spoken Web Search System for MediaEval 2012
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012
 

Más de MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account MatchingMediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval2012
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012
 
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...MediaEval2012
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskMediaEval2012
 
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...MediaEval2012
 
ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationMediaEval2012
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...MediaEval2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesMediaEval2012
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskMediaEval2012
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 

Más de MediaEval2012 (20)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
 
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
 
ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video Classification
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
 

Último

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Último (20)

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets

  • 1. Technische Universität München Violent Scenes Detection with Large, Brute-Forced Acoustic and Visual Feature Sets Florian Eyben, Felix Weninger, Nicolas Lehment, Gerhard Rigoll, Björn Schuller Institute for Human-Machine Communication, Technische Universität München Session “Affect Task: Violent Scenes Detection” October 4, 2012
  • 2. Technische Universität München “Large” • Start with frame-wise features (audio / video) • Summarize over „meaningful unit“ – Shot? – Sliding window? – Overlap? • Application of functionals: – Percentiles, moments, … • Results in 3.8k audio and 9.7k video features October 4, 2012 TUM / Felix Weninger 2
  • 3. Technische Universität München Frame-Wise Features (LLDs) • Acoustic energy LLDs – Loudness, energy, ZCR • Acoustic spectral LLDs – MFCCs, band energy, centroid, roll-off point, flux, entropy, moments, sharpness, harmonicity • Visual LLDs – HSV histogram – Optical Flow: histogram + mean + std.dev. – Laplacian edge image histogram + strongest edge October 4, 2012 TUM / Felix Weninger 3
  • 4. Technische Universität München “Brute-Forced” • Fully data-based approach (no pre-classification) • Little hand-crafting / engineering of features • Systematic feature (over-)generation • Emphasize on machine learning • Successful in affect recognition and speaker characterization tasks – INTERSPEECH 2009 Emotion Challenge – INTERSPEECH 2010 Paralinguistic Challenge – INTERSPEECH 2011 Intoxication / Sleepiness • Generalization? October 4, 2012 TUM / Felix Weninger 4
  • 5. Technische Universität München A Data-Based Approach • System development based on 3-fold CV of development data – „Movie-independent“ – Stratified by violence proportion and age • Use all features from development data for evaluation on test data October 4, 2012 TUM / Felix Weninger 5
  • 6. Technische Universität München „Acoustic and Visual“ • Expect complementarity of modalities • Late fusion by confidences of single-modal classifiers October 4, 2012 TUM / Felix Weninger 6
  • 7. Technische Universität München Segmentation and Classification • Two segmentations evaluated on development set: – Functionals over shots – Functionals over X sec. sliding window • Sliding window segmentation: – Classify per window – Fuse window classification per shot – Alternative: Generate segmentation • Weka, SVM (SMO), C = 0.01 • Logistic regression to obtain confidences October 4, 2012 TUM / Felix Weninger 7
  • 8. Technische Universität München TUM Test Runs Run Modality Overlap Overlap MAP100 MAP100 MAP20 Train Eval Test Dev (CV) Dev (CV) TUM-1 A+V X .484 .397 .525 TUM-2 A X .376 .445 .515 TUM-3 A X X .360 .428 .518 TUM-4 A .392 .442 .503 TUM-5 V .320 .224 .213 October 4, 2012 TUM / Felix Weninger 8
  • 9. Technische Universität München TUM Test Runs Run Modality Overlap Overlap UA Rec WA Rec Train Eval Dev Dev TUM-1 A+V X .584 .848 TUM-2 A X .648 .830 TUM-3 A X X .648 .826 TUM-4 A .634 .829 TUM-5 V .537 .832 October 4, 2012 TUM / Felix Weninger 9
  • 10. Technische Universität München Test Data: MAP 100 by Movie Movie TUM-1 (A+V) TUM-2 (A) Dead Poets Society .523 .158 Fight Club .321 .315 Independence Day .609 .656 October 4, 2012 TUM / Felix Weninger 10
  • 11. Technische Universität München Discussion • MAP very sensitive to segmentation – Ex.: MAP100 = .73, MAP20 = .88 on Dev iff segment boundaries are aligned to violent / non-violent scenes NV V Aligned: Not Aligned: ? – Train on aligned data / test on not aligned data: MAP100 = .49 • Accuracies: similar ranking, but less „sensitive“ – Correlated with target function in learning October 4, 2012 TUM / Felix Weninger 11
  • 12. Technische Universität München Conclusions and Outlook • Demonstrated feasibility of „brute-force“ approach • Acoustic features alone are often competitive • Visual features are complementary • Future: Deeper analysis of – Individual features„ worth – Influence of segmentation on model training and evaluation October 4, 2012 TUM / Felix Weninger 12
  • 13. Technische Universität München Thank you. weninger@tum.de openSMILE: http://opensmile.sourceforge.net October 4, 2012 TUM / Felix Weninger 13