UiPath Community: Communication Mining from Zero to Hero
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets
1. Technische Universität München
Violent Scenes Detection with
Large, Brute-Forced Acoustic and Visual
Feature Sets
Florian Eyben, Felix Weninger, Nicolas Lehment,
Gerhard Rigoll, Björn Schuller
Institute for Human-Machine Communication,
Technische Universität München
Session “Affect Task: Violent Scenes Detection”
October 4, 2012
2. Technische Universität München
“Large”
• Start with frame-wise features (audio / video)
• Summarize over „meaningful unit“
– Shot?
– Sliding window?
– Overlap?
• Application of functionals:
– Percentiles, moments, …
• Results in 3.8k audio and 9.7k video features
October 4, 2012 TUM / Felix Weninger 2
3. Technische Universität München
Frame-Wise Features (LLDs)
• Acoustic energy LLDs
– Loudness, energy, ZCR
• Acoustic spectral LLDs
– MFCCs, band energy, centroid, roll-off
point, flux, entropy, moments, sharpness, harmonicity
• Visual LLDs
– HSV histogram
– Optical Flow: histogram + mean + std.dev.
– Laplacian edge image histogram + strongest edge
October 4, 2012 TUM / Felix Weninger 3
4. Technische Universität München
“Brute-Forced”
• Fully data-based approach (no pre-classification)
• Little hand-crafting / engineering of features
• Systematic feature (over-)generation
• Emphasize on machine learning
• Successful in affect recognition and speaker
characterization tasks
– INTERSPEECH 2009 Emotion Challenge
– INTERSPEECH 2010 Paralinguistic Challenge
– INTERSPEECH 2011 Intoxication / Sleepiness
• Generalization?
October 4, 2012 TUM / Felix Weninger 4
5. Technische Universität München
A Data-Based Approach
• System development based on 3-fold CV of
development data
– „Movie-independent“
– Stratified by violence proportion and age
• Use all features from development data for evaluation
on test data
October 4, 2012 TUM / Felix Weninger 5
6. Technische Universität München
„Acoustic and Visual“
• Expect complementarity of modalities
• Late fusion by confidences of single-modal classifiers
October 4, 2012 TUM / Felix Weninger 6
7. Technische Universität München
Segmentation and Classification
• Two segmentations evaluated on development set:
– Functionals over shots
– Functionals over X sec. sliding window
• Sliding window segmentation:
– Classify per window
– Fuse window classification per shot
– Alternative: Generate segmentation
• Weka, SVM (SMO), C = 0.01
• Logistic regression to obtain confidences
October 4, 2012 TUM / Felix Weninger 7
8. Technische Universität München
TUM Test Runs
Run Modality Overlap Overlap MAP100 MAP100 MAP20
Train Eval Test Dev (CV) Dev (CV)
TUM-1 A+V X .484 .397 .525
TUM-2 A X .376 .445 .515
TUM-3 A X X .360 .428 .518
TUM-4 A .392 .442 .503
TUM-5 V .320 .224 .213
October 4, 2012 TUM / Felix Weninger 8
9. Technische Universität München
TUM Test Runs
Run Modality Overlap Overlap UA Rec WA Rec
Train Eval Dev Dev
TUM-1 A+V X .584 .848
TUM-2 A X .648 .830
TUM-3 A X X .648 .826
TUM-4 A .634 .829
TUM-5 V .537 .832
October 4, 2012 TUM / Felix Weninger 9
10. Technische Universität München
Test Data: MAP 100 by Movie
Movie TUM-1 (A+V) TUM-2 (A)
Dead Poets Society .523 .158
Fight Club .321 .315
Independence Day .609 .656
October 4, 2012 TUM / Felix Weninger 10
11. Technische Universität München
Discussion
• MAP very sensitive to segmentation
– Ex.: MAP100 = .73, MAP20 = .88 on Dev iff segment
boundaries are aligned to violent / non-violent scenes
NV V
Aligned:
Not Aligned: ?
– Train on aligned data / test on not aligned data: MAP100 = .49
• Accuracies: similar ranking, but less „sensitive“
– Correlated with target function in learning
October 4, 2012 TUM / Felix Weninger 11
12. Technische Universität München
Conclusions and Outlook
• Demonstrated feasibility of „brute-force“ approach
• Acoustic features alone are often competitive
• Visual features are complementary
• Future: Deeper analysis of
– Individual features„ worth
– Influence of segmentation on model training and evaluation
October 4, 2012 TUM / Felix Weninger 12
13. Technische Universität München
Thank you.
weninger@tum.de
openSMILE: http://opensmile.sourceforge.net
October 4, 2012 TUM / Felix Weninger 13