SlideShare una empresa de Scribd logo
1 de 19
Competence Center Information Retrieval & Machine Learning
11th International Workshop on Content-Based Multimedia Indexing (CBMI), Veszprem, Hungary, 2013
Detecting Violent Content in Hollywood Movies by Mid-level
Audio Representations
Esra Acar
Esra Acar, Frank Hopfgartner, Sahin Albayrak
Outline
217. Juni 2013 CBMI‘2013
► Motivation
► The Violence Detection Method
 Audio Representation of Videos
 Learning Violence Detection Model
► Performance Evaluation
► Conclusions & Future Work
Motivation
317. Juni 2013 CBMI‘2013
► Goal: the detection of most violent scenes in Hollywood
movies.
► Use case: Parents select or reject movies by previewing parts of
the movies that include the most violent moments.
► We investigate the discriminative power of mid-level audio
features
 Bag-of-Audio Words (BoAW) representations based on Mel-
Frequency Cepstral Coefficients (MFCCs)
 Two different BoAW construction methods
Vector quantization-based (VQ-based) method, and
Sparse coding-based (SC-based) method
The Violence Detection Method
417. Juni 2013 CBMI‘2013
►The definition of violence: “physical violence or
accident resulting in human injury or pain”
“violence” as defined in the MediaEval Violent
Scenes Detection (VSD) task.
►Two main components of the method:
The representation of video shots
The learning of a violence model
Audio Representation of Videos (1)
517. Juni 2013 CBMI‘2013
► Mel-Frequency Cepstral Coefficients (MFCCs)
 are commonly used in speech recognition and music
information retrieval (e.g., genre classification).
 relate better to human perception.
 work well for the detection of excitement/non-excitement
(i.e., indicators of the excitement level of video segments).
► MFCC-based audio representation is employed for the
description of the audio content of Hollywood movies.
► Using mid-level representations may help modeling video
segments one step closer to human perception. Examples are:
 bags of features,
 the upper units of convolutional networks or deep belief
networks
Audio Representation of Videos (2)
617. Juni 2013 CBMI‘2013
► We use mid-level audio features based on MFCCs (i.e., BoAW
approach).
► The BoAW approach with two different coding schemes
 Vector quantization (by k-means clustering)
dividing feature vectors into groups, where each group is
represented by its centroid point (e.g., k-means clustering
algorithm).
 Sparse coding (by the LARS algorithm)
representing a feature vector as a linear combination of an over-
complete set of basis vectors.
Audio Representation of Videos (3)
717. Juni 2013 CBMI‘2013
Dictionary Generation Phase
Audio Representation of Videos (4)
817. Juni 2013 CBMI‘2013
Representation Construction Phase
Learning Violence Detection Model
917. Juni 2013 CBMI‘2013
Learning a Violence Model
Performance Evaluation
1017. Juni 2013 CBMI‘2013
► Dataset:
 32,708 video shots from 18 Hollywood movies of different genres
(ranging from extremely violent movies to movies without
violence).
Training set: 26,138 video shots from 15 movies.
Test set: 6,570 video shots from 3 movies.
► Ground truth:
 generated by 7 human assessors. Violent movie segments are
annotated at the frame-level.
 Each video shot is labeled as violent or non-violent.
The characteristics of training and test datasets
Evaluation Metrics
1117. Juni 2013 CBMI‘2013
► The ranking of violent shots are more important for the use
case.
► Metrics other than precision and recall are required to
compare the performance.
► Average precision at 20 & 100 are used (official metrics in the
MediaEval VSD task)
► R-precision which can be seen as an alternative to the precision
at k.
Results & Discussions (1)
1217. Juni 2013 CBMI‘2013
Average Precision at 100 for the Baseline and Our Methods
Average Precision at 20 & 100 and R-precision
for the VQ- and SC-based methods
Results & Discussions (2)
1317. Juni 2013 CBMI‘2013
Average Precision at 20 & 100 and R-precision on Independence Day
Average Precision at 20 & 100 and R-precision on Dead Poets Society
Average Precision at 20 & 100 and R-precision on Fight Club
Results & Discussions (3)
1417. Juni 2013 CBMI‘2013
Team Features Modality APat100*
ARF Color, texture, audio and concepts audio-visual 0.651
Shanghai-
Hong Kong
Trajectory-based features, SIFT, STIP, MFCCs audio-visual 0.624
TEC Color, motion, acoustic features audio-visual 0.618
TUM Acoustic energy and spectral, color, texture,
optical flow
audio-visual 0.484
SC-based
(ours)
BoAW with sparse coding audio 0.444
VQ-based
(ours)
BoAW with vector quantization audio 0.387
LIG-MIRM Color, texture, bag of SIFT and MFCCs audio-visual 0.314
NII Visual concepts learned from color and
texture
visual 0.308
DYNI-LSIS Multi-scale local binary pattern visual 0.125
* Average Precision at 100 (the official evaluation metric of the MediaEval VSD task)
Sample Video Shots (Correctly Classified)
1517. Juni 2013 CBMI‘2013
Sample Video Shots (Wrongly Classified)
1617. Juni 2013 CBMI‘2013
Conclusions
1717. Juni 2013 CBMI‘2013
► An approach for movie violent content detection at video shot
level is presented.
► Mid-level audio features based on BoAW approach with two
different coding schemes are employed.
► Promising results are obtained
 the SC-based BoAW outperforms all uni-modal submissions in
the MediaEval VSD task except one vision-based method.
► One significant point is that the average precision variation of
the proposed method is high for movies of varying violence
levels.
Future Work
1817. Juni 2013 CBMI‘2013
► Construction of more sophisticated mid-level representations
for video content analysis.
► Augmenting the feature set by including visual features (both
low-level and mid-level) helps further improving classification.
► Extend our approach to user-generated videos.
 Different from Hollywood movies, these videos are not
professionally edited, e.g., in order to enhance dramatic
scenes.
1917. Juni 2013 CBMI‘2013
THANKS!
QUESTIONS?

Más contenido relacionado

Similar a Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations

Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Editor IJARCET
 
An In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre CategorizationAn In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre CategorizationIonut Mironica
 
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalFisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalIonut Mironica
 
TVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using TitlesTVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using TitlesNEERAJ BAGHEL
 
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...multimediaeval
 
Media Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media InterestingnessMedia Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media InterestingnessBenoit HUET
 
ppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptxppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptxssusera4da91
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...MediaMixerCommunity
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clusteringcsandit
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clusteringcsandit
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clusteringcsandit
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
 
AMATH582_Final_Poster
AMATH582_Final_PosterAMATH582_Final_Poster
AMATH582_Final_PosterMark Chang
 
Ac02417471753
Ac02417471753Ac02417471753
Ac02417471753IJMER
 
image processing image processing image processing
image processing  image processing  image processingimage processing  image processing  image processing
image processing image processing image processingSportsAcademy1
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Action event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for eventAction event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for eventIAEME Publication
 
Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...IAEME Publication
 

Similar a Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations (20)

Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351
 
An In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre CategorizationAn In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre Categorization
 
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalFisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
 
TVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using TitlesTVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using Titles
 
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
 
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
 
Media Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media InterestingnessMedia Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media Interestingness
 
ppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptxppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptx
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
C04841417
C04841417C04841417
C04841417
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
 
AMATH582_Final_Poster
AMATH582_Final_PosterAMATH582_Final_Poster
AMATH582_Final_Poster
 
Ac02417471753
Ac02417471753Ac02417471753
Ac02417471753
 
image processing image processing image processing
image processing  image processing  image processingimage processing  image processing  image processing
image processing image processing image processing
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Action event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for eventAction event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for event
 
Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations

  • 1. Competence Center Information Retrieval & Machine Learning 11th International Workshop on Content-Based Multimedia Indexing (CBMI), Veszprem, Hungary, 2013 Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations Esra Acar Esra Acar, Frank Hopfgartner, Sahin Albayrak
  • 2. Outline 217. Juni 2013 CBMI‘2013 ► Motivation ► The Violence Detection Method  Audio Representation of Videos  Learning Violence Detection Model ► Performance Evaluation ► Conclusions & Future Work
  • 3. Motivation 317. Juni 2013 CBMI‘2013 ► Goal: the detection of most violent scenes in Hollywood movies. ► Use case: Parents select or reject movies by previewing parts of the movies that include the most violent moments. ► We investigate the discriminative power of mid-level audio features  Bag-of-Audio Words (BoAW) representations based on Mel- Frequency Cepstral Coefficients (MFCCs)  Two different BoAW construction methods Vector quantization-based (VQ-based) method, and Sparse coding-based (SC-based) method
  • 4. The Violence Detection Method 417. Juni 2013 CBMI‘2013 ►The definition of violence: “physical violence or accident resulting in human injury or pain” “violence” as defined in the MediaEval Violent Scenes Detection (VSD) task. ►Two main components of the method: The representation of video shots The learning of a violence model
  • 5. Audio Representation of Videos (1) 517. Juni 2013 CBMI‘2013 ► Mel-Frequency Cepstral Coefficients (MFCCs)  are commonly used in speech recognition and music information retrieval (e.g., genre classification).  relate better to human perception.  work well for the detection of excitement/non-excitement (i.e., indicators of the excitement level of video segments). ► MFCC-based audio representation is employed for the description of the audio content of Hollywood movies. ► Using mid-level representations may help modeling video segments one step closer to human perception. Examples are:  bags of features,  the upper units of convolutional networks or deep belief networks
  • 6. Audio Representation of Videos (2) 617. Juni 2013 CBMI‘2013 ► We use mid-level audio features based on MFCCs (i.e., BoAW approach). ► The BoAW approach with two different coding schemes  Vector quantization (by k-means clustering) dividing feature vectors into groups, where each group is represented by its centroid point (e.g., k-means clustering algorithm).  Sparse coding (by the LARS algorithm) representing a feature vector as a linear combination of an over- complete set of basis vectors.
  • 7. Audio Representation of Videos (3) 717. Juni 2013 CBMI‘2013 Dictionary Generation Phase
  • 8. Audio Representation of Videos (4) 817. Juni 2013 CBMI‘2013 Representation Construction Phase
  • 9. Learning Violence Detection Model 917. Juni 2013 CBMI‘2013 Learning a Violence Model
  • 10. Performance Evaluation 1017. Juni 2013 CBMI‘2013 ► Dataset:  32,708 video shots from 18 Hollywood movies of different genres (ranging from extremely violent movies to movies without violence). Training set: 26,138 video shots from 15 movies. Test set: 6,570 video shots from 3 movies. ► Ground truth:  generated by 7 human assessors. Violent movie segments are annotated at the frame-level.  Each video shot is labeled as violent or non-violent. The characteristics of training and test datasets
  • 11. Evaluation Metrics 1117. Juni 2013 CBMI‘2013 ► The ranking of violent shots are more important for the use case. ► Metrics other than precision and recall are required to compare the performance. ► Average precision at 20 & 100 are used (official metrics in the MediaEval VSD task) ► R-precision which can be seen as an alternative to the precision at k.
  • 12. Results & Discussions (1) 1217. Juni 2013 CBMI‘2013 Average Precision at 100 for the Baseline and Our Methods Average Precision at 20 & 100 and R-precision for the VQ- and SC-based methods
  • 13. Results & Discussions (2) 1317. Juni 2013 CBMI‘2013 Average Precision at 20 & 100 and R-precision on Independence Day Average Precision at 20 & 100 and R-precision on Dead Poets Society Average Precision at 20 & 100 and R-precision on Fight Club
  • 14. Results & Discussions (3) 1417. Juni 2013 CBMI‘2013 Team Features Modality APat100* ARF Color, texture, audio and concepts audio-visual 0.651 Shanghai- Hong Kong Trajectory-based features, SIFT, STIP, MFCCs audio-visual 0.624 TEC Color, motion, acoustic features audio-visual 0.618 TUM Acoustic energy and spectral, color, texture, optical flow audio-visual 0.484 SC-based (ours) BoAW with sparse coding audio 0.444 VQ-based (ours) BoAW with vector quantization audio 0.387 LIG-MIRM Color, texture, bag of SIFT and MFCCs audio-visual 0.314 NII Visual concepts learned from color and texture visual 0.308 DYNI-LSIS Multi-scale local binary pattern visual 0.125 * Average Precision at 100 (the official evaluation metric of the MediaEval VSD task)
  • 15. Sample Video Shots (Correctly Classified) 1517. Juni 2013 CBMI‘2013
  • 16. Sample Video Shots (Wrongly Classified) 1617. Juni 2013 CBMI‘2013
  • 17. Conclusions 1717. Juni 2013 CBMI‘2013 ► An approach for movie violent content detection at video shot level is presented. ► Mid-level audio features based on BoAW approach with two different coding schemes are employed. ► Promising results are obtained  the SC-based BoAW outperforms all uni-modal submissions in the MediaEval VSD task except one vision-based method. ► One significant point is that the average precision variation of the proposed method is high for movies of varying violence levels.
  • 18. Future Work 1817. Juni 2013 CBMI‘2013 ► Construction of more sophisticated mid-level representations for video content analysis. ► Augmenting the feature set by including visual features (both low-level and mid-level) helps further improving classification. ► Extend our approach to user-generated videos.  Different from Hollywood movies, these videos are not professionally edited, e.g., in order to enhance dramatic scenes.
  • 19. 1917. Juni 2013 CBMI‘2013 THANKS! QUESTIONS?