Computational modeling of music emotion has been addressed primarily by two approaches: the categorical approach that categorizes emotions into mood classes and the dimensional approach that regards emotions as numerical values over a few dimensions such as valence and activation. Being two extreme scenarios (discrete/continuous), the two approaches actually share a unified goal of understanding the emotion semantics of music. This paper presents the first computational model that unifies the two semantic modalities under a probabilistic framework, which makes it possible to explore the relationship between them in a computational way. With the proposed framework, mood labels can be mapped into the emotion space in an unsupervised and content-based manner, without any training ground truth annotations for the semantic mapping. Such a function can be applied to automatically generate a semantically structured tag cloud in the emotion space. To demonstrate the effectiveness of the proposed framework, we qualitatively evaluate the mood tag clouds generated from two emotion-annotated corpora, and quantitatively evaluate the accuracy of the categorical-dimensional mapping by comparing the results with those created by psychologists, including the one proposed by Whissell & Plutchik and the one defined in the Affective Norms for English Words (ANEW).
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Exploring the Relationship Between Multi-Modal Emotion Semantics of Music
1. 1
Exploring the Relationship
Between Multi-Modal Emotion
Semantics of Music
Ju-Chiang Wang, Yi-Hsuan Yang, Kaichun
Chang, Hsin-Min Wang, and Skyh-Kang Jeng
Academia Sinica,
National Taiwan University,
Taipei, Taiwan
2. 2
Outline
• Introduction and Potentiality
• Methodology
– The ATB and AEG models
– Framework to combine the two models
• Evaluation and Result
• Conclusion
• In this presentation, mood and emotion
are exchangeable
3. 3
Introduction – Tag and Valence-Arousal (VA)
• Music emotion modeling, two approaches:
• Share a unified goal of
understanding the emotion
semantics of music
• (Arbitrary) mood tags can be
mapped into the VA space
in an unsupervised and
content-based manner,
without any training ground
truth for the semantic mapping
• Automatically generate a
semantically structured tag cloud
in the VA space
Categorical
Dimensional
Arousal
2 1
3 4
(high )
(low )
Valence
(positive )(negative )
5. 5
Potentiality (Clarifying the Debate)
• A novice user may be unfamiliar with VA model, it
would be helpful to display mood tags in the VA space
• Facilitate applications such as tag-based music search
and browsing interface
• Dimension reduction for tag visualization may result
dimensions not conforming to valence and arousal
• The VA values of some affective terms can be found,
but not elicited from music
• Affective terms are not cross lingual and not always
have exact translations in different languages
• Cultural-dependent, corpus-dependent
8. 8
Methodology of the Framework
• A probabilistic framework with two component models,
Acoustic Tag Bernoullis (ATB) and Acoustic Emotion
Gaussians (AEG)
– Computationally model the generative processes from acoustic
features to a mood tag and a VA value, respectively
• Based on the same acoustic feature space, the ATB and
AEG models can share and transit the semantic
information to each other
• Bridged by the acoustic feature space, we can align one
emotion modality to the other
• The first attempt to establish a joint model for exploring
between discrete mood categories and continuous
emotion space
9. 9
Construct Feature Reference Model
A1 A2
AK-1
AK A3A4
Global GMM for acoustic
feature encoding
EM Training
A Universal
Music Database
Acoustic GMM
Music Tracks
& Audio Signal
Frame-based Features
… …
… …
Global Set of frame
vectors randomly
selected from each track
…
Music Tracks
& Audio Signal
A Universal
Music Database
Music Tracks
& Audio Signal
10. 10
Represent a Song into Probabilistic Space
1
2
K-1
K…
Posterior
Probabilities over
the Acoustic GMM
…
A1
A2
AK-1
Acoustic GMM
AK
…
Feature Vectors
Histogram:
Acoustic GMM Posterior
prob
Each dim corresponds to a specific acoustic pattern
1 2 K-1 K…
11. 11
Acoustic Tag Bernoullis (ATB)
• Given an mood-tagged music dataset with the binary
label for a mood tag
• Learn ATB that describes the generative process of each
song in the dataset from acoustic features to mood tag
• Won (AUC Clip) in Mood Tag Classification (MIREX2009,
2010)
12. 12
Acoustic Emotion Gaussians (AEG)
• Given a VA-annotated music dataset
• Learn AEG that describes the generative process of
each song in the dataset from acoustic features to the
VA space
• Presented in OS2, superior to its rivals, SVR and MLR
14. 14
Multi-Modal Emotion Semantic Mapping
• Three models are aligned, ATB, Acoustic GMM, and AEG
• Transit the weights from a mood tag to the VA GMM
• The semantic mapping processes are transparent and
easy to be observed and interpreted
Mapping a tag into a VA Gaussian distribution
15. 15
Evaluation – Corpora and Settings
• Two corpora used: MER60 and AMG1644
• MER60: jointly annotated corpus (MER60-alone setting)
– 60 music clips, each is 30-second
– 99 subjects in total, each clip annotated by 40 subjects
– The VA values are entered by clicking on the emotion space
on a computer display
– Query Last.fm and leave 50 top mood tags for the 60 songs
• AMG1644: used for the separately annotated corpora
(AMG1644-MER60 setting)
– Crawl the audio of the “top songs” for 33 mood tags (AMG),
most of the tags are used in MIREX mood classification task
– Leading to 1,644 clips, each is about 30-second
16. 16
Acoustic Features
• Adopt the bag-of-frames representation
• Extracting frame-based musical features from audio
using the MIRToolbox 1.3
• All the frames of a clip are aggregated into the acoustic
GMM posterior and perform the analysis of emotion at
the clip-level, instead of frame-level
• Frame-based features
– Dynamic, spectral, timbre, and tonal
– 70-dim concatenated feature vector for a frame
17. 17
Result for the MER60-Alone Setting
• Graphviz for visualization, Voronoi diagram-based
heuristic to avoid tag overlapping
18. 18
• Graphviz for visualization, Voronoi diagram-based
heuristic to avoid tag overlapping
Result for the AMG-MER Setting
19. 19
Comparison with Psychologist
• Quantitative comparison
– Refer to the VA values of 30 affective terms proposed by
Whissell and Plutchik (WP) and by the Affective Norms for
English Words (ANEW)
– For a tag, measure the Euclidean distance between the
generated VA value and the psychologists’ one
• Baseline
– Set the generated VA values of each tag to the origin
– Represent a non-effective tag-VA mapping
20. 20
Discussion
• The result is not sensitive to K
• Such a learning-based framework is scalable and can do
better if more annotated data is available
• Automatic discovering
– For instance, construct a balance audio music corpus and let
Chinese to label the Chinese mood tags
– Generate a Chinese mood tag cloud
• Inverse correlation between the VA intensity and the
covariance of a tag
– Tags lying on the outer circle would have larger font sizes
22. 22
Conclusion
• A novel framework that unifies the categorical and
dimensional emotion semantics of music
• Demonstrated how to map a mood tag to a 2-D VA
Gaussian and generate the corresponding tag cloud,
and this can be further extended to arbitrary tags
• Verify whether an arbitrary tag is mood-related or not
• We will conduct user studies for the result
• More investigations in acoustic feature
representations for better generalization of the
emotion modeling