SlideShare a Scribd company logo
1 of 28
Download to read offline
10/5/2012




  ISMIR 2012 Tutorial 2                                  Speaker
             Music Affect Recognition:
             The State-of-the-art and
                 Lessons Learned
     Xiao Hu, Ph.D            Yi-Hsuan Eric Yang, Ph.D
The University of Hong Kong   Academic Sinica, Taiwan




 10/5/2012                                         1      10/5/2012                                     2




Speaker                                                                  The Audience
                                                              Do you believe that music is powerful?
                                                              Why do you think so?
                                                              Have you searched for music by affect?
                                                              Have you searched for other things (photos,
                                                              video) by affect?
                                                              Have you questioned the difference between
                                                              emotion and mood?
                                                              Is your research related to affect?

 10/5/2012                                         3      10/5/2012                                     4




 Music Affect:                                            Music Affect:




 10/5/2012                                         5      10/5/2012                                     6




                                                                                                               1
10/5/2012




Music Affect:                                       Music Affect:




10/5/2012                                       7   10/5/2012                                                  8




                       Agenda                                              Agenda
        Grand challenges on music affect                Grand challenges on music affect
        Music affect taxonomy and annotation            Music affect taxonomy and annotation
        Automatic music affect analysis                 Automatic music affect analysis
            Categorical approach                            Categorical approach
            Multimodal approach                             Multimodal approach
            Dimensional approach                            Dimensional approach
            Temporal approach                               Temporal approach
        Beyond music                                    Beyond music
        Conclusion                                      Conclusion
10/5/2012                                       9   10/5/2012                                                 10




 Emotion or Mood ?                                  Emotion or Mood ?
                                                        Mood: “relatively permanent and
                                                        stable”
                                                        Emotion: “temporary and evanescent”
                                                        "most of the supposed [psychological]
                                                        studies of emotion in music are
                                                        actually concerned with mood and               Leonard
                                                        association."                                  Meyer


                                                    Meyer, Leonard B. (1956). Emotion and Meaning in Music.
                                                    Chicago: Chicago University Press
10/5/2012                                      11   10/5/2012                                                 12




                                                                                                                     2
10/5/2012




               Expressed or Induced                                                 Which Moods? 1/2
                                                                      Different websites / studies use different terms
    Designated/indicated/expressed by a
    music piece
    Induced/evoked/felt by a listener
    Both are studied in MIR
                                                              Thayer’s stress-energy model gives 4 clusters       Farnsworth’s 10 adjective groups
        Mainly differ in the ways of collecting labels
        “indicate how you feel when listen to the music”
        “indicate the mood conveyed by the music”
                                                                                                                                  Tellegen-Watson-
                                                                                                                                  Clark model


10/5/2012                                                13       10/5/2012                                                                          14




               Which Moods ? 2/2                                              Sources of Music Emotion
                                                                      Intrinsic (structural characteristics of the music)
    Lack of a general theory of emotions                                  e.g., modality -> happy vs. sad
        Ekman’s 6 basic emotions:
                                                                          What about melody?
        anger, joy, surprise disgust, sadness, fear
                                                                      Extrinsic emotion (semantic context related but outside the
                                                                      music)
                                                                      Lee et al., (2012) identified a range of factors in people’s
                                                                      assessment of music mood
    Verbalization of emotional states is
                                                                          Lyrics, tempo, instrumentation, genre, delivery, and even cultural
    often a “distortion” (Meyer, 1956)
                                                                          context
        “unspeakable feelings”
                                                                          Little has been known on the mapping of these factors to music mood
        “ a restful feeling throughout ... like one of
        going downstream while swimming”

                                                                   Lee, J. H., Hill, T., & Work, L. (2012) What does music mood mean for real
10/5/2012                                                15       10/5/2012                                                                   16
                                                                   users? Proceedings of the iConference




Let’s ask the users… (Lee et al., 2012)                                                 Data, data, data!
                                                                    Extremely scarce resource
                                                                    Annotations are time consuming
                                                                    Consistency is low across annotators
                                                                    Existent public datasets on mood:
                                                                         MoodSwings Turk dataset
                                                                              240 30-sec clips; Arousal – Valence scores
                                                                         MIREX mood classification task
                                                                              600 30-sec clips; in 5 mood clusters
                                                                         MIREX tag classification task (mood sub-task)
                                                                              3,469 30-sec clips; in 18 mood-related tag groups
                                                                         Yang’s emotion regression dataset
                                                                              193 25-sec clips; in 11 levels Arousal Valence scale

10/5/2012                                                17       10/5/2012                                                                          18




                                                                                                                                                            3
10/5/2012




            Suboptimal Performance                                                           Newer Challenges
    MIREX Mood Classification (2012)                                          Cross-cultural applicability
        Accuracy: 46% - 68%                                                       Existent efforts focus on Western music
                                                                                  OS1 @ ISMIR 2012 (tomorrow): Yang & Hu: Cross-cultural Music
    MIREX Tag Classification mood subtask(2011)                                   Mood Classification: A Comparison on English and Chinese Songs
                                                                              Personalization
                                                                                  Ultimate solution to the subjectivity problem
                                                                              Contextualization
                                                                                  Even the same person’s emotional responses change in different
                                                                                  time, location, occasions
                                                                                  PS1 @ ISMIR 2012 (Tomorrow) Watson & Mandryk: Modeling
                                                                                  Musical Mood From Audio Features and Listening Context on an In-
                                                                                  Situ Data Set

10/5/2012                                                            19   10/5/2012                                                                20




            Summary of Challenges                                                                        Agenda
                                                                                  Grand challenges on music affect
        Terminology                                                               Music affect taxonomy and annotation
        Models and categories                                                     Automatic music affect analysis
            No consensus
                                                                                      Categorical approach
        Sources and factors
                                                                                      Multimodal approach
            No clear mapping between sources and affects
                                                                                      Dimensional approach
        Data scarcity
                                                                                      Temporal approach
        Suboptimal performances
        Newer issues
                                                                                  Beyond music
            Cross-cultural, personalization, contextualization,...                Conclusion
10/5/2012                                                            21   10/5/2012                                                                22




            Music affect taxonomy and
                                                                                                    Taxonomy
                    annotation
                                                                            Domain oriented controlled vocabulary
    Background                                                              Contain labels (metadata)
        What are taxonomies?                                                Commonly used on websites
        Taxonomy vs. Folksonomy                                                  Pick list; browsable directory, etc.
    Developing music mood taxonomies
        Taxonomy from Editorial Labels
        Taxonomies from Social Tags
    Annotations
        Experts
        Crowdsourcing (e.g., MTurks, games)
        Subjects
        Derived from online services
10/5/2012                                                            23   10/5/2012                                                                24




                                                                                                                                                          4
10/5/2012




             Taxonomy vs. Folksonomy                                      Models in Music Psychology 1/2
     Taxonomy                                                                 Categorical
         Controlled, structured vocabulary                                       Hevner’s
         Often require expert knowledge
                                                                               adjective circle
         Top-down and bottom up approaches
         e.g.,
                                                                               (1936)
     Folksonomy                                                            Hevner, K. 1936.
                                                                           Experimental studies
         Uncontrolled, unstructured vocabulary                             of the elements of
         Social tags freely applied by users                               expression in music.
                                                                           American Journal of
         Commonality exists in large number of tags                        Psychology, 48
         e.g.,
 10/5/2012                                                     25         10/5/2012                                                                          26




 Models in Music Psychology 2/2                                            Borrow from Psychology to MIR
   Dimensional
      Russell’s
    circumplex
                                                                      Thayer’s stress-energy model gives 4 clusters    Farnsworth’s 10 adjective groups
    model
 Russell, J. A. 1980. A                                                                                               Grounded in music
 circumplex model of                                                                                                  perception research, but
 affect. Journal of
 Personality and Social
                                                                                                                      lack social context of music
 Psychology, 39: 1161-                                                                                                listening (Juslin & Laukka,
 1178.                                                                                                                2004)
                                                                       Tellegen-Watson-Clark model
                                                                          Juslin, P. N. and Laukka, P. (2004). Expression, perception, and induction of musical
 10/5/2012                                                     27         emotions: a review and a questionnaire study of everyday listening. JNMR.
                                                                          10/5/2012                                                                          28




Taxonomy Built from Editorial Labels
• Editorial labels:
-Given by professional
editors of online
repositories
-Have a certain level of
control
- Rooted in realistic social
contexts


              allmusic.com: “the most comprehensive music reference
              source on the planet”
              288 mood labels created and assigned to music works
 10/5/2012                                                     29         10/5/2012                                                                          30




                                                                                                                                                                   5
10/5/2012




                Mood Label Clustering                                              A Taxonomy of 5 Mood Clusters
   Mood labels for albums                       Mood labels for songs
                                                                                 Cluster_1:
                                                                                        passionate, rousing, confident, boisterous, rowdy
                                                                                 Cluster_2:
                                                                                        rollicking, cheerful, fun, sweet, amiable/good natured
                                                                                 Cluster_3:
                                                                                        literate, poignant, wistful, bittersweet, autumnal, brooding
                                                                                 Cluster_4:
                                                                                        humorous, silly, campy, quirky, whimsical, witty, wry
                                                                                 Cluster_5:
                                                                                          aggressive, fiery, tense/anxious, intense, volatile, visceral
C1 C2           C3 C4          C5        C4          C1       C3 C2 C5
  Hu, X., & Downie, J. S. (2007). Exploring Mood Metadata: Relationships with
10/5/2012
  Genre, Artist and Usage Metadata. In Proceedings of ISMIR                 31    10/5/2012                                                          32




            Taxonomy from Social Tags                                                                    The Method
   Social tags                                                                      1,586 terms in WordNet-Affect (a lexicon of affective words)
    Pros:                                    “The largest music tagging             – 202 evaluation terms in General Inquirer
                                               site for Western music”
       Users’ perspectives                                                                          (“good”, “great”, “poor”, etc.)
       Large quantity                                                               – 135 non-affect/ ambiguous terms by experts
                                                                                                    ( “cold”, “chill”, “beat”, etc.)
                                                                                    = 1,249 terms
    Cons:
       Non-standardized                                                                       476 terms are last.fm tags
                                          Linguistic Resources
       Ambiguous                          Human Expertise                                     group the tags by WordNet-Affect and experts
                                                                                              => 36 categories
   Hu, X. (2010). Music and Mood: Where Theory and Reality Meet. In
   Proceedings of the 5th iConference, (Best Student Paper).
    10/5/2012                                                               33    10/5/2012                                                          34




                    2-D Mood Taxonomy                                                Comparison to Russell’s 2-D Model
        2-Dimensional Representation




    10/5/2012                                                               35    10/5/2012                                                          36
                                    10/5/2012




                                                                                                                                                            6
10/5/2012




                      Our Taxonomy                                                              Laurier et al. (2009) Taxonomy from
                                                                                                           Social Tags 1/2
                                                                                                   Manually compiled 120 mood words from the literature




                                              VALENCE
                                                                                                   Crawled 6.8M social tags from last.fm
                                                                                                   107 unique tags matched mood words
                                                                                                   80 tags with more than 100 occurrences
                                                                                                             Most used             Least used
                                                                                                             sad                   rollicking
 AROUSAL
                                                                                                             fun                   solemn
                                                                                                             melancholy            rowdy
                                                                                                             happy                 tense

                                                                                              Laurier et al. (2009) Music mood representations from social tags, ISMIR
10/5/2012                                                                            37        10/5/2012                                                                  38




Laurier et al. (2009) Taxonomy from                                                                Agreement between Laurier’s and
           Social Tags 2/2                                                                              the 5 cluster taxonomy
• Used LSA to project tag-track matrix to a space of 100 dim.
                                                                                                   Based on Laurier’s 100-dimensional space
• Clustering trials with varied number of clusters
                                                                                                Intra-cluster similarity                  Inter-cluster dissimilarity
                                  cluster 1              cluster 2    cluster 3   cluster 4
                                    angry                  sad         tender      happy                                                    C1    C2 C3 C4 C5
                                 aggressive bittersweet               soothing     joyous
                                                                                                                                     C1     0     .74 .13 .20 .11
                                   visceral             sentimental    sleepy      bright
                                   rousing                tragic      tranquil    cheerful
                                                                                                                                     C2           0   .86 .82 .88
                                   intense              depressing     quiet      humorous                                           C3                  0     .32 .27
                                  confident              sadness        calm        gay                                              C4                        0         .53
                                    anger                 spooky       serene     amiable
                                                                                                                                     C5                                  0
                                    +A –V                  -A –V       -A +V       +A +V


Laurier et al. (2009) Music mood representations from social tags, ISMIR39                      Laurier et al. (2009) Music mood representations from social tags, ISMIR
10/5/2012                                                                                      10/5/2012                                                                  40




            Summary on Taxonomy                                                                               Mood Annotations
    What are taxonomies?                                                                           All annotation needs three things
    Taxonomy vs. Folksonomy                                                                           taxonomy, music, people
                                                                                                   People
    Developing music mood taxonomies
                                                                                                      Experts
        from Editorial Labels
                                                                                                      Subjects
        from Social Tags
                                                                                                      Crowdsourcing (e.g., MTurks, games)
                                                                                                   Derive annotations from online services



10/5/2012                                                                            41        10/5/2012                                                                  42




                                                                                                                                                                                 7
10/5/2012




                       Expert Annotation                                            Expert Annotation: MIREX AMC
      The MIREX Audio Mood Classification (AMC) task                                                                 2468 judgments collected (3750
                                                                                   •Each expert had 250 clips
           5 cluster taxonomy                                                      • 8 of 21 experts finished all planned)
           1,250 tracks selected from the APM libraries                            assignments                       Each clips had 2 or 3 judgments
           A Web-based annotation system called E6K                                                                  Avg. Cohen’s Kappa: 0.5

                                                                                                      Dataset built from agreements among experts
                                                                                    Agreements               C1    C2       C3       C4   C5    Total   Accuracy
                                                                                    3 of 3 judges            21    24        56      21   31     153        0.59
                                                                                    2 of 3 judges            41    35        18      26   14     134        0.38
                                                                                    2 0f 2 judges            58    61        46      73    75    313        0.54
                                                                                                  Total     120   120       120     120   120    600

                                                                                     Lessons: 1. Missed judgments -> low accuracy
Hu, X., Downie, J. S., Laurier, C., Bay, M., & Ehmann, A. (2008). The 2007 MIREX              2. Need more motivated annotators
 10/5/2012                                                                    43    10/5/2012                                                                44
Audio Mood Classification Task: Lessons Learned. In ISMIR.




    Crowdsourcing: Amazon Mechanic Turk                                               Annotation: Amazon Mechanic Turk
•     Lee & Hu (2012): compare expert and MTurk annotations                           Human Intelligence Task
      •  The same 1,250 music clips as in MIREX AMC                                (HIT)
      •  The same 5 clusters                                                          Each HIT had 27 clips
      •  Annotators: “Turkers” who work on human intelligent                          2 duplicates for consistency
      tasks for very low payment                                                   check
                                                                                      Each clips had 2 judges
•     Advantages of MTurk                                                             Paid 0.55 USD for 1 HIT
      •  Plenty of labor                                                              Qualification test before
•     Disadvantages of MTurk                                                       proceeding to task
      •  Quality control                                                              186 HITs collected
                                                                                      100 HITs accepted
Lee, J. H. & Hu, X. (2012) Generating Ground Truth for Music Mood Classification      Avg. Cohen’s kappa: 0.48
Using Mechanical Turk, In Proceedings of Joint Conference on Digital Libraries
 10/5/2012                                                                  45      10/5/2012                                                                46




    Comparison: Stats on Collecting Data                                               Comparison: Agreement Rates
                     EVALUTRON 6000                                                                       EVALUTRON 6000

                           Number of Judgments Collected                                 2   22           % of clips with                 % of clips with
                 2468 (incomplete)                  2500 (complete)                                        agreements                      agreements
                       Total Time for Collecting All Judgments                         C1                    40.2%                 C1        39.6%
                                                                                       C2                    60.2%                 C2        48.9%
                       38 days                          19 days
                (+ additional in-house                                                 C3                    70.5%                 C3        69.5%
                     assessment)                                                       C4                    39.6%                 C4        46.3%
                           Cost for Collecting All Judgments
                                                                                       C5                    70.8%                 C5        60.0%
                                                                                      Other                  16.9%                Other      21.3%
                         $0                             $60.50
                       Average Time Spent on Each Music Clip
    10/5/2012      21.54 seconds                     17.46 seconds          47      10/5/2012                                                                48




                                                                                                                                                                    8
10/5/2012




     Comparison: Confusions among
                                                                                       Confusions Shown in Russell’s Model
               Clusters
                                                          Disagreed IN
            Clusters          Disagreed in E6K
                                  EVALUTRON 6000
                                                             MTurk                                 Cluster           Cluster              Cluster
Cluster 1 & Cluster 2                 20                         95                                   5                 1                    2
Cluster 2 & Cluster 4                 31                         86
Cluster 1 & Cluster 5                 13                         74
               ⁞                       ⁞                          ⁞                                                                        Cluster
Cluster 3 & Cluster 4                  6                         27
                                                                                                      Cluster                                 4
Cluster 2 & Cluster 5                  1                         22
Cluster 3 & Cluster 5                  1                         20                                      3
             Total                   253                        595
10/5/2012                                                                 49           10/5/2012                                                                    50




  Comparison: System Performances
                                                                                                   Crowdsourcing: Games
          (MIREX 2007)
                                                                                         MoodSwings (Kim et al., 2008)
                   EVALUTRON 6000
                                                                                              2-player Web-based game to collect
                                                                                              annotations of music pieces in the
                                                                                              arousal- valence space
                                                                                              Time-varying annotations are
                                                                                              collected at a rate of 1 sample per
                                                                                              second
                                                                                              Players “score” for agreement with
                                                                                              their competitor




                                                                                        Kim, Y. E., Schimdt, E., and Emelle, L. (2008). Moodswings: a collaborative
                                                                                        game for music mood label collection, ISMIR
10/5/2012                                                                 51           10/5/2012                                                                    52




             MoodSwings: Challenges                                                          MoodSwings: MTurk version
    Needs a pair of players                                                                                                           •   Single person game
        Simulated AI player                                                                                                           •   No competition, no scores
           Randomly following the real player less challenging                                                                        •   Monetary reward
           Based on prediction model need training data                                                                                   (0.25 USD/11 pieces)
                                                                                                                                      •   Consistency check:
    Attracting players (true for all games)
                                                                                                                                          -- 2 identical pieces whose
        Must be challenging and fun
                                                                                                                                          labels must be within experts’
        Music: more recent and entertaining                                                                                               decision boundary
        Game interface: sleek, aesthetic                                                                                                  -- must not label all clips the
    Research values                                                                                                                       same way
        Variety of music and mood


                                                                                   Speck, J. A., Schmidt, E. M., Morton, B. G., and Kim, Y. E. (2011). A comparative study
 B. G. Morton, J. A. Speck, E. M. Schmidt, and Y. E. Kim (2010). Improving music   of collaborative vs. traditional music mood annotation, ISMIR
 emotion labeling using human computation,” in HCOMP
10/5/2012                                                                   53         10/5/2012                                                                    54




                                                                                                                                                                             9
10/5/2012




  MoodSwings: 2 version Comparison                                                                     Subject Annotation
                                                                                        Do not require music expertise
                                                                                            Easier to recruit than experts
                                                                                            Arguably more authentic to MIR situations
                                      Label                                             Can be trained for annotation task
                                      Corr.                                                 Higher data quality than MTurk
                                      V: 0.71                                               Still needs verification/evaluation
                                      A: 0.85
                                                                                         Often with payments
                                                                                            Rates much higher than MTurk


Speck, J. A., Schmidt, E. M., Morton, B. G., and Kim, Y. E. (2011). A comparative
study of collaborative vs. traditional music mood annotation, ISMIR
  10/5/2012                                                                     55   10/5/2012     Image Copyright © www.allaboutaddiction.com   56




                                                                                       MIREX Mood Tag Classification
Derive Annotations from online services

                                                      Harness the power of
                                                      Music 2.0
                                                      Based on editorial
                                                      labels and noisy user
                                                      tags
                                                      e.g., the MSD
                                                      e.g., MIREX Audio
                                                      Tag Classification
                                                      mood dataset
   Music 2.0 Logo by Rocketsurgeon
  10/5/2012                                                                     57   10/5/2012                                                   58




  MIREX Mood Tag Classification Dataset:                                              MIREX Mood Tag Classification Dataset:
   Positive Examples in Each Category                                                           An Example
      Based on the top 100 tags provided by last.fm API


                                                                   Select songs
                                                                   tagged
                                                                   heavily with
                                                                   terms in a
                                                                   category




10/5/2012                                                                       59   10/5/2012                                                   60




                                                                                                                                                       10
10/5/2012




                                                                              Cross-Cultural Issue in Annotation
  Annotation Derived from Music 2.0
                                                                            A survey of 30 clips on Americans and Chinese
   PROS                                  CONS
      Grounded on real-life              • Need mood-related              C1: passionate
                                           social tags                    C2: cheerful
     usage                                                                C3: bittersweet
      Larger dataset,                    • Need clever ways to            C4: humorous
     supporting multi-                     filter out noise               C5: aggressive
     label                               • May be culturally
                                           dependent                      Got to get you into
      No manual                                                           my life by The
     annotation required                                                  Beatles


                                                                          Hu, X. & Lee, J. H. (2012). A Cross-cultural Study of Music Mood Perception
  10/5/2012                                                          61
                                                                          between American and Chinese Listeners, ISMIR (PS3 – Thursday!)
                                                                           10/5/2012                                                                 62




                Summary on Annotation                                                                   Agenda
                                                                               Grand challenges on music affect
                                                                               Music affect taxonomy and annotation
                                                                               Automatic Music affect analysis
                                                                                   Categorical approach
                                                                                   Multimodal approach
                                                                                   Dimensional approach
              Expert annotation for small datasets                                 Temporal approach
              Crowdsourcing with careful designs                               Beyond music
              Music 2.0 for super size datasets                                Conclusion
              ??
  10/5/2012                                                          63    10/5/2012                                                                 64




                                                                                       Categorical and Multimodal
                Automatic Approaches                                                          Approaches
      Categorical vs. Dimensional
                    Pros                   Cons                                Classification problem and framework
Categorical         • Intuitive            • Term are ambiguous                Audio features and classification models
                    • Natural language     • Difficult to offer fine-          Existing experiments
                                             grained differentiation
                                                                               Multimodal classification
Dimensional         • Continuous           • Less intuitive                    Cross-cultural classification
                      affective scales     • Difficult to annotate
                    • Good user
                      interface

  10/5/2012                                                          65    10/5/2012                                                                 66




                                                                                                                                                           11
10/5/2012




                                                                                        A Framework for Multimodal Mood Classification
                Automatic Classification
                 (supervised learning)                                                     Textual           Social tags        Lyrics            MP3s
                                                                                                                                                              …
                                                                                                                                                                    Audio

                                           Classifier                                                                 Dataset Construction
   “Here comes the sun”
             Happy                                                                        Feature                                                                  Feature
    “ I will be back” ->                                                        Happy                    linguistic    stylistic …       tempo     timbral …
                                                                                         Extraction                                                               Extraction
             Sad
      “Down with the                                                            Angry   Feature                Feature
    sickness” Angry                                                                     Generation and         Selection F-score     language
                                                        Prediction                                                                   modeling        PCA
     Song X Happy             Training                                          Sad     Selection                                                             …
        Song Y Sad
            ………                                                                         Classification and
                                                 Testing
  Training examples                                                                     Multimodal             Classification                feature late Hybrid
                                                                                        Combination                                          concate fusion methods
                                                                                                                   SVM     KNN …              nation        …

                                                                 New examples           Evaluation and
                                                                                        Analysis                performance       learning         feature
                                                                                                                comparison         curves        comparison
  10/5/2012                                                                       67      10/5/2012                                                           …            68




                           Audio Features                                                                Classification Models
      Type                      Description                           Tool
     Energy
                 The mean and standard deviation of root         Marsyas,                     Generic supervised learning algorithms
                          mean square energy                    MIR Toolbox                       neural network, k-nearest neighbor (k-NN), maximum likelihood,
                                                                MIR Toolbox                       decision tree, support vector machine (SVM), Gaussian mixture
     Rhythm           Fluctuation pattern and tempo
                                                                 PsySound                         models (GMM), Neural Network, etc.
                   Pitch class profile, the intensity of 12     MIR Toolbox                   Tools: generic machine learning packages
       Pitch         semitones of the musical octave in          PsySound
                                                                                                  Weka, RapidMiner, LibSVM, SVMLight
                         Western twelve-tone scale
                Key clarity, musical mode (major/minor),        MIR Toolbox                   SVM seems superior
      Tonal
                and harmonic change (e.g., chord change)
                 The mean and standard deviation of the          Marsyas,
     Timbre       first 13 MFCCs, delta MFCCs, and delta        MIR Toolbox
                                delta MFCCs
                 perceptual loudness, volume, sharpness
     Psycho-     (dull/sharp), timbre width (flat/rough),
                                                                     PsySound
     acoustic          spectral and tonal dissonance
                      (dissonant/consonant) of music
                                                                                                                MIREX AMC 2007 Results
  10/5/2012                                                                       69      10/5/2012                                                                        70




         Audio signal’s “glass-ceiling”                                                               Multimodal Classification
      Aucouturier & Pachet (2004)                                                                            Social Tags                     Metadata
       “Semantic Gap” between low-Level music feature
          and high-level human perception
                                                                                            Bischoff et al.
      MIREX AMC performance (5 classes)                                                                                     MUSIC                        Schuller et al.
                                                                                                2009
                                                                                                                                                             2011
        Year                Top 3 accuracies
        2007               61.50%, 60.50%, 59.67%                                                                                                 Lyrics
        2008               63.67%, 58.20%, 56.00%
                                                                                                         Audio
        2009               65.67%, 65.50%, 63.67%
        2010               63.83%, 63.50%, 63.17%
                                                                                                                       Yang & Lee, 2004
        2011               69.50%, 67.17%, 66.67%                                                                      Laurie et al, 2009
        2012               67.83%, 67.67%, 67.17%                                                                     Hu & Downie, 2010

Aucouturier, J-J., & Pachet, F. (2004), Improving timbre similarity: How high is the     Improving classification performance by combining
sky? Journal of Negative. Results in Speech and Audio Sciences, 1 (1).
   10/5/2012                                                                  71          10/5/2012 multiple independent sources        72




                                                                                                                                                                                 12
10/5/2012




                         Lyric Features                                                           Lyric Feature Example
   Basic features:
        Content words, part-of-speech, function          ANEW examples                       Top General Inquire (GI) features in category “Aggressive”
        words
                                                           Vale   Aro    Domi      GI Feature                     Description                         Example
   Lexicon features:                                       nce    usal   nance
        Words in WordNet-Affect                                                     WlbPhys      words connoting the physical aspects of well    blood, dead, drunk,
                                                  Happy    8.21   6.49   6.63
                                                                                                 being, including its absence                            pain
   Psycholinguistic features:                     Sad      1.61   4.13   3.45       Perceiv      words referring to the perceptual process of   dazzle, fantasy, hear,
        Psychological categories in GI (General   Thrill   8.05   8.02   6.54
        Inquirer)                                                                                recognizing or identifying something by        look, make, tell, view
                                                  Kiss     8.26   7.32   6.93                    means of the senses
        Scores in ANEW (Affective Norm of
        English Words)                            Dead     1.94   5.73   2.84        Exert       action words                                   hit, kick, drag, upset
   Stylistic features:                            Dream    6.73   4.53   5.53
                                                                                     TIME        words indicating time                          noon, night, midnight
        Punctuation marks; interjection words     Angry    2.85   7.17   5.55
        Statistics: e.g., how many words per      Fear     2.76   6.96   3.22        COLL        words referring to all human collectivities     people, gang, party
        minute
                                                                                    WlbLoss      words related to a loss in a state of well     burn, die, hurt, mad
Hu, X. & Downie, J. S. (2010) Improving Mood Classification in Music Digital                     being, including being upset
 10/5/2012                                                                73        10/5/2012                                                                    74
Libraries by Combining Lyrics and Audio, JCDL




                                                                                                Distribution of feature “!”




    Lyric                                                         No significant
                                                                    difference
Classification                                                     between top
                                                                  combinations
   Results
 10/5/2012                                                                 75       10/5/2012                                                                    76




        Distribution of feature “hey”                                                  “number of words per minute”




 10/5/2012                                                                 77       10/5/2012                                                                    78




                                                                                                                                                                         13
10/5/2012




Combine with Audio-based Classifier                                                  Hybrid Methods
                                                                         – Late fusion             Lyric Classifier
   A leading system in MIREX AMC 2007 and 2008:
   Marsyas                                                         Dominate due
                                                                                                                      Prediction
                                                                   to clarity and
       Music Analysis, Retrieval and Synthesis for Audio Signals                                                                      Final
                                                                   the avoidance
       led by Prof. Tzanetakis at University of Victoria                                                                              Prediction
                                                                   of “curse of                                       Prediction
       Uses audio spectral features                                dimensionality”
                                                                                                  Audio Classifier
       marsyas.info
       Finalist in the Sourceforge Community Choice Awards 2009          – Feature concatenation
                                                                           (early fusion)
                                                                                                                 Classifier

                                                                                                                                     Prediction

10/5/2012                                                   79      10/5/2012                                                               80




                                                                                         Effectiveness
                                                                                                                                   Audio




                                                                                 Hybrid
                                                                                 (late         Hybrid            Lyrics
                                                                                 fusion)       (early
                                                                                               fusion)




10/5/2012                                                   81      10/5/2012                                                               82




                                                                                      Audio vs. Lyrics
                 Learning Curves




                                                                                Hu & Downie (2010) When Lyrics Outperform Audio for
10/5/2012                                                   83      10/5/2012                                                               84
                                                                                Music Mood Classification: A Feature Analysis, ISMIR




                                                                                                                                                   14
10/5/2012




                   Top Lyric Features                                             Top Lyric Features in “Calm”




 10/5/2012                                                            85     10/5/2012                                                   86




                                                                                     Other Textual Features used in
                                                                                      Music Mood Classification
Top Affective
   Words                                                                       Based on SentiWordNet
                                                                                    assigns to each synset of WordNet three sentiment
                                                                                    scores: positivity, negativity, objectivity
                                                                               Simple Syntactic Structures
                                                                                    Negation, modifier
             vs.                                                               Lyric rhyme patterns (inspired by poems)
                                                                               Contextual features (Beyond lyrics)
                                                                                    Social tags, blogs, playlists, etc.



 10/5/2012                                                            87     10/5/2012                                                   88




                                                                                         Summary of Categorical and
    Cross-cultural Mood Classification
                                                                                           Multimodal Approaches
     Tomorrow, Oral Session 1                                                   Natural language labels are intuitive to end users
                                                  Cross cultural                Based on supervised learning techniques
                                                  model                             Studies mostly focusing on Feature Engineering
                                                  applicability:                Multimodal approaches improve performances
                                                  -23 mood categories
                                                  based on                          Effectiveness and Efficiency
                                                  AllMusic.com                  Cross-cultural mood classification: just started
                                                  - Train on songs in           Challenges
                                                  one culture and
                                                  classify songs in the             Ambiguity inherent in terms (Meyer’s “distortion”)
                                                  other                             Hierarchy of mood categories
                                                                                    Connections between features and mood categories
Yang & Hu (2012) Cross-cultural Music Mood Classification: A Comparison on
 10/5/2012                                                             89    10/5/2012                                                   90
English and Chinese Songs, ISMIR




                                                                                                                                               15
10/5/2012




                              Agenda                                                                 Dimensional Approach
     Grand challenges on music affect
     Music affect taxonomy and annotation                                                   What is and why dimensional model
     Automatic Music affect analysis                                                        Computational model for dimensional music
         Categorical approach                                                               emotion recognition
         Multimodal approach                                                                Issues
         Dimensional approach                                                                   Difficulty of emotion rating
         Temporal approach                                                                      Subjectivity of emotion perception
     Beyond music                                                                               Context of music listening
                                                                                                Usability of UI
     Conclusion
 10/5/2012                                                                     91       10/5/2012                                                              92




Categorical Approach                                                                    Dimensional Approach
     Audio spectrum                                                                         Audio spectrum




                                                                                                                                                   Circumplex model
                                                                 Hevner’ model (1936)                                                              (Russell 1980)




 10/5/2012                                                                     93       10/5/2012                                                              94




                                                                                        The Valence-Arousal (VA) Emotion Model
  What is the Dimensional Model
                                                                                                                         ○ Energy or neurophysiological
     Alternative conceptualization of                                                               Activation‒Arousal     stimulation level
     emotions based on their placement
     along broad affective dimensions

     It is obtained by analyzing
     “similarity ratings” of emotion words
     or facial expression by factor analysis                                                                                               Evaluation‒Valence
     or multi-dimensional scaling                                                                                                            ○ Pleasantness
                                                                                                                                             ○ Positive and
         For example, Russell (1980) asked                                                                                                     negative affective
         343 subjects to describe their emotional states using                                                                                 states
         28 emotion words and use four different methods to
         analyze the correlation between the emotion ratings
         Many studies identifies similar dimensions
                                                                                                                                 [psp80]
 10/5/2012                                                                     95       10/5/2012                                                              96




                                                                                                                                                                      16
10/5/2012




  More Dimensions                                                                            Why the Dimensional Model 1/3
     The world of emotions is not 2D                                                             Free of emotion words
     (Fontaine et al., 2007)
            3rd dimension: potency‒control                                                           Emotion words are not always precise and consistent
                Feeling of power/weakness;                                                                 We often cannot find proper words to express our feelings
                dominance/submission                                                                       Different people have different understandings to the words
                Anger ↔ fear                                                                               Emotion words are difficult to translate and might not exist with
                Pride ↔ shame                                                                              the exact same meaning in different languages (Russell 1991)
                Interest ↔ disappointment                                                            Semantic overlap between emotion categories
            4th dimension: predictability                                                                  Cheerful, happy, joyous, party/celebratory
                Surprise                                                                                   Melancholy, gloomy, sad, sorrowful
                Stress↔ fear
                                                                                                     Difficult to determine how many and what categories to
                Contempt ↔ disgust
                                                                                                     be used in a mood classification system
            However, 2D model seems to
            work fine for music emotion
 10/5/2012                                                                              97   10/5/2012                                                                       98




No Consensus on Mood Taxonomy in MIR                                                                 Why the Dimensional Model 2/3
Work                        #   Emotion description                                                                                                         Emotion changes
Katayose et al [icpr98]     4 Gloomy, urbane, pathetic, serious                                                                                             as time unfolds
                                                                                             Reliable and economical model
Feng et al [sigir03]        4 Happy, angry, fear, sad
                                                                                                 Only two variables (valence, arousal),
Li et al [ismir03],           Happy, light, graceful, dreamy, longing, dark, sacred,
Wieczorkowska              13 dramatic, agitated, frustrated, mysterious, passionate,            instead of tens or hundreds of mood tags
et al [imtci04]               bluesy                                                             Easy to compare the performance
Wang et al [icsp04]         6 Joyous, robust, restless, lyrical, sober, gloomy                   of different systems
Tolos et al [ccnc05]        3 Happy, aggressive, melancholic+calm
Lu et al [taslp06]          4 Exuberant, anxious/frantic, depressed, content
                                                                                             Suitable for continuous measurements                                 arousal
Yang et al [mm06]           4 Happy, angry, sad, relaxed                                         Emotions may change over time                     very angry

Skowronek et al                 Arousing, angry, calming, carefree, cheerful, emo-                                                                        angry
                           12
[ismir07]                       tional, loving, peaceful, powerful, sad, restless, tender    Emotion intensity
                                                                                                                                                            neutral
                                Happy, light, easy, touching, sad, sublime,                      More precise and intuitive than              valence
Wu et al [mmm08]            8
                                grand, exciting
                                                                                                 emotion words
Hu et al [ismir08]          5 Passionate, cheerful, bittersweet, witty, aggressive
Trohidis et al [ismir08]
 10/5/2012                  6 Surprised, happy, relaxed, quiet, sad, angry              99   10/5/2012                                                                      100




     Why the Dimensional Model 3/3                                                            Mapping Songs to the VA Space
     Ready canvas for user                                                                   Assumption
     interaction                                                                                  View the VA space as a
            Emotion-based retrieval                   Song collection navigation                  continuous, Euclidean space
                                                                                                  View each point as an
                                                                                                  emotional state
                                                                                                                                     (valence, arousal)
                                                                                             Goal
                                                                                                  Given a short music clip
                                                                                                  (e.g., 10 to 30 seconds)
                                                                                                  Automatically compute a pair of
                                                                                                  valence and arousal (VA) values
                                                                                                  that best quantify (summarize)
                                                                                                  the expressed emotion of the overall clip
                                                                                                         The research on time-dependent second-by-second emotion
                                                                                                         recognition (emotion tracking) will be introduced in the next
                                                  Three dimensions are used:                             session
 10/5/2012                                        valence, arousal, synthetic/acoustic
                                                                                     101     10/5/2012                                                                      102




                                                                                                                                                                                  17
10/5/2012




 How to Predict Emotion Values 1/3                                                       How to Predict Emotion Values 2/3
                                                                                            Sol (B): by further exploiting the “geographic information”
    Sol (A): by dividing the emotion space into several                                     (Yang et al., 2006)
    mood classes                                                                                                                           1
                                                                                               For example, perform
        For example, into 16 classes                                                           binary classification                      0.5

    Pros                                                                                       for each quadrant                           0
                                                                                               Apply arithmetic operations                      class 1 class 2 class 3 class 4
        Standard classification problem
        y = f(x),                                                                              to the probability estimates
        x is a feature vector,                                                                      Valence = u1 + u4 – u2 – u3
        y is a discrete label (1‒16)
                                                                                                    Arousal = u1 + u2 – u3 – u4
    Cons                                                                                                       (u denotes likelihood)
        Poor granularity of the                                                             Pros
        emotion space                                                                           Easy to compute
        (not really VA values)
                                                                                            Cons
                                                       Moody by Crayonroom                      Lack theoretical foundation
10/5/2012                                                                         103   10/5/2012                                                                                 104




 How to Predict Emotion Values 3/3                                                              Linear Regression: Example
    Sol (C): by means of regression (Yang et al., 2007, 2008;
    MacDorman et al., 2007; Eerola et al., 2009)                                            Linear regression
       Given features, predict a numerical value                                                f(x) = wTx +b
                                                                                                Possible (hypothesized) w for valence and arousal
       One for valence, one for arousal
        yv = fv (x),       x is a feature vector,                                                         loudness         tempo      pitch level     harmony           mode
        ya = fa (x),       yv and ya are both numerical values                                              (loud/         (fast/        (high/      (consonant        (major/
                                                                                                             soft)         slow)          low)       /dissonant)        miner)
    Pros
                                                                                             valence          0              0             0                1              1
        Regression analysis is theoretical sound and well-developed
                                                                                             arousal          1              1             1                0              0
        Many off-the-shelf good regression algorithms
                                                                                                positive valence = consonant harmony & major mode
    Cons
                                                                                                high arousal = loud loudness & fast tempo & high pitch
        Require ground truth “emotion values”
        Need to ask human subject to “rate” the emotion values of songs                     Nonlinear regression functions can also be used

10/5/2012                                                                         105   10/5/2012                                                                                 106




        Computational Framework                                                                        Feature Extraction: Get x
                                                                                           Extractor              Language Features
    Emotion annotation: obtain y for training data
                                                                                                                                 MFCC, LPCC, spectral properties (centroid,
                                                                                           Marsyas-0.2            C
    Feature extraction: obtain x                                                                                                 moment, flatness, crest factor)
    Regression model training: obtain w                                                                                          Spectral features, rhythm features, pitch, key
                                                                                           MIR toolbox            Matlab
                                                                                                                                 clarity, harmonic change, mode
    Automatic prediction: obtain y for test data                                                                                 MFCC, spectral histogram, periodic
                                                                                           MA toolbox             Matlab
                                                                                                                                 histogram, fluctuation pattern
                       y
                                                                   w                                                             Psychoacoustic model –based features
                              Emotion        Emotion                                       PsySound               Matlab         (loudness, sharpness, roughness, virtual
                             annotation       value                                                                              pitch, volume, timbre width, dissonance)
       Training                                        Regressor
         data                                           training                           Rhythm pattern
                              Feature                                                                             Matlab         Rhythm pattern, beat histogram, tempo
                                             Feature                                       extractor
                             extraction
                                                            Regressor
                       x                                                                   EchoNest API           Python         Timbre, pitch, loudness, key, mode, tempo
        Test                  Feature        Feature Automatic          Emotion
        data                 extraction              Prediction          value             MPEG-7 audio                          Spectral properties, harmonic ratio, noise
                                                                                                                  Java
                       x                                                          y        encoder                               level, fundamental frequency type
10/5/2012                                                                         107   10/5/2012                                                                                 108




                                                                                                                                                                                        18
10/5/2012




                        Relevant Features                                                   Example Matlab Code for Extracting MFCC
                                                       [Gomez and Danuser, 2007]
                                                                                                     Using the MA Toolbox



      Sound intensity                  Tempo                        Rhythm




                                                                                                                                                       DC value

                                          major



            Pitch range                                                                                                      Take mean &
                                        Mode                     Consonance                                                  STD along time
                                                                                                                                                  we take 20 coefficients

10/5/2012                                                                        109      10/5/2012                                                                       110




            Emotion Annotation: Get y                                                                           Example System
    Rate the VA values of each song                                                           Data set (Yang et al., 2008)
        Ordinal rating scale                                                                      195 pop songs (Chinese, Japanese, and English)
        Scroll bar                                                                                Each song is rated by 10+ subjects
                                              Only need to annotate the y for training            Ground truth is set by averaging
                                                   data, the y for the test data can be
                                           automatically predicted by our regression              Use Marsyas and PsySound to extract features
                    y
                                                                                 model
                           Emotion     Emotion               w                                Model learning (get w)
                          annotatio     value
       Training              n                  Regressor                                         Linear regression
         data                                    training
                           Feature     Feature
                                                                                                  Adaboost.RT (nonlinear)
                          extraction
                    x
                                                      Regressor                                   Support vector regression (SVR)(nonlinear)
       Test                Feature     Feature    Automatic        Emotion
       data               extraction              Prediction        value
                                                                                          Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H.-H. Chen (2008) A regression approach to
                    x                                                    y
10/5/2012                                                                        111      music emotion recognition, IEEE TASLP 16(2)
                                                                                           10/5/2012                                                                  112




               Performance Evaluation                                                                       Quantitative Result
    Evaluation metric                                                                      Method                                          R2 of valence    R2 of arousal
                                                                                           Linear regression                                   0.109              0.568
        R 2 statistics
                                                                                           Adaboost.RT [ijcnn04]                               0.117              0.553
              Squared correlation between estimate and ground
                                                                                           SVR (support vector regression) [sc04]              0.222              0.570
              truth
                                                                                           SVR + RReliefF (feature selection) [ml03]           0.254              0.609
              The higher the better
              R 2 = 1 perfectly fits                                                          Result
              R 2 = 0 random guess                                                                SVR (nonlinear) performs the best
                                                                                                  Feature selection by the algorithm RReliefF offers gain
    10-fold cross validation                                                                          Valence: 0.254
        9/10 data for training and 1/10 for testing                                                   Arousal: 0.609
                                                                                                  Valence is more difficult to model (it is more subjective)
        Repeat 20 times to get average result
                                                                                                      Valence: 0.25 – 0.35
                                                                                                      Arousal: 0.60 – 0.85
10/5/2012                                                                        113      10/5/2012                                                                       114




                                                                                                                                                                                19
Ismir2012 tutorial2
Ismir2012 tutorial2
Ismir2012 tutorial2
Ismir2012 tutorial2
Ismir2012 tutorial2
Ismir2012 tutorial2
Ismir2012 tutorial2
Ismir2012 tutorial2
Ismir2012 tutorial2

More Related Content

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Ismir2012 tutorial2

  • 1. 10/5/2012 ISMIR 2012 Tutorial 2 Speaker Music Affect Recognition: The State-of-the-art and Lessons Learned Xiao Hu, Ph.D Yi-Hsuan Eric Yang, Ph.D The University of Hong Kong Academic Sinica, Taiwan 10/5/2012 1 10/5/2012 2 Speaker The Audience Do you believe that music is powerful? Why do you think so? Have you searched for music by affect? Have you searched for other things (photos, video) by affect? Have you questioned the difference between emotion and mood? Is your research related to affect? 10/5/2012 3 10/5/2012 4 Music Affect: Music Affect: 10/5/2012 5 10/5/2012 6 1
  • 2. 10/5/2012 Music Affect: Music Affect: 10/5/2012 7 10/5/2012 8 Agenda Agenda Grand challenges on music affect Grand challenges on music affect Music affect taxonomy and annotation Music affect taxonomy and annotation Automatic music affect analysis Automatic music affect analysis Categorical approach Categorical approach Multimodal approach Multimodal approach Dimensional approach Dimensional approach Temporal approach Temporal approach Beyond music Beyond music Conclusion Conclusion 10/5/2012 9 10/5/2012 10 Emotion or Mood ? Emotion or Mood ? Mood: “relatively permanent and stable” Emotion: “temporary and evanescent” "most of the supposed [psychological] studies of emotion in music are actually concerned with mood and Leonard association." Meyer Meyer, Leonard B. (1956). Emotion and Meaning in Music. Chicago: Chicago University Press 10/5/2012 11 10/5/2012 12 2
  • 3. 10/5/2012 Expressed or Induced Which Moods? 1/2 Different websites / studies use different terms Designated/indicated/expressed by a music piece Induced/evoked/felt by a listener Both are studied in MIR Thayer’s stress-energy model gives 4 clusters Farnsworth’s 10 adjective groups Mainly differ in the ways of collecting labels “indicate how you feel when listen to the music” “indicate the mood conveyed by the music” Tellegen-Watson- Clark model 10/5/2012 13 10/5/2012 14 Which Moods ? 2/2 Sources of Music Emotion Intrinsic (structural characteristics of the music) Lack of a general theory of emotions e.g., modality -> happy vs. sad Ekman’s 6 basic emotions: What about melody? anger, joy, surprise disgust, sadness, fear Extrinsic emotion (semantic context related but outside the music) Lee et al., (2012) identified a range of factors in people’s assessment of music mood Verbalization of emotional states is Lyrics, tempo, instrumentation, genre, delivery, and even cultural often a “distortion” (Meyer, 1956) context “unspeakable feelings” Little has been known on the mapping of these factors to music mood “ a restful feeling throughout ... like one of going downstream while swimming” Lee, J. H., Hill, T., & Work, L. (2012) What does music mood mean for real 10/5/2012 15 10/5/2012 16 users? Proceedings of the iConference Let’s ask the users… (Lee et al., 2012) Data, data, data! Extremely scarce resource Annotations are time consuming Consistency is low across annotators Existent public datasets on mood: MoodSwings Turk dataset 240 30-sec clips; Arousal – Valence scores MIREX mood classification task 600 30-sec clips; in 5 mood clusters MIREX tag classification task (mood sub-task) 3,469 30-sec clips; in 18 mood-related tag groups Yang’s emotion regression dataset 193 25-sec clips; in 11 levels Arousal Valence scale 10/5/2012 17 10/5/2012 18 3
  • 4. 10/5/2012 Suboptimal Performance Newer Challenges MIREX Mood Classification (2012) Cross-cultural applicability Accuracy: 46% - 68% Existent efforts focus on Western music OS1 @ ISMIR 2012 (tomorrow): Yang & Hu: Cross-cultural Music MIREX Tag Classification mood subtask(2011) Mood Classification: A Comparison on English and Chinese Songs Personalization Ultimate solution to the subjectivity problem Contextualization Even the same person’s emotional responses change in different time, location, occasions PS1 @ ISMIR 2012 (Tomorrow) Watson & Mandryk: Modeling Musical Mood From Audio Features and Listening Context on an In- Situ Data Set 10/5/2012 19 10/5/2012 20 Summary of Challenges Agenda Grand challenges on music affect Terminology Music affect taxonomy and annotation Models and categories Automatic music affect analysis No consensus Categorical approach Sources and factors Multimodal approach No clear mapping between sources and affects Dimensional approach Data scarcity Temporal approach Suboptimal performances Newer issues Beyond music Cross-cultural, personalization, contextualization,... Conclusion 10/5/2012 21 10/5/2012 22 Music affect taxonomy and Taxonomy annotation Domain oriented controlled vocabulary Background Contain labels (metadata) What are taxonomies? Commonly used on websites Taxonomy vs. Folksonomy Pick list; browsable directory, etc. Developing music mood taxonomies Taxonomy from Editorial Labels Taxonomies from Social Tags Annotations Experts Crowdsourcing (e.g., MTurks, games) Subjects Derived from online services 10/5/2012 23 10/5/2012 24 4
  • 5. 10/5/2012 Taxonomy vs. Folksonomy Models in Music Psychology 1/2 Taxonomy Categorical Controlled, structured vocabulary Hevner’s Often require expert knowledge adjective circle Top-down and bottom up approaches e.g., (1936) Folksonomy Hevner, K. 1936. Experimental studies Uncontrolled, unstructured vocabulary of the elements of Social tags freely applied by users expression in music. American Journal of Commonality exists in large number of tags Psychology, 48 e.g., 10/5/2012 25 10/5/2012 26 Models in Music Psychology 2/2 Borrow from Psychology to MIR Dimensional Russell’s circumplex Thayer’s stress-energy model gives 4 clusters Farnsworth’s 10 adjective groups model Russell, J. A. 1980. A Grounded in music circumplex model of perception research, but affect. Journal of Personality and Social lack social context of music Psychology, 39: 1161- listening (Juslin & Laukka, 1178. 2004) Tellegen-Watson-Clark model Juslin, P. N. and Laukka, P. (2004). Expression, perception, and induction of musical 10/5/2012 27 emotions: a review and a questionnaire study of everyday listening. JNMR. 10/5/2012 28 Taxonomy Built from Editorial Labels • Editorial labels: -Given by professional editors of online repositories -Have a certain level of control - Rooted in realistic social contexts allmusic.com: “the most comprehensive music reference source on the planet” 288 mood labels created and assigned to music works 10/5/2012 29 10/5/2012 30 5
  • 6. 10/5/2012 Mood Label Clustering A Taxonomy of 5 Mood Clusters Mood labels for albums Mood labels for songs Cluster_1: passionate, rousing, confident, boisterous, rowdy Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry Cluster_5: aggressive, fiery, tense/anxious, intense, volatile, visceral C1 C2 C3 C4 C5 C4 C1 C3 C2 C5 Hu, X., & Downie, J. S. (2007). Exploring Mood Metadata: Relationships with 10/5/2012 Genre, Artist and Usage Metadata. In Proceedings of ISMIR 31 10/5/2012 32 Taxonomy from Social Tags The Method Social tags 1,586 terms in WordNet-Affect (a lexicon of affective words) Pros: “The largest music tagging – 202 evaluation terms in General Inquirer site for Western music” Users’ perspectives (“good”, “great”, “poor”, etc.) Large quantity – 135 non-affect/ ambiguous terms by experts ( “cold”, “chill”, “beat”, etc.) = 1,249 terms Cons: Non-standardized 476 terms are last.fm tags Linguistic Resources Ambiguous Human Expertise group the tags by WordNet-Affect and experts => 36 categories Hu, X. (2010). Music and Mood: Where Theory and Reality Meet. In Proceedings of the 5th iConference, (Best Student Paper). 10/5/2012 33 10/5/2012 34 2-D Mood Taxonomy Comparison to Russell’s 2-D Model 2-Dimensional Representation 10/5/2012 35 10/5/2012 36 10/5/2012 6
  • 7. 10/5/2012 Our Taxonomy Laurier et al. (2009) Taxonomy from Social Tags 1/2 Manually compiled 120 mood words from the literature VALENCE Crawled 6.8M social tags from last.fm 107 unique tags matched mood words 80 tags with more than 100 occurrences Most used Least used sad rollicking AROUSAL fun solemn melancholy rowdy happy tense Laurier et al. (2009) Music mood representations from social tags, ISMIR 10/5/2012 37 10/5/2012 38 Laurier et al. (2009) Taxonomy from Agreement between Laurier’s and Social Tags 2/2 the 5 cluster taxonomy • Used LSA to project tag-track matrix to a space of 100 dim. Based on Laurier’s 100-dimensional space • Clustering trials with varied number of clusters Intra-cluster similarity Inter-cluster dissimilarity cluster 1 cluster 2 cluster 3 cluster 4 angry sad tender happy C1 C2 C3 C4 C5 aggressive bittersweet soothing joyous C1 0 .74 .13 .20 .11 visceral sentimental sleepy bright rousing tragic tranquil cheerful C2 0 .86 .82 .88 intense depressing quiet humorous C3 0 .32 .27 confident sadness calm gay C4 0 .53 anger spooky serene amiable C5 0 +A –V -A –V -A +V +A +V Laurier et al. (2009) Music mood representations from social tags, ISMIR39 Laurier et al. (2009) Music mood representations from social tags, ISMIR 10/5/2012 10/5/2012 40 Summary on Taxonomy Mood Annotations What are taxonomies? All annotation needs three things Taxonomy vs. Folksonomy taxonomy, music, people People Developing music mood taxonomies Experts from Editorial Labels Subjects from Social Tags Crowdsourcing (e.g., MTurks, games) Derive annotations from online services 10/5/2012 41 10/5/2012 42 7
  • 8. 10/5/2012 Expert Annotation Expert Annotation: MIREX AMC The MIREX Audio Mood Classification (AMC) task 2468 judgments collected (3750 •Each expert had 250 clips 5 cluster taxonomy • 8 of 21 experts finished all planned) 1,250 tracks selected from the APM libraries assignments Each clips had 2 or 3 judgments A Web-based annotation system called E6K Avg. Cohen’s Kappa: 0.5 Dataset built from agreements among experts Agreements C1 C2 C3 C4 C5 Total Accuracy 3 of 3 judges 21 24 56 21 31 153 0.59 2 of 3 judges 41 35 18 26 14 134 0.38 2 0f 2 judges 58 61 46 73 75 313 0.54 Total 120 120 120 120 120 600 Lessons: 1. Missed judgments -> low accuracy Hu, X., Downie, J. S., Laurier, C., Bay, M., & Ehmann, A. (2008). The 2007 MIREX 2. Need more motivated annotators 10/5/2012 43 10/5/2012 44 Audio Mood Classification Task: Lessons Learned. In ISMIR. Crowdsourcing: Amazon Mechanic Turk Annotation: Amazon Mechanic Turk • Lee & Hu (2012): compare expert and MTurk annotations Human Intelligence Task • The same 1,250 music clips as in MIREX AMC (HIT) • The same 5 clusters Each HIT had 27 clips • Annotators: “Turkers” who work on human intelligent 2 duplicates for consistency tasks for very low payment check Each clips had 2 judges • Advantages of MTurk Paid 0.55 USD for 1 HIT • Plenty of labor Qualification test before • Disadvantages of MTurk proceeding to task • Quality control 186 HITs collected 100 HITs accepted Lee, J. H. & Hu, X. (2012) Generating Ground Truth for Music Mood Classification Avg. Cohen’s kappa: 0.48 Using Mechanical Turk, In Proceedings of Joint Conference on Digital Libraries 10/5/2012 45 10/5/2012 46 Comparison: Stats on Collecting Data Comparison: Agreement Rates EVALUTRON 6000 EVALUTRON 6000 Number of Judgments Collected 2 22 % of clips with % of clips with 2468 (incomplete) 2500 (complete) agreements agreements Total Time for Collecting All Judgments C1 40.2% C1 39.6% C2 60.2% C2 48.9% 38 days 19 days (+ additional in-house C3 70.5% C3 69.5% assessment) C4 39.6% C4 46.3% Cost for Collecting All Judgments C5 70.8% C5 60.0% Other 16.9% Other 21.3% $0 $60.50 Average Time Spent on Each Music Clip 10/5/2012 21.54 seconds 17.46 seconds 47 10/5/2012 48 8
  • 9. 10/5/2012 Comparison: Confusions among Confusions Shown in Russell’s Model Clusters Disagreed IN Clusters Disagreed in E6K EVALUTRON 6000 MTurk Cluster Cluster Cluster Cluster 1 & Cluster 2 20 95 5 1 2 Cluster 2 & Cluster 4 31 86 Cluster 1 & Cluster 5 13 74 ⁞ ⁞ ⁞ Cluster Cluster 3 & Cluster 4 6 27 Cluster 4 Cluster 2 & Cluster 5 1 22 Cluster 3 & Cluster 5 1 20 3 Total 253 595 10/5/2012 49 10/5/2012 50 Comparison: System Performances Crowdsourcing: Games (MIREX 2007) MoodSwings (Kim et al., 2008) EVALUTRON 6000 2-player Web-based game to collect annotations of music pieces in the arousal- valence space Time-varying annotations are collected at a rate of 1 sample per second Players “score” for agreement with their competitor Kim, Y. E., Schimdt, E., and Emelle, L. (2008). Moodswings: a collaborative game for music mood label collection, ISMIR 10/5/2012 51 10/5/2012 52 MoodSwings: Challenges MoodSwings: MTurk version Needs a pair of players • Single person game Simulated AI player • No competition, no scores Randomly following the real player less challenging • Monetary reward Based on prediction model need training data (0.25 USD/11 pieces) • Consistency check: Attracting players (true for all games) -- 2 identical pieces whose Must be challenging and fun labels must be within experts’ Music: more recent and entertaining decision boundary Game interface: sleek, aesthetic -- must not label all clips the Research values same way Variety of music and mood Speck, J. A., Schmidt, E. M., Morton, B. G., and Kim, Y. E. (2011). A comparative study B. G. Morton, J. A. Speck, E. M. Schmidt, and Y. E. Kim (2010). Improving music of collaborative vs. traditional music mood annotation, ISMIR emotion labeling using human computation,” in HCOMP 10/5/2012 53 10/5/2012 54 9
  • 10. 10/5/2012 MoodSwings: 2 version Comparison Subject Annotation Do not require music expertise Easier to recruit than experts Arguably more authentic to MIR situations Label Can be trained for annotation task Corr. Higher data quality than MTurk V: 0.71 Still needs verification/evaluation A: 0.85 Often with payments Rates much higher than MTurk Speck, J. A., Schmidt, E. M., Morton, B. G., and Kim, Y. E. (2011). A comparative study of collaborative vs. traditional music mood annotation, ISMIR 10/5/2012 55 10/5/2012 Image Copyright © www.allaboutaddiction.com 56 MIREX Mood Tag Classification Derive Annotations from online services Harness the power of Music 2.0 Based on editorial labels and noisy user tags e.g., the MSD e.g., MIREX Audio Tag Classification mood dataset Music 2.0 Logo by Rocketsurgeon 10/5/2012 57 10/5/2012 58 MIREX Mood Tag Classification Dataset: MIREX Mood Tag Classification Dataset: Positive Examples in Each Category An Example Based on the top 100 tags provided by last.fm API Select songs tagged heavily with terms in a category 10/5/2012 59 10/5/2012 60 10
  • 11. 10/5/2012 Cross-Cultural Issue in Annotation Annotation Derived from Music 2.0 A survey of 30 clips on Americans and Chinese PROS CONS Grounded on real-life • Need mood-related C1: passionate social tags C2: cheerful usage C3: bittersweet Larger dataset, • Need clever ways to C4: humorous supporting multi- filter out noise C5: aggressive label • May be culturally dependent Got to get you into No manual my life by The annotation required Beatles Hu, X. & Lee, J. H. (2012). A Cross-cultural Study of Music Mood Perception 10/5/2012 61 between American and Chinese Listeners, ISMIR (PS3 – Thursday!) 10/5/2012 62 Summary on Annotation Agenda Grand challenges on music affect Music affect taxonomy and annotation Automatic Music affect analysis Categorical approach Multimodal approach Dimensional approach Expert annotation for small datasets Temporal approach Crowdsourcing with careful designs Beyond music Music 2.0 for super size datasets Conclusion ?? 10/5/2012 63 10/5/2012 64 Categorical and Multimodal Automatic Approaches Approaches Categorical vs. Dimensional Pros Cons Classification problem and framework Categorical • Intuitive • Term are ambiguous Audio features and classification models • Natural language • Difficult to offer fine- Existing experiments grained differentiation Multimodal classification Dimensional • Continuous • Less intuitive Cross-cultural classification affective scales • Difficult to annotate • Good user interface 10/5/2012 65 10/5/2012 66 11
  • 12. 10/5/2012 A Framework for Multimodal Mood Classification Automatic Classification (supervised learning) Textual Social tags Lyrics MP3s … Audio Classifier Dataset Construction “Here comes the sun” Happy Feature Feature “ I will be back” -> Happy linguistic stylistic … tempo timbral … Extraction Extraction Sad “Down with the Angry Feature Feature sickness” Angry Generation and Selection F-score language Prediction modeling PCA Song X Happy Training Sad Selection … Song Y Sad ……… Classification and Testing Training examples Multimodal Classification feature late Hybrid Combination concate fusion methods SVM KNN … nation … New examples Evaluation and Analysis performance learning feature comparison curves comparison 10/5/2012 67 10/5/2012 … 68 Audio Features Classification Models Type Description Tool Energy The mean and standard deviation of root Marsyas, Generic supervised learning algorithms mean square energy MIR Toolbox neural network, k-nearest neighbor (k-NN), maximum likelihood, MIR Toolbox decision tree, support vector machine (SVM), Gaussian mixture Rhythm Fluctuation pattern and tempo PsySound models (GMM), Neural Network, etc. Pitch class profile, the intensity of 12 MIR Toolbox Tools: generic machine learning packages Pitch semitones of the musical octave in PsySound Weka, RapidMiner, LibSVM, SVMLight Western twelve-tone scale Key clarity, musical mode (major/minor), MIR Toolbox SVM seems superior Tonal and harmonic change (e.g., chord change) The mean and standard deviation of the Marsyas, Timbre first 13 MFCCs, delta MFCCs, and delta MIR Toolbox delta MFCCs perceptual loudness, volume, sharpness Psycho- (dull/sharp), timbre width (flat/rough), PsySound acoustic spectral and tonal dissonance (dissonant/consonant) of music MIREX AMC 2007 Results 10/5/2012 69 10/5/2012 70 Audio signal’s “glass-ceiling” Multimodal Classification Aucouturier & Pachet (2004) Social Tags Metadata “Semantic Gap” between low-Level music feature and high-level human perception Bischoff et al. MIREX AMC performance (5 classes) MUSIC Schuller et al. 2009 2011 Year Top 3 accuracies 2007 61.50%, 60.50%, 59.67% Lyrics 2008 63.67%, 58.20%, 56.00% Audio 2009 65.67%, 65.50%, 63.67% 2010 63.83%, 63.50%, 63.17% Yang & Lee, 2004 2011 69.50%, 67.17%, 66.67% Laurie et al, 2009 2012 67.83%, 67.67%, 67.17% Hu & Downie, 2010 Aucouturier, J-J., & Pachet, F. (2004), Improving timbre similarity: How high is the Improving classification performance by combining sky? Journal of Negative. Results in Speech and Audio Sciences, 1 (1). 10/5/2012 71 10/5/2012 multiple independent sources 72 12
  • 13. 10/5/2012 Lyric Features Lyric Feature Example Basic features: Content words, part-of-speech, function ANEW examples Top General Inquire (GI) features in category “Aggressive” words Vale Aro Domi GI Feature Description Example Lexicon features: nce usal nance Words in WordNet-Affect WlbPhys words connoting the physical aspects of well blood, dead, drunk, Happy 8.21 6.49 6.63 being, including its absence pain Psycholinguistic features: Sad 1.61 4.13 3.45 Perceiv words referring to the perceptual process of dazzle, fantasy, hear, Psychological categories in GI (General Thrill 8.05 8.02 6.54 Inquirer) recognizing or identifying something by look, make, tell, view Kiss 8.26 7.32 6.93 means of the senses Scores in ANEW (Affective Norm of English Words) Dead 1.94 5.73 2.84 Exert action words hit, kick, drag, upset Stylistic features: Dream 6.73 4.53 5.53 TIME words indicating time noon, night, midnight Punctuation marks; interjection words Angry 2.85 7.17 5.55 Statistics: e.g., how many words per Fear 2.76 6.96 3.22 COLL words referring to all human collectivities people, gang, party minute WlbLoss words related to a loss in a state of well burn, die, hurt, mad Hu, X. & Downie, J. S. (2010) Improving Mood Classification in Music Digital being, including being upset 10/5/2012 73 10/5/2012 74 Libraries by Combining Lyrics and Audio, JCDL Distribution of feature “!” Lyric No significant difference Classification between top combinations Results 10/5/2012 75 10/5/2012 76 Distribution of feature “hey” “number of words per minute” 10/5/2012 77 10/5/2012 78 13
  • 14. 10/5/2012 Combine with Audio-based Classifier Hybrid Methods – Late fusion Lyric Classifier A leading system in MIREX AMC 2007 and 2008: Marsyas Dominate due Prediction to clarity and Music Analysis, Retrieval and Synthesis for Audio Signals Final the avoidance led by Prof. Tzanetakis at University of Victoria Prediction of “curse of Prediction Uses audio spectral features dimensionality” Audio Classifier marsyas.info Finalist in the Sourceforge Community Choice Awards 2009 – Feature concatenation (early fusion) Classifier Prediction 10/5/2012 79 10/5/2012 80 Effectiveness Audio Hybrid (late Hybrid Lyrics fusion) (early fusion) 10/5/2012 81 10/5/2012 82 Audio vs. Lyrics Learning Curves Hu & Downie (2010) When Lyrics Outperform Audio for 10/5/2012 83 10/5/2012 84 Music Mood Classification: A Feature Analysis, ISMIR 14
  • 15. 10/5/2012 Top Lyric Features Top Lyric Features in “Calm” 10/5/2012 85 10/5/2012 86 Other Textual Features used in Music Mood Classification Top Affective Words Based on SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity Simple Syntactic Structures Negation, modifier vs. Lyric rhyme patterns (inspired by poems) Contextual features (Beyond lyrics) Social tags, blogs, playlists, etc. 10/5/2012 87 10/5/2012 88 Summary of Categorical and Cross-cultural Mood Classification Multimodal Approaches Tomorrow, Oral Session 1 Natural language labels are intuitive to end users Cross cultural Based on supervised learning techniques model Studies mostly focusing on Feature Engineering applicability: Multimodal approaches improve performances -23 mood categories based on Effectiveness and Efficiency AllMusic.com Cross-cultural mood classification: just started - Train on songs in Challenges one culture and classify songs in the Ambiguity inherent in terms (Meyer’s “distortion”) other Hierarchy of mood categories Connections between features and mood categories Yang & Hu (2012) Cross-cultural Music Mood Classification: A Comparison on 10/5/2012 89 10/5/2012 90 English and Chinese Songs, ISMIR 15
  • 16. 10/5/2012 Agenda Dimensional Approach Grand challenges on music affect Music affect taxonomy and annotation What is and why dimensional model Automatic Music affect analysis Computational model for dimensional music Categorical approach emotion recognition Multimodal approach Issues Dimensional approach Difficulty of emotion rating Temporal approach Subjectivity of emotion perception Beyond music Context of music listening Usability of UI Conclusion 10/5/2012 91 10/5/2012 92 Categorical Approach Dimensional Approach Audio spectrum Audio spectrum Circumplex model Hevner’ model (1936) (Russell 1980) 10/5/2012 93 10/5/2012 94 The Valence-Arousal (VA) Emotion Model What is the Dimensional Model ○ Energy or neurophysiological Alternative conceptualization of Activation‒Arousal stimulation level emotions based on their placement along broad affective dimensions It is obtained by analyzing “similarity ratings” of emotion words or facial expression by factor analysis Evaluation‒Valence or multi-dimensional scaling ○ Pleasantness ○ Positive and For example, Russell (1980) asked negative affective 343 subjects to describe their emotional states using states 28 emotion words and use four different methods to analyze the correlation between the emotion ratings Many studies identifies similar dimensions [psp80] 10/5/2012 95 10/5/2012 96 16
  • 17. 10/5/2012 More Dimensions Why the Dimensional Model 1/3 The world of emotions is not 2D Free of emotion words (Fontaine et al., 2007) 3rd dimension: potency‒control Emotion words are not always precise and consistent Feeling of power/weakness; We often cannot find proper words to express our feelings dominance/submission Different people have different understandings to the words Anger ↔ fear Emotion words are difficult to translate and might not exist with Pride ↔ shame the exact same meaning in different languages (Russell 1991) Interest ↔ disappointment Semantic overlap between emotion categories 4th dimension: predictability Cheerful, happy, joyous, party/celebratory Surprise Melancholy, gloomy, sad, sorrowful Stress↔ fear Difficult to determine how many and what categories to Contempt ↔ disgust be used in a mood classification system However, 2D model seems to work fine for music emotion 10/5/2012 97 10/5/2012 98 No Consensus on Mood Taxonomy in MIR Why the Dimensional Model 2/3 Work # Emotion description Emotion changes Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, serious as time unfolds Reliable and economical model Feng et al [sigir03] 4 Happy, angry, fear, sad Only two variables (valence, arousal), Li et al [ismir03], Happy, light, graceful, dreamy, longing, dark, sacred, Wieczorkowska 13 dramatic, agitated, frustrated, mysterious, passionate, instead of tens or hundreds of mood tags et al [imtci04] bluesy Easy to compare the performance Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy of different systems Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, content Suitable for continuous measurements arousal Yang et al [mm06] 4 Happy, angry, sad, relaxed Emotions may change over time very angry Skowronek et al Arousing, angry, calming, carefree, cheerful, emo- angry 12 [ismir07] tional, loving, peaceful, powerful, sad, restless, tender Emotion intensity neutral Happy, light, easy, touching, sad, sublime, More precise and intuitive than valence Wu et al [mmm08] 8 grand, exciting emotion words Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressive Trohidis et al [ismir08] 10/5/2012 6 Surprised, happy, relaxed, quiet, sad, angry 99 10/5/2012 100 Why the Dimensional Model 3/3 Mapping Songs to the VA Space Ready canvas for user Assumption interaction View the VA space as a Emotion-based retrieval Song collection navigation continuous, Euclidean space View each point as an emotional state (valence, arousal) Goal Given a short music clip (e.g., 10 to 30 seconds) Automatically compute a pair of valence and arousal (VA) values that best quantify (summarize) the expressed emotion of the overall clip The research on time-dependent second-by-second emotion recognition (emotion tracking) will be introduced in the next Three dimensions are used: session 10/5/2012 valence, arousal, synthetic/acoustic 101 10/5/2012 102 17
  • 18. 10/5/2012 How to Predict Emotion Values 1/3 How to Predict Emotion Values 2/3 Sol (B): by further exploiting the “geographic information” Sol (A): by dividing the emotion space into several (Yang et al., 2006) mood classes 1 For example, perform For example, into 16 classes binary classification 0.5 Pros for each quadrant 0 Apply arithmetic operations class 1 class 2 class 3 class 4 Standard classification problem y = f(x), to the probability estimates x is a feature vector, Valence = u1 + u4 – u2 – u3 y is a discrete label (1‒16) Arousal = u1 + u2 – u3 – u4 Cons (u denotes likelihood) Poor granularity of the Pros emotion space Easy to compute (not really VA values) Cons Moody by Crayonroom Lack theoretical foundation 10/5/2012 103 10/5/2012 104 How to Predict Emotion Values 3/3 Linear Regression: Example Sol (C): by means of regression (Yang et al., 2007, 2008; MacDorman et al., 2007; Eerola et al., 2009) Linear regression Given features, predict a numerical value f(x) = wTx +b Possible (hypothesized) w for valence and arousal One for valence, one for arousal yv = fv (x), x is a feature vector, loudness tempo pitch level harmony mode ya = fa (x), yv and ya are both numerical values (loud/ (fast/ (high/ (consonant (major/ soft) slow) low) /dissonant) miner) Pros valence 0 0 0 1 1 Regression analysis is theoretical sound and well-developed arousal 1 1 1 0 0 Many off-the-shelf good regression algorithms positive valence = consonant harmony & major mode Cons high arousal = loud loudness & fast tempo & high pitch Require ground truth “emotion values” Need to ask human subject to “rate” the emotion values of songs Nonlinear regression functions can also be used 10/5/2012 105 10/5/2012 106 Computational Framework Feature Extraction: Get x Extractor Language Features Emotion annotation: obtain y for training data MFCC, LPCC, spectral properties (centroid, Marsyas-0.2 C Feature extraction: obtain x moment, flatness, crest factor) Regression model training: obtain w Spectral features, rhythm features, pitch, key MIR toolbox Matlab clarity, harmonic change, mode Automatic prediction: obtain y for test data MFCC, spectral histogram, periodic MA toolbox Matlab histogram, fluctuation pattern y w Psychoacoustic model –based features Emotion Emotion PsySound Matlab (loudness, sharpness, roughness, virtual annotation value pitch, volume, timbre width, dissonance) Training Regressor data training Rhythm pattern Feature Matlab Rhythm pattern, beat histogram, tempo Feature extractor extraction Regressor x EchoNest API Python Timbre, pitch, loudness, key, mode, tempo Test Feature Feature Automatic Emotion data extraction Prediction value MPEG-7 audio Spectral properties, harmonic ratio, noise Java x y encoder level, fundamental frequency type 10/5/2012 107 10/5/2012 108 18
  • 19. 10/5/2012 Relevant Features Example Matlab Code for Extracting MFCC [Gomez and Danuser, 2007] Using the MA Toolbox Sound intensity Tempo Rhythm DC value major Pitch range Take mean & Mode Consonance STD along time we take 20 coefficients 10/5/2012 109 10/5/2012 110 Emotion Annotation: Get y Example System Rate the VA values of each song Data set (Yang et al., 2008) Ordinal rating scale 195 pop songs (Chinese, Japanese, and English) Scroll bar Each song is rated by 10+ subjects Only need to annotate the y for training Ground truth is set by averaging data, the y for the test data can be automatically predicted by our regression Use Marsyas and PsySound to extract features y model Emotion Emotion w Model learning (get w) annotatio value Training n Regressor Linear regression data training Feature Feature Adaboost.RT (nonlinear) extraction x Regressor Support vector regression (SVR)(nonlinear) Test Feature Feature Automatic Emotion data extraction Prediction value Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H.-H. Chen (2008) A regression approach to x y 10/5/2012 111 music emotion recognition, IEEE TASLP 16(2) 10/5/2012 112 Performance Evaluation Quantitative Result Evaluation metric Method R2 of valence R2 of arousal Linear regression 0.109 0.568 R 2 statistics Adaboost.RT [ijcnn04] 0.117 0.553 Squared correlation between estimate and ground SVR (support vector regression) [sc04] 0.222 0.570 truth SVR + RReliefF (feature selection) [ml03] 0.254 0.609 The higher the better R 2 = 1 perfectly fits Result R 2 = 0 random guess SVR (nonlinear) performs the best Feature selection by the algorithm RReliefF offers gain 10-fold cross validation Valence: 0.254 9/10 data for training and 1/10 for testing Arousal: 0.609 Valence is more difficult to model (it is more subjective) Repeat 20 times to get average result Valence: 0.25 – 0.35 Arousal: 0.60 – 0.85 10/5/2012 113 10/5/2012 114 19