Presentation of "Generating Ground Truth for Music Mood Classification Using Mechanical Turk" by Jin Ha Lee and Xiao Hu at the 12th Annual ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL).
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Generating Ground Truth for Music Mood Classification Using Mechanical Turk
1. Generating Ground Truth for
Music Mood Classification
Using Mechanical Turk
Jin Ha Lee & Xiao Hu
JCDL 2012
2. Mood: a relatively long lasting
and stable emotional state (Meyer, 1956)
Emotion?
Affect?
3. Music mood
• Recently received a lot of attention in
MIR (Music Information Retrieval) domain
• “Audio Music Mood Classification” task in
MIREX, starting in 2007
• Critical for developing MDL
Music Information Retrieval
Evaluation eXchange
4. • Evaluation is based
on ground truth
Passionate Bittersweet Bittersweet
Bittersweet
5. More is better!
However, generating ground truth
based on human input is expensive
and time consuming
6. How is it done in MIREX?
• A web-based survey system called E6K
• Invitations posted to MIREX and music-ir
mailing lists in order to recruit
volunteers
7.
8. Can we use the
CROWD
instead of
MUSIC
EXPERTS?
Is there a
better way?
9. 1. How do music mood classification results
obtained from MechanicalTurk
compare to those collected from music
experts in MIREX?
2. How different or similar are the
evaluation outcomes for MIREX
AMC task when based on ground truth
collected from MechanicalTurk vs. E6K?
14. EVALUTRON 6000
Stats on Collecting Data
AverageTime Spent on Each Music Clip
21.54 seconds 17.46 seconds
TotalTime for Collecting All Judgments
38 days
(+ additional in-house
assessment)
19 days
Cost for Collecting All Judgments
$0 $60.50
22. Conclusion
• Overall the human judgments from E6K and
MTurk showed similar patterns:
– Judgment distribution across five mood clusters
– Agreement distribution across clusters
– Confusion among clusters
• System performance rankings from E6K and
Mturk were also comparable
23. Conclusion (Cont’d.)
• However, combined ground truth from E6K
and MTurk is only about 60% the size of the
original E6K ground truth
• Mood is a highly subjective feature for
describing and organizing music
• Other means for judging the moods should be
explored (e.g., ranking)
24. Future work
• In-depth interview with users to investigate
factors affecting people’s judgments on music
mood
• More controlled study with different user
groups