More Like This: Machine Learning Approaches to Music similarity

More Like This:
Machine Learning Approaches
to Music Similarity

Brian McFee

Computer Science & Engineering
University of California, San Diego

Music discovery in days of yore...

Music discovery 2.0: the present

f
• ~20 million songs available

• Discovery is still largely human-powered

A Google for music?

• Standard text search can work with meta-data
• Can we predict meta-data from audio?
⁃ [Turnbull, 2008], [Barrington, 2011]

Query by example

• Natural, user-friendly alternative to text search

This talk

• Learning algorithms for QBE, geared toward music discovery

• We'll look at two consumption models:

Active browsing Passive listening
(search & ranking) (playlist generation)

• Evaluation derived from user behavior

Defining similarity: semantics?

Song similarity
=
tag similarity?

Defining similarity: semantics?

• Drawbacks:
- Choosing, weighting vocabulary is surprisingly difficult
- Hard to maintain quality at scale

Defining similarity: human judgements?
[M. & Lanckriet, 2009, 2011]
• Which is more similar?

Defining similarity: human judgements?
[M. & Lanckriet, 2009, 2011]
• Which is more similar?

• Drawbacks: ambiguity, subjectivity, scale

Collaborative filter similarity

• Collect listening histories for (lots of!) users

• Song similarity = portion of users in common

Collaborative filter similarity
• Collaborative filters perform well...
- ... for tagging [Kim, Tomasik, & Turnbull, 2009]
- ... and playlisting [Barrington, Oda, & Lanckriet, 2009]
- ... and recommendation (Yahoo, Last.fm, iTunes...)

• Implicit feedback requires no additional effort from users

• ... but fails on unpopular items: the cold start problem!

Learning from a collaborative filter
[M., Barrington, & Lanckriet, 2010, 2012]

1.

2.

3.

Metric learning to rank

• The goal:

Rankings in Rankings in
=
audio space CF space

[M. & Lanckriet, 2010]
• The goal:

Ranking by Target
=
(learned) distance rankings

• The goal:

Ranking by Target
=
(learned) distance rankings

• Optimize a linear transformation for ranking

Structure prediction: nearest neighbors

• Setup: database , rankings

• PSD matrix transforms features

• Order by distance from :

Structure prediction: nearest neighbors

• Setup: database , rankings

• PSD matrix transforms features

• Order by distance from :

• encodes each (query, ranking) pair

Metric learning to rank (MLR)

Score for
target ranking
> Score ranking + Prediction
other
for any
error

• Supported losses Δ:
AUC, KNN, MAP, MRR, NDCG, Prec@k

MLR solver
• Cutting-plane algorithm based on 1-slack Structural SVM
[Joachims, et al. 2009]

• Repeat until convergence:

Constraint Semi-definite
generation programming
(DP)

MLR solver


(DP)
Sequence of QPs

MLR solver


(DP)
Sequence of QPs

• Multiple kernel extensions:
[Galleguillos, M., Belongie, & Lanckriet 2011]

Audio pipeline

Audio signal

Audio pipeline

Audio signal 1. Feature Bag of ΔMFCCs
extraction

Audio pipeline

extraction

2. Vector
quantization

Codeword hist.

Audio pipeline

extraction

2. Vector
quantization

PPK Codeword hist.

3. Probability
product
kernel

Audio pipeline

Audio signal CF similarity

Supervision

PPK MLR

Features

Evaluation: CAL10K

• Last.fm collaborative filter [Celma, 2008]
- 360K users, 186K artists

• CAL10K songs [Tingle, Turnbull, & Kim, 2010]
- 5.4K songs, 2K artists (after CF matching)

Evaluation: CAL10K

• Last.fm collaborative filter [Celma, 2008]
- 360K users, 186K artists

• CAL10K songs [Tingle, Turnbull, & Kim, 2010]
- 5.4K songs, 2K artists (after CF matching)

• Evaluation:
- Split artists into train/val/test
- Target rankings: top-10 most similar train artists

Evaluation: comparison

• Gaussian mixture models + KL divergence
- 8 component, diagonal covariance GMM per song

• Auto-tags: predict 149 semantic tags from audio
[Turnbull, 2008]

• [Our method] VQ+MLR: 1024 codewords

• Expert tags: 1053 tags from Pandora
[Tingle, et al., 2009]

Similarity learning: results

GMM (KL)
Auto-tags
Auto-tags + MLR
Audio VQ
Audio VQ + MLR
Expert tags (cos)
Expert tags + MLR
0.65 0.70 0.75 0.80 0.85 0.90 0.95
AUC

Example playlists
The Ramones - Go Mental

Def Leppard - Promises
The Buzzcocks - Harmony In My Head
Los Lonely Boys - Roses
Wolfmother - Colossal
Judas Priest - Diamonds and Rust (live)

Example playlists
The Ramones - Go Mental

Def Leppard - Promises
Los Lonely Boys - Roses
Wolfmother - Colossal
Judas Priest - Diamonds and Rust (live)

Mötley Crüe - Same Ol' Situation
The Offspring - Gotta Get Away MLR
The Misfits - Skulls
AC/DC - Who Made Who (live)

Example playlists
Fats Waller - Winter Weather

Dizzy Gillespie - She's Funny That Way
Enrique Morente - Solea
Chet Atkins - In the Mood
Rachmaninov - Piano Concerto #4
Eluvium - Radio Ballet

Example playlists
Fats Waller - Winter Weather

Dizzy Gillespie - She's Funny That Way
Enrique Morente - Solea
Rachmaninov - Piano Concerto #4
Eluvium - Radio Ballet

Charlie Parker - What Is This Thing Called Love?
Bud Powell - Oblivion
Bob Wills & His Texas Playboys - Lyla Lou
Bob Wills & His Texas Playboys - Sittin' On Top of the World

Scaling up: fast retrieval

• Audio similarity search for a million songs?

• Idea: Index data with spatial trees

• 100-NN search over 900K songs:
- Brute force: 2.4s
- 50% recall: 0.14s 17x speedup
- 20% recall: 0.02s 120x speedup

Similarity learning: summary

• Collaborative filters provide user-centric music similarity

• CF similarity can be approximated by audio features

• Audio search can be done quickly at large-scale

Playlist generation

• Goal: generate a "good" song sequence
- Music auto-pilot (given context)

• Many existing algorithms, but no standard evaluation

• What makes one algorithm better than another?

Playlist evaluation 1: Human survey

• Idea: generate playlists, ask for opinions

• Impractical at large-scale:
- Huge search space
- User taste, expertise can be problematic
- Slow, expensive

• Does not facilitate rapid evaluation and optimization

Playlist evaluation 2: Information retrieval

• Idea:
- Define "good" and "bad" playlists
- Predict the next song, measure accuracy

• But what makes a bad playlist?

• Do users agree on good/bad?

A generative approach
[M. & Lanckriet, 2011b]

• Playlist algorithm = distribution over playlists

• Don't evaluate synthetic playlists

• Do evaluate the likelihood of generating real playlists

The playlist collection: AOTM-2011

• Art of the Mix
- 13 years of playlists
- ~210K playlist segments
- ~100K songs from MSD

• Top 25 playlist categories:
- Genre: Punk, Hip-hop, Reggae...
- Context: Road trip, Break-up, Sleep...
- Other: Mixed genre, Alternating DJ...

A simple playlist model

1. Start with a set of songs


2. Select a subset (e.g., jazz songs)


3. Select a song


4. Select a new subset


5. Select a new song


6. Repeat...

Connecting the dots...

• Random walk on a hypergraph
- Vertices = songs
- Edges = subsets

• Edges derived from:
- Audio clusters, tags, lyrics, era, popularity, CF
- or combinations/intersections

• Goal: optimize edge weights from example playlists

Playlist model

exp. prior edge
weights

transitions

playlists

Playlist generation: evaluation

• Setup:
- Split playlist collection into train/test
- Learn edge weights on training playlists
- Evaluate average likelihood of test playlists

• Train per category, or all together

• Compare against uniform shuffle baseline

Random walk results
ALL
Mixed Global model
Theme Category-specific
Rock-pop
Alternating DJ
Indie
Single artist
Romantic
Road trip
Punk
Depression
Break up
Narrative
Hip-hop
Sleep
Electronic
Dance-house
R&B
Country
Cover songs
Hardcore
Rock
Jazz
Folk
Reggae
Blues
0% 5% 10% 1 5% 20% 25%
Log-likelihood gain over random shuffle

Stationary model results
ALL
Mixed Global model
Theme Category-specific
Rock-pop
Alternating DJ
Indie
Single artist
Romantic
Road trip
Punk
Depression
Break up
Narrative
Hip-hop
Sleep
Electronic
Dance-house
R&B
Country
Cover songs
Hardcore
Rock
Jazz
Folk
Reggae
Blues
-15% -10% -5% 0% 5% 10% 15% 20%
Log-likelihood gain over random shuffle

Example playlists

Rhythm & Blues
70s & soul Lyn Collins - Think
Audio #14 & funk Isaac Hayes - No Name Bar
DECADE 1965 & soul Michael Jackson - My Girl

Electronic music
Audio #11 & downtempo Everything But The Girl - Blame
DECADE 1990 & trip-hop Massive Attack - Spying Glass
Audio #11 & electronica Björk - Hunter

Playlist generation summary

• Generative approach simplifies evaluation

• AOTM-2011 collection facilitates learning and evaluation

• Robust, efficient and transparent feature integration

Directions for future work

• Audio features: coding, dynamics and rhythm

• Playlist models: mixtures, long-range interactions

• UI models: interactive, context-aware, diversity

Personalized recommendation
[M., Bertin-Mahieux, Ellis, & Lanckriet, 2012]

• The Million Song Dataset Challenge

• Listening histories for 1.1M users, 380K songs

• Task: personalized song recommendation

Conclusion

• MLR can optimize distance metrics for ranking, QBE retrieval

• Audio similarity can approximate a collaborative filter

• Generative playlist model integrates data, models dynamics

• User-centric evaluation makes it all possible

Metric partial order feature

• Score is large when distances match ranking

Playlist weights: 6390 edges
ALL
Mixed
Theme
Rock-pop
Alternating DJ
Indie
Single Artist
Romantic
RoadTrip
Punk
Depression
Break Up
Narrative
Hip-hop
Sleep
Electronic music
Dance-house
Rhythm and Blues
Country
Cover
Hardcore
Rock
Jazz
Folk
Reggae
Blues
Audio CF Era Familiarity Lyrics Tags Uniform

• Audio & CF: k-means (16/64/256) • Lyrics: LDA (k=32, top-1/3/5)
• Era: year, decade, decade+5 • Tags: Last.fm top-10
• Familiarity: high/med/low • Conjunctions

More Like This: Machine Learning Approaches to Music similarity

Recomendados

Recomendados

Más contenido relacionado

Similar a More Like This: Machine Learning Approaches to Music similarity

Similar a More Like This: Machine Learning Approaches to Music similarity (20)

Último

Último (20)

More Like This: Machine Learning Approaches to Music similarity