SlideShare una empresa de Scribd logo
1 de 72
Descargar para leer sin conexión
More Like This:
Machine Learning Approaches
     to Music Similarity


           Brian McFee

Computer Science & Engineering
University of California, San Diego
Music discovery in days of yore...
Music discovery 2.0: the present



       f
• ~20 million songs available

• Discovery is still largely human-powered
A Google for music?
A Google for music?




• Standard text search can work with meta-data
• Can we predict meta-data from audio?
 ⁃ [Turnbull, 2008], [Barrington, 2011]
Query by example

• Natural, user-friendly alternative to text search
Query by example

• Natural, user-friendly alternative to text search
Query by example

• Natural, user-friendly alternative to text search
This talk

• Learning algorithms for QBE, geared toward music discovery

• We'll look at two consumption models:




         Active browsing                Passive listening
        (search & ranking)            (playlist generation)

• Evaluation derived from user behavior
Learning similarity
Defining similarity: semantics?



                              Song similarity
                                     =
                              tag similarity?
Defining similarity: semantics?




• Drawbacks:
  - Choosing, weighting vocabulary is surprisingly difficult
  - Hard to maintain quality at scale
Defining similarity: human judgements?
                           [M. & Lanckriet, 2009, 2011]
• Which is more similar?
Defining similarity: human judgements?
                                    [M. & Lanckriet, 2009, 2011]
• Which is more similar?




• Drawbacks: ambiguity, subjectivity, scale
Collaborative filter similarity




• Collect listening histories for (lots of!) users

• Song similarity = portion of users in common
Collaborative filter similarity
• Collaborative filters perform well...
 - ... for tagging [Kim, Tomasik, & Turnbull, 2009]
 - ... and playlisting [Barrington, Oda, & Lanckriet, 2009]
 - ... and recommendation (Yahoo, Last.fm, iTunes...)



• Implicit feedback requires no additional effort from users



• ... but fails on unpopular items: the cold start problem!
Learning from a collaborative filter
                [M., Barrington, & Lanckriet, 2010, 2012]




                                      1.

                                      2.

                                      3.
Learning from a collaborative filter
                [M., Barrington, & Lanckriet, 2010, 2012]




                                      1.

                                      2.

                                      3.
Learning from a collaborative filter
                [M., Barrington, & Lanckriet, 2010, 2012]




                                      1.

                                      2.

                                      3.
Metric learning to rank

• The goal:

              Rankings in       Rankings in
                            =
              audio space        CF space
Metric learning to rank
                                         [M. & Lanckriet, 2010]
• The goal:

                 Ranking by            Target
                                   =
              (learned) distance       rankings
Metric learning to rank
                                         [M. & Lanckriet, 2010]
• The goal:

                 Ranking by            Target
                                   =
              (learned) distance       rankings


• Optimize a linear transformation for ranking
Structure prediction: nearest neighbors

• Setup: database       , rankings

• PSD matrix         transforms features

• Order        by distance from       :
Structure prediction: nearest neighbors

• Setup: database        , rankings

• PSD matrix          transforms features

• Order        by distance from          :




•                   encodes each (query, ranking) pair
Metric learning to rank (MLR)




         Score for
         target ranking
                          > Score ranking + Prediction
                            other
                                  for any
                                            error

• Supported losses Δ:
              AUC, KNN, MAP, MRR, NDCG, Prec@k
MLR solver
• Cutting-plane algorithm based on 1-slack Structural SVM
 [Joachims, et al. 2009]

• Repeat until convergence:



         Constraint                     Semi-definite
         generation                     programming
            (DP)
MLR solver
• Cutting-plane algorithm based on 1-slack Structural SVM
 [Joachims, et al. 2009]

• Repeat until convergence:



         Constraint                     Semi-definite
         generation                     programming
            (DP)
                                       Sequence of QPs
MLR solver
• Cutting-plane algorithm based on 1-slack Structural SVM
 [Joachims, et al. 2009]

• Repeat until convergence:



          Constraint                       Semi-definite
          generation                       programming
             (DP)
                                          Sequence of QPs


• Multiple kernel extensions:
  [Galleguillos, M., Belongie, & Lanckriet 2011]
Audio pipeline

    Audio signal
Audio pipeline

    Audio signal   1. Feature      Bag of ΔMFCCs
                      extraction
Audio pipeline

    Audio signal   1. Feature      Bag of ΔMFCCs
                      extraction



                                       2. Vector
                                          quantization

                                   Codeword hist.
Audio pipeline

    Audio signal   1. Feature       Bag of ΔMFCCs
                      extraction



                                        2. Vector
                                           quantization

       PPK                          Codeword hist.


                   3. Probability
                      product
                      kernel
Audio pipeline

    Audio signal              CF similarity




                                      Supervision

       PPK                        MLR


                   Features
Evaluation: CAL10K

• Last.fm collaborative filter                    [Celma, 2008]
 - 360K users, 186K artists

• CAL10K songs                     [Tingle, Turnbull, & Kim, 2010]
  - 5.4K songs, 2K artists (after CF matching)
Evaluation: CAL10K

• Last.fm collaborative filter                    [Celma, 2008]
 - 360K users, 186K artists

• CAL10K songs                     [Tingle, Turnbull, & Kim, 2010]
  - 5.4K songs, 2K artists (after CF matching)


• Evaluation:
  - Split artists into train/val/test
 - Target rankings: top-10 most similar train artists
Evaluation: comparison

• Gaussian mixture models + KL divergence
 - 8 component, diagonal covariance GMM per song

• Auto-tags: predict 149 semantic tags from audio
  [Turnbull, 2008]


• [Our method] VQ+MLR: 1024 codewords

• Expert tags: 1053 tags from Pandora
  [Tingle, et al., 2009]
Similarity learning: results


         GMM (KL)
         Auto-tags
   Auto-tags + MLR
         Audio VQ
   Audio VQ + MLR
   Expert tags (cos)
  Expert tags + MLR
                  0.65   0.70   0.75   0.80   0.85   0.90   0.95
                                       AUC
Example playlists
 The Ramones - Go Mental

 Def Leppard - Promises
 The Buzzcocks - Harmony In My Head
 Los Lonely Boys - Roses
 Wolfmother - Colossal
 Judas Priest - Diamonds and Rust (live)
Example playlists
 The Ramones - Go Mental

 Def Leppard - Promises
 The Buzzcocks - Harmony In My Head
 Los Lonely Boys - Roses
 Wolfmother - Colossal
 Judas Priest - Diamonds and Rust (live)



 The Buzzcocks - Harmony In My Head
 Mötley Crüe - Same Ol' Situation
 The Offspring - Gotta Get Away            MLR
 The Misfits - Skulls
 AC/DC - Who Made Who (live)
Example playlists
 Fats Waller - Winter Weather

 Dizzy Gillespie - She's Funny That Way
 Enrique Morente - Solea
 Chet Atkins - In the Mood
 Rachmaninov - Piano Concerto #4
 Eluvium - Radio Ballet
Example playlists
 Fats Waller - Winter Weather

 Dizzy Gillespie - She's Funny That Way
 Enrique Morente - Solea
 Chet Atkins - In the Mood
 Rachmaninov - Piano Concerto #4
 Eluvium - Radio Ballet


 Chet Atkins - In the Mood
 Charlie Parker - What Is This Thing Called Love?
 Bud Powell - Oblivion
 Bob Wills & His Texas Playboys - Lyla Lou
 Bob Wills & His Texas Playboys - Sittin' On Top of the World
Scaling up: fast retrieval
                                            [M. & Lanckriet, 2011]

• Audio similarity search for a million songs?



• Idea: Index data with spatial trees



• 100-NN search over 900K songs:
  - Brute force:     2.4s
  - 50% recall:     0.14s 17x speedup
  - 20% recall:     0.02s 120x speedup
Similarity learning: summary

• Collaborative filters provide user-centric music similarity

• CF similarity can be approximated by audio features

• Audio search can be done quickly at large-scale
Playlist generation
Playlist generation

• Goal: generate a "good" song sequence
 - Music auto-pilot (given context)



• Many existing algorithms, but no standard evaluation



• What makes one algorithm better than another?
Playlist evaluation 1: Human survey

• Idea: generate playlists, ask for opinions



• Impractical at large-scale:
   - Huge search space
   - User taste, expertise can be problematic
   - Slow, expensive



• Does not facilitate rapid evaluation and optimization
Playlist evaluation 2: Information retrieval


• Idea:
 - Define "good" and "bad" playlists
 - Predict the next song, measure accuracy

• But what makes a bad playlist?


• Do users agree on good/bad?
A generative approach
                                           [M. & Lanckriet, 2011b]




• Playlist algorithm = distribution over playlists

• Don't evaluate synthetic playlists

• Do evaluate the likelihood of generating real playlists
The playlist collection: AOTM-2011

• Art of the Mix
 - 13 years of playlists
 - ~210K playlist segments
 - ~100K songs from MSD



• Top 25 playlist categories:
  - Genre:        Punk, Hip-hop, Reggae...
  - Context:     Road trip, Break-up, Sleep...
  - Other:       Mixed genre, Alternating DJ...
A simple playlist model




  1. Start with a set of songs
A simple playlist model




  2. Select a subset (e.g., jazz songs)
A simple playlist model




  3. Select a song
A simple playlist model




  4. Select a new subset
A simple playlist model




  4. Select a new subset
A simple playlist model




  5. Select a new song
A simple playlist model




  6. Repeat...
A simple playlist model




  6. Repeat...
Connecting the dots...


• Random walk on a hypergraph
 - Vertices = songs
 - Edges = subsets

• Edges derived from:
  - Audio clusters, tags, lyrics, era, popularity, CF
  - or combinations/intersections

• Goal: optimize edge weights from example playlists
Playlist model

             exp. prior           edge
                                weights




                          transitions

                                  playlists
Playlist generation: evaluation


• Setup:
 - Split playlist collection into train/test
 - Learn edge weights on training playlists
 - Evaluate average likelihood of test playlists


• Train per category, or all together

• Compare against uniform shuffle baseline
Random walk results
              ALL
            Mixed                                     Global model
           Theme                                      Category-specific
        Rock-pop
    Alternating DJ
             Indie
      Single artist
        Romantic
         Road trip
             Punk
       Depression
         Break up
         Narrative
          Hip-hop
            Sleep
        Electronic
    Dance-house
              R&B
          Country
      Cover songs
         Hardcore
             Rock
              Jazz
              Folk
           Reggae
             Blues
                      0%      5%      10%      1 5%        20%        25%
                       Log-likelihood gain over random shuffle
Stationary model results
              ALL
            Mixed                                    Global model
           Theme                                     Category-specific
        Rock-pop
    Alternating DJ
             Indie
      Single artist
        Romantic
         Road trip
             Punk
       Depression
         Break up
         Narrative
          Hip-hop
            Sleep
        Electronic
    Dance-house
              R&B
          Country
      Cover songs
         Hardcore
             Rock
              Jazz
              Folk
           Reggae
             Blues
                      -15%   -10%   -5%   0%   5%   10%    15%     20%
                         Log-likelihood gain over random shuffle
Example playlists

 Rhythm & Blues
  70s & soul                Lyn Collins - Think
  Audio #14 & funk          Isaac Hayes - No Name Bar
  DECADE 1965 & soul        Michael Jackson - My Girl


 Electronic music
  Audio #11 & downtempo     Everything But The Girl - Blame
  DECADE 1990 & trip-hop    Massive Attack - Spying Glass
  Audio #11 & electronica   Björk - Hunter
Playlist generation summary


• Generative approach simplifies evaluation

• AOTM-2011 collection facilitates learning and evaluation

• Robust, efficient and transparent feature integration
The future
Directions for future work



• Audio features: coding, dynamics and rhythm

• Playlist models: mixtures, long-range interactions

• UI models: interactive, context-aware, diversity
Personalized recommendation
                   [M., Bertin-Mahieux, Ellis, & Lanckriet, 2012]

• The Million Song Dataset Challenge

• Listening histories for 1.1M users, 380K songs

• Task: personalized song recommendation
Conclusion


• MLR can optimize distance metrics for ranking, QBE retrieval

• Audio similarity can approximate a collaborative filter

• Generative playlist model integrates data, models dynamics


• User-centric evaluation makes it all possible
Thanks!
Metric partial order feature




 • Score is large when distances match ranking
Playlist weights: 6390 edges
              ALL
            Mixed
           Theme
        Rock-pop
    Alternating DJ
             Indie
     Single Artist
        Romantic
         RoadTrip
             Punk
       Depression
         Break Up
         Narrative
         Hip-hop
            Sleep
 Electronic music
    Dance-house
Rhythm and Blues
          Country
            Cover
         Hardcore
             Rock
              Jazz
              Folk
           Reggae
             Blues
                     Audio   CF   Era   Familiarity Lyrics   Tags   Uniform

 • Audio & CF: k-means (16/64/256)       • Lyrics: LDA (k=32, top-1/3/5)
 • Era: year, decade, decade+5           • Tags: Last.fm top-10
 • Familiarity: high/med/low             • Conjunctions

Más contenido relacionado

Similar a More Like This: Machine Learning Approaches to Music similarity

[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova musicNAVER D2
 
Actions speak louder than words: Analyzing large-scale query logs to improve ...
Actions speak louder than words: Analyzing large-scale query logs to improve ...Actions speak louder than words: Analyzing large-scale query logs to improve ...
Actions speak louder than words: Analyzing large-scale query logs to improve ...Raman Chandrasekar
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative FilteringYONG ZHENG
 
Media Sharing on Urban Transport
Media Sharing on Urban TransportMedia Sharing on Urban Transport
Media Sharing on Urban TransportUCL-CS MobiSys
 
Multimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question AnsweringMultimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question AnsweringSWAMI06
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
 
2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System Linkers2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System LinkersChing-Yi Chen
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicEric Battenberg
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMLconf
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Vienna Data Science Group
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBMihnea Giurgea
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...multimediaeval
 
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Yun-Nung (Vivian) Chen
 
Dstc6 an introduction
Dstc6 an introductionDstc6 an introduction
Dstc6 an introductionhkh
 
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...NUGU developers
 

Similar a More Like This: Machine Learning Approaches to Music similarity (20)

[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
Actions speak louder than words: Analyzing large-scale query logs to improve ...
Actions speak louder than words: Analyzing large-scale query logs to improve ...Actions speak louder than words: Analyzing large-scale query logs to improve ...
Actions speak louder than words: Analyzing large-scale query logs to improve ...
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Media Sharing on Urban Transport
Media Sharing on Urban TransportMedia Sharing on Urban Transport
Media Sharing on Urban Transport
 
Multimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question AnsweringMultimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question Answering
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System Linkers2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System Linkers
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDB
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
 
Dstc6 an introduction
Dstc6 an introductionDstc6 an introduction
Dstc6 an introduction
 
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

More Like This: Machine Learning Approaches to Music similarity

  • 1. More Like This: Machine Learning Approaches to Music Similarity Brian McFee Computer Science & Engineering University of California, San Diego
  • 2. Music discovery in days of yore...
  • 3. Music discovery 2.0: the present f • ~20 million songs available • Discovery is still largely human-powered
  • 4. A Google for music?
  • 5. A Google for music? • Standard text search can work with meta-data • Can we predict meta-data from audio? ⁃ [Turnbull, 2008], [Barrington, 2011]
  • 6. Query by example • Natural, user-friendly alternative to text search
  • 7. Query by example • Natural, user-friendly alternative to text search
  • 8. Query by example • Natural, user-friendly alternative to text search
  • 9. This talk • Learning algorithms for QBE, geared toward music discovery • We'll look at two consumption models: Active browsing Passive listening (search & ranking) (playlist generation) • Evaluation derived from user behavior
  • 11. Defining similarity: semantics? Song similarity = tag similarity?
  • 12. Defining similarity: semantics? • Drawbacks: - Choosing, weighting vocabulary is surprisingly difficult - Hard to maintain quality at scale
  • 13. Defining similarity: human judgements? [M. & Lanckriet, 2009, 2011] • Which is more similar?
  • 14. Defining similarity: human judgements? [M. & Lanckriet, 2009, 2011] • Which is more similar? • Drawbacks: ambiguity, subjectivity, scale
  • 15. Collaborative filter similarity • Collect listening histories for (lots of!) users • Song similarity = portion of users in common
  • 16. Collaborative filter similarity • Collaborative filters perform well... - ... for tagging [Kim, Tomasik, & Turnbull, 2009] - ... and playlisting [Barrington, Oda, & Lanckriet, 2009] - ... and recommendation (Yahoo, Last.fm, iTunes...) • Implicit feedback requires no additional effort from users • ... but fails on unpopular items: the cold start problem!
  • 17. Learning from a collaborative filter [M., Barrington, & Lanckriet, 2010, 2012] 1. 2. 3.
  • 18. Learning from a collaborative filter [M., Barrington, & Lanckriet, 2010, 2012] 1. 2. 3.
  • 19. Learning from a collaborative filter [M., Barrington, & Lanckriet, 2010, 2012] 1. 2. 3.
  • 20. Metric learning to rank • The goal: Rankings in Rankings in = audio space CF space
  • 21. Metric learning to rank [M. & Lanckriet, 2010] • The goal: Ranking by Target = (learned) distance rankings
  • 22. Metric learning to rank [M. & Lanckriet, 2010] • The goal: Ranking by Target = (learned) distance rankings • Optimize a linear transformation for ranking
  • 23. Structure prediction: nearest neighbors • Setup: database , rankings • PSD matrix transforms features • Order by distance from :
  • 24. Structure prediction: nearest neighbors • Setup: database , rankings • PSD matrix transforms features • Order by distance from : • encodes each (query, ranking) pair
  • 25. Metric learning to rank (MLR) Score for target ranking > Score ranking + Prediction other for any error • Supported losses Δ: AUC, KNN, MAP, MRR, NDCG, Prec@k
  • 26. MLR solver • Cutting-plane algorithm based on 1-slack Structural SVM [Joachims, et al. 2009] • Repeat until convergence: Constraint Semi-definite generation programming (DP)
  • 27. MLR solver • Cutting-plane algorithm based on 1-slack Structural SVM [Joachims, et al. 2009] • Repeat until convergence: Constraint Semi-definite generation programming (DP) Sequence of QPs
  • 28. MLR solver • Cutting-plane algorithm based on 1-slack Structural SVM [Joachims, et al. 2009] • Repeat until convergence: Constraint Semi-definite generation programming (DP) Sequence of QPs • Multiple kernel extensions: [Galleguillos, M., Belongie, & Lanckriet 2011]
  • 29. Audio pipeline Audio signal
  • 30. Audio pipeline Audio signal 1. Feature Bag of ΔMFCCs extraction
  • 31. Audio pipeline Audio signal 1. Feature Bag of ΔMFCCs extraction 2. Vector quantization Codeword hist.
  • 32. Audio pipeline Audio signal 1. Feature Bag of ΔMFCCs extraction 2. Vector quantization PPK Codeword hist. 3. Probability product kernel
  • 33. Audio pipeline Audio signal CF similarity Supervision PPK MLR Features
  • 34. Evaluation: CAL10K • Last.fm collaborative filter [Celma, 2008] - 360K users, 186K artists • CAL10K songs [Tingle, Turnbull, & Kim, 2010] - 5.4K songs, 2K artists (after CF matching)
  • 35. Evaluation: CAL10K • Last.fm collaborative filter [Celma, 2008] - 360K users, 186K artists • CAL10K songs [Tingle, Turnbull, & Kim, 2010] - 5.4K songs, 2K artists (after CF matching) • Evaluation: - Split artists into train/val/test - Target rankings: top-10 most similar train artists
  • 36. Evaluation: comparison • Gaussian mixture models + KL divergence - 8 component, diagonal covariance GMM per song • Auto-tags: predict 149 semantic tags from audio [Turnbull, 2008] • [Our method] VQ+MLR: 1024 codewords • Expert tags: 1053 tags from Pandora [Tingle, et al., 2009]
  • 37. Similarity learning: results GMM (KL) Auto-tags Auto-tags + MLR Audio VQ Audio VQ + MLR Expert tags (cos) Expert tags + MLR 0.65 0.70 0.75 0.80 0.85 0.90 0.95 AUC
  • 38. Example playlists The Ramones - Go Mental Def Leppard - Promises The Buzzcocks - Harmony In My Head Los Lonely Boys - Roses Wolfmother - Colossal Judas Priest - Diamonds and Rust (live)
  • 39. Example playlists The Ramones - Go Mental Def Leppard - Promises The Buzzcocks - Harmony In My Head Los Lonely Boys - Roses Wolfmother - Colossal Judas Priest - Diamonds and Rust (live) The Buzzcocks - Harmony In My Head Mötley Crüe - Same Ol' Situation The Offspring - Gotta Get Away MLR The Misfits - Skulls AC/DC - Who Made Who (live)
  • 40. Example playlists Fats Waller - Winter Weather Dizzy Gillespie - She's Funny That Way Enrique Morente - Solea Chet Atkins - In the Mood Rachmaninov - Piano Concerto #4 Eluvium - Radio Ballet
  • 41. Example playlists Fats Waller - Winter Weather Dizzy Gillespie - She's Funny That Way Enrique Morente - Solea Chet Atkins - In the Mood Rachmaninov - Piano Concerto #4 Eluvium - Radio Ballet Chet Atkins - In the Mood Charlie Parker - What Is This Thing Called Love? Bud Powell - Oblivion Bob Wills & His Texas Playboys - Lyla Lou Bob Wills & His Texas Playboys - Sittin' On Top of the World
  • 42. Scaling up: fast retrieval [M. & Lanckriet, 2011] • Audio similarity search for a million songs? • Idea: Index data with spatial trees • 100-NN search over 900K songs: - Brute force: 2.4s - 50% recall: 0.14s 17x speedup - 20% recall: 0.02s 120x speedup
  • 43. Similarity learning: summary • Collaborative filters provide user-centric music similarity • CF similarity can be approximated by audio features • Audio search can be done quickly at large-scale
  • 45. Playlist generation • Goal: generate a "good" song sequence - Music auto-pilot (given context) • Many existing algorithms, but no standard evaluation • What makes one algorithm better than another?
  • 46. Playlist evaluation 1: Human survey • Idea: generate playlists, ask for opinions • Impractical at large-scale: - Huge search space - User taste, expertise can be problematic - Slow, expensive • Does not facilitate rapid evaluation and optimization
  • 47. Playlist evaluation 2: Information retrieval • Idea: - Define "good" and "bad" playlists - Predict the next song, measure accuracy • But what makes a bad playlist? • Do users agree on good/bad?
  • 48. A generative approach [M. & Lanckriet, 2011b] • Playlist algorithm = distribution over playlists • Don't evaluate synthetic playlists • Do evaluate the likelihood of generating real playlists
  • 49. The playlist collection: AOTM-2011 • Art of the Mix - 13 years of playlists - ~210K playlist segments - ~100K songs from MSD • Top 25 playlist categories: - Genre: Punk, Hip-hop, Reggae... - Context: Road trip, Break-up, Sleep... - Other: Mixed genre, Alternating DJ...
  • 50. A simple playlist model 1. Start with a set of songs
  • 51. A simple playlist model 2. Select a subset (e.g., jazz songs)
  • 52. A simple playlist model 3. Select a song
  • 53. A simple playlist model 4. Select a new subset
  • 54. A simple playlist model 4. Select a new subset
  • 55. A simple playlist model 5. Select a new song
  • 56. A simple playlist model 6. Repeat...
  • 57. A simple playlist model 6. Repeat...
  • 58. Connecting the dots... • Random walk on a hypergraph - Vertices = songs - Edges = subsets • Edges derived from: - Audio clusters, tags, lyrics, era, popularity, CF - or combinations/intersections • Goal: optimize edge weights from example playlists
  • 59. Playlist model exp. prior edge weights transitions playlists
  • 60. Playlist generation: evaluation • Setup: - Split playlist collection into train/test - Learn edge weights on training playlists - Evaluate average likelihood of test playlists • Train per category, or all together • Compare against uniform shuffle baseline
  • 61. Random walk results ALL Mixed Global model Theme Category-specific Rock-pop Alternating DJ Indie Single artist Romantic Road trip Punk Depression Break up Narrative Hip-hop Sleep Electronic Dance-house R&B Country Cover songs Hardcore Rock Jazz Folk Reggae Blues 0% 5% 10% 1 5% 20% 25% Log-likelihood gain over random shuffle
  • 62. Stationary model results ALL Mixed Global model Theme Category-specific Rock-pop Alternating DJ Indie Single artist Romantic Road trip Punk Depression Break up Narrative Hip-hop Sleep Electronic Dance-house R&B Country Cover songs Hardcore Rock Jazz Folk Reggae Blues -15% -10% -5% 0% 5% 10% 15% 20% Log-likelihood gain over random shuffle
  • 63. Example playlists Rhythm & Blues 70s & soul Lyn Collins - Think Audio #14 & funk Isaac Hayes - No Name Bar DECADE 1965 & soul Michael Jackson - My Girl Electronic music Audio #11 & downtempo Everything But The Girl - Blame DECADE 1990 & trip-hop Massive Attack - Spying Glass Audio #11 & electronica Björk - Hunter
  • 64. Playlist generation summary • Generative approach simplifies evaluation • AOTM-2011 collection facilitates learning and evaluation • Robust, efficient and transparent feature integration
  • 66. Directions for future work • Audio features: coding, dynamics and rhythm • Playlist models: mixtures, long-range interactions • UI models: interactive, context-aware, diversity
  • 67. Personalized recommendation [M., Bertin-Mahieux, Ellis, & Lanckriet, 2012] • The Million Song Dataset Challenge • Listening histories for 1.1M users, 380K songs • Task: personalized song recommendation
  • 68. Conclusion • MLR can optimize distance metrics for ranking, QBE retrieval • Audio similarity can approximate a collaborative filter • Generative playlist model integrates data, models dynamics • User-centric evaluation makes it all possible
  • 70.
  • 71. Metric partial order feature • Score is large when distances match ranking
  • 72. Playlist weights: 6390 edges ALL Mixed Theme Rock-pop Alternating DJ Indie Single Artist Romantic RoadTrip Punk Depression Break Up Narrative Hip-hop Sleep Electronic music Dance-house Rhythm and Blues Country Cover Hardcore Rock Jazz Folk Reggae Blues Audio CF Era Familiarity Lyrics Tags Uniform • Audio & CF: k-means (16/64/256) • Lyrics: LDA (k=32, top-1/3/5) • Era: year, decade, decade+5 • Tags: Last.fm top-10 • Familiarity: high/med/low • Conjunctions