Presentation slide for the paper "Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information", which is accepted as presenting team for the RecSys Challenge 2018.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information
1. Towards Seed-Free Music Playlist
Generation:
Enhancing Collaborative Filtering with Playlist Title
Information
Jaehun Kim, Minz Won, Cynthia C. S. Liem, Alan Hanjalic
1
3. First attempt : WRMF
3
● Good Old MF
○ Weighted Regularized Matrix Factorization [1]
■ Developed for implicit feedback
■ ALS* optimization : fast and reliable
■ Only 2~3 hyper parameters
*Alternating Least Square (or Coordinate Descent)
R U~= x V
7. First attempt : WRMF
7
● Good Old MF
○ Already reasonable performance
○ Except No-Seed case => Cold Start Problem
● Any metadata or content for playlist?
8. First attempt : WRMF
8
● Good Old MF
○ Already reasonable performance
○ Except No-Seed case => Cold Start Problem
● Any metadata or content for playlist?
○ Playlist titles!
10. ● Text information (implicitly) represents the playlist
● Some key statistics
Playlist Titles
10
# of titles (MPD + Challenge Set) 1,010,000
# of unique titles 93,250
# of unique titles (stemmed) 49,808
Single word ~60%
Less than two words ~92%
11. ● Text information (implicitly) represents the playlist
● Some key statistics
Playlist Titles
11
# of titles (MPD + Challenge Set) 1,010,000
# of unique titles 93,250
# of unique titles (stemmed) 49,808
Single word ~60%
Less than two words ~92%
1. Playlist titles ~= Words
12. ● Noisiness
Playlist Titles
12
Categories Examples
Special characters //Pretty Little Liars//, ** some tunes, ?!?
Repeated characters Yaaaas, summerrrr, partayyy
Shortened words Chillin, Temp, favss
Abbreviated words loml, jb, IDFK, jjjj
Symbolic expressions
Multiple languages otoño, 電台收藏, アニメ
13. ● Noisiness
Playlist Titles
13
Categories Examples
Special characters //Pretty Little Liars//, ** some tunes, ?!?
Repeated characters Yaaaas, summerrrr, partayyy
Shortened words Chillin, Temp, favss
Abbreviated words loml, jb, IDFK, jjjj
Symbolic expressions
Multiple languages otoño, 電台收藏, アニメ
2. Standard word-level approaches
(may be) NOT WORKING
14. Playlist Titles
14
● Playlist titles ~= words
● Standard word-level approaches (may be) not working
● Character level approach : Character N-GRAM
15. Character N-gram
15
Input Text “Character N-gram”
Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m
Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ...
Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ...
Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ...
... ...
16. Character N-gram
16
Input Text “Character N-gram”
Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m
Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ...
Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ...
Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ...
... ...
Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ...
jkl
Bag-of-Character-N-gram
17. ● Build bag-of-n-grams for each playlist (Train + Test Set)
● For each testing playlist
○ Find M closest playlist in Train set using cosine distance
○ Collect tracks from retrieved playlist
○ Recommend L most popular tracks
Title-based RecSys
NGRAM:Similarity Based
17
18. Cosine
Distance
Title-based RecSys
NGRAM:Similarity Based
18
Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ...
jkl
...
...
Testing
Playlists
Training
Playlist
L most popular tracks among
M closest training playlists
Bag-of-Character-N-gram
42. Take away
42
● MF is still powerful
● Setting up right (internal) evaluation setup is more important than model
● Software engineering DOES MATTER
○ Since the scalability DOES MATTER
○ Since hyper-parameter tuning DOES MATTER
● Deep learning is not a magic wand
○ No Free Lunch
○ It costs a LOT
● Content-based algorithms still gives small (but significant) gain to CF
44. References
44
[1] Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008.
ICDM'08. Eighth IEEE International Conference on. Ieee, 2008.
[2] Wang, Xinxi, and Ye Wang. "Improving content-based and hybrid music recommendation using deep learning."
Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
[3] Van den Oord, Aäron, Sander Dieleman, and Benjamin Schrauwen. "Deep content-based music recommendation."
Advances in neural information processing systems. 2013.