5. Factorization Machine
5
• Algorithm for Recommendation
• Classification(Clustering)
• Regression
• Supervised Learning
• Need Input/Output Data
• Suitable for Sparse Data
9. INPUT Details
9
• Identifier
- User Identifier : [0, 0, …, 0, 1, 0, …,0]
- Movie Identifier : [0, 0, …, 0, 0, 1, 0, …,0]
• Designed Feature
- Rating of Other Movie
- Time
- Last Movie rated
24. 24
Other Use Case
• E-Commerce User-Item Recommendation
• Input Data
• Age
• Purchase timezone
• Past bought items
• Cluster ID
• Target Data
• Evaluation of
an Item by User
26. Latent Dirichlet Allocation
26
• Most Popular Algorithm of Topic Model
• Mostly applied for text data
• Find hidden structure of data
• Unsupervised Learning
• Need Input Data only
• Generative Model
27. Latent Dirichlet Allocation
27
• Generative Modelling in LDA
• Mimic how to generate Document
• 1. Choose what you write about
• 2. Choose word from the Topic
• 3. Write
36. Mini-batch Online LDA
36
• Faster than Batch Algorithm
• Less noise than pure Online LDA
Pure Online
Mini-batch
Online
Batch
Batch Size
37. 37
Implemented Model
• Mini-Batch Map Model
• For unknown data
• Don t assume Vocabulary List
• Mini-Batch Array Model (Other
implementation)
• For known data
• Assume Vocabulary List
38. • Mini-Batch Map Model
• For unknown data
• Don t assume Vocabulary List
38
Implemented Model
• Mini-Batch Array Model (Other
implementation)
• For known data
• Assume Vocabulary List
39. • Meaning Less word
• LDA: Clustering word by co-occurrence
• a , the , I , He , is , in , on
• Stop Word: Ignore them
• TF-IDF: how important a word is to a
document in a collection or dataset
39
Faced Implementation Problem
40. 40
Faced Implementation Problem
• Meaning Less word
• LDA: Clustering word by co-occurrence
• a , the , I , He , is , in , on
• Stop Word: Ignore them
• TF-IDF: how important a word is to a
document in a collection or dataset
41. • TF-IDF
• can be calculated by Hivemall
• Input Data: (DocId, Words)
• https://github.com/myui/hivemall/wiki/
TFIDF-calculation
41
Faced Implementation Problem
43. • Vocabulary List Model
• Initialize all lambda for all words at first
• if word does not appear in the Doc:
• Lambda decreases at the same rate
• No initialization problem
43
Faced Implementation Problem
44. • Online Map Model
• Initialize lambda when new word fetched
• final lambda:
depend on the first appeared time
• Initialize problem
44
Faced Implementation Problem
45. • Prepared Dummy Lambda
• Initialize dummy lambdas at first
• Apply lambda update rule for dummy
lambda
45
Faced Implementation Problem
46. • Implicit Φ Normalization
• Not written implicitly
46
Faced Implementation Problem
47. • Implicit Φ Normalization
• Not written implicitly
47
Faced Implementation Problem
48. • Implicit Φ Normalization
• Not written explicitly
48
Faced Implementation Problem
55. Impression about Internship
55
• Machine Learning
• Implementing ML algorithm from
Scratch was fun
• Contributing for OSS is precious
experience for me