4. Why Build a RecSys?
• College students
may not know
what they want —
must show options
• Promote customer
jobs
• Ongoing
engagements with
content (blog,
guide) recs
4
12. Leverage the Profile
• Structured &
Unstructured Data
• Natural Language
Processing
• Learning to Rank
• Domain Knowledge &
Feature Engineering
7
13. Architecture
User &
Front End:
Hey, show me
jobs!
Main App:
That’s hard! But
I know who you
are!
DB
Microservice:
Got you. Feature
Engineering your
Profile…
DB
Profile,
Interaction
History
Listing IDs
Listing
Details
User
Details
User ID,
Params
Ranked
Listings
& Details
Offline Machine
Learning
8
14. What do you mean by… Similar?
Graphic Designer
Lehman Brothers is the
leading firm in highly
leveraged mortgages!
We have a ping pong
table!
You’re a great artist.
Risk Manager
Lehman Brothers is the
leading firm in highly
leveraged mortgages!
We have a ping pong
table!
You’re OK at math.
Visual Brand Lead
Can you draw? Dunder
Mifflin seeks a talented
person to help bring our
office paper business to
the next level. And you’ll
be on television!
Meetup, next week!
9
15. How to Build a Multi-Factor,
Profile-Based, Cold-Start Content
Recommendation System
10
19. 17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
20. 17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
21. 17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
22. 17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
23. 17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
6. Sponsored — why wouldn’t we…?
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
24. 17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
6. Sponsored — why wouldn’t we…?
7. Random!
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
25. The More the Better
id recent major log rec log maj tot rank why
1 1 4 0 1.4 2.1 3 rec
2 2 2 0.7 0.7 1.7 2 maj
3 4 3 1.4 1.1 3.0 4 maj
4 3 1 1.1 0 1.1 1 maj
*1.5
13
*1.0
26. The More the Better
• Sum Weighted Log Rank (not Score)
• Tune with A/B tests (or reinforcement learning)
• Plausible “why” could be exposed to user
• Mix of general and personalized rankers
id recent major log rec log maj tot rank why
1 1 4 0 1.4 2.1 3 rec
2 2 2 0.7 0.7 1.7 2 maj
3 4 3 1.4 1.1 3.0 4 maj
4 3 1 1.1 0 1.1 1 maj
*1.5
13
*1.0
29. Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends JSON profile with no
feature engineering
15
30. Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends JSON profile with no
feature engineering
Recommender microservice
• Knows about content, not
users
• Updated nightly with new
content & statistics
• Parses, engineers features,
ranks
• Returns ranked IDs
15
32. Metrics & Tuning
• Need to store: User X was recommended Content
A, B, C on Page Y, then read B
16
33. Metrics & Tuning
• Need to store: User X was recommended Content
A, B, C on Page Y, then read B
• Metrics & A/B tests:
Click-through Rate (did they like the suggestions?),
Mean Reciprocal Rank (did they like the top items?)
16
34. Metrics & Tuning
• Need to store: User X was recommended Content
A, B, C on Page Y, then read B
• Metrics & A/B tests:
Click-through Rate (did they like the suggestions?),
Mean Reciprocal Rank (did they like the top items?)
• Avoid hurting top KPIs!
16
35. Metrics & Tuning
• Need to store: User X was recommended Content
A, B, C on Page Y, then read B
• Metrics & A/B tests:
Click-through Rate (did they like the suggestions?),
Mean Reciprocal Rank (did they like the top items?)
• Avoid hurting top KPIs!
• Offline debugging tool is very handy
16
37. Pros & Cons
• Incredibly fast to prototype offline;
Fairly fast to build in production
17
38. Pros & Cons
• Incredibly fast to prototype offline;
Fairly fast to build in production
• Amenable to explanations
17
39. Pros & Cons
• Incredibly fast to prototype offline;
Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
17
40. Pros & Cons
• Incredibly fast to prototype offline;
Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
17
41. Pros & Cons
• Incredibly fast to prototype offline;
Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
• Works with new users and new-ish content
17
42. Pros & Cons
• Incredibly fast to prototype offline;
Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
• Works with new users and new-ish content
• Doesn’t work with very large number of items;
Requires tuning
17