SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Cold-Start Recommendations
to Users With Rich Profiles
Harlan D. Harris, PhD

Director of Data Science at WayUp
September, 2018
RecSys NYC Meetup
1
After This Meetup!
• Go to The
Storehouse!
• Meet other
RecSys peeps!
2
3
Why Build a RecSys?
• College students
may not know
what they want —
must show options
• Promote customer
jobs
• Ongoing
engagements with
content (blog,
guide) recs
4
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
(Feed)
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
(Feed)
5
the problem with
collaborative filters…
6
the problem with
collaborative filters…
6
Leverage the Profile
• Structured &
Unstructured Data
• Natural Language
Processing
• Learning to Rank
• Domain Knowledge &
Feature Engineering
7
Architecture
User & 

Front End:
Hey, show me
jobs!
Main App:

That’s hard! But
I know who you
are!
DB
Microservice:
Got you. Feature
Engineering your
Profile…
DB
Profile,

Interaction
History
Listing IDs
Listing

Details
User

Details
User ID,
Params
Ranked
Listings

& Details
Offline Machine
Learning
8
What do you mean by… Similar?
Graphic Designer

Lehman Brothers is the
leading firm in highly
leveraged mortgages!
We have a ping pong
table!
You’re a great artist.
Risk Manager

Lehman Brothers is the
leading firm in highly
leveraged mortgages!
We have a ping pong
table!
You’re OK at math.
Visual Brand Lead

Can you draw? Dunder
Mifflin seeks a talented
person to help bring our
office paper business to
the next level. And you’ll
be on television!
Meetup, next week!
9
How to Build a Multi-Factor,
Profile-Based, Cold-Start Content
Recommendation System
10
11
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
6. Sponsored — why wouldn’t we…?
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
6. Sponsored — why wouldn’t we…?
7. Random!
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
The More the Better
id recent major log rec log maj tot rank why
1 1 4 0 1.4 2.1 3 rec
2 2 2 0.7 0.7 1.7 2 maj
3 4 3 1.4 1.1 3.0 4 maj
4 3 1 1.1 0 1.1 1 maj
*1.5
13
*1.0
The More the Better
• Sum Weighted Log Rank (not Score)
• Tune with A/B tests (or reinforcement learning)
• Plausible “why” could be exposed to user
• Mix of general and personalized rankers
id recent major log rec log maj tot rank why
1 1 4 0 1.4 2.1 3 rec
2 2 2 0.7 0.7 1.7 2 maj
3 4 3 1.4 1.1 3.0 4 maj
4 3 1 1.1 0 1.1 1 maj
*1.5
13
*1.0
14
Separation of Concerns
15
Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends JSON profile with no
feature engineering
15
Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends JSON profile with no
feature engineering
Recommender microservice
• Knows about content, not
users
• Updated nightly with new
content & statistics
• Parses, engineers features,
ranks
• Returns ranked IDs
15
Metrics & Tuning
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 

Click-through Rate (did they like the suggestions?),

Mean Reciprocal Rank (did they like the top items?)
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 

Click-through Rate (did they like the suggestions?),

Mean Reciprocal Rank (did they like the top items?)
• Avoid hurting top KPIs!
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 

Click-through Rate (did they like the suggestions?),

Mean Reciprocal Rank (did they like the top items?)
• Avoid hurting top KPIs!
• Offline debugging tool is very handy
16
Pros & Cons
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
• Works with new users and new-ish content
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
• Works with new users and new-ish content
• Doesn’t work with very large number of items; 

Requires tuning
17
Thank You!
Harlan Harris
harlan@wayup.com
@harlanh on Twitter, Medium, GitHub
http://harlan.harris.name
18
What Happens When?
Real Time
• Ranking
19
Nightly
• Update
content
• Compute
popularity
• Refit
collaborative
ranker
Periodically
• Tuning
parameters
• Exploring new
rankers

Más contenido relacionado

Similar a Cold-Start Recommendations to Users With Rich Profiles

Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Spark Summit
 

Similar a Cold-Start Recommendations to Users With Rich Profiles (20)

Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
 
Understanding Your Project Before You Start
Understanding Your Project Before You StartUnderstanding Your Project Before You Start
Understanding Your Project Before You Start
 
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
Technical Excellence Doesn't Just Happen - AgileIndy 2016
Technical Excellence Doesn't Just Happen - AgileIndy 2016Technical Excellence Doesn't Just Happen - AgileIndy 2016
Technical Excellence Doesn't Just Happen - AgileIndy 2016
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
 
Senior applications engineer email list
Senior applications engineer email listSenior applications engineer email list
Senior applications engineer email list
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products Faster
 
Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware
 
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source Way
 
Improve Product Design with High Quality Requirements
Improve Product Design with High Quality RequirementsImprove Product Design with High Quality Requirements
Improve Product Design with High Quality Requirements
 
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB FeatureMongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
 
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Cold-Start Recommendations to Users With Rich Profiles

  • 1. Cold-Start Recommendations to Users With Rich Profiles Harlan D. Harris, PhD
 Director of Data Science at WayUp September, 2018 RecSys NYC Meetup 1
  • 2. After This Meetup! • Go to The Storehouse! • Meet other RecSys peeps! 2
  • 3. 3
  • 4. Why Build a RecSys? • College students may not know what they want — must show options • Promote customer jobs • Ongoing engagements with content (blog, guide) recs 4
  • 5. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  • 6. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  • 7. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  • 8. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  • 9. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  • 12. Leverage the Profile • Structured & Unstructured Data • Natural Language Processing • Learning to Rank • Domain Knowledge & Feature Engineering 7
  • 13. Architecture User & 
 Front End: Hey, show me jobs! Main App:
 That’s hard! But I know who you are! DB Microservice: Got you. Feature Engineering your Profile… DB Profile,
 Interaction History Listing IDs Listing
 Details User
 Details User ID, Params Ranked Listings
 & Details Offline Machine Learning 8
  • 14. What do you mean by… Similar? Graphic Designer
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re a great artist. Risk Manager
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re OK at math. Visual Brand Lead
 Can you draw? Dunder Mifflin seeks a talented person to help bring our office paper business to the next level. And you’ll be on television! Meetup, next week! 9
  • 15. How to Build a Multi-Factor, Profile-Based, Cold-Start Content Recommendation System 10
  • 16. 11
  • 17. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 18. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 19. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 20. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 21. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 22. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 23. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 24. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? 7. Random! Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 25. The More the Better id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  • 26. The More the Better • Sum Weighted Log Rank (not Score) • Tune with A/B tests (or reinforcement learning) • Plausible “why” could be exposed to user • Mix of general and personalized rankers id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  • 27. 14
  • 29. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering 15
  • 30. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering Recommender microservice • Knows about content, not users • Updated nightly with new content & statistics • Parses, engineers features, ranks • Returns ranked IDs 15
  • 32. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B 16
  • 33. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) 16
  • 34. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! 16
  • 35. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! • Offline debugging tool is very handy 16
  • 37. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production 17
  • 38. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations 17
  • 39. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) 17
  • 40. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities 17
  • 41. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content 17
  • 42. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content • Doesn’t work with very large number of items; 
 Requires tuning 17
  • 43. Thank You! Harlan Harris harlan@wayup.com @harlanh on Twitter, Medium, GitHub http://harlan.harris.name 18
  • 44. What Happens When? Real Time • Ranking 19 Nightly • Update content • Compute popularity • Refit collaborative ranker Periodically • Tuning parameters • Exploring new rankers