SlideShare una empresa de Scribd logo
1 de 42
Large-Scale Recommendation Systems Workshop
RecSys 2013, Hong Kong

Recommendation at Netflix Scale
Justin Basilico
Netflix Algorithm Engineering

October 13, 2013
1
Outline

Reintroduction to
Netflix

Approach to
Recommendation

Netflix Scale

Architecture

2
Reintroduction to Netflix
3
4
Change of focus

2006

2013

5
Approach to Recommendation
6
Goal

Help members find content to watch and enjoy
to maximize member satisfaction and retention

7
Everything is a Recommendation

Rows

Ranking

Over 75% of what
people watch comes
from our
recommendations

8
Top 10: Our best guess
Personalization awareness

All

Dad

Dad&Mom Daughter

All

All?

Daughter

Son

Mom

Mom

Diversity
9
But…

10
Genre Personalization
 Personalized genre rows
focus on user interest
 Also provide context and
“evidence”

 How are they generated?
 Implicit: based on user’s recent
plays, ratings, & other
interactions
 Explicit taste preferences
 Hybrid: combine the above

 Also take into account:
 Freshness - has this been
shown before?
 Diversity– avoid repeating tags
and genres, limit number of TV
genres, etc.

11
Similars
 Displayed in many
contexts
 Video display page
 In response to user
actions
(search, queue
add, …)
 “Because you
watched” rows

12
Support for Recommendations

Social Support

13
EVERYTHING is a Recommendation

14
… EVERYTHING

15
Netflix Scale
16
Netflix Data
 > 37M members
 > 40 countries

 > 1000 device types
 Ratings: > 4M/day
 Searches: > 3M/day

 Plays: > 30M/day
 1B hours in June 2012
 > 4B hours in Q1 2013
 Log 100B events/day
 32.25% of peak US downstream
traffic
17
Plays
●

What people watch

●

The most important source of data for
our algorithms

●

A few plays are usually more valuable
than most of our other data

●

We have a lot of information
associated to a play:
○

Duration

○

Start/stop/pause/rewind

○

Device, location, time, …

○

Page context

○

…

18
Ratings


Explicit information about a member’s taste
should be great



But we find ratings are…



Noisy





Sparse
Biased

Quality of our ratings has decreased over
time
19
Metadata
●

Our tag space is made of thousands of
different concepts

●

Manually annotated by a set of experts

●

Although an automatic approach may be
possible, we believe it would be of lesser
quality
○

●

However, we are researching on automatic
annotation of scenes, transitions…

Metadata is useful
○

Especially for coldstart
20
Social
●

Can your “friends” interests help us predict
yours better?

●

The answer is similar to the Metadata case:
○

○

●

If we know enough about you, social information
becomes less useful
But, it is very interesting for coldstarting

Social support for recommendations has been
shown to matter

21
Affordances
 Highly curated catalog
 Catalog changes daily
 Videos have long shelf-lives
 Videos take time to consume

22
Smart Models


Logistic/linear regression



Elastic nets



SVD and other Matrix Factorizations



Restricted Boltzmann Machines



Deep Networks



Factorization Machines



Markov Chains



Different clustering approaches



Latent Dirichlet Allocation



Gradient Boosted Decision
Trees/Random Forests



…
23
Offline/Online testing process
Weeks to months

days

Offline
testing

[success]

Online A/B
testing

[success]

Rollout
Feature to
all users

[fail]

24
System Architecture
25
Design Considerations

Recommendations

Systems

• Personal
• Accurate
• Novel
• Diverse
• Fresh

• Scalable
• Responsive
• Resilient
• Efficient
• Flexible

26
Technology Stack

http://techblog.netflix.com

27
Cloud Computing at Netflix
 Layered services
 Clusters: Horizontal scaling
 Auto-scale with demand
 Plan for failure
 Replication
 Fail fast

 State is bad

 Simian Army: Induce failures to
ensure resiliency
28
System Overview

OFFLINE
Netflix.Hermes

Query results

 Blueprint for multiple
personalization algorithm
services
 Ranking
 Row selection

Offline Data
Machine
Learning
Algorithm

Offline
Computation
Nearline
Computation

NEARLINE

Models

Machine
Learning
Algorithm

Netflix.Manhattan

 Ratings

User Event
Queue

 …

 Recommendation involving
multi-layered Machine
Learning

Model
training

Event Distribution

Algorithm
Service

Online
Data Service

UI Client

ONLINE

Play, Rate,
Browse...

Recommendations

Online
Computation
Machine
Learning
Algorithm

Member

29
OFFLINE
Netflix.Hermes

Query results

Offline Data

Event & Data Distribution

Machine
Learning
Algorithm

Netflix.Manhattan

 Collect actions

Machine
Learning
Algorithm

User Event
Queue

Algorithm
Service

Online
Data Service

UI Client

User Event
Queue

Play, Rate,
Browse...

Recommendations

Online
Computation
Machine
Learning
Algorithm

Member

Event Distribution

 Small units

 Data

Models

Netflix.Manhattan

Event Distribution

 Plays, browsing, searches, ratin
gs, etc.

 Time sensitive

Nearline
Computation

NEARLINE

ONLINE

 Events

Model
training

Offline
Computation

UI Client

Play, Rate,
Browse...

 Dense information
 Processed for further use
 Saved

Member

30
Computation Layers

OFFLINE
Netflix.Hermes

 Offline

Offline Data
Models

Offline
Computation

 Process data

 Nearline

Nearline
Computation

NEARLINE

Machine
Learning
Algorithm

Netflix.Manhattan

 Process events

 Online
 Process requests

ONLINE

Algorithm
Service

Online Data
Service

UI Client

Recommendations

Online
Computation
Machine
Learning
Algorithm

Member

31
OFFLINE
Netflix.Hermes

Query results

Offline Data

Online Computation

Machine
Learning
Algorithm

Model
training

Offline
Computation
Nearline
Computation

NEARLINE

Models

Machine
Learning
Algorithm

Netflix.Manhattan

 Synchronous computation in
response to a member request

 Pros:

 Good for:

User Event
Queue

Event Distribution

Algorithm
Service

Online
Data Service

UI Client

 Simple algorithms

ONLINE

Play, Rate,
Browse...

Recommendations

Online
Computation
Machine
Learning
Algorithm

Member

 Model application

 Access to most fresh data

 Business logic

 Knowledge of full request context

 Context-dependence

 Compute only what is necessary

 Interactivity

Online
Data Service

 Cons:
 Strict Service Level Agreements
 Must respond quickly … in all cases
 Requires high availability

 Limited view of data

Event Distribution

Algorithm
Service
UI Client

Play, Rate,
Browse...

Recommendations

Online
Computation
Machine
Learning
Algorithm

www.netflix.com
Member

32
OFFLINE
Netflix.Hermes

Query results

Offline Data

Offline Computation

Machine
Learning
Algorithm

Model
training

Offline
Computation
Nearline
Computation

NEARLINE

Models

Machine
Learning
Algorithm

Netflix.Manhattan

 Asynchronous computation done
on a regular schedule

 Good for:

User Event
Queue

Event Distribution

Algorithm
Service

Online
Data Service

UI Client

ONLINE

 Batch learning

 Pros:

Play, Rate,
Browse...

Online
Computation

Recommendations

Machine
Learning
Algorithm
Member

 Model training

 Can handle large data

 Complex algorithms

 Can do bulk processing

 Precomputing

 Relaxed time constraints

 Cons:

Query results

Netflix.Hermes

Model
training

Machine
Learning
Algorithm

 Cannot react quickly

 Results can become stale

Models

Offline Data

Offline
Computation

Machine
Learning
Algorithm

33
OFFLINE
Netflix.Hermes

Query results

Offline Data

Nearline Computation

Machine
Learning
Algorithm

Model
training

Offline
Computation
Nearline
Computation

NEARLINE

Models

Machine
Learning
Algorithm

Netflix.Manhattan

 Asynchronous computation in
response to a member event

 Pros:

 Good for:

User Event
Queue

Event Distribution

Algorithm
Service

Online
Data Service

UI Client

 Incremental learning

ONLINE

Play, Rate,
Browse...

Recommendations

Online
Computation
Machine
Learning
Algorithm

Member

 User-oriented algorithms

 Can keep data fresh

 Moderate complexity algorithms

 Can run moderate complexity
algorithms

 Keeping precomputed results
fresh

 Can average computational cost
across users

Nearline
Computation

 Change from actions

 Cons:

Machine
Learning
Algorithm

Netflix.Manhattan

 Has some delay
 Done in event context

User Event
Queue
34
Where to place components?
 Example: Matrix Factorization
 Offline:
 Collect sample of play data
 Run batch learning algorithm to
produce factorization
 Publish item factors

 Nearline:
 Solve user factors
 Compute user-item products
 Combine

 Online:
 Presentation-context filtering
 Serve recommendations

OFFLINE

X

Netflix.Hermes

Query results

Offline Data
Machine
Learning
Algorithm

Model

X≈UVt
training

Offline
Computation

sNearline j
ij=uiv

NEARLINE

V

Models

Machine
Learning
Algorithm

Aui=b

Computation

Netflix.Manhattan

sij

User Event
Queue

Event Distribution

sij>t

Algorithm
Service

Online
Data Service

UI Client

ONLINE

Play, Rate,
Browse...

Recommendations

Online
Computation
Machine
Learning
Algorithm

Member

35
Netflix Manhattan

Stan Lanning

 Event-based precomputation framework
 Supports both nearline and offline computation modes

 Customer-centric events and data
Play
Service
Rating
Service

Event
Queue

Event
Event
Event
Handler
Handler
Handler

Request
Queue

…
Event
Rules

Manager
Manager
Manager
Algorithm
Algorithm
Algorithm

Cached
User Data
36
OFFLINE
Netflix.Hermes

Query results

Offline Data

Signals & Models

Machine
Learning
Algorithm

Model
training

Offline
Computation
Nearline
Computation

NEARLINE

Models

Machine
Learning
Algorithm

Netflix.Manhattan

 Similar pattern across layers

User Event
Queue

Offline Data

Event Distribution

Algorithm
Service

Online
Data Service

UI Client

ONLINE

 Models

 Previously processed and
stored information

Online
Computation
Machine
Learning
Algorithm

Netflix.Hermes

Offline
Computation
Nearline
Computation

Models

Machine
Learning
Algorithm

Online
Computation

 Signals
 Fresh data from live services
 User-related or context-related

Recommendations

Member

 Parameter files
 Trained offline

 Data

Play, Rate,
Browse...

Signals
(Online Service)

Machine
Learning
Algorithm

37
OFFLINE
Netflix.Hermes

Query results

Offline Data

Recommendation Results

Machine
Learning
Algorithm

Model
training

Offline
Computation
Nearline
Computation

NEARLINE

Models

Machine
Learning
Algorithm

Netflix.Manhattan

 Precomputed results

User Event
Queue

Event Distribution

ONLINE

 Fetch from data store

 Collect signals, apply model

 Combination

 Dynamically choose

Online
Data Service

Play, Rate,
Browse...

Recommendations

Online
Computation
Machine
Learning
Algorithm

Member

 Post-process in context

 Generated on the fly

Algorithm
Service
UI Client

Algorithm
Service

Machine
Learning
Algorithm

Online
Computation

UI Client

Recommendations

 Fallbacks
Member

38
Conclusions
39
Research Directions

Personalized
learning to rank

Context
awareness

Presentation
effects

Social
recommendation

Full-page
optimization

Cold start

40
Take Aways
 Behind-the-scenes peek at a real-world, industrial-scale
recommender system

 Recommendation is not just ratings
 Scaling is not only about batch, offline algorithms

 Use application domain advantages

41
We’re hiring

Thank You

Justin Basilico
42
@JustinBasilico

Más contenido relacionado

La actualidad más candente

Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the WorldYves Raimond
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Netflix recommendation systems
Netflix recommendation systemsNetflix recommendation systems
Netflix recommendation systemsMina Tafreshi
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 

La actualidad más candente (20)

Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Netflix recommendation systems
Netflix recommendation systemsNetflix recommendation systems
Netflix recommendation systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 

Similar a Recommendation at Netflix Scale

[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsAxel de Romblay
 
DSDT Meetup April 2021
DSDT Meetup April 2021DSDT Meetup April 2021
DSDT Meetup April 2021DSDT_MTL
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)Amazon Web Services
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for SpeedCapgemini
 
Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Looker
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxJadna Almeida
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxJadna Almeida
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In IndustryXavier Amatriain
 
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...Cloudera, Inc.
 
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchLean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchPeople10 Technosoft Private Limited
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsApigee | Google Cloud
 
Data Science in E-commerce
Data Science in E-commerceData Science in E-commerce
Data Science in E-commerceVincent Michel
 

Similar a Recommendation at Netflix Scale (20)

[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
DSDT Meetup April 2021
DSDT Meetup April 2021DSDT Meetup April 2021
DSDT Meetup April 2021
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for Speed
 
Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
 
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchLean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee Insights
 
Data Science in E-commerce
Data Science in E-commerceData Science in E-commerce
Data Science in E-commerce
 

Último

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 

Último (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 

Recommendation at Netflix Scale

  • 1. Large-Scale Recommendation Systems Workshop RecSys 2013, Hong Kong Recommendation at Netflix Scale Justin Basilico Netflix Algorithm Engineering October 13, 2013 1
  • 4. 4
  • 7. Goal Help members find content to watch and enjoy to maximize member satisfaction and retention 7
  • 8. Everything is a Recommendation Rows Ranking Over 75% of what people watch comes from our recommendations 8
  • 9. Top 10: Our best guess Personalization awareness All Dad Dad&Mom Daughter All All? Daughter Son Mom Mom Diversity 9
  • 11. Genre Personalization  Personalized genre rows focus on user interest  Also provide context and “evidence”  How are they generated?  Implicit: based on user’s recent plays, ratings, & other interactions  Explicit taste preferences  Hybrid: combine the above  Also take into account:  Freshness - has this been shown before?  Diversity– avoid repeating tags and genres, limit number of TV genres, etc. 11
  • 12. Similars  Displayed in many contexts  Video display page  In response to user actions (search, queue add, …)  “Because you watched” rows 12
  • 14. EVERYTHING is a Recommendation 14
  • 17. Netflix Data  > 37M members  > 40 countries  > 1000 device types  Ratings: > 4M/day  Searches: > 3M/day  Plays: > 30M/day  1B hours in June 2012  > 4B hours in Q1 2013  Log 100B events/day  32.25% of peak US downstream traffic 17
  • 18. Plays ● What people watch ● The most important source of data for our algorithms ● A few plays are usually more valuable than most of our other data ● We have a lot of information associated to a play: ○ Duration ○ Start/stop/pause/rewind ○ Device, location, time, … ○ Page context ○ … 18
  • 19. Ratings  Explicit information about a member’s taste should be great  But we find ratings are…   Noisy   Sparse Biased Quality of our ratings has decreased over time 19
  • 20. Metadata ● Our tag space is made of thousands of different concepts ● Manually annotated by a set of experts ● Although an automatic approach may be possible, we believe it would be of lesser quality ○ ● However, we are researching on automatic annotation of scenes, transitions… Metadata is useful ○ Especially for coldstart 20
  • 21. Social ● Can your “friends” interests help us predict yours better? ● The answer is similar to the Metadata case: ○ ○ ● If we know enough about you, social information becomes less useful But, it is very interesting for coldstarting Social support for recommendations has been shown to matter 21
  • 22. Affordances  Highly curated catalog  Catalog changes daily  Videos have long shelf-lives  Videos take time to consume 22
  • 23. Smart Models  Logistic/linear regression  Elastic nets  SVD and other Matrix Factorizations  Restricted Boltzmann Machines  Deep Networks  Factorization Machines  Markov Chains  Different clustering approaches  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  … 23
  • 24. Offline/Online testing process Weeks to months days Offline testing [success] Online A/B testing [success] Rollout Feature to all users [fail] 24
  • 26. Design Considerations Recommendations Systems • Personal • Accurate • Novel • Diverse • Fresh • Scalable • Responsive • Resilient • Efficient • Flexible 26
  • 28. Cloud Computing at Netflix  Layered services  Clusters: Horizontal scaling  Auto-scale with demand  Plan for failure  Replication  Fail fast  State is bad  Simian Army: Induce failures to ensure resiliency 28
  • 29. System Overview OFFLINE Netflix.Hermes Query results  Blueprint for multiple personalization algorithm services  Ranking  Row selection Offline Data Machine Learning Algorithm Offline Computation Nearline Computation NEARLINE Models Machine Learning Algorithm Netflix.Manhattan  Ratings User Event Queue  …  Recommendation involving multi-layered Machine Learning Model training Event Distribution Algorithm Service Online Data Service UI Client ONLINE Play, Rate, Browse... Recommendations Online Computation Machine Learning Algorithm Member 29
  • 30. OFFLINE Netflix.Hermes Query results Offline Data Event & Data Distribution Machine Learning Algorithm Netflix.Manhattan  Collect actions Machine Learning Algorithm User Event Queue Algorithm Service Online Data Service UI Client User Event Queue Play, Rate, Browse... Recommendations Online Computation Machine Learning Algorithm Member Event Distribution  Small units  Data Models Netflix.Manhattan Event Distribution  Plays, browsing, searches, ratin gs, etc.  Time sensitive Nearline Computation NEARLINE ONLINE  Events Model training Offline Computation UI Client Play, Rate, Browse...  Dense information  Processed for further use  Saved Member 30
  • 31. Computation Layers OFFLINE Netflix.Hermes  Offline Offline Data Models Offline Computation  Process data  Nearline Nearline Computation NEARLINE Machine Learning Algorithm Netflix.Manhattan  Process events  Online  Process requests ONLINE Algorithm Service Online Data Service UI Client Recommendations Online Computation Machine Learning Algorithm Member 31
  • 32. OFFLINE Netflix.Hermes Query results Offline Data Online Computation Machine Learning Algorithm Model training Offline Computation Nearline Computation NEARLINE Models Machine Learning Algorithm Netflix.Manhattan  Synchronous computation in response to a member request  Pros:  Good for: User Event Queue Event Distribution Algorithm Service Online Data Service UI Client  Simple algorithms ONLINE Play, Rate, Browse... Recommendations Online Computation Machine Learning Algorithm Member  Model application  Access to most fresh data  Business logic  Knowledge of full request context  Context-dependence  Compute only what is necessary  Interactivity Online Data Service  Cons:  Strict Service Level Agreements  Must respond quickly … in all cases  Requires high availability  Limited view of data Event Distribution Algorithm Service UI Client Play, Rate, Browse... Recommendations Online Computation Machine Learning Algorithm www.netflix.com Member 32
  • 33. OFFLINE Netflix.Hermes Query results Offline Data Offline Computation Machine Learning Algorithm Model training Offline Computation Nearline Computation NEARLINE Models Machine Learning Algorithm Netflix.Manhattan  Asynchronous computation done on a regular schedule  Good for: User Event Queue Event Distribution Algorithm Service Online Data Service UI Client ONLINE  Batch learning  Pros: Play, Rate, Browse... Online Computation Recommendations Machine Learning Algorithm Member  Model training  Can handle large data  Complex algorithms  Can do bulk processing  Precomputing  Relaxed time constraints  Cons: Query results Netflix.Hermes Model training Machine Learning Algorithm  Cannot react quickly  Results can become stale Models Offline Data Offline Computation Machine Learning Algorithm 33
  • 34. OFFLINE Netflix.Hermes Query results Offline Data Nearline Computation Machine Learning Algorithm Model training Offline Computation Nearline Computation NEARLINE Models Machine Learning Algorithm Netflix.Manhattan  Asynchronous computation in response to a member event  Pros:  Good for: User Event Queue Event Distribution Algorithm Service Online Data Service UI Client  Incremental learning ONLINE Play, Rate, Browse... Recommendations Online Computation Machine Learning Algorithm Member  User-oriented algorithms  Can keep data fresh  Moderate complexity algorithms  Can run moderate complexity algorithms  Keeping precomputed results fresh  Can average computational cost across users Nearline Computation  Change from actions  Cons: Machine Learning Algorithm Netflix.Manhattan  Has some delay  Done in event context User Event Queue 34
  • 35. Where to place components?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm to produce factorization  Publish item factors  Nearline:  Solve user factors  Compute user-item products  Combine  Online:  Presentation-context filtering  Serve recommendations OFFLINE X Netflix.Hermes Query results Offline Data Machine Learning Algorithm Model X≈UVt training Offline Computation sNearline j ij=uiv NEARLINE V Models Machine Learning Algorithm Aui=b Computation Netflix.Manhattan sij User Event Queue Event Distribution sij>t Algorithm Service Online Data Service UI Client ONLINE Play, Rate, Browse... Recommendations Online Computation Machine Learning Algorithm Member 35
  • 36. Netflix Manhattan Stan Lanning  Event-based precomputation framework  Supports both nearline and offline computation modes  Customer-centric events and data Play Service Rating Service Event Queue Event Event Event Handler Handler Handler Request Queue … Event Rules Manager Manager Manager Algorithm Algorithm Algorithm Cached User Data 36
  • 37. OFFLINE Netflix.Hermes Query results Offline Data Signals & Models Machine Learning Algorithm Model training Offline Computation Nearline Computation NEARLINE Models Machine Learning Algorithm Netflix.Manhattan  Similar pattern across layers User Event Queue Offline Data Event Distribution Algorithm Service Online Data Service UI Client ONLINE  Models  Previously processed and stored information Online Computation Machine Learning Algorithm Netflix.Hermes Offline Computation Nearline Computation Models Machine Learning Algorithm Online Computation  Signals  Fresh data from live services  User-related or context-related Recommendations Member  Parameter files  Trained offline  Data Play, Rate, Browse... Signals (Online Service) Machine Learning Algorithm 37
  • 38. OFFLINE Netflix.Hermes Query results Offline Data Recommendation Results Machine Learning Algorithm Model training Offline Computation Nearline Computation NEARLINE Models Machine Learning Algorithm Netflix.Manhattan  Precomputed results User Event Queue Event Distribution ONLINE  Fetch from data store  Collect signals, apply model  Combination  Dynamically choose Online Data Service Play, Rate, Browse... Recommendations Online Computation Machine Learning Algorithm Member  Post-process in context  Generated on the fly Algorithm Service UI Client Algorithm Service Machine Learning Algorithm Online Computation UI Client Recommendations  Fallbacks Member 38
  • 40. Research Directions Personalized learning to rank Context awareness Presentation effects Social recommendation Full-page optimization Cold start 40
  • 41. Take Aways  Behind-the-scenes peek at a real-world, industrial-scale recommender system  Recommendation is not just ratings  Scaling is not only about batch, offline algorithms  Use application domain advantages 41
  • 42. We’re hiring Thank You Justin Basilico 42 @JustinBasilico

Notas del editor

  1. http://www.businessweek.com/articles/2013-05-09/netflix-reed-hastings-survive-missteps-to-join-silicon-valleys-elite#p5
  2. http://techblog.netflix.com/2013/03/system-architectures-for.html