Machine Learning at Netflix Scale

•

7 recomendaciones•1,981 vistas

Netflix is the world’s leading Internet television network with over 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series. Netflix uses machine learning to deliver a personalized experience to each one of our 48 million users. In this talk you will hear about the machine learning algorithms that power almost every part of the Netflix experience, including some of our recent work on distributed Neural Networks on AWS GPUs. You will also get an insight into the innovation approach that includes offline experimentation and online AB testing. Finally, you will learn about the system architectures that enable all of this at a Netflix scale.

Ingeniería Tecnología Educación

Machine Learning At
Netflix Scale
Aish Fenton
Manager - Research Engineering
@aishfenton

Rank based on your taste
Rankbasedonyourtaste

Proxy question:
▪ Accuracy in predicted rating
▪ Improve by 10% = $1million!
What we were interested in:
▪ High quality recommendations
predicted
actual

SVD RBMs
Top two results still used in production!

• > 44M members
• > 40 countries
• > 5B hours in Q3 2013
• Log 100B events/day
• 31.62% of peak US downstream traffic

▪ > 40M subscribers
▪ Ratings: ~5M/day
▪ Searches: >3M/day
▪ Plays: > 50M/day
▪ Streamed hours:
o 5B hours in Q3 2013
Geo Info
Time
Impressions
Device Info
Metadata
Social
Ratings
Demographics
Member Behavior
Plays

Aish House of Cards
Latent User Vector
Latent Item Vector

3.53
RU
M
u1 u2 u3
m1 !
m2!
m3
House of Cards
Aish Aish
House of Cards

Mean Rating My Bias
Movie Bias
Interaction

Mean Rating My Bias
Movie Bias
Interaction
3.55 = 2.50 + -1.5 + 1.2 + pq
My rating for
House of Cards

R
3.53
U
M
u1 u2 u3
m1 !
m2!
m3
House of Cards
Aish
2.35
1.34
Time
T
t1 t2 t3 Time

▪ Matrix/Tensor Factorization
▪ Regression models (Logistic, Linear, Elastic nets)
▪ Factorization Machines
▪ Restricted Boltzmann Machines
▪ Markov Chains & other graph models
▪ Clustering / Topic Models
▪ Neural Networks
▪ Association Rules
▪ GBDT/RF
▪ …

Popularity
+ Ratings
+ More Features & Optimized Models
0%
50%
100%
150%
200%
250%
300%
Improvement Over Baseline

Problem
Data
Experiment
Offline
Produce
Model
Test /
Metrics

Near-line
Online
UI Clients
Event
Distribution
Online
Algs
Model
Trainer
Pre-
compute
AB Test
Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation
Platform
S3 / HDFS
Offline
Metrics
Query Tools
Models
Models

▪ App Logs
▪ User Actions
▪ Ratings
▪ Plays
▪ Queue Adds
▪ Algo Actions
▪ Impressions (Presentation Bias)
▪ Context
▪ Device Info
▪ User Demographics
▪ Social
▪ Time
▪ …
Many different types of data…

Neural Network Training
1,536 cores
G2 Instances
$0.60 p/h

Más contenido relacionado

La actualidad más candente

Data platform architecture principles - ieee infrastructure 2020Julien Le Dem

Deep Learning for Recommender SystemsYves Raimond

Contextualization at NetflixLinas Baltrunas

Shallow and Deep Latent Models for Recommender SystemAnoop Deoras

Artwork Personalization at NetflixJustin Basilico

Deep Learning for Recommender SystemsJustin Basilico

Learning to PersonalizeJustin Basilico

Recommendations for Building Machine Learning SoftwareJustin Basilico

Making Netflix Machine Learning Algorithms ReliableJustin Basilico

Netflix Recommendations - Beyond the 5 StarsXavier Amatriain

Context Aware Recommendations at NetflixLinas Baltrunas

Introduction to Recommendation SystemsTrieu Nguyen

RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020Zachary Schendel

Sequential Decision Making in RecommendationsJaya Kawale

Machine learning for Netflix recommendations talk at SF Make SchoolFaisal Siddiqi

Recommending for the WorldYves Raimond

Interactive Recommender Systems with Netflix and SpotifyChris Johnson

Calibrated RecommendationsHarald Steck

Recent Trends in Personalization at NetflixJustin Basilico

Introduction to Artificial Intelligence on AWSAmazon Web Services

La actualidad más candente (20)

Data platform architecture principles - ieee infrastructure 2020

Deep Learning for Recommender Systems

Contextualization at Netflix

Shallow and Deep Latent Models for Recommender System

Artwork Personalization at Netflix

Deep Learning for Recommender Systems

Learning to Personalize

Recommendations for Building Machine Learning Software

Making Netflix Machine Learning Algorithms Reliable

Netflix Recommendations - Beyond the 5 Stars

Context Aware Recommendations at Netflix

Introduction to Recommendation Systems

RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020

Sequential Decision Making in Recommendations

Machine learning for Netflix recommendations talk at SF Make School

Recommending for the World

Interactive Recommender Systems with Netflix and Spotify

Calibrated Recommendations

Recent Trends in Personalization at Netflix

Introduction to Artificial Intelligence on AWS

Destacado

Machine Learning at NetflixDomino Data Lab

ARTIFICIAL INTELLIGENCE AT WORKEnrico Busto

REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud

Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain

Survey of Recommendation Systemsyoualab

Personalization - 10 Lessons Learned from NetflixPancrazio Auteri

10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit

Destacado (8)

Machine Learning at Netflix

ARTIFICIAL INTELLIGENCE AT WORK

REAL-TIME RECOMMENDATION SYSTEMS

Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale

Survey of Recommendation Systems

Personalization - 10 Lessons Learned from Netflix

10 Lessons Learned from Building Machine Learning Systems

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...

Similar a Machine Learning at Netflix Scale

Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi

Graph Database Use Cases - StampedeCon 2015StampedeCon

Graph database Use CasesMax De Marzi

Darin Briskman_Amazon_June_9_2017_PresentationTriNimbus

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014Amazon Web Services

Análisis de las novedades del Elastic StackElasticsearch

Elastic Stack roadmap deep diveElasticsearch

Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...DataStax

Netflix Recommender System : Big Data Case StudyKetan Patil

Data Science At ZillowNicholas McClure

Data Access PatternsAmazon Web Services

Análisis del roadmap del Elastic StackElasticsearch

Ordering the chaos: Creating websites with imperfect dataAndy Stretton

An Approach to Data Quality for Netflix Personalization SystemsDatabricks

Bootstrapping Recommendations with Neo4jMax De Marzi

Analytics, reporting and ROI, Presentation EnDigiCom LTTA 1 by Jasna Suhadolc...EnDigiCom

Scaling the Netflix API - From Atlassian Dev DenDaniel Jacobson

Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Spark Summit

Graphs in Action: In-depth look at Neo4j in ProductionNeo4j

Perfect Norikra 2nd SeasonSATOSHI TAGOMORI

Similar a Machine Learning at Netflix Scale (20)

Netflix Recommendations Feature Engineering with Time Travel

Graph Database Use Cases - StampedeCon 2015

Graph database Use Cases

Darin Briskman_Amazon_June_9_2017_Presentation

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014

Análisis de las novedades del Elastic Stack

Elastic Stack roadmap deep dive

Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...

Netflix Recommender System : Big Data Case Study

Data Science At Zillow

Data Access Patterns

Análisis del roadmap del Elastic Stack

Ordering the chaos: Creating websites with imperfect data

An Approach to Data Quality for Netflix Personalization Systems

Bootstrapping Recommendations with Neo4j

Analytics, reporting and ROI, Presentation EnDigiCom LTTA 1 by Jasna Suhadolc...

Scaling the Netflix API - From Atlassian Dev Den

Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...

Graphs in Action: In-depth look at Neo4j in Production

Perfect Norikra 2nd Season

Último

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

UNIT - IV - Air Compressors and its Performancesivaprakash250

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Introduction and different types of Ethernet.pptxupamatechverse

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

Machine Learning at Netflix Scale

2. Machine Learning At Netflix Scale Aish Fenton Manager - Research Engineering @aishfenton

3. Everything is a recommendation

4. 4

5. Top Picks for Aish

6. Movies based on books

7. Because you watched Bob’s Burgers

9. Rank based on your taste Rankbasedonyourtaste

10. 75% of plays come from homepage

11. Back Story…

12.

13. Proxy question: ▪ Accuracy in predicted rating ▪ Improve by 10% = $1million! What we were interested in: ▪ High quality recommendations predicted actual

14. SVD RBMs Top two results still used in production!

15. >

16. 2006 2013

17. • > 44M members • > 40 countries • > 5B hours in Q3 2013 • Log 100B events/day • 31.62% of peak US downstream traffic

18. Data and Models

19. ▪ > 40M subscribers ▪ Ratings: ~5M/day ▪ Searches: >3M/day ▪ Plays: > 50M/day ▪ Streamed hours: o 5B hours in Q3 2013 Geo Info Time Impressions Device Info Metadata Social Ratings Demographics Member Behavior Plays

20. Aish House of Cards Latent User Vector Latent Item Vector

21. 3.53 RU M u1 u2 u3 m1 ! m2! m3 House of Cards Aish Aish House of Cards

22.

23. Mean Rating My Bias Movie Bias Interaction

24. Mean Rating My Bias Movie Bias Interaction 3.55 = 2.50 + -1.5 + 1.2 + pq My rating for House of Cards

25. R 3.53 U M u1 u2 u3 m1 ! m2! m3 House of Cards Aish 2.35 1.34 Time T t1 t2 t3 Time

26. ▪ Matrix/Tensor Factorization ▪ Regression models (Logistic, Linear, Elastic nets) ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Markov Chains & other graph models ▪ Clustering / Topic Models ▪ Neural Networks ▪ Association Rules ▪ GBDT/RF ▪ …

27. Popularity + Ratings + More Features & Optimized Models 0% 50% 100% 150% 200% 250% 300% Improvement Over Baseline

28. Anatomy of a Machine Learning Platform

29. Problem Data Experiment Offline Produce Model Test / Metrics

30. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models

31. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models

32. ▪ App Logs ▪ User Actions ▪ Ratings ▪ Plays ▪ Queue Adds ▪ Algo Actions ▪ Impressions (Presentation Bias) ▪ Context ▪ Device Info ▪ User Demographics ▪ Social ▪ Time ▪ … Many different types of data…

33.

34.

35. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models Embedded Embedded

36. Weights Real-time popularity of movie

37. Example: Neural Network Training

38.

39. θ Input OutputHidden Layer

40. Input OutputHidden Layers

41. Neural Network Training 1,536 cores G2 Instances $0.60 p/h

42. But… things can go astray

43.

44.

45. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models

46. RU M Pre-compute u1 u2 u3Online

47. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models Aish played HoC Publish new model for Aish

48. Aish Fenton @aishfenton https://www.linkedin.com/profile/view?id=47917219

Notas del editor

- Who in the audience has an ML background ? Who is has big data background? Who’s an engineer? Going to cover: Bit of everything. A few models, our approach to architecture of ML systems, and how it all comes together Feel free to ask questions as we go along.
- We use Machine Learning in many places at Netflix, but perhaps the place we’re best known for ML is in our recommender systems, and our personalization - So wanted to start with quick overview of what is personalization in Netflix
If you’ve logged into Netflix before this should look familiar. This is what it looks like when you login to our website What you might not realize however is that almost every element on this page is driven by a ML algorithm
- There’s the obvious recommendations. We a row of explicit recommendations, where we pull together everything we know about you, and present our “top picks” for you
You’ll also see “Genre” rows, that provide shows around a particular theme. Movies are tagged in our system based on a number of different aspects The tags are editorially added by our team of content experts Which genre’s we pick however, is personalized. So “Movies based on books” is shown for me based on my predicted likelihood of wanting to watch this genre There’s also a level of personalization within the row itself. So a genre like “Movies based on books” spans a lot of different tastes. For example, movies about Wall Street and documentation on the GFC, and Young Adult Fiction all types of “Movies based on Books”, but they serve different tastes. But based on what we know about you, we can construct a set of “Movies based on books” tailed to your particular view of what that means.
We also do “Similar” rows. So as the title says, because I last watched Bob Burgers, here’s some choices that are similar to that.
Even our marketing images are personalized. Much of the hero images and marketing you see within Netflix is personalized to your taste. I see OITNB, but here because it fits with my tastes
Finally we put it all together. Unsurprisingly, most of what people play is from the top left hand corner, and if they are forced to scroll further down, or right, then that means we failed to predict what they want to watch So we also rank the entire page. I’ve already shown how we rank the different rows left-to-right. We also rank each row top-to-bottom, so that you the most relevant (for you) rows are pushed to the top of the page.
The net result of this personalization, is that 75% of what our users watch, is selected from the homepage. And the rows I’ve just shown you. Which means that we’ve been able to provide a very personalized experience for our users, where what they see on the homepage, when they login to Netflix, matches pretty well with what they want to watch.
- Okay, I’m going to take a minute now to provide some back story.
Who’s heard of the Netflix prize? It ran from 2006->2009. - It was won in 2009 by Team KorBell (AT&T).
The challenge was: We give you 100M anonymized ratings from users data, to build a “rating prediction” model with We then get you to predict 2.8M ratings for user’s who we already know what they rated, but we held back. If you can improve on our predicted ratings by 10%, then we give you 1 million dollars We measure this as the root mean square difference between, your predicted rating, and what the real rating is that we held back. - Team KorBell (AT&T) won it in 2009. - They improved the predictions by 8.43% http://mathurl.com/osuomvj
Two significant algorithms came out of the Netflix Prize. SVD - Prize RMSE: 0.8914 RBM - Prize RMSE: 0.8990 They were known in academia already, but hadn’t made their way out into industry recommender systems. I talk through how SVD works at a high level in later slides These two algorithms are still used in parts of the Netflix Recommender System to this day.
- There are limitations though. Ratings != Plays. People’s ratings are somewhat “aspirational”. People may rate CitzenKane 5 stars, but what they watch is Sharknado. For our use case, we’re interested in predicting what people actually want to watch, not predicting what they think are critically worthy movies.
Also Netflix has changed a lot since the start of the Netflix Prize. In 2006 we were mailing out DVDs. Now we’re more about steaming to devices. This also changed people viewing habits. The investment in selecting a great DVD, that the entire family can watch, was higher. Everyone had to agree on it, and getting it wrong might ruin your night. With streaming content want content that is more personalized, and more context sensitive to what they want to watch NOW.
Also Netflix has grown. A lot. What algorithms worked in 2006, don’t necessary work with the volume we now have
- Okay so dive a little into the models and data we use to do our personalization
On the data side we have have a lot to work with. There’s a lot of signal that we get beyond straight plays/ratings. If you think about it, the context in which someone chooses what to watch tells you a lot too.
So I want to give you a quick overview of how SVD (aka Matrix Factorization) works. This is one of the classic algorithms used in the NF prize, and was a big break through at the time. This should give you a flavor of how these systems work. Basic model is. http://mathurl.com/pgux65w
- To make that more visual
http://mathurl.com/pgux65w
http://mathurl.com/l4w5yd6
http://mathurl.com/l4w5yd6
So that’s one of the foundational algorithms used in recommender systems. But things have moved on a lot since then too. These days we’re mostly focused on ranking rather than rating prediction. This allows us to balance things like diversity, freshness, global popularity against our prediction on how much this fits your tastes We are (or have) AB tested many of these. And what algorithm to use really depends on your application, and what you’re trying to achieve. All have pros and cons. You’ll likely end up with a few different algorithms for different parts of the problem The important thing to test them in your production system
Over time we been able to improve on the results we got from the Netflix prize. It’s been a combination of adding more data, and adding in more sophisticated models As you can see here, we’ve moved things on a lot. These are improvements to Netflix’s core business metrics. So even a 1% improvement equates to real benefits to the business One quick note: Always make sure you select a realistic baseline to test against. Just straight global popularity is usually pretty tough to beat. So you can fool yourself if you’re not testing against that, or your equivalent of that.
- So you now you have an idea of what a recommender system algorithm looks like, lets see how you can productionize that
So here’s the core workflow you’ll need to support. Whatever decisions you make about your architecture, you’ll need to make the above process seamless. Machine Learning Approach Define problem (what you think needs solving, or hypothesis of what can be improved Gather data on which to train model Experiment offline to see if you can improve over baseline Produce Model/Algorithm and deploy Track key metrics in production to see if hypothesis is proven
- Here’s a blueprint for different layers you’ll need. - We’ll step through each area next.
Okay lets start with the front-end (aka online). I won’t cover much here, except for to point out that you’ll need an extremely good data pipeline. You’ll spend 90% of your time building this. Often needs to be built by an engineering team in collaboration with your researches.
There’s many different types of data you’ll want to capture Incl. What your algorithms are doing. You’ll need to correct for presentation bias And context and behavior that users interact with you in
- Need backend service that can accept and aggregate all these disparate data sources Want to look at technologies like Suro, Kafka, etc Stream to longer term (cheap) storage (S3, HDFS)
Need common framework that makes it easier to instrument your code for events. Adopt early and get into every app as “standard”
Okay lets talk about where you (typically) define and train your models Most of your models will be produced offline & embedded in production You’ll need a platform that allows easy, across diverse tools: R, iPython, in-house Common Format (can be code) that allows you to embed models once learned
Common confusion: Models change less than you think Values you’ll be plugging in, can still be real-time http://mathurl.com/kuxa5hw
Lets walk through an example of a model we train. Neural Networks
These days use GPUs (Cuda) to do training of network. Thousand of cores Massively parallel Computing power is what’s changed. ANN are really an old idea
But still need to explore hyper-parameter space. Parameters Learning rate theta …
Parameters How many layers, and how deep
AWS offers GPU compute instances. Approach. Conduct search over many different architectures / parameters - Distribute different architecture to each instance - Train model - Evaluate Can get smarter with how you explore this space. So rather than doing grid search, you search in areas most likely to have improvement 60cents an hour. Comparative fortune compared to other instances, but only takes a few hours to train model that is used in production for weeks (or months) Perfect for experimental work
Your offline models won’t reflect sudden changes in behavior, that it hasn’t seen before. Here’s OITNB, and House of Cards (as being searched for in Google). These can represent massive shifts in global user behavior, which can throw the model off Also some models degrade faster than others. You see this especially with tree models.
Another problem: The models themselves still run in production (even though they’re trained offline). This limits how sophisticated you can make your models. They still need to return results within your SLAs.
One Solution. Near-line computing. Re-train models based on events from the system Pre-compute results where you can
Now you don’t always have to pre-compute the final results. The beauty of the near-line approach is that it lets you half-bake the model. So that the parts that are more static are pre-generated, and the parts that are more sensitive to changes get worked on the fly. Remember our SVD model. U is users, M is movies, and R are ratings - Turns out that solving U if you know M and R, is simple Least Squares solution. With modern linear algebra libraries we can compute that in milli-seconds.
Recomputes are event driven. No need to re-compute if nothing has changed So in this example, we re-compute the latent vectors representing my tastes, whenever there’s more information available about me to re-train that vector with.

Machine Learning at Netflix Scale

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Machine Learning at Netflix Scale

Similar a Machine Learning at Netflix Scale (20)

Último

Último (20)

Machine Learning at Netflix Scale

Notas del editor