SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
People You May Know
Fast Recommendations Over Massive Data
Jeff Weiner
Chief Executive Officer
Sumit Rangwala
Artificial Intelligence
Felix GV
Data Infrastructure
My Professional Network
Professional network in real world
Sumit Felix
Amol
GaojiePeter
My Professional Network
Professional network in real world Professional network on LinkedIn
Sumit Felix
Peter
Amol
Gaojie
Sumit Felix
Peter
Amol
Gaojie
My Professional Network
Professional network in real world Professional network on LinkedIn
Sumit Felix
Peter
Amol
Gaojie
Sumit Felix
Peter
Amol
Gaojie
Predicting
real world
connections
Helps grow member’s professional network
Recommends people that one might know
People You May Know
Enables many other LinkedIn services
Talk Outline
People You May Know
PYMK: Generating Recommendations
PYMK Architecture Evolution
PYMK Rebirth
Insights and Road Ahead
PYMK: Generating Recommendations
PYMK: Prediction Strategy
Data Mining
• LinkedIn’s Economic Graph
• Member’s activities and profile
LinkedIn Economic Graph
Sumit Felix
Peter
Amol
Gaojie
PYMK: Prediction Strategy
Data Mining
• LinkedIn’s Economic Graph
• Member’s activities and profile
LinkedIn Economic Graph
Felix
Peter
Amol
Gaojie
Microsoft
USC
Sumit
Recommendation System
Candidate Generation
Feature Generation
Scoring
PYMK: Candidate Generation
Using commonalities in
economic graph
• Friends of my friends
(triangle closing)
LinkedIn Economic Graph
Amol
Peter Gaojie
Sumit Felix
PYMK: Candidate Generation
Using commonalities in
economic graph
• Friends of my friends
(triangle closing)
• Coworkers
• Personalized Page Rank
LinkedIn Economic Graph
Amol
Peter Gaojie
Felix
Microsoft
Sumit
PYMK: Feature Generation
Using economic graph
characteristics
• Number of common friends
Using member
activities/profile
• Common work location
LinkedIn Economic Graph
Amol
Peter Gaojie
Felix
Microsoft
Sumit
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
Sumit might know Amol’s friend Felix
Sumit and Felix has one common friend
Sumit and Felix both work in Bay Area
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
Sumit might know Amol’s friend Felix
Sumit and Felix has one common friend
Sumit and Felix both work in Bay Area
Graph processing
Data processing
PYMK Architecture Evolution
Pre-compute recommendations
A P P R O A C H
PYMK: The Beginning
Problem Space
• 10s of millions of members
Architecture
• Pre-compute using SQL
Shortcomings
• Staleness of 6 weeks to 6 months
• Extraneous computation
Oracle
PYMK: The Beginning
Problem Space
• 10s of millions of members
Architecture
• Pre-compute using SQL
Shortcomings
• Staleness of 6 weeks to 6 months
• Extraneous computation
Oracle PYMK
Service
Online service request
PYMK: Keeping up with Growth
Problem space
• Low 100s of millions of members
Architecture
• Pre-compute using Hadoop MR
• Push to a key-value store
Shortcomings
• Staleness of 2-3 days
• Extraneous computation
Voldemort
PYMK
Service
PYMK: Pushing the Technology Limits
Problem Space
• Mid 100s of millions of members
Architecture
• Pre-compute using Spark1
• Push to a key-value store
Shortcomings
• Staleness of 1-2 days
• Excessive computation cost
Venice
[1] Managing Exploding Big Data
PYMK
Service
PYMK: Exploring Data Freshness
Problem Space
• Use up to date member data
Architecture
• Hybrid offline-online approach
Shortcomings
• Split-brain design
• Didn’t scale
Venice
Realtime signals
PYMK
Service
Key Realization
Freshness
matters
Pre-computation
is costly
PYMK Rebirth
Compute recommendations on demand
A P P R O A C H
PYMK: Recommendation System
Candidate
Generation
Feature
Generation
Sumit might know Amol’s friend Felix
Sumit and Felix has one common friend
Sumit and Felix both work in Bay Area
Online Graph Traversal
Fast Data Access
An online graph processing system
G A I A
A generic service for executing complex graph algorithms
with low latency on massive graphs
Gaia: Overview
Gaia
Gaia: Overview
Gaia
Any kind of graph
A snapshot
on HDFS
Gaia: Overview
Gaia
Any kind of graph
Updates to graph
A snapshot
on HDFS
Via Kafka, etc.
Gaia: Overview
Gaia
Any kind of graph
Updates to graph
Graph algorithm code
A snapshot
on HDFS
Via Kafka, etc.
Using
compute
framework
e.g., triangle closing,
random graph walks
Design Choice
Gaia
• Single server architecture with replicas
• Full in-memory graph for fast execution
Gaia: Architecture
Server Server Server
Gaia
Gaia: Architecture
Server Server Server
Algo Algo Algo
Gaia
Gaia: Architecture
Graph snapshot on
disk
Server Server Server
Algo Algo Algo
Gaia
Gaia: Architecture
Graph snapshot on
disk Graph updates via
Kafka, etc.
Server Server Server
Algo Algo Algo
Gaia
PYMK
Gaia
• Candidate generation using triangle
closing and common connection count
• 10s of milliseconds (p90)
A key-value store with scoring capability
At a glance
Venice
• Tailored for serving ML jobs’ output
• High throughput ingestion
• Fast lookups
• Self-service onboarding
Supported Ingestion Modes in Venice
Batch
Hadoop Push Job
Supported Ingestion Modes in Venice
Batch Incremental
Hadoop Push Job
Samza Streaming Job
Supported Ingestion Modes in Venice
Batch Incremental
Hadoop Push Job Push Job
Samza Reprocessing Job
(Kappa Architecture)
Streaming Job
Supported Ingestion Modes in Venice
Batch Incremental
Hadoop Push Job Push Job
Samza Reprocessing Job
(Kappa Architecture)
Streaming Job
Hybrid Any Batch Job + Streaming Job
(Lambda Architecture)
Online Feature Retrieval
F i r s t P Y M K U s e C a s e
Requirements
Online Feature Retrieval
• Millions of lookups / sec at peak
• ~1000 keys / query
• Thousands of queries / sec
• ~80B / value
Before / After
Online Feature Retrieval
• Base latency
• 4 seconds (p99)
• Changed storage engine to RocksDB
• 60 ms (p99)
Embeddings
S e c o n d P Y M K U s e C a s e
Requirements
Embeddings
• Millions of lookups / sec at peak
• ~1000 keys / query
• Thousands of queries / sec
• ~800B / value
• 10x the previous size
Before / After
Embeddings
• Base latency
• 275 ms (p99)
• Server-side computation
• 60 ms (p99)
At a glance
Server-side Computation
• Simple vector operations
• Smaller response size
• Big input (vector)
• Small output (scalar)
• Declarative API
• No arbitrary code
More tuning
Fast Avro
• Online feature retrieval
• 60 to 40 ms (p99)
• Embeddings w/ computation
• 60 to 35 ms (p99)
• Now open-source!
• github.com/linkedin/avro-util
PYMK Today
P u t t i n g i t a l l t o g e t h e r
PYMK: Recommendation System
Candidate
Generation
Sumit might know Amol’s friend, Felix
Sumit and Felix have one common friend
Sumit and Felix both work in Bay Area
PYMK Service
Feature
Generation
Scoring Sumit and Felix likely know each other
Venice
Gaia
PYMK: Today
Venice
PYMK
Service
Gaia
1. Ingest in
Gaia & Venice
2. Candidate gen
& graph features
from Gaia
4. Final scoring
by PYMK Service
3. Member features
& partial scoring
from Venice
Staleness
• Seconds to minutes
Key Learnings
• Pre-computation is viable for many products
• Scaling RT computation requires moving compute close to data
• Infra aware Machine Learning
Looking Ahead
• Further scale Gaia & Venice
• More candidates
• More features
• Larger features
• More complex computations
ML-Aware Infra
• Continue democratizing access
• Easier onboarding to Venice & Gaia
• Multi-tenancy for Venice Compute
• Integration with other frameworksProductive ML
Contributors
Amol Ghoting Gaojie Liu Kevinjeet Gill Peter Chng Min Huang
Yao Chen Hema Raghavan Many othersAshish Singhai
Thank You
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data

Más contenido relacionado

La actualidad más candente

Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 

La actualidad más candente (20)

Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines Introduction
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Incorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender SystemIncorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender System
 
Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lecture
 

Similar a [QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 

Similar a [QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data (20)

Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Kontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světě
Kontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světěKontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světě
Kontent.ai DevMeetup #1 - Evoluce prvního veřejného API v hotelovém světě
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the Cloud
 
PayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance PracticePayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance Practice
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
 
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapBeyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with Purpose
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
Building real-time data analytics on Google Cloud
Building real-time data analytics on Google CloudBuilding real-time data analytics on Google Cloud
Building real-time data analytics on Google Cloud
 
Security with the Speed of Continuous Delivery
Security with the Speed of Continuous DeliverySecurity with the Speed of Continuous Delivery
Security with the Speed of Continuous Delivery
 
Startup Showcase - QuizUp
Startup Showcase - QuizUpStartup Showcase - QuizUp
Startup Showcase - QuizUp
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 

Último

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Último (20)

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data