SlideShare una empresa de Scribd logo
1 de 44
… and not go bust in the process
How to build
a personalization platform
for 30M users
2
Piotr Turek
Senior Big Data Architect
& Team Leader
Michał Żmuda
Big Data Architect
DreamLab – the IT hub of Ringier Axel Springer
Today's journey – next Reasons to Believe
Reasons to Believe Challenges of
the Real World
Our Approach Brave New World
4
Reasons to believe – Personalisation as a Service
20M
• REAL
USERS
10K
• RPS
30K
• EVENTS/
S
Deployment time
Diverse brands applicable
10+
1w
10d
From 5% to 90+% at Onet.pl
Reasons to believe – works like a charm!
Comparison of personalised and manual mobile versions, 10.12.2018 to 10.01. 2019
Users are
ATTRACTED
Become
ACTIVE
Stay
LOYAL
CTR
headline section +43%
PageViews
/UU
average
+12%
Active
Users
for 10+ days/month
+7%
Today's journey – next Challenges of the Real World
Reasons to Believe Challenges of
the Real World
Our Approach Brave New World
7
8
1-to-1 Personalisation – basic intuition
Nearest Neighbour Search
9
KEEP ME
INFORMED
OR I LEAVE
:<
NEW NEW
Mix with content-based?
Mix with popularity-based?
Cold User
Cold Start
Online model updates?
Mix with popularity-based?
~25% of users are cold!
Exploitation only!
No built-in exploration!
Adding random for fairness
Adding proper exploration?
𝟏 𝟎 𝟏 𝟎 𝟏 𝟎 𝟎 ⋯ 𝟏
𝟎 𝟎 𝟎 𝟎 𝟎 𝟏 𝟎 ⋯ 𝟏
𝟎 𝟏 𝟎 𝟎 𝟎 𝟎 𝟎 ⋯ 𝟎
𝟎 𝟎 𝟏 𝟎 𝟎 𝟎 𝟎 ⋯ 𝟎
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 𝟏
𝟏 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 ⋯ 𝟎
12
1-to-1 Personalisation – underlying data structure
U S E R S
I
T
E
M
S
30M
10K
Online updates?
10K RPS?<<100ms latency?
Lorem ipsum dolor sit amet 1313
THE COST OF PERSONALISATION
IS TOO DAMN HIGH
Today's journey – next Our Approach
14
Intro &
Reasons to Believe
Challenges of
the Real World
Our Approach Brave New World
Lorem ipsum dolor sit amet 1515
USER SEGMENTATIONS
REAL-TIME ALGORITHMS
16
REAL-TIME ALGORITHM(s)
Ensure Time-
Sensitivity
Address
Cold Start
Low
Complexity
CLEAN, FITTING
approach is needed
1-to-1 PERSONALISATION
alone isn’t a solution
INSTANT
EXPLORATION
IMMEDIATE
FEEDBACK
CONTINUOUS
OPTIMIZATION
Real-Time Algorithms – Multi-armed Bandit
EXPLOITATION EXPLORATION
Popular, proven items New items & trends
Predictable payout High reward chance
• Tunable stats
horizon
Harnessing
trends
• Tunable exploration
No filter
bubble
• Config A/B testing
• Hyperparam opt.*
Continuous
improvement
I'm gonna balance
those odds, boy!
Real-Time Data Flow – Overall Architecture
Feedback
via EVENTS
Collect
STREAMS
Compute
MEASURES
Update
OLAP
CUBE
Calculate
Item KPIs
New
RECCOMEN
- DATIONS
Feedback
via EVENTS
18
Sub-second
latency
Cost
effective
Trend
responsive
Lorem ipsum dolor sit amet 1919
USER SEGMENTATIONS
REAL-TIME ALGORITHMS
20
SEGMENTATION(s)
Caches
reduce costs
Counter
Filter Bubble
Address cold
start
They care about
ATTRACTIVENESS
Users don’t care if
recommendations are
UNIQUE
SPORT CELEBRITIES
SUPERCARS
RALLY
GENERAL SEGMENTATION
MOTO SEGMENTATION
COLD USERS
21
Lower
Costs Greater
Performance
Segmentations – Tunability of Costs
Number of Segments
ARTICLES
EVENTS
TOPIC
MODELING
REAL-TIME
RECOMMENDER
ONLINE
VIEW
CLUSTERING
Segmentations – Overall Architecture
Lorem ipsum dolor sit amet 2323
USER SEGMENTATIONS
REAL-TIME ALGORITHMS
25
Platforms >> Products
"A product is useless without a platform,
or more precisely, a platform-less
product will always be replaced by an
equivalent platform-ised product"
--- Jeff Bezos
Products
"Vendor knows
best"
Fixed
functionality
Closed system
Platforms
"Customer
knows best"
Customizable
Open to change
Open to
extension
26
27
ANY SEGMENTATION ANY ALGORITHM ANY KPI
One Simple Integration
28
Context
Mapping
Bounded
Contexts =>
Microservices
Build around
APIs
Pluginable
logic
Streaming-
first
OLAP
databases
DomainDrivenDesign
ModernDataArchitecture
Today's journey – next Brave New World
29
Intro &
Reasons to Believe
Challenges of
the Real World
Our Approach Brave New World
Great success. Are we finished then?
Personalisation will transform how your business operates
32
What
to write/
publish
about?
Which
topics need
attention?
Which
articles no
longer
perform?
33
Too many reports, too little time
N
(10+)
sections per
page
K
(10+)
versions per
section
N*K
(100+)
section versions
to manage
34
Too many reports, too little time
N
(10+)
versions per
section
K
(10+)
sections per
page
N*K
(100+)
section versions
to manage
Prescriptive!
• Suggest sensible actions
Predictive
• Predict outcomes
Diagnostic
• Explain why
Descriptive
• Say what happened 35
Analytics reimagined - the paradigm shift
36
What
to write/
publish
about?
Which
topics need
attention?
Which
articles no
longer
perform?
37
How can I
help you?
Hey Ring!
38
Hey! Users interested in
„Game of Thrones, winter, anticlimax"
are less active than they used to be.
Consider writing more about this
Estimated users affected: 3.2M
Hey Ring!
What can I do
better?
Editorial Insights
PlatformStreamingOLAP
PlatformStreamingOLAP
Any advanced,
unforeseen functionality
Platform and vision… Success!
Machine Learning
Project
Business Transformation
Platform-approach
The way to execute
Transformation
Complexity Enemy of Success
KEY TAKEAWAYS!
• linkedin.com/in/zmu-michal
• twitter.com/zmu_michal
• github.com/zmumi
• michal.zmuda@dreamlab.pl
• linkedin.com/in/pturek
• twitter.com/rekurencja
• github.com/turu
• piotr.turek@dreamlab.pl
THANK YOU!
1. https://images.pexels.com/photos/161140/pexels-photo-161140.jpeg?cs=srgb&dl=break-break-free-broken-161140.jpg&fm=jpg
2. http://gijoburgeast.sites.caxton.co.za/wp-content/uploads/sites/100/2015/05/holione-post.jpg
3. https://i.ytimg.com/vi/lXmHA-XySmk/maxresdefault.jpg
4. http://pngimg.com/uploads/thinking_woman/thinking_woman_PNG11618.png
5. https://i1.wp.com/petronelarotar.ro/wp-content/uploads/2017/03/alegerea-de-a-fi-ne-fericit.png?fit=2560%2C1440
6. http://www.stickpng.com/assets/images/584dfc7e6a5ae41a83ddee17.png
7. https://www.oasisofhopeusa.org/oasisofhope/wp-content/uploads/2018/07/break-chains-1014x487.jpg
8. http://www.quickmeme.com/img/ab/ab3a65c0eeaddb9da151d559bc45ce2f8f414366cd27f9aa289591833be68f72.jpg
9. https://i.ytimg.com/vi/4wcAJH0XAz8/maxresdefault.jpg
10. https://live.staticflickr.com/5211/5497134432_9c680ecc8f_b.jpg
11. https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Pillar_ionic.svg/1024px-Pillar_ionic.svg.png
12. https://svgsilh.com/svg/48780.svg
13. https://uncertainitiesgalore.files.wordpress.com/2015/03/expat-relationships-life-woman-in-a-bubble-with-a-man-looking-
in-from-outside.jpg
Photo credits
44

Más contenido relacionado

Último

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 

Destacado

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Destacado (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

How to build a personalisation platform for 30M users and not go bust in the process

Notas del editor

  1. [T] Hello everyone, thanks for coming and tuning in Today we are going to tell a story of how to build a personalisation platform for 30m users and not go bust in the process However it's more than that. It's also about building formidable data-driven platforms and about conducting game-changing projects with limited resources. So this story may apply to many of you.
  2. [T]  My name is Piotr Turek an this is Michał Żmuda. We hope that, what we are about to present today is not just a statistical fluke, but a result of our previous experience  as well as best practices and lessons learned that we applied in this project.
  3. [T]  Today we represent DreamLab, which is the IT hub of Ringier Axel Springer, which itself is the biggest digital publisher in CEE. Our publishing platform, Ring Publishing, powers a multitude of leading, diverse brands in 9 countries on 2 continents. Our products are used every day by over 30M real, active users. As you can imagine our challenge was to build a personalisation platform for all these brands and people.
  4. Take a look at our journey plan for today. First, we are going to share with you a couple of reasons to believe.
  5. Platform – personalization as a service  Success could be estimated by demand and scale it operates at. And ours is not too shabby! On the other hand, the quality could be estimated by the capabilities. The platform has been built from the start to suit a very diverse portfolio of brands. Thanks to that we confident offering our services to incoming clients. And when someone wants to integrate with us, we can deliver – just in a week we can start bringing in value to any new client. We’ve already deployed to Onet – the largest media site in CEE. We went from initial evaluation to almost the whole homepage surface area in less than two weeks.
  6. And why Onet has adopted us so eagerly? We are making a real deal of money – take a look at the numbers. Gains in those translate into gains in revenues. Revenues are important, but a partnership isn’t neglected. We’ve taken part transition of Onet – we’ve helped them to go from being ad-centric to being user-engagement oriented. //We’ve done so by providing suitable KPI for optimization – one of many that our platform has to offer. Such partnership in reaching goals builds trust and fosters adoption.
  7. [T] Now, we’ve shown that we’ve got results.  Let’s go on to what we’ve learned along the way. We are about to disscuss real-world problems, that could have doomed us (but they didn’t :p)
  8. [T] But first, lets briefly discuss the intuition behind the most popular and most widely studied approach to personalisation – 1-to-1 personalisation The basic intuition is simply that whenever we need to generate recommendations for a particular user, we perform nearest neighbour search to find users most similar to them based on the content they liked It's as if the yellow guy and the pink girl actually approached their friends and asked them to recommend a few articles. This idea is a basic building block for many advanced methods and is the obvious first choice when building any kind of personalisation. However it faces many obstacles in the world of digital publishing.
  9. [T] The goal of a digital publisher is not only to provide people with content they like, but even more so, to keep them informed about events around the world. We've got over 7K articles published every day. We have to deliver them to our users in minutes – every second counts. The traditional approach to personalisation struggles with this, because it relies on the small group of the most similar users having already had the opportunity to consume a given article. Chicken and egg problem really. One could try using content-based approaches to find articles similar to the newly added one and based on that try to guess who may like the new one. We can also try to mix with global, popularity based approaches. However both ideas are workarounds that significantly increase COMPLEXITY of our solution
  10. [T] Do you know what this is? The guy on the left is a cold user suffering from cold start problem ;) You may think that you know a lot about your users. On aggregate that may be true, but I bet that you know much less about particular users than you think (true even in subscription-based products such as Netflix) Since classical approaches to personalisation rely on finding similar users, they struggle to provide sensible recommendations to users we know little about. One could try fixing this by doing online model updates to incorporate preferences of "new" users faster. We can also, mix with global, popularity based approaches. But again, these sound like COMPLEX and not necessarily the most performant workarounds!
  11. [T] And there is more.... I bet most of you have heard about filter bubbles. It's this dangerous phenomenom which occurs often in today's Internet.  Somehow you find yourself trapped in a tiny, cozy bubble of information – only ever getting things that the system is confident you will like. Have you ever wondered how do filter bubbles happen?  Well, one possible answer is that the basic idea of 1-to-1 personalisation if by design prone to this problem, because all it does is exploit what it already knows about you. THERE IS NO BUILT IN EXPLORATION! You can look for various workarounds such as adding random items for fairness or even try to build proper exploration into the system, but AGAIN, one word - COMPLEXITY
  12. [T] Finally, perhaps the biggest elphant in the room. You know what this is? It's a big sparse matrix. Incidentally it's the basic underlying data structure behind many of 1-to-1 personalisation approaches. It's big because there are millions of users and tens of thousands of articles. It's sparse because each users consumes just a tiny tiny fraction of all content. I guess you can intuitively feel that in such a setting, delivering recommendations with very low latency and high throughput can be very challenging and COMPLEX. Even more so if we want to update the state of system online... and as mentioned, in digital publishing, we HAVE TO do that.
  13. [T] All this complexity and dimensionality can easily lead you to the conclusion that THE COST OF PERSONALISATION IS TOO DAMN HIGH BUT IS IT REALLY? ;)
  14. As you can see we are clearly in need of something else…
  15. Sometimes, some assumptions/constraints must go. This is such a case – forget about 1-1 personalization. Bob & Sally don’t really care if their recommendation is UNIQUE. They care if it’s interesting to them. Instead, DECOMPOSE the whole problem into two (that's what engineers often do; divide & conquer). For us it is: Segmenting users Building algorithms which can successfully optimize for each of the segments in real time
  16. [Z] We’ve seen problems with making Collaborative Filtering work in the real world. Often going around limitations means we need to integrate Collaborative filtering with something, that works in real-time by design. We’ve identified this means trouble with complexity & costs. Instead we need clean solution working in real-time by design and which addresses our challenges from the start.
  17. [Z] Here come bandit algorithms. They are AGENTs which automatically balance exploitation and exploration. The simplest of bandit algorithms can be implemented in 30 lines of code. It’s all about getting the right data, to the right place, at the right time. The model itself can be REALLY simple and yet generate fantastic results -> „THE UNREASONABLE EFFECTIVENESS OF DATA” The bandit algorithm (if applied correctly) may address all our challenges. The secret again lies in using the data the right way. And bandits are prone to tuning go make them more effective. Even simple A/B tests may boost your gains. Hyperparameter optimization (which we are currently investigating) may take performance to the sky.
  18. [Z] How to bring the data to that right place? Here is our overall architecture. We are using continuous feedback from browsers to build event streams that are used to calculate performance measures. But we don’t stop at those simple measures, as measures are used to compute business KPIs using our OLAP qubes. With business-defined KPIs updated with sub-second latecny we deliver new recommendations, that again result in another events with feedback. The data flows with minimal latency, and thanks to use of proper technologies, flows in a cost effective way.
  19. [Z] So we’ve got a right tool for the job – bandit algorithms wit adequate data flow. No let us discuss where do segmentation fits this picture.
  20. As we’ve mentioned… Our results presented at the beggining of the presentation proove the recommendations are attracive. Segmentation (if applied correctly) reduces costs with caching. May be used to address filter buble as coarse segments introduce greater variety in recommendations. May address cold start as we may know very little about particular cold users, but we know SOMETHING about them as a whole. They are users who visit not so often or who do not accept cookies ant it is something that makes them a segment. This is something that 1-on-1 personalization didn’t provide – you can optimize for cold users. //If one wants to optimize that further, is is possible to segment cold users further by use of geolocation, browser info, and so on.
  21. However the most powerful benefit coming with segments  is that now costs are tunable (as they scale with number of segments used). Thanks to that you can deliver to any client no matter the budget as not everyone has enough traffic to benefit from 1-on-1 personalization or even fine-grained segments
  22. [Z] You’ve seen why we need segmentation – let us see how to build architecture for providing segments? See our sample architecture (we have many). This architecture is used to provide a dynamic, unsupervised, interest based segmentation (which involves topic modeling and clustering users around found topics) That unsupervised method boosts trend responsives even more than bandit alone! It is unsupervised process – it allows segments to appear spontaneously. This has proven effective – for instance we’ve observed Game of Thrones segment lately, which helped us harness that trend.
  23. [Z] We’ve got both algorithm & segmentation – what we’ve ended up with?
  24. [Z] We’ve ended with an army! An army of multi-armed bandits, contextualized with user segments ;) And that army fights the fight of our clients – a war for user engagement bringing us sizable spoils from that war.
  25. [T] Ok, we’ve presented key elements of successful media personalisation… however is it a product or a platform? To quote Jeff:
  26. [T] What he really meant by this is: Products represent the "vendor knows best" mindset Platforms are built upon a completely different philosophy of "customer knows best" Seeing how many diverse brands we need to deploy to, we knew we had to build a platform! And so we did.
  27. [T] In case of personalisation platform means, among other things, that after completing one simple integration with the target brand, which automatically handles things like recommendation delivery, optimal caching or collecting user feedback, we can then use: Any segmentation (not just the interest-based one) Any algorithm (not just one particular type of multi-armed bandit): in fact we already suport multiple algorithms.  And FINALLY, pretty much any KPI / goal function can be optimised. We already suport multiple as well  The point here is really, that in our opinion you should always look for components that can be platformized and design your system with that in mind. Your future self will thank you  It certainly is true in our case as you will see today
  28. [T] Once you decide to platformize, there are two fundamental building blocks that will help you build a data-driven platform, not just a product: Domain Driven Design-insipired thinking Modern Data Architecture
  29. [Z] So, you've built such a platform, deployed it successfully for 30M people, you made business part of this revolution, you achieved great results.... are we finished then and can keep deploying it to more and more brands until we achieve world domination?
  30. [Z] Such change just shifts the problems elsewhere. Let us review example of such shift.
  31. [Z] To illustrate that with a concret example, meet Alice who works in one of the newsrooms. as an editor. Everyday she faces questions such as: (…) In the pre-personalisation era we’ve provided Basia and her colleagues with a multitude of analytic tools and reports to help them make those decisions
  32. [Z] However, consider what happens once we add personalisation to the mix We could expand existing tools to allow digging through all these additional dimensions of data, however
  33. [Z] Our brains were simply not designed to proces so much data, so quickly. Especially if you consider that people in the newsroom are non-technical. They are creative types. If anything we would kill their creativity this way.
  34. [T] Instead, we need a paradigm shift: Some years ago, Gartner described a hierarchy of analytics…
  35. [T] Obviously the goal is to go from this
  36. [T] To this
  37. [T] So when Alice asks what she can do better, the publishing platform may respond by saying that: (…)
  38. [T] You may find this much less futuristic than it seems at a first glance, if you've got 3 pillars in place that will give you a head start: OLAP database-oriented statistic source so you can query it in previously unforeseen ways, without costly development or increased operational costs Streaming-first processing architecture which allows you to reuse different streams in (again) originally unforseen ways But most of all, extensible, plugin architecture created thanks to the platform approach described today
  39. [T] If all three pillars are in place, you will find it surprisingly easy to add such advanced functionalities to your data-driven PLATFORM
  40. [Z] So, we’ve build a platform, we’ve deployed it & we are executing vision how to address new challenges. This time it is complete success!
  41. [Z] We would like to leave you with 3 key takeaways: Machine Learning projects shouldn’t start with throwing a bunch of data scientists, making a fuss about it and seeing what happens. They are business transformation. And should have clear path to production, what means they need engineering, platform approach and simplicity. The real complex machine learning (shooting for the stars) should come into play only once you have trust of stakeholders & proven baseline on production.
  42. [Z] Thanks for your attention, it was a pleasure. We would love to hear your thoughts - find us later.