SlideShare a Scribd company logo
1 of 43
Download to read offline
How Mobile.de brings Data
Science to Production for a
Personalized Web Experience
Dr. Markus Schüler & Dr. Florian Wilhelm
2018-07-08, PyData 2018, Berlin
2
Introduction
@FlorianWilhelm
FlorianWilhelm
florianwilhelm.info
Dr. Florian Wilhelm
Data Scientist
inovex GmbH
Dr. Markus Schüler
Data Scientist & Team Lead
mobile.de GmbH
3
Agenda
• General Introduction
• Personalization Use Cases at mobile.de
• Predicting Car Buying Intent
• Python for Big Data Processing
• Optimizing Performance
4
5
MOBILE.DE
GERMAN MARKET
LEADER
13.5 MIO
UNIQUE USER
PER MONTH
1.6 MIO
VEHICLES
290
EMPLOYEES
DREILINDEN /
FRIEDRICHSHAIN
BERLIN
HEADQUARTERS
Part of
ebay Tech
6
IT-project house for digital transformation:
‣ Agile Development & Management
‣ Web · UI/UX · Replatforming · Microservices
‣ Mobile · Apps · Smart Devices · Robotics
‣ Big Data & Business Intelligence Platforms
‣ Data Science · Data Products · Search · Deep Learning
‣ Data Center Automation · DevOps · Cloud · Hosting
‣ Trainings & Coachings
Using technology to inspire our
clients. And ourselves.
inovex offices in
Karlsruhe · Cologne · Munich ·
Pforzheim · Hamburg · Stuttgart.
www.inovex.de
7
Why Recommendations?Why Personalization?
Inspiration
Engagement
Memory of past interactions
You are unique!
8
Why Personalization?
Data-Driven
Personalization
Improves:
User
Experience
User
Engagement
Source: https://www.kleinerperkins.com/perspectives/internet-trends-report-2018
9
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
10
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Marketing
Last Action: Yesterday
Frequent User
User 12345
User Preferences based on User’s interactions
User Car Preference Example
User Preferences
Anonymous
11
Uncertainty Quantification
Number of
user events
Impact of prior
(avg. user)
User profile
à
Posterior User Profile
+
Posterior probability∝Likelihood×Prior probability
Bayesian Approach
30% Volkswagen25% gray 50% automatic8% SUV10,000 €
Prior based on all users
User Preferences
Posterior User Preferences
Impact of Prior
(avg. user)
Number of
user events
12
Recommendation
All Listings
Content-based Information
(User Preferences)
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Collaborative Information
P
P P
P
P
Mobile.de Recommendation Engine
Features of vehicle
13
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
14
Different User Intents
“I have no idea about
cars. I need basic
information and
guidance.”
“I’m a car expert.
Lead me to the
best deals in the
fastest way.”
“I love to browse
expensive cars,
yet I have
no buying intent.”
“As a dealer, I need
detailed data to
compare my own
listings with my
competitor’s”
15
Events of a Car Buying Journey
contacts
parkings
views
16
control buyers
events total 72,621,069 2,500,771
median events 153 188
median days active 22 15
Analysing events of car buyers
17
User Events: Event counts
0.0 0.2 0.4 0.6 0.8 1.0
0.000.050.100.150.200.25
Event count over user journey
contact
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.815e−22 ***
Control intercept diff p = 9.823e−02 .
Control slope diff p = 9.956e−04 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
Event count over user journey
parking
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 7.999e−06 ***
Control intercept diff p = 1.399e−21 ***
Control slope diff p = 6.702e−06 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
051015202530
Event count over user journey
search
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 6.694e−51 ***
Control intercept diff p = 1.141e−01
Control slope diff p = 9.044e−07 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0510152025
Event count over user journey
view
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.824e−08 ***
Control intercept diff p = 2.506e−45 ***
Control slope diff p = 2.824e−02 *
local mean
linear model
lowess
contactparking
viewsearch
18
User Events: Duplicated views
0.2
0.4
0.6
0.0 0.2 0.4 0.6 0.8 1.0
Position in user journey
• Buyers look
more often at
cars they have
seen already
than the control
group and their
ratio increases
faster (both
significant)
Amountofduplicatedviews
Buyer
Control
19
When did buyers interact with the car they bought?
§ Buyers view
“their” car the
most 4/5th
along their user
journey
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
When do buyers view the car they buy?
Position in user journey
%ofusers
0
5
10
15
Position in user journey
%ofusers
20
ML Model: How close to buy?
§ Aim: predict how likely
a user is to make his
buying decision today
§ Personalization
§ Highlight dealer contact
details
§ Provide car buying
assistance
21
Feature Generation
Features:
§ Event counts (view, search, contact, parking)
§ % event of all events (like %views among all event)
§ a=Number of active days, b=Max-diff active days, a/b
§ Additional features:
§ Views/(Search+View)
§ % of duplicated views among all views
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
ratio
22
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
23
Window size optimization
§ Used window size and number as optimization criterion
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
0 days1-9 days10-30 days
0 days1-7 days8-30 days
0-9 days10-19 days20-30 days
0 days1-4 days10-30 days 5-9 days
0 days1-7 days8-30 days
24
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
§ Cross-Validation (15 fold, 70/30 train/test split)
25
closeToBuy_now_0−1−10−30_cid
closeToBuy_now_0−1−7−30_cid
loseToBuy_now_0−10−20−30_cid
closeToBuy_now_0−3−10−30_cid
closeToBuy_now_0−5−10−30_cid
Modelling statistics: closeToBuy_now_cid
0.65
0.70
0.75
0.80 Accuracy Sensitivity Specificity
Results
Prediction: The user made his buying decision today
Best Model:
72% Accuracy / 68% Sensitivity / 76% Specificity
Model1
Model2
Model3
Model4
Model5
26
Buys tomorrow, next week, next two weeks
0%
10%
20%
30%
40%
50%
60%
70%
80%
Buy Today Buy Tomorrow Buy in a Week Buy in two Weeks
Accuracy Sensitivity Specificity
Considerable
lower predictive
power when
predicting more
distant future
events
Still room for
improvement
27
Python & Big Data
BIG
DATA
28
Hive for heavy lifting
• Apache project
• built on top of Hadoop
• SQL interface to your data
• basically map&reduce abstraction layer
• robust and matured
• but slow and surely not “interactive”
Data Team:
• used for batch-processing of user preferences,
user-segmentation etc.
• PyHive by Dropbox for Python support
• usage of Python-based UD(A)Fs
29
User Defined Functions (UDFs)
User defined (aggregation) functions:
§ needed when native functions aren‘t sufficient
§ are always much slower than native functions
§ work on a column or multiple (grouped) columns
§ are vector-valued operations and/or aggregations
transform aggregate apply
30
fast and general engine for
large-scale data processing
PySpark for fast analysis and machine learning
+ =
pyspark
31
Conversion Example of User Preferences
Hive:
• 2483 lines of code
• Jinja2 to generate SQL queries
• Temporary tables for performance
• Runtime 5-10h
• Logic hard to understand at times
Spark:
• 1745 lines of code
• programatic definition of queries
• No temporary tables needed
• Runtime 1-2 h
• Quite easy to understand
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
32
How Spark works
e.g. Jupyter lab
Source: Spark documentation
33
How do Python UD(A)Fs work?
Source: Spark documentation 7
34
Apache Arrow
Source: Arrow documentation
35
PySpark & Pandas
Vectorized UDFs for Spark 2.3:
§build on top of Apache Arrow,
§avoid high serialization and invocation overhead,
§allows row-at-a-time UFDs and cumulative UDAFs
§as flexible as Pandas` apply
Source: databricks blog
36
Performance gains
Source: https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
37
But what if Spark < 2.3?
It‘s possible to write flexible UD(A)Fs by
•using RDD functionality, df.rdd.mapPartitions(my_func)
•convert low-level Row objects to Pandas dataframe
•wrap everything into a nice decorator
Detailed information under:
https://www.inovex.de/blog/efficient-udafs-with-pyspark/
38
Isolated environments with PySpark
39
Concept
§ create a local environment based on wheels,
§ upload unpacked wheels with to HDFS,
§ read and distribute these Python packages from the Spark
driver to the executores with sc.addFile,
§ use the packages on the executors, e.g. in a UDF.
Detailed information under:
https://www.inovex.de/blog/managing-isolated-environments-with-pyspark/
40
Architecture
41
Summary
PyData Stack
Interesting & Challenging Use Cases
Data Science
Data Engineering
Business Impact
42
Any Questions?
How mobile.de brings Data Science to Production for a Personalized Web Experience

More Related Content

What's hot

Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 

What's hot (20)

Computer Vision Introduction
Computer Vision IntroductionComputer Vision Introduction
Computer Vision Introduction
 
Image retrieval
Image retrievalImage retrieval
Image retrieval
 
Deep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsDeep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendations
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
 
Marel Q3 2022 Investor Presentation
Marel Q3 2022 Investor PresentationMarel Q3 2022 Investor Presentation
Marel Q3 2022 Investor Presentation
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Real-time object detection coz YOLO!
Real-time object detection coz YOLO!Real-time object detection coz YOLO!
Real-time object detection coz YOLO!
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Netflix - Realtime Impression Store
Netflix - Realtime Impression Store Netflix - Realtime Impression Store
Netflix - Realtime Impression Store
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
YOLO v1
YOLO v1YOLO v1
YOLO v1
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Computer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networksComputer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networks
 
Making Confluence an Enterprise Standard for Knowledge Management - Atlassian...
Making Confluence an Enterprise Standard for Knowledge Management - Atlassian...Making Confluence an Enterprise Standard for Knowledge Management - Atlassian...
Making Confluence an Enterprise Standard for Knowledge Management - Atlassian...
 

Similar to How mobile.de brings Data Science to Production for a Personalized Web Experience

Which car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendationsWhich car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendations
inovex GmbH
 
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdfBYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
Umair Aijaz
 
Digiprog iii digiprog_3_html
Digiprog iii digiprog_3_htmlDigiprog iii digiprog_3_html
Digiprog iii digiprog_3_html
EchoCullen
 
Trends shaping the automotive remarketing industry melinda zabritski
Trends shaping the automotive remarketing industry   melinda zabritskiTrends shaping the automotive remarketing industry   melinda zabritski
Trends shaping the automotive remarketing industry melinda zabritski
IARAWeb
 

Similar to How mobile.de brings Data Science to Production for a Personalized Web Experience (20)

Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017
 
Which car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendationsWhich car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendations
 
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
 
Toyota Fortuner - Price, Images & Specification
Toyota Fortuner - Price, Images & SpecificationToyota Fortuner - Price, Images & Specification
Toyota Fortuner - Price, Images & Specification
 
HySolarKit - Solar Hybridization of Conventional Vehicles
HySolarKit - Solar Hybridization of Conventional Vehicles HySolarKit - Solar Hybridization of Conventional Vehicles
HySolarKit - Solar Hybridization of Conventional Vehicles
 
European Car Market Analysis
European Car Market AnalysisEuropean Car Market Analysis
European Car Market Analysis
 
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdfBYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
 
Cars on the Go - Project
Cars on the Go - ProjectCars on the Go - Project
Cars on the Go - Project
 
Digiprog iii digiprog_3_html
Digiprog iii digiprog_3_htmlDigiprog iii digiprog_3_html
Digiprog iii digiprog_3_html
 
Sf city8222016
Sf city8222016Sf city8222016
Sf city8222016
 
Automotive Industry Disruption
Automotive Industry Disruption Automotive Industry Disruption
Automotive Industry Disruption
 
Digiprog III
Digiprog IIIDigiprog III
Digiprog III
 
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
 
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
 
Trends shaping the automotive remarketing industry melinda zabritski
Trends shaping the automotive remarketing industry   melinda zabritskiTrends shaping the automotive remarketing industry   melinda zabritski
Trends shaping the automotive remarketing industry melinda zabritski
 
Sf south bay8202016
Sf south bay8202016Sf south bay8202016
Sf south bay8202016
 
Chicago8142016
Chicago8142016Chicago8142016
Chicago8142016
 
101118 Car Pass Mileage Fraud Presentation Brussels
101118 Car Pass Mileage Fraud Presentation Brussels101118 Car Pass Mileage Fraud Presentation Brussels
101118 Car Pass Mileage Fraud Presentation Brussels
 
Fiat Group Final Version
Fiat Group Final VersionFiat Group Final Version
Fiat Group Final Version
 
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
 

More from Florian Wilhelm

Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 

More from Florian Wilhelm (13)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
 
WALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics StackWALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics Stack
 
Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!
 
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
 
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
 
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Declarative Thinking and Programming
Declarative Thinking and ProgrammingDeclarative Thinking and Programming
Declarative Thinking and Programming
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

How mobile.de brings Data Science to Production for a Personalized Web Experience

  • 1. How Mobile.de brings Data Science to Production for a Personalized Web Experience Dr. Markus Schüler & Dr. Florian Wilhelm 2018-07-08, PyData 2018, Berlin
  • 2. 2 Introduction @FlorianWilhelm FlorianWilhelm florianwilhelm.info Dr. Florian Wilhelm Data Scientist inovex GmbH Dr. Markus Schüler Data Scientist & Team Lead mobile.de GmbH
  • 3. 3 Agenda • General Introduction • Personalization Use Cases at mobile.de • Predicting Car Buying Intent • Python for Big Data Processing • Optimizing Performance
  • 4. 4
  • 5. 5 MOBILE.DE GERMAN MARKET LEADER 13.5 MIO UNIQUE USER PER MONTH 1.6 MIO VEHICLES 290 EMPLOYEES DREILINDEN / FRIEDRICHSHAIN BERLIN HEADQUARTERS Part of ebay Tech
  • 6. 6 IT-project house for digital transformation: ‣ Agile Development & Management ‣ Web · UI/UX · Replatforming · Microservices ‣ Mobile · Apps · Smart Devices · Robotics ‣ Big Data & Business Intelligence Platforms ‣ Data Science · Data Products · Search · Deep Learning ‣ Data Center Automation · DevOps · Cloud · Hosting ‣ Trainings & Coachings Using technology to inspire our clients. And ourselves. inovex offices in Karlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart. www.inovex.de
  • 9. 9 Personalization at mobile.de User Event Tracking & Storage Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily preference profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily activity profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Recommendations Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Segmentation User Car Preferences User Interactions
  • 10. 10 Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Marketing Last Action: Yesterday Frequent User User 12345 User Preferences based on User’s interactions User Car Preference Example User Preferences Anonymous
  • 11. 11 Uncertainty Quantification Number of user events Impact of prior (avg. user) User profile à Posterior User Profile + Posterior probability∝Likelihood×Prior probability Bayesian Approach 30% Volkswagen25% gray 50% automatic8% SUV10,000 € Prior based on all users User Preferences Posterior User Preferences Impact of Prior (avg. user) Number of user events
  • 12. 12 Recommendation All Listings Content-based Information (User Preferences) Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Collaborative Information P P P P P Mobile.de Recommendation Engine Features of vehicle
  • 13. 13 Personalization at mobile.de User Event Tracking & Storage Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily preference profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily activity profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Recommendations Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Segmentation User Car Preferences User Interactions
  • 14. 14 Different User Intents “I have no idea about cars. I need basic information and guidance.” “I’m a car expert. Lead me to the best deals in the fastest way.” “I love to browse expensive cars, yet I have no buying intent.” “As a dealer, I need detailed data to compare my own listings with my competitor’s”
  • 15. 15 Events of a Car Buying Journey contacts parkings views
  • 16. 16 control buyers events total 72,621,069 2,500,771 median events 153 188 median days active 22 15 Analysing events of car buyers
  • 17. 17 User Events: Event counts 0.0 0.2 0.4 0.6 0.8 1.0 0.000.050.100.150.200.25 Event count over user journey contact Position in user journey Averagecount Buyer Control Buyer slope p = 1.815e−22 *** Control intercept diff p = 9.823e−02 . Control slope diff p = 9.956e−04 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.0 Event count over user journey parking Position in user journey Averagecount Buyer Control Buyer slope p = 7.999e−06 *** Control intercept diff p = 1.399e−21 *** Control slope diff p = 6.702e−06 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 051015202530 Event count over user journey search Position in user journey Averagecount Buyer Control Buyer slope p = 6.694e−51 *** Control intercept diff p = 1.141e−01 Control slope diff p = 9.044e−07 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 0510152025 Event count over user journey view Position in user journey Averagecount Buyer Control Buyer slope p = 1.824e−08 *** Control intercept diff p = 2.506e−45 *** Control slope diff p = 2.824e−02 * local mean linear model lowess contactparking viewsearch
  • 18. 18 User Events: Duplicated views 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.8 1.0 Position in user journey • Buyers look more often at cars they have seen already than the control group and their ratio increases faster (both significant) Amountofduplicatedviews Buyer Control
  • 19. 19 When did buyers interact with the car they bought? § Buyers view “their” car the most 4/5th along their user journey 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% When do buyers view the car they buy? Position in user journey %ofusers 0 5 10 15 Position in user journey %ofusers
  • 20. 20 ML Model: How close to buy? § Aim: predict how likely a user is to make his buying decision today § Personalization § Highlight dealer contact details § Provide car buying assistance
  • 21. 21 Feature Generation Features: § Event counts (view, search, contact, parking) § % event of all events (like %views among all event) § a=Number of active days, b=Max-diff active days, a/b § Additional features: § Views/(Search+View) § % of duplicated views among all views Buying date (=0) 30 days 0-2 days3-9 days10-30 days ratio
  • 22. 22 Modelling § Logistic Regression § Automatic Feature Selection § start from different sub-selections of features (like “all”, “no ratios”, etc.) § allow addition and subtraction of features based on maximizing AIC § needed to prevent overfitting § Window optimization
  • 23. 23 Window size optimization § Used window size and number as optimization criterion Buying date (=0) 30 days 0-2 days3-9 days10-30 days 0 days1-9 days10-30 days 0 days1-7 days8-30 days 0-9 days10-19 days20-30 days 0 days1-4 days10-30 days 5-9 days 0 days1-7 days8-30 days
  • 24. 24 Modelling § Logistic Regression § Automatic Feature Selection § start from different sub-selections of features (like “all”, “no ratios”, etc.) § allow addition and subtraction of features based on maximizing AIC § needed to prevent overfitting § Window optimization § Cross-Validation (15 fold, 70/30 train/test split)
  • 25. 25 closeToBuy_now_0−1−10−30_cid closeToBuy_now_0−1−7−30_cid loseToBuy_now_0−10−20−30_cid closeToBuy_now_0−3−10−30_cid closeToBuy_now_0−5−10−30_cid Modelling statistics: closeToBuy_now_cid 0.65 0.70 0.75 0.80 Accuracy Sensitivity Specificity Results Prediction: The user made his buying decision today Best Model: 72% Accuracy / 68% Sensitivity / 76% Specificity Model1 Model2 Model3 Model4 Model5
  • 26. 26 Buys tomorrow, next week, next two weeks 0% 10% 20% 30% 40% 50% 60% 70% 80% Buy Today Buy Tomorrow Buy in a Week Buy in two Weeks Accuracy Sensitivity Specificity Considerable lower predictive power when predicting more distant future events Still room for improvement
  • 27. 27 Python & Big Data BIG DATA
  • 28. 28 Hive for heavy lifting • Apache project • built on top of Hadoop • SQL interface to your data • basically map&reduce abstraction layer • robust and matured • but slow and surely not “interactive” Data Team: • used for batch-processing of user preferences, user-segmentation etc. • PyHive by Dropbox for Python support • usage of Python-based UD(A)Fs
  • 29. 29 User Defined Functions (UDFs) User defined (aggregation) functions: § needed when native functions aren‘t sufficient § are always much slower than native functions § work on a column or multiple (grouped) columns § are vector-valued operations and/or aggregations transform aggregate apply
  • 30. 30 fast and general engine for large-scale data processing PySpark for fast analysis and machine learning + = pyspark
  • 31. 31 Conversion Example of User Preferences Hive: • 2483 lines of code • Jinja2 to generate SQL queries • Temporary tables for performance • Runtime 5-10h • Logic hard to understand at times Spark: • 1745 lines of code • programatic definition of queries • No temporary tables needed • Runtime 1-2 h • Quite easy to understand Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 %
  • 32. 32 How Spark works e.g. Jupyter lab Source: Spark documentation
  • 33. 33 How do Python UD(A)Fs work? Source: Spark documentation 7
  • 35. 35 PySpark & Pandas Vectorized UDFs for Spark 2.3: §build on top of Apache Arrow, §avoid high serialization and invocation overhead, §allows row-at-a-time UFDs and cumulative UDAFs §as flexible as Pandas` apply Source: databricks blog
  • 37. 37 But what if Spark < 2.3? It‘s possible to write flexible UD(A)Fs by •using RDD functionality, df.rdd.mapPartitions(my_func) •convert low-level Row objects to Pandas dataframe •wrap everything into a nice decorator Detailed information under: https://www.inovex.de/blog/efficient-udafs-with-pyspark/
  • 39. 39 Concept § create a local environment based on wheels, § upload unpacked wheels with to HDFS, § read and distribute these Python packages from the Spark driver to the executores with sc.addFile, § use the packages on the executors, e.g. in a UDF. Detailed information under: https://www.inovex.de/blog/managing-isolated-environments-with-pyspark/
  • 41. 41 Summary PyData Stack Interesting & Challenging Use Cases Data Science Data Engineering Business Impact