Predictive quality metrics @ tinyclues - Artem Kozhevnikov - Tinyclues

Predictive Quality Metrics
Artem Kozhevnikov
Lead Data Scientist
Tinyclues

SAAS
SOLUTION
USING
FIRST PARTY DATA
COMPATIBLE WITH ANY
MARKETING STACK
DESIGNED FOR
MARKETERS
TINYCLUES IN A FEW WORDS
DEEP AI FOR
CUSTOMER ACTIVATION
DRIVES REVENUE AND
ENGAGEMENT
ACROSS ALL
CHANNELS
UP IN 2 WEEKS
60 employees // 30 R&D // Hiring 20

YOU DON’T KNOW WHAT YOUR CUSTOMERS ARE INTERESTED IN TODAY
INTENT-DRIVEN MARKETING
YOUR STRATEGIC OPPORTUNITY
TO DRIVE REVENUE FROM ALL YOUR CUSTOMERS
IS UNDERSERVED BY YOUR MASS TARGETED CAMPAIGNS
1%
§ ONSITE RECOMMENDATIONS
§ REMARKETING MESSAGES
§ DISPLAY RETARGETING
Works well for recent visitors, but is rapidly
repetitive and inefficient
99%
?

Success Stories
RETAIL
+30%
CAMPAIGN REVENUE
RETAIL / FASHION
+151%
REVENUE PER EMAIL
HOSPITALITY
+178%
CAMPAIGN REVENUE
HOSPITALITY
+30%
CAMPAIGN REVENUE
E-COMMERCE
+305%
CAMPAIGN REVENUE
RETAIL / FASHION
+60%
REVENUE PER EMAIL
E-COMMERCE
+80%
CAMPAIGN REVENUE
10M+
10M+
5M+
10M+
TRAVEL
+115%
CAMPAIGN REVENUE
10M+

FROM TOPIC TO TARGET TO MESSAGE TO REVENUE

Comments :
• This is topic centric formulation (unlikely for onsite recommendation system)
• Need to score all users, in particular, those without recent activity
• In reality, our goals are more complex, we follow various campaign related ROI metrics :
• CTR, CR, Opt out, Attributed Revenue, ...
Predictive Problem
Given a Topic, for each user u ∈ Users we want to build a predictive score
such that users with higher score will have higher probability of conversion
(buying) after receiving a communication about Topic through a given
channel (like Email, Notifications, Facebook Custom Audience, ...).

Sample size vs Data Relevance tradeoff :
• A. contains much more information than sparse C.,
• but A. is not directly related to CRM targeting problem as C. does
3 models
1
10
100
1 000
10 000
100 000
1 000 000
All sales Topic sales Topic sales attributed to
email
Topic sales attributed to
single Topic email campaign
Monthly sales count
C
A
B

• A. has only Implicit Feedback (only positive) information
• To set a (binary) classification problem for A. we need to define a contrast, or negative
response.
• There is no canonical definition for negative response :
• u ∈ User at random ?
• user u ∈ User that bought some other products ?
• …
• Scores for A. problem are not calibrated
• For B. and C. we can use Explicit Feedback, so their scores are calibrated
• Collecting robust feedback takes several days (delayed response)
• You need to implement explore/exploit strategy to have more efficient learning for C.
3 models

Data Base
Impression Click
Channel Data
Relational DataBase
(Simplified Schema)
User
Campaign
Optout
Product
Purchase
Campaign
Attributes
Product
AttributesPageview
WebSearch
Add To
Cart
Attribution
Rules

Unsupervised learning
• Many sparse and long-tail categorical fields in relational and time-dependent data
• Heavy relying of socio-demographic fields (zipcode, firstname, age, …)
• Various clustering methods for heavy tail distributions, no feature hasher
• Bank of latent representations
• Bayesian frameworks allowing finer meta-parameters control
• Long Time series aggregation (several years of logs through different event tables)
Feature Engineering Highlights

Unsupervised Feature propagation
Multi-layer Unsupervised
Module
Multidimensional sparse
tensor (DataBase)
Asynchronous, daily updates
Raw sparse features
Scoring
A.
Scoring
B.
Scoring
C.
Latent Features
Bank
Cold Start
Features
Warm Start & Channel
Specific Features
Scoring Micro
Services

• Train/Test AUCs at different points of pipeline
• Robustness
• Aggregations (average, min, max) of AUCs over most frequently used Topics
• Pre/Post campaign evaluation (“in time” generalization robustness)
• Accuracy/Recall at extract point
• NLL, RMSE
Predictive Metrics : AUC

• Calibration ratio = sum(observed) / sum(predicted)
• Works only for calibrated scores
• Monitor calibration ratio over different Topics
• Independent of features engineering
• Predictive Debug : simple method to see how well a feature is taken into account by model
• In Pre/Post campaign shoot
Predictive Metrics : Calibration
Age >=
40
nb_message nb_clickers CTR sum(proba_is_clicker) mean(proba_is_clicker) Calibration
ratio
True 1304544 29350 2.25% 20645.54 1.58% 1.42
False 701544 21747 3.1% 30904.78 4.4% 0.7
All 2006088 51097 2.55% 51550.32 2.57% 0.99

• 20/80 rule : 20% of buyers make 80% of sales
• You may get a very high AUCs baselines by taking very simple intensity features (top buyers, …)
• Idea : compare the model of specific topic T against one of generic topic G (containing all products)
• Specificity – lift with respect to generic model :
Predictive Metrics : Specificity

Same AUCs, Different Specificity

• Multi-topics context :
• we want to communicate on several Topics at the same moment
• with some pressure constraints (<= 1 message weekly per user)
Predictive Metrics : Overlaps
Extract(Topic1)
Extract(Topic2)
Extract(Topic3)

1. Solution for an approximating problem may provide you with to very good
(unsupervised) features
2. Focus on how AI is going to change your system behavior and try to find
adequate offline metrics
Takeaways

• Long term CRM planning optimization
• Automatic Predictive Setup
• Large scale industrialization of the predictive modules and tools
Next challenges

Questions ?
WE ARE HIRING!
And many more….

Predictive quality metrics @ tinyclues - Artem Kozhevnikov - Tinyclues

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (10)

Similar a Predictive quality metrics @ tinyclues - Artem Kozhevnikov - Tinyclues

Similar a Predictive quality metrics @ tinyclues - Artem Kozhevnikov - Tinyclues (20)

Más de recsysfr

Más de recsysfr (15)

Último

Último (20)

Predictive quality metrics @ tinyclues - Artem Kozhevnikov - Tinyclues