Presented at Snowplow London Meetup, 8 February 2017
Dejan Petelin, head of data science at Gousto, gave a presentation about their data journey, explaining how data reflects the customer’s voice and the importance of joining up all data sources. The goal is to delight and retain customers – critical for a subscription business like Gousto’s. Gousto is using Snowplow as a unified log, to scale up its data capabilities, listen to its customer and provide them with a more personalized experience. Finally, Gousto is moving to the real-time pipeline to enable just-in-time personalization.
How Gousto is moving to just-in-time personalization with Snowplow
1. Gousto USE SNOWPLOW
Dejan Petelin
Head of Data Science
love
—Our journey of leveraging Snowplow Analy9cs …
2. • An online recipe box service.
• Customers come to our site, or use
our apps and select from 22 meals
each week.
• They pick the meals they want to
cook and say how many people
they’re cooking for.
• We deliver all the ingredients they
need in exact propor@ons with
step-by-step recipe cards in 2-3
days.
• No planning, no supermarkets and
no food waste – you just cook (and
eat).
• We’re a rapidly growing business.
About Gousto
3. • Transac@onal database and loads of
external data sources, e.g. Excel
spredsheets, 3rd party tools etc.
• Mul@ple ad-hoc analyses, mostly in
Excel, which are difficult to update.
• Gap between web analy@cs (GA)
and transac@onal data.
• Lack of customer event logs
• we started snapsho@ng
transac@onal database.
• Loads of ques@ons from our CEO
Timo :)
Our data journey…
MySQL
Transac@onal
Read Replica
Mailchimp
Excel
spreadsheets
Google
Analy@cs
Zendesk
CRM
Geo-demographic
data
CRONed data processing Ad-hoc analyses
MySQL
Data Warehouse Excel reports
Stakeholders
4. Growing data capabili9es
Data Science
Analy9cs
DataEngineering
• As a subscrip@on service we are
very retenDon focused – linking all
the data sources is challenging.
• We believe that data is the voice
of our customers, so we try to
collect as much data as possible.
• Therefore we invested a lot in
Snowplow as we own the data,
which is very valuable asset and
core of the business.
• The data is available to everyone –
SQL is a great competency at
Gousto.
5. Our data stack
Airflow (ETLs orchestra9on)
Trans DB
Data-warehouse (lake)
Daily email reportsAd-hoc analysesPredic9ve modelling
WMS
6. Snowplow as unified log
Customer
Service
AcDvity
Log Service
Order
Service
Product
Service
Recipe
Service
. . .
AWS
Lambda
Amazon
DynamoDB
Platform
Deployment
Bucket
SNS
Subscribe to all messages
Event API
Amazon
RedshiK
AWS
Lambda
Subscribe to customer
related messages
7. Snowplow on isomorphic JS
• Shiny and super quick, but… what
happened to my events?!
• No page loads – no automa@c page
views.
• We developed our custom
framework for triggering events.
• We use structured events for that
purpose, but store (unstructured)
JSON objects in them.
• Such approach allows us to be
flexible and quickly introduce new
events.
• But, no data valida@on can lead to
garbage leaking.
• Data modelling in Redshi[.
Client
Server
App API
8. Moving to the real-9me pipeline – use case
Snowplow
1
5
Store acDon taken
Churn
model
GiK
service
Process event
2Events stream
4
If likely
to churn
3
Store churn score
• Analyse customer behaviour in real-
@me.
• Automa@cally react as soon as
possible.
• Feed the response back to
Snowplow (serving as a unified log).
• So the whole customer journey is
available to CRM & reten@on teams
instantly.
10. From analy9cs to op9misa9on …
Raw
data
Standard
reports
Op9misa9on
Predic9ve
modelling
Generic
predic9ve
analy9csAd-hoc
reports
Source: Gartner
Sense & Respond Predict & Act
Complexity / Maturity
Compe@@veadvantage
11. From analy9cs to op9misa9on …
Raw
data
Standard
reports
Op9misa9on
Predic9ve
modelling
Generic
predic9ve
analy9csAd-hoc
reports
Source: Gartner
Sense & Respond Predict & Act
Complexity / Maturity
Compe@@veadvantage
12. From analy9cs to op9misa9on …
• Daily trading reports, e.g. signups by
channel, conversion rate, orders etc.
Raw
data
Standard
reports
Op9misa9on
Predic9ve
modelling
Generic
predic9ve
analy9csAd-hoc
reports
Source: Gartner
Sense & Respond Predict & Act
Complexity / Maturity
Compe@@veadvantage
13. From analy9cs to op9misa9on …
Raw
data
Standard
reports
Op9misa9on
Predic9ve
modelling
Generic
predic9ve
analy9csAd-hoc
reports
Source: Gartner
Sense & Respond Predict & Act
Complexity / Maturity
Compe@@veadvantage
• Daily trading reports, e.g. signups by
channel, conversion rate, orders etc.
• Analy@cs
• Customer behaviour
• Ac@onable insights
14. From analy9cs to op9misa9on …
Raw
data
Standard
reports
Op9misa9on
Predic9ve
modelling
Generic
predic9ve
analy9csAd-hoc
reports
Source: Gartner
Sense & Respond Predict & Act
Complexity / Maturity
Compe@@veadvantage
• Daily trading reports, e.g. signups by
channel, conversion rate, orders etc.
• Analy@cs
• Customer behaviour
• Ac@onable insights
• Customer segmenta@on
• Marke@ng abribu@on
15. From analy9cs to op9misa9on …
Raw
data
Standard
reports
Op9misa9on
Predic9ve
modelling
Generic
predic9ve
analy9csAd-hoc
reports
Source: Gartner
Sense & Respond Predict & Act
Complexity / Maturity
Compe@@veadvantage
• Daily trading reports, e.g. signups by
channel, conversion rate, orders etc.
• Analy@cs
• Customer behaviour
• Ac@onable insights
• Customer segmenta@on
• Marke@ng abribu@on
• Churn predic,on
16. From analy9cs to op9misa9on …
• Daily trading reports, e.g. signups by
channel, conversion rate, orders etc.
• Analy@cs
• Customer behaviour
• Ac@onable insights
• Customer segmenta@on
• Marke@ng abribu@on
• Channel mix op@misa@on
• Churn predic,on
• Automated menu design
• Warehouse op@misa@on
• Tracking performance
Raw
data
Standard
reports
Op9misa9on
Predic9ve
modelling
Generic
predic9ve
analy9csAd-hoc
reports
Source: Gartner
Sense & Respond Predict & Act
Complexity / Maturity
Compe@@veadvantage
17. Churn predic9on – intro
• As a subscrip@on service, we are very
reten9on focused.
• Some customers are immediately
convinced and become very loyal
customers, while some customers
need a bit more effort to get hooked.
• We use Snowplow events data to
model customer behavior and find
customers more likely to churn so we
can focus on them.
• Use personalised approach to retain
customers.
32. Churn predic9on – piTalls
• What churn actually is? How to
define it?
• It might be beber trying to predict
the likelihood of customer placing an
order.
• How big should be a horizon? Where
should we draw a line?
• Using events data, there is almost
unlimited number of features – how
to find really informa@ve ones?
• How do we keep model up to date if
we are affec@ng customer journeys?
• How to measure success?
• No maber how accurate the model, the
profit is what it counts at the end.
33. Churn predic9on – future
• Predic@ng when the next event will
happen, rather then probability of an
event in the next X weeks.
• Using recursive (deep) neural networks
(RNN) to model events recursively,
rather than engineering features.
34. Churn predic9on – results
• Accuracy of the model is ~80%.
• A bit too op@mis@c in the lower region
and a bit too pessimis@c in higher region.
• Significant upli[ in the reten@on.
• Indeed, it depends on the ac@on taken.
• Loads of A/B tes@ng to find the right
ac@ons to be taken.
• In the future, we want to build
another model, sugges@ng what
ac@on should be taken for each
customer.
• Actually, why not build an
autonomous system trying different
approaches and communica@on
channels to find the best approach?
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%100%
Actualpropor+on
Predicted likelihood
control varia9on A varia9on B
35. Automated menu design - intro
• The food team used to manually
design menus – every week.
• With 22 recipes this task has become
too demanding – diversity, mul@ple
constraints, costs etc.
• They should be focusing on recipe
development to keep delivering
delicious recipes.
• Why not use machine learning to
leverage the data to understand
customers’ taste and design popular
menus?
36. Automated menu design – how it works (I)
• We developed a very detailed
ontology to describe our recipes.
• We built an internal Slack bot to
collect data on recipe similarity.
• Insights gathered with that data
enabled us to provide diverse menus.
• Understanding customers’ taste is a
crucial part of designing popular
menus.
• Transac@onal data (orders) is not
enough – Snowplow data gives us
way more insights on how customers
explore menus.
37. Automated menu design – how it works (II)
• Mul@-objec@ve op@misa@on:
• Maximising recipe diversity
• Maximising menu popularity
• Balancing costs
• Matching forecasts
• Using Gene@c Algorithms (GA)
• Speed is not an issue as we have a whole
week to generate new menu :)
• Mul@ple solu@ons so the food team
can choose which menu best fit their
objec@ves.
Selec9on
Cross-over
Muta9on
Evalua9on
38. Concluding thoughts
• Snowplow has helped us to scale our data capabili@es with limited data
engineering resources.
• TIP TO STARTUPS: start building data capabili9es as early as possible – data is a huge asset.
• Snowplow also serves us as a unified log.
• Not necessarily limited to customer focused data.
• Snowplow enables us to ‘listen’ to our customers and provide them more
personalised experience.
• Moving to the real-Dme pipeline to realise just-in-@me personalisa@on,
e.g. personalised recipe ordering, add-on recommenda@ons (upselling) etc.