SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
A Year Of Data Science at Metail
Matt McDonnell - Data Scientist
Business Context
Startup: “A group of people operating in an environment of uncertainty
striving for a repeatable and scalable business model“
A scalable startup needs a Customer Factory
Figure adapted from ‘Scaling Lean’ by Ash Maurya https://leanstack.com/scaling-lean-book/
A look behind the curtain – what’s the data?
See Metail in action:
http://metail.myshopify.com?utm_source=DataInsightsNov2016
(Scary UTM code is there so I don’t have to spend the next week
digging into ‘Who are these mysterious visitors?’)
Live Demo Starts Here!
Sheepish explanation of why it’s not working starts here
The road to Data Science
• Understand the data
• Learn the tools
• Build the analytics for business intelligence
• More sophisticated data analysis for deeper understanding
• Apply machine learning techniques
• Develop models for prediction and decision making
My experience prior to Metail
Careers
• Physics Postdoc
Oxford, Griffith
• Technical Consultant
MathWorks
• Quant Developer
Fidelity Worldwide Investment
• Quant Analyst
Fidelity Worldwide Investment
Tools used:
(plus some Java, C#, Excel and VBA when I had to)
Understanding the data and tools
My experience since joining Metail
Lots of event stream data
Many AWS components
Outputs:
- Business Intelligence
- Bespoke Analysis
- Productionised Science
Tools to learn
Tools we used a year ago
• R for analysis and science
• dplyr, tidyr, ggplot
• Looker for some of the analysis
Tools we use now
• Python
• pandas, SQLAlchemy, boto3,
seaborn
• Still some R
• dplyr, tidyr, ggplot
• Looker for most of day to day
analysis
• Swagger
• AWS stack
Data Analytics
Business intelligence
• How well is the customer factory working? (KPIs)
• What about if we do this? (A/B Tests)
• How’s our retention? (Cohort analysis)
• How efficiently are we digitising garments? (Process monitoring)
• How are we growing?
To answer this we need …
LOTS AND LOTS OF SQL! (yay.)
Most of it embedded in Looker LookML (basically YAML) (yay - again.)
Data Analytics
Raw Events Engagement States Analytics Model
(Looker demo goes here if time allows)
Data Science
Exploring Digitised Garments
Event  Data
{
"schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0",
"data": {
"schema": "",
"data": {
"name": "GarmentCoverage",
"data": {
"page": {
"garments": 24,
"garmentsWithCtas": 14,
"scrollPosY": 201,
"load": {
"isInitiator": false,
"elapsedTimeMs": 1424
}
},
"batch": {
"garments": 12,
"garmentsWithCtas": 7,
"ctas": [
{
"sku": "32536",
"x": 0.2721021611002,
"y": 1.6311844077961
},
{
"sku": "32544",
"x": 0.51768172888016,
"y": 1.6311844077961
},
{
"sku": "32545",
"x": 0.51768172888016,
"y": 1.0134932533733
},
{
"sku": "32548",
"x": 0.51768172888016,
"y": 0.39580209895052
},
{
"sku": "53282",
"x": 0.76326129666012,
"y": 0.39580209895052
},
{
"sku": "53337",
"x": 0.026522593320236,
"y": 1.0134932533733
},
{
"sku": "134499",
"x": 0.2721021611002,
"y": 0.39580209895052
}
]
}
}
}
}
}
GarmentCoverage event
"scrollPosY": 201,
"garmentsWithCtas": 7,
{
"sku": "32544",
"x": 0.51768172888016,
"y": 1.6311844077961
},
Spread of digitised garments
• Look at positions of all digitised garments for a given category.
• page is in units of #scrolls (based on browser height on the user’s device)
• Digitised garments on /women-dress and /women-tops-tees are more spread
out than garments on /women-jeans
Views by garment position
• Aggregate visitors who see garment ‘X’ in a given
category on a given date.
• Scale these visitor counts by the maximum #visitors for a
garment on that date in that category.
• In the /women-dress category:
• Digitised garments are spread between 0 and 120 page scrolls
with median ~40
• Long “tail” of digitised garments which get much fewer visits.
• The average digitised garment typically gets 20% of the visitors as
the most popular garment in that category (on a given day).
Date url_path sku Users Page scaled_count
2016-01-01 /women-
dress
101742 699 5.0 0.743617
2016-01-01 /women-
dress
101743 700 4.0 0.744681
Views by category
• Look at positions of all digitized garments for a given category.
• ‘page’ is in units of #scrolls (based on browser height on the user’s device)
• Digitised garments on /women-dress and /women-tops-tees are more spread out than digitised garments
on /women-jeans. Could also be that there are more digitised garments in /women-tops-tees.
• There are some “hotspots” of digitised garment positions e.g. ~page 100 for /women-tops-tees.
Unfortunately, they are quite far down the category page and visitor counts are typically around 10-20% of
the values for the most popular garments (closest to the top of the category page)
/women-tops-tees /women-jeans /women-dress
Views as time series
• Digitised garments on /women-dress over time
• The “hotspot” moves further down the page: most discernibly in the last 2 weeks.
Data Science
Exploring User Body Shapes
BMI Quantiles
BMI: 17.6
Height: 160cm
Weight: 45kg
BMI: 19.9
Height: 157cm
Weight: 49kg
BMI: 22.2
Height: 153cm
Weight: 52kg
BMI: 25.8
Height: 146cm
Weight: 55kg
BMI: 29.7
Height: 155cm
Weight: 71kg
Our Shape Segmentation
Spoon Triangle Bottom Hourglass Rectangle Hourglass Top Hourglass Inverted Triangle
Adapting the shape segmentation rules of the Lee et al. (2007) paper used by FFIT
Users Segmented by Shape
Hips – Waist (cm)
Bust–Waist(cm)
Shape Distribution and Popular Garments
Engagement by Shape
% of users trying on at least two garments on personalised MeModel
1SD
Data Science
Learning User Behaviour
Understanding Users
Event stream summary over a month
Visits by day of month
All users
Distinct types
Of users
Machine Learning Techniques
Data Driven User Segmentation
Distinct types
Of users
Use Machine Learning techniques to characterise which features define users in each cluster
Identify clusters: engaged and converted users
Cluster Labels into Redshift /
Looker
Acquisition
Rate
RPV
Seen Size
Advice Rate
Acquisition
Retention Reuse
Retention Revisit
Deep Funnel
Revenue
Revenue
674 users 595 users 541 users 721 users 312 users
Try-ons (any model)
A first look at the clusters
Future plans: more MODELLING!
Some possibilities:
• Use engagement clustering to create labels for supervised learning
• Engagement prediction using trained machine learning
• Apply Probabilistic Graphical Modelling techniques
• (I quite like Daphne Koller’s Coursera course and book
https://www.coursera.org/learn/probabilistic-graphical-models/home/welcome )
• More Bayesian reasoning
• … (any suggestions?)
Time permitting, SAMIAM (http://reasoning.cs.ucla.edu/samiam/) demo goes here
Bayesian inference – what are the variables?
(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
Bayesian inference – how are things related?
(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
Bayesian inference – what can we infer?
(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
That’s all folks!
Questions?

Más contenido relacionado

Similar a A Year of Data Science at Metail

Tableau Conference 2014 Presentation
Tableau Conference 2014 PresentationTableau Conference 2014 Presentation
Tableau Conference 2014 Presentationkrystalstjulien
 
Great Data Delivery: A model-based approach
Great Data Delivery: A model-based approachGreat Data Delivery: A model-based approach
Great Data Delivery: A model-based approachZach Taylor
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...
When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...
When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...Goergen Institute for Data Science
 
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.BI
 
Ai based analytics in the cloud
Ai based analytics in the cloudAi based analytics in the cloud
Ai based analytics in the cloudSvetlin Stanchev
 
Creating a Single View: Overview and Analysis
Creating a Single View: Overview and AnalysisCreating a Single View: Overview and Analysis
Creating a Single View: Overview and AnalysisMongoDB
 
Introduction to Machine Learning - An overview and first step for candidate d...
Introduction to Machine Learning - An overview and first step for candidate d...Introduction to Machine Learning - An overview and first step for candidate d...
Introduction to Machine Learning - An overview and first step for candidate d...Lucas Jellema
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysBusiness Over Broadway
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckSasha Lazarevic
 
Growth marketing
Growth marketingGrowth marketing
Growth marketingOnur Polat
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientistPoo Kuan Hoong
 
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi MojabData Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojabzmojab
 

Similar a A Year of Data Science at Metail (20)

Tableau Conference 2014 Presentation
Tableau Conference 2014 PresentationTableau Conference 2014 Presentation
Tableau Conference 2014 Presentation
 
Great Data Delivery: A model-based approach
Great Data Delivery: A model-based approachGreat Data Delivery: A model-based approach
Great Data Delivery: A model-based approach
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...
When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...
When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing F...
 
Analytics in Online Retail
Analytics in Online RetailAnalytics in Online Retail
Analytics in Online Retail
 
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
 
Ai based analytics in the cloud
Ai based analytics in the cloudAi based analytics in the cloud
Ai based analytics in the cloud
 
Creating a Single View: Overview and Analysis
Creating a Single View: Overview and AnalysisCreating a Single View: Overview and Analysis
Creating a Single View: Overview and Analysis
 
Introduction to Machine Learning - An overview and first step for candidate d...
Introduction to Machine Learning - An overview and first step for candidate d...Introduction to Machine Learning - An overview and first step for candidate d...
Introduction to Machine Learning - An overview and first step for candidate d...
 
Managing AI Products
Managing AI ProductsManaging AI Products
Managing AI Products
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and Surveys
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
Growth marketing
Growth marketingGrowth marketing
Growth marketing
 
Data mining-basic
Data mining-basicData mining-basic
Data mining-basic
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
What’s Up, EDoc?!
What’s Up,EDoc?!What’s Up,EDoc?!
What’s Up, EDoc?!
 
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi MojabData Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
 

Último

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 

Último (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 

A Year of Data Science at Metail

  • 1. A Year Of Data Science at Metail Matt McDonnell - Data Scientist
  • 2. Business Context Startup: “A group of people operating in an environment of uncertainty striving for a repeatable and scalable business model“
  • 3. A scalable startup needs a Customer Factory Figure adapted from ‘Scaling Lean’ by Ash Maurya https://leanstack.com/scaling-lean-book/
  • 4. A look behind the curtain – what’s the data? See Metail in action: http://metail.myshopify.com?utm_source=DataInsightsNov2016 (Scary UTM code is there so I don’t have to spend the next week digging into ‘Who are these mysterious visitors?’) Live Demo Starts Here! Sheepish explanation of why it’s not working starts here
  • 5. The road to Data Science • Understand the data • Learn the tools • Build the analytics for business intelligence • More sophisticated data analysis for deeper understanding • Apply machine learning techniques • Develop models for prediction and decision making
  • 6. My experience prior to Metail Careers • Physics Postdoc Oxford, Griffith • Technical Consultant MathWorks • Quant Developer Fidelity Worldwide Investment • Quant Analyst Fidelity Worldwide Investment Tools used: (plus some Java, C#, Excel and VBA when I had to) Understanding the data and tools
  • 7. My experience since joining Metail Lots of event stream data Many AWS components Outputs: - Business Intelligence - Bespoke Analysis - Productionised Science
  • 8. Tools to learn Tools we used a year ago • R for analysis and science • dplyr, tidyr, ggplot • Looker for some of the analysis Tools we use now • Python • pandas, SQLAlchemy, boto3, seaborn • Still some R • dplyr, tidyr, ggplot • Looker for most of day to day analysis • Swagger • AWS stack
  • 9. Data Analytics Business intelligence • How well is the customer factory working? (KPIs) • What about if we do this? (A/B Tests) • How’s our retention? (Cohort analysis) • How efficiently are we digitising garments? (Process monitoring) • How are we growing? To answer this we need … LOTS AND LOTS OF SQL! (yay.) Most of it embedded in Looker LookML (basically YAML) (yay - again.)
  • 10. Data Analytics Raw Events Engagement States Analytics Model (Looker demo goes here if time allows)
  • 12. Event  Data { "schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0", "data": { "schema": "", "data": { "name": "GarmentCoverage", "data": { "page": { "garments": 24, "garmentsWithCtas": 14, "scrollPosY": 201, "load": { "isInitiator": false, "elapsedTimeMs": 1424 } }, "batch": { "garments": 12, "garmentsWithCtas": 7, "ctas": [ { "sku": "32536", "x": 0.2721021611002, "y": 1.6311844077961 }, { "sku": "32544", "x": 0.51768172888016, "y": 1.6311844077961 }, { "sku": "32545", "x": 0.51768172888016, "y": 1.0134932533733 }, { "sku": "32548", "x": 0.51768172888016, "y": 0.39580209895052 }, { "sku": "53282", "x": 0.76326129666012, "y": 0.39580209895052 }, { "sku": "53337", "x": 0.026522593320236, "y": 1.0134932533733 }, { "sku": "134499", "x": 0.2721021611002, "y": 0.39580209895052 } ] } } } } } GarmentCoverage event "scrollPosY": 201, "garmentsWithCtas": 7, { "sku": "32544", "x": 0.51768172888016, "y": 1.6311844077961 },
  • 13. Spread of digitised garments • Look at positions of all digitised garments for a given category. • page is in units of #scrolls (based on browser height on the user’s device) • Digitised garments on /women-dress and /women-tops-tees are more spread out than garments on /women-jeans
  • 14. Views by garment position • Aggregate visitors who see garment ‘X’ in a given category on a given date. • Scale these visitor counts by the maximum #visitors for a garment on that date in that category. • In the /women-dress category: • Digitised garments are spread between 0 and 120 page scrolls with median ~40 • Long “tail” of digitised garments which get much fewer visits. • The average digitised garment typically gets 20% of the visitors as the most popular garment in that category (on a given day). Date url_path sku Users Page scaled_count 2016-01-01 /women- dress 101742 699 5.0 0.743617 2016-01-01 /women- dress 101743 700 4.0 0.744681
  • 15. Views by category • Look at positions of all digitized garments for a given category. • ‘page’ is in units of #scrolls (based on browser height on the user’s device) • Digitised garments on /women-dress and /women-tops-tees are more spread out than digitised garments on /women-jeans. Could also be that there are more digitised garments in /women-tops-tees. • There are some “hotspots” of digitised garment positions e.g. ~page 100 for /women-tops-tees. Unfortunately, they are quite far down the category page and visitor counts are typically around 10-20% of the values for the most popular garments (closest to the top of the category page) /women-tops-tees /women-jeans /women-dress
  • 16. Views as time series • Digitised garments on /women-dress over time • The “hotspot” moves further down the page: most discernibly in the last 2 weeks.
  • 18. BMI Quantiles BMI: 17.6 Height: 160cm Weight: 45kg BMI: 19.9 Height: 157cm Weight: 49kg BMI: 22.2 Height: 153cm Weight: 52kg BMI: 25.8 Height: 146cm Weight: 55kg BMI: 29.7 Height: 155cm Weight: 71kg
  • 19. Our Shape Segmentation Spoon Triangle Bottom Hourglass Rectangle Hourglass Top Hourglass Inverted Triangle
  • 20. Adapting the shape segmentation rules of the Lee et al. (2007) paper used by FFIT Users Segmented by Shape Hips – Waist (cm) Bust–Waist(cm)
  • 21. Shape Distribution and Popular Garments
  • 22. Engagement by Shape % of users trying on at least two garments on personalised MeModel 1SD
  • 24. Understanding Users Event stream summary over a month Visits by day of month All users Distinct types Of users Machine Learning Techniques
  • 25. Data Driven User Segmentation Distinct types Of users Use Machine Learning techniques to characterise which features define users in each cluster
  • 26. Identify clusters: engaged and converted users Cluster Labels into Redshift / Looker Acquisition Rate RPV Seen Size Advice Rate
  • 27. Acquisition Retention Reuse Retention Revisit Deep Funnel Revenue Revenue 674 users 595 users 541 users 721 users 312 users Try-ons (any model) A first look at the clusters
  • 28. Future plans: more MODELLING! Some possibilities: • Use engagement clustering to create labels for supervised learning • Engagement prediction using trained machine learning • Apply Probabilistic Graphical Modelling techniques • (I quite like Daphne Koller’s Coursera course and book https://www.coursera.org/learn/probabilistic-graphical-models/home/welcome ) • More Bayesian reasoning • … (any suggestions?) Time permitting, SAMIAM (http://reasoning.cs.ucla.edu/samiam/) demo goes here
  • 29. Bayesian inference – what are the variables? (Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
  • 30. Bayesian inference – how are things related? (Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)
  • 31. Bayesian inference – what can we infer? (Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)