SlideShare una empresa de Scribd logo
1 de 30
Data Science in
E-commerce industry
DSSP 2016/05/20
Vincent Michel
Big Data Europe, BDD, Rakuten Inc. / PriceMinister
vincent.michel@rakuten.com
@HowIMetYourData
2
Short Bio
ESPCI: engineer in Physics / Biology
ENS Cachan: MVA Master Mathematics Vision and Learning
INRIA Parietal team: PhD in Computer Science
Understanding the visual cortex by using classification techniques
Logilab – Development and data science consulting
Data.bnf.fr (French National Library open-data platform)
Brainomics (platform for heterogeneous medical data)
Education
Experience
Rakuten PriceMinister– Senior Developer and data scientist
Data engineer and data science consulting
Software engineering
Lessons learned from (painful) experiences
4
Do not redo it yourself !
Lots of really interesting open-source libraries for all your needs:
Test first on a small POC, then contribute/develop
Scikit-learn, pandas, Caffe, Scikit-image, opencv, ….
Be careful: it is really easy to do something wrong !
Open-data:
More and more open-data for catalogs, …
E.g. data.bnf.fr
~ 2.000.000 authors
~ 200.000 works
~ 200.000 topics
Contribute to open-source:
Is there a need / pool of potential developers ?
Do it well (documentation / test)
Unless you are doing some kind of super magical algorithm
May bring you help, bug fixes, and engineers ! But it takes time and energy
5
Quality in data science software engineering
Never underestimates integration cost
Really easy to write a 20 lines Python code doing some
fancy Random Forests…
…that could be really hard to deploy (data pipeline, packaging, monitoring)
Developer != DevOps != Sys admin
Make it clean from the start (> 2 days of dev or > 100 lines of code):
Tests, tests, tests, tests, tests, tests, tests, …
Documentation
Packaging / supervision / monitoring
Release often release earlier
Agile development, Pull request, code versioning
Choose the right tool:
Do you really need this super fancy NoSQL database
to store your transactions?
6
Monitoring and metrics
Always monitor:
Your development: continuous integration (Jenkins)
Your service: nagios/shinken
Your business data (BI): Kibana
Your user: tracker
Your data science process : e.g. A/B test
Evaluation:
Choose the right metric
Prediction accuracy / Precision-recall …
Always A/B test rather than relying on personal thoughts
Good question leads to good answer: Define your problem
Hiring remarks
Finding the good data scientist
8
Finding your data scientist
Do not try to find a unicorn!
Define your needs
(and unicorns no longer exist…)
9
Few remarks on hiring – my personal opinion
Be careful of CVs with buzzwords!
E.g. “IT skills: SVM (linear, non-linear), Clustering (K-means, Hierarchical),
Random Forests, Regularization (L1, L2, Elastic net…) …”
It is like as someone saying “ IT skills: Python (for loop, if/else pattern, …)
Often found in Junior CVs (ok), but huge warning in Senior CVs
Hungry for data?
Loving data is the most important thing to check
Opendata? Personal project? Curious about data? (Hackaton?)
Pluridisciplinary == knowing how to handle various datasets
Check for IT skills:
Should be able to install/develop new libraries/algorithms
A huge part of the job could be to format / cleanup the data
Experience VS education -> Autonomy
Recommendations @Rakuten
Data science use-case
11
Rakuten Group Worldwide
Recommendation
challenges
Different languages
Users behavior
Business areas
12
Rakuten Group in Numbers
Rakuten in Japan
> 12.000 employees
> 48 billions euros of GMS
> 100.000.000 users
> 250.000.000 items
> 40.000 merchants
Rakuten Group
Kobo 18.000.000 users
Viki 28.000.000 users
Viber 345.000.000 users
13
Rakuten Ecosystem
Rakuten global ecosystem :
Member-based business model that connects Rakuten services
Rakuten ID common to various Rakuten services
Online shopping and services;
Main business areas
E-commerce
Internet finance
Digital content
Recommendation challenges
Cross-services
Aggregated data
Complex users features
14
Rakuten’s e-commerce: B2B2C Business Model
Business to Business to Consumer:
Merchants located in different regions / online virtual shopping mall
Main profit sources
• Fixed fees from merchants
• Fees based on each transaction and other service
Recommendation
challenges
Many shops
Items references
Global catalog
15
Big Data Department @ Rakuten
Big Data Department
150+ engineers – Japan / Europe / US
Missions
Development and operations of internal
systems for:
Recommendations
Search
Targeting
User behavior tracking
Average traffic
> 100.000.000 events / day
> 40.000.000 items view / day
> 50.000.000 search / day
> 750.000 purchases / day
Technology stack
Java / Python / Ruby
Solr / Lucene
Cassandra / Couchbase
Hadoop / Hive / Pig
Redis / Kafka
16
Recommendations on Rakuten Marketplaces
Non-personalized recommendations
All-shop recommendations:
Item to item
User to item
In-shop recommendations
Review-based recommendations
Personalized recommendations
Purchase history recommendations
Cart add recommendations
Order confirmation recommendations
System status and scale
In production in over 35 services of Rakuten Group worldwide
Several hundreds of servers running:
Hadoop
Cassandra
APIS
17
Challenges in Recommendations
Items
Catalogue
Items
Similarity
Recommendations
engine
Evaluation
Process
Items catalogues
Catalogue for multiple shops with different items
references ?
Items similarity / distances
Cross services aggregation ?
Lots of parameters ?
Recommendations engine
Best / optimal recommendations logic ?
Evaluation process
Offline / online evaluation ?
Long-tail ? KPI ?
18
Recommendations Architecture: Constantly Evolving
Browsing
Events
Cocounts Storage
Purchase
Events
Catalogue(s)
Distributionlayer
Recommendations
Offline / materialized
Recommendations
Online algebra / multi-arm
19
Items Catalogues
Use different levels of aggregation to improve recommendations
Category-level
(e.g. food, soda, clothes, …)
Product-level
(manufactured items)
Item in shop-level
(specific product sell by a
specific shop)
Increased statistical
power in co-events
computation
Easier business handling
(picking the good item)
20
Enriching Catalogues using Record Linkage
Marketplace 2Marketplace 1 Reference database
Record linkage
 Use external sources (e.g., Wikidata) to
align markets' products
 Fuzzy matching of 600K vs 350K items
for movies alignments usecase.
 Blocking algorithm
Cross recommendation
 Global catalog
 Items aggregation
 Helps with cold start issues
 Improved navigation
21
Co-occurrences and Similarities Computation
Only access to unitary data (purchase / browsing)
Use co-occurrences for computing items similarity
Multiple possible parameters:
 Size of time window to be considered:
Does browsing and purchase data reflect similar behavior ?
 Threshold on co-occurrences
Is one co-occurrence significant enough to be used ? Two ? Three ?
 Symmetric or asymmetric
Is the order important in the co-occurrence ? A then B == B then A ?
 Similarity metrics
Which similarity metrics to be used based on the co-occurrences ?
22
Co-occurrences Example
Browsing
Purchase
Session ? Session ?Time window 1
Session ?Time window 2
07/11/2015 08/11/2015
08/11/2015
24/11/2015
08/11/2015
08/11/2015
10/09/201
5
08/09/201
5
10/09/201
5
23
Co-occurrences Computation
Co-purchases
Co-browsing
Classical co-occurrences
Complementary
items
Substitute
items
Other possible co-occurrences
Items browsed and
bought together
Items browsed and
not bought together
“You may also
want…”
“Similar items…”
08/11/2015
08/11/2015
08/11/2015
07/11/2015
08/11/201510/09/201
5
08/09/201
5
07/11/2015
24
Recommendation Quality Challenges
Recommendations categories
Cold start issue
• External data ?
• Cross-services ?
Hot products (A)
• Top-N items ?
Short tail (B)
Long tail (C + D)
Minor
Product
Major
Product
(Popular)
New
Product
Old
Product
(A)
(B)
(D)
(C)
25
Long Tail is Fat
Long tail numbers
• Most of the items are long tail
• They still represent a large
portion of the traffic
Long tail approaches
• Content-based
• Aggregation / clustering
• Personalization
Popula
r
Short
tail
Long
tail
Browsing share Number of items
Long tail Short tail Popular
26
Recommendations Offline Evaluation
Pros/Cons
• Convenient way to
try new ideas
• Fast and cheap
• But hard to align
with online KPI
Approaches
• Rescoring
• Prediction game
• Business simulator
27
Public Initiative – Viki Recommendation Challenge
567 submissions from 132 participants
http://www.dextra.sg/challenges/rakuten-viki-video-challenge
28
Datascience everywhere !
Rakuten provides marketplaces worldwide
Specific challenges for recommendations
Items catalogue: reinforce statistical power of co-occurrences
across shops and services;
Items similarities: find the good parameters for the different use-
cases;
Recommendations models: what is the best models for in-shop,
all-shops, personalization?
Evaluation: handling long-tail? Comparing different models?
29
THANKS !
Questions ?
More on Rakuten tech initiatives
http://www.slideshare.net/rakutentech
http://rit.rakuten.co.jp/oss.html
http://rit.rakuten.co.jp/opendata.html
Positions
• http://global.rakuten.com/corp/careers/bigdata/
• http://www.priceminister.com/recrutement/?p=197
30
We are Hiring!
Big Data Department – team in Paris
http://global.rakuten.com/corp/careers/bigdata/
http://www.priceminister.com/recrutement/?p=197
Data Scientist / Software Developer
 Build algorithms for recommendations, search, targeting
 Predictive modeling, machine learning, natural language processing
 Working close to business
 Python, Java, Hadoop, Couchbase, Cassandra…
 Also hiring: search engine developers, big data system
administrators, etc.

Más contenido relacionado

La actualidad más candente

Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaEugene Yan Ziyou
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comDaqing Zhao
 
Data Science applications in business
Data Science applications in businessData Science applications in business
Data Science applications in businessVladyslav Yakovenko
 
The Impact of Artificial Intelligence and Digital Disruption on the Supply Chain
The Impact of Artificial Intelligence and Digital Disruption on the Supply ChainThe Impact of Artificial Intelligence and Digital Disruption on the Supply Chain
The Impact of Artificial Intelligence and Digital Disruption on the Supply ChainJason Prescott
 
GTU GeekDay Data Science and Applications
GTU GeekDay Data Science and ApplicationsGTU GeekDay Data Science and Applications
GTU GeekDay Data Science and ApplicationsKürşat İNCE
 

La actualidad más candente (6)

Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
 
Data Science applications in business
Data Science applications in businessData Science applications in business
Data Science applications in business
 
The Impact of Artificial Intelligence and Digital Disruption on the Supply Chain
The Impact of Artificial Intelligence and Digital Disruption on the Supply ChainThe Impact of Artificial Intelligence and Digital Disruption on the Supply Chain
The Impact of Artificial Intelligence and Digital Disruption on the Supply Chain
 
GTU GeekDay Data Science and Applications
GTU GeekDay Data Science and ApplicationsGTU GeekDay Data Science and Applications
GTU GeekDay Data Science and Applications
 

Destacado

Seek and ye shall find - 28.10.2016
Seek and ye shall find - 28.10.2016Seek and ye shall find - 28.10.2016
Seek and ye shall find - 28.10.2016Sonja Riesterer
 
Cross-border E-commerce Strategy Luncheon Series 1
Cross-border E-commerce Strategy Luncheon Series 1Cross-border E-commerce Strategy Luncheon Series 1
Cross-border E-commerce Strategy Luncheon Series 1GS1 Hong Kong
 
The blueprint for world-class demand generation
The blueprint for world-class demand generationThe blueprint for world-class demand generation
The blueprint for world-class demand generationJon Barkworth
 
Making search better by tracking & utilizing user search behavior
Making search better by tracking & utilizing user search behaviorMaking search better by tracking & utilizing user search behavior
Making search better by tracking & utilizing user search behaviorSameer Maggon
 
Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...
Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...
Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...SAP Ariba
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Complexity and Solution Architecture
Complexity and Solution ArchitectureComplexity and Solution Architecture
Complexity and Solution ArchitectureAlan McSweeney
 
User Behavior Tracking with Google Analytics, Garb, and Vanity
User Behavior Tracking with Google Analytics, Garb, and VanityUser Behavior Tracking with Google Analytics, Garb, and Vanity
User Behavior Tracking with Google Analytics, Garb, and VanityTony Pitale
 

Destacado (10)

Seek and ye shall find - 28.10.2016
Seek and ye shall find - 28.10.2016Seek and ye shall find - 28.10.2016
Seek and ye shall find - 28.10.2016
 
Cross-border E-commerce Strategy Luncheon Series 1
Cross-border E-commerce Strategy Luncheon Series 1Cross-border E-commerce Strategy Luncheon Series 1
Cross-border E-commerce Strategy Luncheon Series 1
 
The blueprint for world-class demand generation
The blueprint for world-class demand generationThe blueprint for world-class demand generation
The blueprint for world-class demand generation
 
Making search better by tracking & utilizing user search behavior
Making search better by tracking & utilizing user search behaviorMaking search better by tracking & utilizing user search behavior
Making search better by tracking & utilizing user search behavior
 
Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...
Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...
Beyond B2B Consumerization: How Instant Commerce™ in B2B Changes the Future o...
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Retail 2.0 Strategy - Perfect Store PDF
Retail 2.0 Strategy - Perfect Store PDFRetail 2.0 Strategy - Perfect Store PDF
Retail 2.0 Strategy - Perfect Store PDF
 
Scalable Search Analytics
Scalable Search AnalyticsScalable Search Analytics
Scalable Search Analytics
 
Complexity and Solution Architecture
Complexity and Solution ArchitectureComplexity and Solution Architecture
Complexity and Solution Architecture
 
User Behavior Tracking with Google Analytics, Garb, and Vanity
User Behavior Tracking with Google Analytics, Garb, and VanityUser Behavior Tracking with Google Analytics, Garb, and Vanity
User Behavior Tracking with Google Analytics, Garb, and Vanity
 

Similar a Data Science in E-commerce

Datasciencein E-commerce industry
Datasciencein E-commerce industryDatasciencein E-commerce industry
Datasciencein E-commerce industryRakuten Group, Inc.
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation PlatformKarthik Murugesan
 
Dan-Ya Schwartz - All Things DATA 2017
Dan-Ya Schwartz - All Things DATA 2017Dan-Ya Schwartz - All Things DATA 2017
Dan-Ya Schwartz - All Things DATA 2017Shuki Mann
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analyticssunnypatil1778
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaDobo Radichkov
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
Digital analytics lecture1
Digital analytics lecture1Digital analytics lecture1
Digital analytics lecture1Joni Salminen
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and ProfitLouis Rosenfeld
 

Similar a Data Science in E-commerce (20)

Datasciencein E-commerce industry
Datasciencein E-commerce industryDatasciencein E-commerce industry
Datasciencein E-commerce industry
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation Platform
 
Dan-Ya Schwartz - All Things DATA 2017
Dan-Ya Schwartz - All Things DATA 2017Dan-Ya Schwartz - All Things DATA 2017
Dan-Ya Schwartz - All Things DATA 2017
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Big Data
Big DataBig Data
Big Data
 
Digital analytics lecture1
Digital analytics lecture1Digital analytics lecture1
Digital analytics lecture1
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 

Último

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Último (20)

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Data Science in E-commerce

  • 1. Data Science in E-commerce industry DSSP 2016/05/20 Vincent Michel Big Data Europe, BDD, Rakuten Inc. / PriceMinister vincent.michel@rakuten.com @HowIMetYourData
  • 2. 2 Short Bio ESPCI: engineer in Physics / Biology ENS Cachan: MVA Master Mathematics Vision and Learning INRIA Parietal team: PhD in Computer Science Understanding the visual cortex by using classification techniques Logilab – Development and data science consulting Data.bnf.fr (French National Library open-data platform) Brainomics (platform for heterogeneous medical data) Education Experience Rakuten PriceMinister– Senior Developer and data scientist Data engineer and data science consulting
  • 3. Software engineering Lessons learned from (painful) experiences
  • 4. 4 Do not redo it yourself ! Lots of really interesting open-source libraries for all your needs: Test first on a small POC, then contribute/develop Scikit-learn, pandas, Caffe, Scikit-image, opencv, …. Be careful: it is really easy to do something wrong ! Open-data: More and more open-data for catalogs, … E.g. data.bnf.fr ~ 2.000.000 authors ~ 200.000 works ~ 200.000 topics Contribute to open-source: Is there a need / pool of potential developers ? Do it well (documentation / test) Unless you are doing some kind of super magical algorithm May bring you help, bug fixes, and engineers ! But it takes time and energy
  • 5. 5 Quality in data science software engineering Never underestimates integration cost Really easy to write a 20 lines Python code doing some fancy Random Forests… …that could be really hard to deploy (data pipeline, packaging, monitoring) Developer != DevOps != Sys admin Make it clean from the start (> 2 days of dev or > 100 lines of code): Tests, tests, tests, tests, tests, tests, tests, … Documentation Packaging / supervision / monitoring Release often release earlier Agile development, Pull request, code versioning Choose the right tool: Do you really need this super fancy NoSQL database to store your transactions?
  • 6. 6 Monitoring and metrics Always monitor: Your development: continuous integration (Jenkins) Your service: nagios/shinken Your business data (BI): Kibana Your user: tracker Your data science process : e.g. A/B test Evaluation: Choose the right metric Prediction accuracy / Precision-recall … Always A/B test rather than relying on personal thoughts Good question leads to good answer: Define your problem
  • 7. Hiring remarks Finding the good data scientist
  • 8. 8 Finding your data scientist Do not try to find a unicorn! Define your needs (and unicorns no longer exist…)
  • 9. 9 Few remarks on hiring – my personal opinion Be careful of CVs with buzzwords! E.g. “IT skills: SVM (linear, non-linear), Clustering (K-means, Hierarchical), Random Forests, Regularization (L1, L2, Elastic net…) …” It is like as someone saying “ IT skills: Python (for loop, if/else pattern, …) Often found in Junior CVs (ok), but huge warning in Senior CVs Hungry for data? Loving data is the most important thing to check Opendata? Personal project? Curious about data? (Hackaton?) Pluridisciplinary == knowing how to handle various datasets Check for IT skills: Should be able to install/develop new libraries/algorithms A huge part of the job could be to format / cleanup the data Experience VS education -> Autonomy
  • 11. 11 Rakuten Group Worldwide Recommendation challenges Different languages Users behavior Business areas
  • 12. 12 Rakuten Group in Numbers Rakuten in Japan > 12.000 employees > 48 billions euros of GMS > 100.000.000 users > 250.000.000 items > 40.000 merchants Rakuten Group Kobo 18.000.000 users Viki 28.000.000 users Viber 345.000.000 users
  • 13. 13 Rakuten Ecosystem Rakuten global ecosystem : Member-based business model that connects Rakuten services Rakuten ID common to various Rakuten services Online shopping and services; Main business areas E-commerce Internet finance Digital content Recommendation challenges Cross-services Aggregated data Complex users features
  • 14. 14 Rakuten’s e-commerce: B2B2C Business Model Business to Business to Consumer: Merchants located in different regions / online virtual shopping mall Main profit sources • Fixed fees from merchants • Fees based on each transaction and other service Recommendation challenges Many shops Items references Global catalog
  • 15. 15 Big Data Department @ Rakuten Big Data Department 150+ engineers – Japan / Europe / US Missions Development and operations of internal systems for: Recommendations Search Targeting User behavior tracking Average traffic > 100.000.000 events / day > 40.000.000 items view / day > 50.000.000 search / day > 750.000 purchases / day Technology stack Java / Python / Ruby Solr / Lucene Cassandra / Couchbase Hadoop / Hive / Pig Redis / Kafka
  • 16. 16 Recommendations on Rakuten Marketplaces Non-personalized recommendations All-shop recommendations: Item to item User to item In-shop recommendations Review-based recommendations Personalized recommendations Purchase history recommendations Cart add recommendations Order confirmation recommendations System status and scale In production in over 35 services of Rakuten Group worldwide Several hundreds of servers running: Hadoop Cassandra APIS
  • 17. 17 Challenges in Recommendations Items Catalogue Items Similarity Recommendations engine Evaluation Process Items catalogues Catalogue for multiple shops with different items references ? Items similarity / distances Cross services aggregation ? Lots of parameters ? Recommendations engine Best / optimal recommendations logic ? Evaluation process Offline / online evaluation ? Long-tail ? KPI ?
  • 18. 18 Recommendations Architecture: Constantly Evolving Browsing Events Cocounts Storage Purchase Events Catalogue(s) Distributionlayer Recommendations Offline / materialized Recommendations Online algebra / multi-arm
  • 19. 19 Items Catalogues Use different levels of aggregation to improve recommendations Category-level (e.g. food, soda, clothes, …) Product-level (manufactured items) Item in shop-level (specific product sell by a specific shop) Increased statistical power in co-events computation Easier business handling (picking the good item)
  • 20. 20 Enriching Catalogues using Record Linkage Marketplace 2Marketplace 1 Reference database Record linkage  Use external sources (e.g., Wikidata) to align markets' products  Fuzzy matching of 600K vs 350K items for movies alignments usecase.  Blocking algorithm Cross recommendation  Global catalog  Items aggregation  Helps with cold start issues  Improved navigation
  • 21. 21 Co-occurrences and Similarities Computation Only access to unitary data (purchase / browsing) Use co-occurrences for computing items similarity Multiple possible parameters:  Size of time window to be considered: Does browsing and purchase data reflect similar behavior ?  Threshold on co-occurrences Is one co-occurrence significant enough to be used ? Two ? Three ?  Symmetric or asymmetric Is the order important in the co-occurrence ? A then B == B then A ?  Similarity metrics Which similarity metrics to be used based on the co-occurrences ?
  • 22. 22 Co-occurrences Example Browsing Purchase Session ? Session ?Time window 1 Session ?Time window 2 07/11/2015 08/11/2015 08/11/2015 24/11/2015 08/11/2015 08/11/2015 10/09/201 5 08/09/201 5 10/09/201 5
  • 23. 23 Co-occurrences Computation Co-purchases Co-browsing Classical co-occurrences Complementary items Substitute items Other possible co-occurrences Items browsed and bought together Items browsed and not bought together “You may also want…” “Similar items…” 08/11/2015 08/11/2015 08/11/2015 07/11/2015 08/11/201510/09/201 5 08/09/201 5 07/11/2015
  • 24. 24 Recommendation Quality Challenges Recommendations categories Cold start issue • External data ? • Cross-services ? Hot products (A) • Top-N items ? Short tail (B) Long tail (C + D) Minor Product Major Product (Popular) New Product Old Product (A) (B) (D) (C)
  • 25. 25 Long Tail is Fat Long tail numbers • Most of the items are long tail • They still represent a large portion of the traffic Long tail approaches • Content-based • Aggregation / clustering • Personalization Popula r Short tail Long tail Browsing share Number of items Long tail Short tail Popular
  • 26. 26 Recommendations Offline Evaluation Pros/Cons • Convenient way to try new ideas • Fast and cheap • But hard to align with online KPI Approaches • Rescoring • Prediction game • Business simulator
  • 27. 27 Public Initiative – Viki Recommendation Challenge 567 submissions from 132 participants http://www.dextra.sg/challenges/rakuten-viki-video-challenge
  • 28. 28 Datascience everywhere ! Rakuten provides marketplaces worldwide Specific challenges for recommendations Items catalogue: reinforce statistical power of co-occurrences across shops and services; Items similarities: find the good parameters for the different use- cases; Recommendations models: what is the best models for in-shop, all-shops, personalization? Evaluation: handling long-tail? Comparing different models?
  • 29. 29 THANKS ! Questions ? More on Rakuten tech initiatives http://www.slideshare.net/rakutentech http://rit.rakuten.co.jp/oss.html http://rit.rakuten.co.jp/opendata.html Positions • http://global.rakuten.com/corp/careers/bigdata/ • http://www.priceminister.com/recrutement/?p=197
  • 30. 30 We are Hiring! Big Data Department – team in Paris http://global.rakuten.com/corp/careers/bigdata/ http://www.priceminister.com/recrutement/?p=197 Data Scientist / Software Developer  Build algorithms for recommendations, search, targeting  Predictive modeling, machine learning, natural language processing  Working close to business  Python, Java, Hadoop, Couchbase, Cassandra…  Also hiring: search engine developers, big data system administrators, etc.