SlideShare una empresa de Scribd logo
1 de 16
Real Time Recommender System
with
Jan 22, 2014

Daqing Zhao, Director of Advanced Analytics

Macy’s.com
Agenda

 Big data analytics versus traditional BI

 Macy’s Advanced Analytics Team
 Our analytics projects
 Example: site recommendations using Kiji
 High level architecture
 Kiji Schema table structure
 Model deployment using Kiji
 Key benefits of Kiji and WibiData team

1
Traditional BI process
Knowledge
Discovery

Segmentation and
Predictive Modeling
Most companies
Stay in this area
Multidimensional Report

Standard Report

Schema definition, ETL into RDMS

Baseline Consulting

 Data can be accessed and analyzed only after ETL
 Schema definition may not be optimal
2
Hadoop/NoSQL: paradigm shift

Decisions

Insights

Models

Decision Agent

Segmentation
and
Predictive
Modeling

Multi
dimensional
Report

Reports

Standard
Report

Hive, Mahout, Cascading, Scalding, Kiji, …

MapReduce
Raw
data

Volume
Velocity
Variety

Write
Append
Read

Distributed
storage

Computation
near data

Hadoop, HBase, avro, …

 We can access raw data and analyze using MapReduce
 With pros and cons
3
Macy.com’s Advanced Analytics Group
 We are at the frontiers of Big Data science:
• Using Big Data technology
• Machine learning and Statistical algorithms

 We have predictive modeling, experimental design and data science
teams

 Our team members have very strong background in
• Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and cs
• We collaborate with systems and IT teams internally as well as 3rd party vendors like WibiData,
SAS Research, IBM Research…

 We use a wide range of tools
• Hadoop, SAS, R, Mahout, and others, as well as Kiji Models

 We are data scientists with keen focus on domain problems

4
Customer acquisition and retention
 Targeting the right message to the right customer at the right time
• Build predictive models of purchase behavior and identify drivers

 Site recommendation algorithms
• Recommend products based on items that are added to bag for cross- and up-sell
• We also look at market basket analysis
• Most work is in batch mode, expanding slowly into real time

 Rapid-prototyping and testing of algorithms and policies
• All done in short development cycles

 Output of the team’s work support other marketing teams to identify,
and reach best customers
• Search, display, social network, affiliates, retention, customer services, …

5
Some other projects
 Data organization or data munging
•
•
•
•
•

Data collections, individual and event level, 360 degrees, …
Segmentation of customers
Customer value, revenue, costs
Multiple channel attribution of marketing contacts
Product attributes

 Experimentation platform
• Success of online marketing depends highly on testing, learning and optimization
• Both for site layout as well as contents and recommendations

 Forecast and optimization
• Prediction, simulation, and search and optimize

 Big data refinement and scalability
• New data sources, more efficient ways of accessing data, and organizing and
processing data

6
Example: similar and complementary products

7
Example: customer segmentation

Demographic
Socio-economic
Behavioral
Values and styles
Channels
Modality

8
Example: product social network

Demographic
Style
Size

Brand
Price range
Season

9
Example: site product recommendation
 Customer Adds to Bag one or more products

 We recommend in real time similar/complementary products
• Based on product associations and customer profile

 We use various machine learning algorithms
•
•
•
•
•
•

Association rules
Collaborative filtering
Predictive modeling
Business rules
And others, …
Models built offline

 Real time data, real time model scoring and real time decision
 Champion/challenger tests, models evolve quickly in time
 Frequent model updates, add new data

10
Architecture

Real Time
Data access, Scoring
Decisions

Others
data mining
Kiji Express
environment
data mining
Mahout
environment
data mining
R
environment
SAS
Environment

products

Kiji Model
Kiji Kiji Scoring
Scoring
Kiji Kiji Rest
Rest
Kiji Kiji Rest
Rest

Hadoop
HBase

11
Kiji Schema table structure

Customer table

entity id

customer

email

metadata

order

Product table

entity id

product

category

metadata

inventory

Schema have column names and types, compared to bits stored in HBase
Group column families are structured, while Map column families are flexible
Accessible as collections from Kiji Express
Scala code focuses on model and business logic
Scalding underneath takes care of generating MapReduce jobs

12
Model Build and Deployment

Model
Model
building
Model
building
Model
building
Model
building
building

Kiji Express
Kiji Scoring
Kiji PMML
Kiji MR
Deployment

Kiji
Schema
HBase
Hadoop

Offline
Kiji Modeling
R, SAS, Mahout, …

Real time data update
Real time scoring
Real time decisions

13
Key benefits of partnership with WibiData

 Open source, Kiji suite, abstracted with focus in modeling
• Kiji Schema, KijiMR, Kiji Model, Kiji Scoring, Kiji Express, Kiji REST
• Allow quick development cycle

 Package popular open source projects
• Hadoop, HBase, Avro, Cascading, Scalding, Scala

 Better organization
• Create tables, query by field name, flexibility, …, more DB like than HBase

 WibiData professional services team help develop, integrate, maintain,
train in-house team, consult,…
• Competence, knowledge
• Support infrastructure, so that we can focus on the science

 Real time model deployment environment and scalable
• Interactive
• In milliseconds

14
Acknowledgement

 Macy’s teams

 Analytics team: Kerem Tomak, Albert Zhai
 Infrastructure team: Winslow Holmes, Rakesh Sharma, Cherry Peng

 WibiData team
 Professional Services team: Adam, Christophe, Renuka, Lynn

15

Más contenido relacionado

La actualidad más candente

Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupCaserta
 
Watson Analytics for HSE - Copy
Watson Analytics for HSE - CopyWatson Analytics for HSE - Copy
Watson Analytics for HSE - CopyAlexei Cherenkov
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...Michelle Zhou
 
Ecr presentation ss chain - jeffrey - final
Ecr presentation   ss chain - jeffrey - finalEcr presentation   ss chain - jeffrey - final
Ecr presentation ss chain - jeffrey - finalECR Community
 
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem  and  TechnologyValue Delivery through RakutenBig Data Intelligence Ecosystem  and  Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem and TechnologyRakuten Group, Inc.
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National LabsLucidworks
 
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...NICSA
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiDeko Dimeski
 
Strategic Value from Enterprise Search and Insights - Viren Patel, PwC
Strategic Value from Enterprise Search and Insights - Viren Patel, PwCStrategic Value from Enterprise Search and Insights - Viren Patel, PwC
Strategic Value from Enterprise Search and Insights - Viren Patel, PwCLucidworks
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningLviv Startup Club
 
A Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-ServiceA Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-ServiceDenodo
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
What Watson Explorer is and How it works
What Watson Explorer is and How it worksWhat Watson Explorer is and How it works
What Watson Explorer is and How it worksVirginia Fernandez
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadThink Big, a Teradata Company
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseLisa Cohen
 
Consumer Data Management
Consumer Data ManagementConsumer Data Management
Consumer Data Managementijtsrd
 

La actualidad más candente (20)

Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Watson Analytics for HSE - Copy
Watson Analytics for HSE - CopyWatson Analytics for HSE - Copy
Watson Analytics for HSE - Copy
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
 
Ecr presentation ss chain - jeffrey - final
Ecr presentation   ss chain - jeffrey - finalEcr presentation   ss chain - jeffrey - final
Ecr presentation ss chain - jeffrey - final
 
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem  and  TechnologyValue Delivery through RakutenBig Data Intelligence Ecosystem  and  Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
 
Data analytics
Data analyticsData analytics
Data analytics
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
 
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko Dimeski
 
Strategic Value from Enterprise Search and Insights - Viren Patel, PwC
Strategic Value from Enterprise Search and Insights - Viren Patel, PwCStrategic Value from Enterprise Search and Insights - Viren Patel, PwC
Strategic Value from Enterprise Search and Insights - Viren Patel, PwC
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
 
A Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-ServiceA Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-Service
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
What Watson Explorer is and How it works
What Watson Explorer is and How it worksWhat Watson Explorer is and How it works
What Watson Explorer is and How it works
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Consumer Data Management
Consumer Data ManagementConsumer Data Management
Consumer Data Management
 

Destacado

Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱Daqing Zhao
 
Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsbrock55
 
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]Karel Minarik
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time PersonalizationRichard Veryard
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesGopalakrishna Palem
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...DataStax
 

Destacado (10)

Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
 
Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
 
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time Personalization
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businesses
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 

Similar a Real Time Recommendation System using Kiji

Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platformHaoran Du
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationPerficient, Inc.
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarConcept Searching, Inc
 
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Brad Culbert
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysBusiness Over Broadway
 
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...ETCenter
 
BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech
 

Similar a Real Time Recommendation System using Kiji (20)

Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations Webinar
 
BI and Big Data DeepDive - Pressmart
BI and Big Data DeepDive - PressmartBI and Big Data DeepDive - Pressmart
BI and Big Data DeepDive - Pressmart
 
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and Surveys
 
Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure
 
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
 
BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 

Último

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Real Time Recommendation System using Kiji

  • 1. Real Time Recommender System with Jan 22, 2014 Daqing Zhao, Director of Advanced Analytics Macy’s.com
  • 2. Agenda  Big data analytics versus traditional BI  Macy’s Advanced Analytics Team  Our analytics projects  Example: site recommendations using Kiji  High level architecture  Kiji Schema table structure  Model deployment using Kiji  Key benefits of Kiji and WibiData team 1
  • 3. Traditional BI process Knowledge Discovery Segmentation and Predictive Modeling Most companies Stay in this area Multidimensional Report Standard Report Schema definition, ETL into RDMS Baseline Consulting  Data can be accessed and analyzed only after ETL  Schema definition may not be optimal 2
  • 4. Hadoop/NoSQL: paradigm shift Decisions Insights Models Decision Agent Segmentation and Predictive Modeling Multi dimensional Report Reports Standard Report Hive, Mahout, Cascading, Scalding, Kiji, … MapReduce Raw data Volume Velocity Variety Write Append Read Distributed storage Computation near data Hadoop, HBase, avro, …  We can access raw data and analyze using MapReduce  With pros and cons 3
  • 5. Macy.com’s Advanced Analytics Group  We are at the frontiers of Big Data science: • Using Big Data technology • Machine learning and Statistical algorithms  We have predictive modeling, experimental design and data science teams  Our team members have very strong background in • Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and cs • We collaborate with systems and IT teams internally as well as 3rd party vendors like WibiData, SAS Research, IBM Research…  We use a wide range of tools • Hadoop, SAS, R, Mahout, and others, as well as Kiji Models  We are data scientists with keen focus on domain problems 4
  • 6. Customer acquisition and retention  Targeting the right message to the right customer at the right time • Build predictive models of purchase behavior and identify drivers  Site recommendation algorithms • Recommend products based on items that are added to bag for cross- and up-sell • We also look at market basket analysis • Most work is in batch mode, expanding slowly into real time  Rapid-prototyping and testing of algorithms and policies • All done in short development cycles  Output of the team’s work support other marketing teams to identify, and reach best customers • Search, display, social network, affiliates, retention, customer services, … 5
  • 7. Some other projects  Data organization or data munging • • • • • Data collections, individual and event level, 360 degrees, … Segmentation of customers Customer value, revenue, costs Multiple channel attribution of marketing contacts Product attributes  Experimentation platform • Success of online marketing depends highly on testing, learning and optimization • Both for site layout as well as contents and recommendations  Forecast and optimization • Prediction, simulation, and search and optimize  Big data refinement and scalability • New data sources, more efficient ways of accessing data, and organizing and processing data 6
  • 8. Example: similar and complementary products 7
  • 10. Example: product social network Demographic Style Size Brand Price range Season 9
  • 11. Example: site product recommendation  Customer Adds to Bag one or more products  We recommend in real time similar/complementary products • Based on product associations and customer profile  We use various machine learning algorithms • • • • • • Association rules Collaborative filtering Predictive modeling Business rules And others, … Models built offline  Real time data, real time model scoring and real time decision  Champion/challenger tests, models evolve quickly in time  Frequent model updates, add new data 10
  • 12. Architecture Real Time Data access, Scoring Decisions Others data mining Kiji Express environment data mining Mahout environment data mining R environment SAS Environment products Kiji Model Kiji Kiji Scoring Scoring Kiji Kiji Rest Rest Kiji Kiji Rest Rest Hadoop HBase 11
  • 13. Kiji Schema table structure Customer table entity id customer email metadata order Product table entity id product category metadata inventory Schema have column names and types, compared to bits stored in HBase Group column families are structured, while Map column families are flexible Accessible as collections from Kiji Express Scala code focuses on model and business logic Scalding underneath takes care of generating MapReduce jobs 12
  • 14. Model Build and Deployment Model Model building Model building Model building Model building building Kiji Express Kiji Scoring Kiji PMML Kiji MR Deployment Kiji Schema HBase Hadoop Offline Kiji Modeling R, SAS, Mahout, … Real time data update Real time scoring Real time decisions 13
  • 15. Key benefits of partnership with WibiData  Open source, Kiji suite, abstracted with focus in modeling • Kiji Schema, KijiMR, Kiji Model, Kiji Scoring, Kiji Express, Kiji REST • Allow quick development cycle  Package popular open source projects • Hadoop, HBase, Avro, Cascading, Scalding, Scala  Better organization • Create tables, query by field name, flexibility, …, more DB like than HBase  WibiData professional services team help develop, integrate, maintain, train in-house team, consult,… • Competence, knowledge • Support infrastructure, so that we can focus on the science  Real time model deployment environment and scalable • Interactive • In milliseconds 14
  • 16. Acknowledgement  Macy’s teams  Analytics team: Kerem Tomak, Albert Zhai  Infrastructure team: Winslow Holmes, Rakesh Sharma, Cherry Peng  WibiData team  Professional Services team: Adam, Christophe, Renuka, Lynn 15