The document discusses Macy's use of real-time recommender systems using Apache Kiji. Macy's Advanced Analytics team builds predictive models to recommend complementary products to customers on their website. They use a Hadoop/NoSQL approach to access raw data and build models near the data. Their models are deployed using Apache Kiji, which provides scalable real-time scoring and decisions. Kiji allows quick development and real-time updates of recommender models to improve customer acquisition and retention.
1. Real Time Recommender System
with
Jan 22, 2014
Daqing Zhao, Director of Advanced Analytics
Macy’s.com
2. Agenda
Big data analytics versus traditional BI
Macy’s Advanced Analytics Team
Our analytics projects
Example: site recommendations using Kiji
High level architecture
Kiji Schema table structure
Model deployment using Kiji
Key benefits of Kiji and WibiData team
1
3. Traditional BI process
Knowledge
Discovery
Segmentation and
Predictive Modeling
Most companies
Stay in this area
Multidimensional Report
Standard Report
Schema definition, ETL into RDMS
Baseline Consulting
Data can be accessed and analyzed only after ETL
Schema definition may not be optimal
2
4. Hadoop/NoSQL: paradigm shift
Decisions
Insights
Models
Decision Agent
Segmentation
and
Predictive
Modeling
Multi
dimensional
Report
Reports
Standard
Report
Hive, Mahout, Cascading, Scalding, Kiji, …
MapReduce
Raw
data
Volume
Velocity
Variety
Write
Append
Read
Distributed
storage
Computation
near data
Hadoop, HBase, avro, …
We can access raw data and analyze using MapReduce
With pros and cons
3
5. Macy.com’s Advanced Analytics Group
We are at the frontiers of Big Data science:
• Using Big Data technology
• Machine learning and Statistical algorithms
We have predictive modeling, experimental design and data science
teams
Our team members have very strong background in
• Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and cs
• We collaborate with systems and IT teams internally as well as 3rd party vendors like WibiData,
SAS Research, IBM Research…
We use a wide range of tools
• Hadoop, SAS, R, Mahout, and others, as well as Kiji Models
We are data scientists with keen focus on domain problems
4
6. Customer acquisition and retention
Targeting the right message to the right customer at the right time
• Build predictive models of purchase behavior and identify drivers
Site recommendation algorithms
• Recommend products based on items that are added to bag for cross- and up-sell
• We also look at market basket analysis
• Most work is in batch mode, expanding slowly into real time
Rapid-prototyping and testing of algorithms and policies
• All done in short development cycles
Output of the team’s work support other marketing teams to identify,
and reach best customers
• Search, display, social network, affiliates, retention, customer services, …
5
7. Some other projects
Data organization or data munging
•
•
•
•
•
Data collections, individual and event level, 360 degrees, …
Segmentation of customers
Customer value, revenue, costs
Multiple channel attribution of marketing contacts
Product attributes
Experimentation platform
• Success of online marketing depends highly on testing, learning and optimization
• Both for site layout as well as contents and recommendations
Forecast and optimization
• Prediction, simulation, and search and optimize
Big data refinement and scalability
• New data sources, more efficient ways of accessing data, and organizing and
processing data
6
11. Example: site product recommendation
Customer Adds to Bag one or more products
We recommend in real time similar/complementary products
• Based on product associations and customer profile
We use various machine learning algorithms
•
•
•
•
•
•
Association rules
Collaborative filtering
Predictive modeling
Business rules
And others, …
Models built offline
Real time data, real time model scoring and real time decision
Champion/challenger tests, models evolve quickly in time
Frequent model updates, add new data
10
12. Architecture
Real Time
Data access, Scoring
Decisions
Others
data mining
Kiji Express
environment
data mining
Mahout
environment
data mining
R
environment
SAS
Environment
products
Kiji Model
Kiji Kiji Scoring
Scoring
Kiji Kiji Rest
Rest
Kiji Kiji Rest
Rest
Hadoop
HBase
11
13. Kiji Schema table structure
Customer table
entity id
customer
email
metadata
order
Product table
entity id
product
category
metadata
inventory
Schema have column names and types, compared to bits stored in HBase
Group column families are structured, while Map column families are flexible
Accessible as collections from Kiji Express
Scala code focuses on model and business logic
Scalding underneath takes care of generating MapReduce jobs
12
14. Model Build and Deployment
Model
Model
building
Model
building
Model
building
Model
building
building
Kiji Express
Kiji Scoring
Kiji PMML
Kiji MR
Deployment
Kiji
Schema
HBase
Hadoop
Offline
Kiji Modeling
R, SAS, Mahout, …
Real time data update
Real time scoring
Real time decisions
13
15. Key benefits of partnership with WibiData
Open source, Kiji suite, abstracted with focus in modeling
• Kiji Schema, KijiMR, Kiji Model, Kiji Scoring, Kiji Express, Kiji REST
• Allow quick development cycle
Package popular open source projects
• Hadoop, HBase, Avro, Cascading, Scalding, Scala
Better organization
• Create tables, query by field name, flexibility, …, more DB like than HBase
WibiData professional services team help develop, integrate, maintain,
train in-house team, consult,…
• Competence, knowledge
• Support infrastructure, so that we can focus on the science
Real time model deployment environment and scalable
• Interactive
• In milliseconds
14
16. Acknowledgement
Macy’s teams
Analytics team: Kerem Tomak, Albert Zhai
Infrastructure team: Winslow Holmes, Rakesh Sharma, Cherry Peng
WibiData team
Professional Services team: Adam, Christophe, Renuka, Lynn
15