1. New Analytical Architectures
Why Classic Data Warehousing Approaches
Miss the Mark with Big Data
March 21, 2013
Casey Kiernan • casey.l.kiernan@gmail.com
Blog • www.the-data-platform.com
2. “The Future is Data”
“Hadoopis the kernel of a new Distributed Data OS”
Doug Cutting
3. Data has Changed
> Analytics has Changed
Transactional
> Trailing Indicators
Communities
> Reach/Influence
Personal
> Interactive
Can the Data Warehouse Architecture adapt?
6. My Mountain Bike as a Data Platform
Data Collection
Heart Rate
Data Collection
Altitude
Data Collection Temperature
Speed / Trip Miles Time Guidance
Performance
Rate of Climb
Calories Burned
Miles Obtained
Total Climbed
Elapsed Time
Current,
Average,
Max Values
Data Collection
Cadence / RPM
Data Architecture - on a Local Wireless Network (ANT+ Protocol)
8. New Data Behaviors (individual actions) > Content > Time
Behaviors
Content
9. New Data More is Better…
Meaningful
Guidance
Massive
Data
9
10. “Business”Analytics - Classic “DW”
BUSINESS INTELLIGENCE
OLAP / DATA WAREHOUSE
OLTP / TRANSACTIONS
DATA.
Answers the question:
What are our most profitable Products?
11. Classic “Business” Analytics
Good for Reporting, Forecasting
What did Happen? What will Happen?
Operational Reporting Tactical Trending Strategic
Months WeeksWeeks Months Years
Descriptive/Trending Analytics11
12. New“Personal”Analytics
SELF-SERVICE
GUIDANCE
BEHAVIOURS
DATA. Answers the question:
Show me a good movie to watch!
13. “Personal” Analytics
“Right Now” is a very important time-frame!
What is Happening
RIGHT NOW!
What did Happen? What will Happen?
Operational Reporting Tactical Trending Strategic
Months WeeksWeeks Months Years
Predictive/Prescriptive Analytics
13
16. “Business”Analytical Architecture
Classic “DW” Data Flow - Uni-Directional, Latent,…
Ordering App
Data
Staging OLAP / Reports Business
Financial App Warehouse
OLTP to OLAP Facts/Dimensions Analyst
Mapping
Business Metrics,
Master Data Facts & KPI, YTD Reporting
Dimensions
What are our most Profitable Products?
16
17. “Personal” Analytical Architecture
“New” Data Flow - Iterative, Specialized, Extensible, plug & play Analytics, near real-time
[Some components are open-source]
Application / UX
Analytical Capabilities
Scoring/Ranking,
Recommendations,
Natural Language
Processing, Relevancy,
Classification, Optimization,
Data Analytics Collaborative Filtering,
Personalization,
Digital Attribution,…
Data
Analysts
What movie should I watch tonight? 17
18. “Personal Analytics” Data Architecture
“New” Data Flow – Detailed View of Components
End-User Experience
Browser, Tablet, Self-Service Application
Mobile,…
Personalization, Personalized
Preferences, State Recommendations
App Persistence Published Analytics
Persistence/Analytics “State” Persistence “Read” Performance
Analytics Engines
Pluggable
Social Signals Mass Data Storage
RSS/Facebook/… Behaviors / “Write” Performance
Data Scientists
18
23. How important is Social?
Shows you who is actively
Install ghostery.com watching you surf the web!
Lots of people!!!
24. Signals – The Core of New Data
Mixture of Proprietary and Public Data
Social
Personal
Content
Time
25. The New “Analytical Application” Architecture
“New” Data Flow – Specialized Technology Choices
End-User Experience
Browser, Tablet, Self-Service Application
Mobile,…
Personalization, Personalized
Preferences, State Recommendations
App Persistence Published Analytics
Persistence/Analytics Cassandra, Riak,… Hbase
Data-Center or Cloud
Analytics
R, Mahout, Pig
Mass Data Storage
Hadoop
Specialization of Data Technologies 26
26. Servicing Multiple Analytical Systems
Using Shared Analytical Mas- Storage
Self-Service Application A Self-Service Application B
Published Published
Persistence Persistence
Analytics Analytics
Riak Cassandra
HBase MySQL
Data Scientists
Analytics Engine
Pluggable
Mass Data Storage Analytics Engine
Behaviors / “Write”Performance Pluggable
Analytics Engine
Hadoop / AWS
Pluggable
p.
27
27. Integrating the Architectures
“Personal” Analytics Stack + Classic “DW” Stack
Only Financial Events ($$$) cross the threshold
App (and are recorded into) the Data Warehouse
Staging
App
Data Warehouse OLAP / Business
OLTP to OLAP Mapping Reports Analyst
App
“Local” Events stay Local
(they are analyzed locally)
Not all DATA Belongs in the Data Warehouse! 28
28. Classic DW Vs. the New Analytics
The Shift from “Business” Analytics to “Personal” Analytics
Classic DW New Analytics
Scope Enterprise Application
Analytics Trailing:OLAP Predictive: Machine Learning
Sentiment Analysis, Recommendations,
Personalization,Natural Language
Processing, Classification, Clustering,
Optimization, Collaborative Filtering,
Digital Attribution,…
Actionable? Loosely Coupled Tightly Coupled
Analytics Embedded in Application
Data Structures Facts/Dimensions Semantic Data,Graph / Triples,
(Requires a DW) Observations, Direct Signals
Knowledge Expert Business Analyst Data Scientist
Technology Stack Vendor Driven ($$$) Open-Source
Architecture Scale-Up Scale-Out (or in the Cloud)
29. New Signals + New Analytics = NewScenarios
Data New
Analytics
Signals New Recommendations,
Social Scenarios Natural Language
Location Customer Processing,
Engagement, Relevancy,
Personal
Customer Loyalty / Classification,
Behaviors Attrition / Retention, Optimization,
Transactions Fraud, Risk Analysis, Collaborative
Content Intent, Customer Filtering,
Personalization Digital
Time
Attribution,…
Most people see data as a storage issue – persistence / serializatioAnd the associated technologies – oracle,mysql, sql server, and now Hadoop / riak / Hbasebut I don’t see data as a persistence problem – simply – “where do you put it?”I see data lattiss - theirs more data in the network than in databases… – it’s value is comes from how/where the data is used – not how it is stored. I had the opportunity to attend a number of training sessions help by Chris Date – we learned about RI, dead locks,… But one thing he said to me that really stuck – He said that the “data” is in the transaction log, not the database – the database only contains a current snapshot – What is happening is in the transaction log.