SlideShare una empresa de Scribd logo
1 de 31
Customer Intelligence –
Harnessing Elephants at Transamerica
Stephen Lloyd
BI Architect, Transamerica Life & Protection
#strataconf #hadoopworld
Vishal Bamba
Chief Architect Transamerica Life & Protection
Dave Beaudoin
AVP Transamerica Investments & Retirement
2#strataconf #hadoopworld
Agenda
• About Transamerica
• Current State and Data Architecture
• Use Cases / Learnings
• POC Highlights
• Solution Design
• Lessons Learned
• Questions / Discussion
3#strataconf #hadoopworld
Transamerica Org Structure
• Investments & Retirement
– Retirement and Benefit Plan services to employers/employees
– Mutual funds and variable annuities
– Mission to help people save and invest wisely to secure their
retirement dreams, the I&R business unit serves more than 3 million
retirement plan participants across the entire spectrum of defined
benefit and defined contribution plans.
• Life & Protection
– Term and Perm insurance products
– Medicare supplement, long term care, accidental death, final expense
– Mission to protect what you’ve built, secure what’s next.
4#strataconf #hadoopworld
Data Architecture
• Rich data environment across organizational
business units, comprised of many source
systems across various platforms
• A consistent enterprise view of data across
business units is required
5#strataconf #hadoopworld
Data Architecture
6#strataconf #hadoopworld
Data Architecture
As in many industries,
we are focused on
leveraging technology
to build data-driven
customer
relationships.
How can the current data architecture
support this strategic direction?
7#strataconf #hadoopworld
Data Architecture
Enterprise Data Management
Crossroads
Another traditional
warehouse project?
or
Enterprise data
hub/lake/ocean with new
technologies?
8#strataconf #hadoopworld
Data Architecture – New Direction?
9#strataconf #hadoopworld
• 360 degree view of consumers for marketing, planning, and analytics
• Discover and mine relationships
• Create highly targeted and individualized marketing programs
The Vision
Enterprise Marketing and Analytics Platform
The How
• Co-location, master data management, custom data quality and cleansing rules
and more
• Which allow integration of data from across and outside of the company to
create 360 view of consumers
10#strataconf #hadoopworld
Use Cases
Complete a Data Append Process
Append Data Elements from Enrichment sources to
Customer/Prospect records
Tests: Data Integration, Processing Power
11#strataconf #hadoopworld
Use Cases
Score Prospects with a Predictive Model
Score prospects with the current model to predict likelihood
to respond to a direct marketing offer
Tests: Processing Power, Predictive Modeling
12#strataconf #hadoopworld
Use Cases
Measure online contracts/sales, create visitor
personas, create simple attribution
Match weblog visitor data to Salesforce leads and to policy
holders to generate a sales pipeline. Join to enrichment sources
to create personas. Join to Direct Mail solicitation history and
test for correlation.
Tests: Data Integration, Processing Power, Analytics
13#strataconf #hadoopworld
Web Log
Salesforce
Epsilon
Direct Mail
History
POC Success
Join weblogs > CRM > Demographics > mailings
Enabled:
• Connected online activity to offline activity
• Created time series of events
• Created demographic profile
TOTAL ANALYSIS TIME = 2 hours
120k visits
4,252 ppl
14% in niche
35% income > $100k
83 sent DM after visit
**Use case figures are for illustration purposes only**
14#strataconf #hadoopworld
Summary of Key Benefits
 Provides a single platform to house key customer and
prospect data sources,
 Establishes persistent keys across previously disparate data
sources ,
 Provides for rapid intake of new data sources (structured and
unstructured),
 Eliminates today’s data intake and append bottleneck,
 Empowers Analysts to explore all data elements,
 Increases processing power for statistical analysis and,
 Improves recruiting and retention of Data Engineers and Data
Scientists
15#strataconf #hadoopworld
Solution Architecture
PowerCenterBigDataEdition
HDFSDataQualityBigDataEdition
IdentityResolution
HBase
Hive
Map
Reduce
Cleansed
Files
Individual Household
Informatica Big Data Edition Cloudera Big Data Platform
Visualizations
Big Data Analytics
Extract Load & Transform
Data Quality –Cleaning, Identity Resolution
Customer
Data
Partner
Files
Prospects
Enrichment
Inputs
CRM
Solicitation
History
Weblogs
PredictiveAnalytics
Visualizations
Consumption
Datameer
Predictive Analytics
16#strataconf #hadoopworld
Product Value Proposition
Cloudera Enterprise Data Hub
– Strong commitment to community driven, open source platform
– Security & Governance : Authentication; Authorization; Auditing; Data lineage
& Data discovery
– Strong presence in Financial Services Industry
Informatica
– Leverage existing skills; Visual development; Increased Productivity
– Prebuilt connectors: RDBMS, VSAM, OLAP, Salesforce, Social Media (Facebook,
LinkedIn, Twitter)
– Data Profiling & data quality on Hadoop: Identify data issues earlier; score
carding; cleanse, match and standardize on Hadoop; prebuilt data quality
rules, data masking
– Natural Language Processing (NLP): Mine semi-structured & unstructured data
(emails, twitter feeds, Facebook posts)
17#strataconf #hadoopworld
Product Value Proposition
Datameer
– Big Data Analytics: Analyze structured & semi-structured; data mining;
built for Hadoop
– Easy, Excel-like interface eliminates need to learn new programming
languages… Would appeal to more broad analytic user community
across the enterprise.
– Create, execute & test data pipelines natively on Hadoop
– Rapidly combine and enrich existing data sets; what-if scenarios
– Pre-stage data for Campaigns / Reporting
18#strataconf #hadoopworld
POC Highlights
 10 Node Cluster running CDH 5.0
 718 Million Rows of data from
1200+ input files
 Seven use cases with increasing
complexity
 30 TB of Data
19#strataconf #hadoopworld
POC Timeline
Task Participants Duration
Cloudera Install, Setup and
cluster certification for
Enterprise Data Hub, Security
review
Cloudera
Core Team
2 weeks
Installation, Data preparation,
ingestion, using PowerCenter
BigData Edition
Informatica /
TA Core Team
2 weeks
Profiling, Data cleansing and
standardization, aggregation,
de-duping, customer
identification using Data
Quality and Identity
Resolution
Informatica
TA Core Team
4 weeks
Visualizations, models, pmml,
campaign files using Big Data
Analytics using Datameer
Datameer
TA Core Team
2 weeks
SAS integration TA Core Team 1 week
Wrap up 1 week
20#strataconf #hadoopworld
POC Infrastructure
21#strataconf #hadoopworld
Infrastucuture
• Cloudera
• Enterprise Data Hub (5.0.1)
• Hive (0.12), Hue (3.5), Impala, Hbase (0.96),
Pig (0.12), Spark (0.9)
• 10 node cluster, 6 data nodes
• RHEL 6.5, 20 cores, 128 GB RAM
• 80TB usable space
• Informatica Big Data Governance Edition
• BDE 9.6.1
• Identity Match (MapIR)
• Datameer
22#strataconf #hadoopworld
Team
• Business (Sponsor, Data, Analytics, Campaign)
• PM
• BA
• IT (6)
• Operations (3 part time)
• Support staff
• Legal
• Procurement
• Security
23#strataconf #hadoopworld
Why Hbase?
• Faster update processing
• Leverage index to perform seek for upsert
processing instead of full scan against hdfs.
Huge gains in update processing times.
• Faster Analytics
• Can leverage Hbase index for improved
query performance
24#strataconf #hadoopworld
Informatica Developer (Big Data Edition)
25#strataconf #hadoopworld
Data Profiling on Hadoop
26#strataconf #hadoopworld
Score carding
27#strataconf #hadoopworld
Datameer – Workbooks
28#strataconf #hadoopworld
Datameer - Visualizations
29#strataconf #hadoopworld
Lessons Learned
• Invest in a PoC
– Tie to business use cases to demonstrate value
• Partner with key vendors
– Technology is changing rapidly
• Small team with the right skillset
– Naturally innovative individuals
– Can wear multiple hats
• Evangelize & sell your idea
– Partner with business
– Socialize the platform & the vision
• Big Data requires Data Governance
– Establish tools & processes to support data governance
– Data Stewards: Profile, validate, catalog, metadata creation, lineage
– Managed & Curated Data
• Align with larger enterprise strategy
30#strataconf #hadoopworld
Questions
31#strataconf #hadoopworld
Stephen Lloyd
stephen.lloyd@transamerica.com
http://about.me/stephenlloyd
David Beaudoin
david.beaudoin@transamerica.com
Vishal Bamba
vishal.bamba@transamerica.com
Twitter: @vishalbamba

Más contenido relacionado

La actualidad más candente

New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013IBM Sverige
 
Data governance datalakes_multitenancy
Data governance datalakes_multitenancyData governance datalakes_multitenancy
Data governance datalakes_multitenancySathish K S
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationDATAVERSITY
 
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...Neo4j
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives☁Jake Weaver ☁
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?DATUM LLC
 
Unlocking Greater Insights with Integrated Data Quality for Collibra
Unlocking Greater Insights with Integrated Data Quality for CollibraUnlocking Greater Insights with Integrated Data Quality for Collibra
Unlocking Greater Insights with Integrated Data Quality for CollibraPrecisely
 
2. Getvisibility. Innovative data governance, control & oversight
2. Getvisibility. Innovative data governance, control & oversight2. Getvisibility. Innovative data governance, control & oversight
2. Getvisibility. Innovative data governance, control & oversightVanessa Pulgarín Auquilla
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
Top 3 Hot Data Security And Privacy Technologies
Top 3 Hot Data Security And Privacy TechnologiesTop 3 Hot Data Security And Privacy Technologies
Top 3 Hot Data Security And Privacy TechnologiesTyrone Systems
 
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...DATAVERSITY
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsDatameer
 
Accelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data VirtualizationAccelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data VirtualizationDenodo
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesTony Pearson
 
Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”Jean-Michel Franco
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics OverviewKevin Dingle
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in KafkaSlides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in KafkaDATAVERSITY
 
Impact of BIG Data on MDM
Impact of BIG Data on MDMImpact of BIG Data on MDM
Impact of BIG Data on MDMSubhendu Dey
 

La actualidad más candente (20)

New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
 
Data governance datalakes_multitenancy
Data governance datalakes_multitenancyData governance datalakes_multitenancy
Data governance datalakes_multitenancy
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?
 
Unlocking Greater Insights with Integrated Data Quality for Collibra
Unlocking Greater Insights with Integrated Data Quality for CollibraUnlocking Greater Insights with Integrated Data Quality for Collibra
Unlocking Greater Insights with Integrated Data Quality for Collibra
 
2. Getvisibility. Innovative data governance, control & oversight
2. Getvisibility. Innovative data governance, control & oversight2. Getvisibility. Innovative data governance, control & oversight
2. Getvisibility. Innovative data governance, control & oversight
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
Top 3 Hot Data Security And Privacy Technologies
Top 3 Hot Data Security And Privacy TechnologiesTop 3 Hot Data Security And Privacy Technologies
Top 3 Hot Data Security And Privacy Technologies
 
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data Analytics
 
Accelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data VirtualizationAccelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data Virtualization
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics Overview
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in KafkaSlides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
 
Impact of BIG Data on MDM
Impact of BIG Data on MDMImpact of BIG Data on MDM
Impact of BIG Data on MDM
 

Similar a Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)

Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Precisely
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyNeo4j
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...Big Data Week
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksMapR Technologies
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360Capgemini
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Jeffrey T. Pollock
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyDataWorks Summit
 
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIFive Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIInside Analysis
 

Similar a Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1) (20)

Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph Technology
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIFive Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BI
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 

Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)

  • 1. Customer Intelligence – Harnessing Elephants at Transamerica Stephen Lloyd BI Architect, Transamerica Life & Protection #strataconf #hadoopworld Vishal Bamba Chief Architect Transamerica Life & Protection Dave Beaudoin AVP Transamerica Investments & Retirement
  • 2. 2#strataconf #hadoopworld Agenda • About Transamerica • Current State and Data Architecture • Use Cases / Learnings • POC Highlights • Solution Design • Lessons Learned • Questions / Discussion
  • 3. 3#strataconf #hadoopworld Transamerica Org Structure • Investments & Retirement – Retirement and Benefit Plan services to employers/employees – Mutual funds and variable annuities – Mission to help people save and invest wisely to secure their retirement dreams, the I&R business unit serves more than 3 million retirement plan participants across the entire spectrum of defined benefit and defined contribution plans. • Life & Protection – Term and Perm insurance products – Medicare supplement, long term care, accidental death, final expense – Mission to protect what you’ve built, secure what’s next.
  • 4. 4#strataconf #hadoopworld Data Architecture • Rich data environment across organizational business units, comprised of many source systems across various platforms • A consistent enterprise view of data across business units is required
  • 6. 6#strataconf #hadoopworld Data Architecture As in many industries, we are focused on leveraging technology to build data-driven customer relationships. How can the current data architecture support this strategic direction?
  • 7. 7#strataconf #hadoopworld Data Architecture Enterprise Data Management Crossroads Another traditional warehouse project? or Enterprise data hub/lake/ocean with new technologies?
  • 9. 9#strataconf #hadoopworld • 360 degree view of consumers for marketing, planning, and analytics • Discover and mine relationships • Create highly targeted and individualized marketing programs The Vision Enterprise Marketing and Analytics Platform The How • Co-location, master data management, custom data quality and cleansing rules and more • Which allow integration of data from across and outside of the company to create 360 view of consumers
  • 10. 10#strataconf #hadoopworld Use Cases Complete a Data Append Process Append Data Elements from Enrichment sources to Customer/Prospect records Tests: Data Integration, Processing Power
  • 11. 11#strataconf #hadoopworld Use Cases Score Prospects with a Predictive Model Score prospects with the current model to predict likelihood to respond to a direct marketing offer Tests: Processing Power, Predictive Modeling
  • 12. 12#strataconf #hadoopworld Use Cases Measure online contracts/sales, create visitor personas, create simple attribution Match weblog visitor data to Salesforce leads and to policy holders to generate a sales pipeline. Join to enrichment sources to create personas. Join to Direct Mail solicitation history and test for correlation. Tests: Data Integration, Processing Power, Analytics
  • 13. 13#strataconf #hadoopworld Web Log Salesforce Epsilon Direct Mail History POC Success Join weblogs > CRM > Demographics > mailings Enabled: • Connected online activity to offline activity • Created time series of events • Created demographic profile TOTAL ANALYSIS TIME = 2 hours 120k visits 4,252 ppl 14% in niche 35% income > $100k 83 sent DM after visit **Use case figures are for illustration purposes only**
  • 14. 14#strataconf #hadoopworld Summary of Key Benefits  Provides a single platform to house key customer and prospect data sources,  Establishes persistent keys across previously disparate data sources ,  Provides for rapid intake of new data sources (structured and unstructured),  Eliminates today’s data intake and append bottleneck,  Empowers Analysts to explore all data elements,  Increases processing power for statistical analysis and,  Improves recruiting and retention of Data Engineers and Data Scientists
  • 15. 15#strataconf #hadoopworld Solution Architecture PowerCenterBigDataEdition HDFSDataQualityBigDataEdition IdentityResolution HBase Hive Map Reduce Cleansed Files Individual Household Informatica Big Data Edition Cloudera Big Data Platform Visualizations Big Data Analytics Extract Load & Transform Data Quality –Cleaning, Identity Resolution Customer Data Partner Files Prospects Enrichment Inputs CRM Solicitation History Weblogs PredictiveAnalytics Visualizations Consumption Datameer Predictive Analytics
  • 16. 16#strataconf #hadoopworld Product Value Proposition Cloudera Enterprise Data Hub – Strong commitment to community driven, open source platform – Security & Governance : Authentication; Authorization; Auditing; Data lineage & Data discovery – Strong presence in Financial Services Industry Informatica – Leverage existing skills; Visual development; Increased Productivity – Prebuilt connectors: RDBMS, VSAM, OLAP, Salesforce, Social Media (Facebook, LinkedIn, Twitter) – Data Profiling & data quality on Hadoop: Identify data issues earlier; score carding; cleanse, match and standardize on Hadoop; prebuilt data quality rules, data masking – Natural Language Processing (NLP): Mine semi-structured & unstructured data (emails, twitter feeds, Facebook posts)
  • 17. 17#strataconf #hadoopworld Product Value Proposition Datameer – Big Data Analytics: Analyze structured & semi-structured; data mining; built for Hadoop – Easy, Excel-like interface eliminates need to learn new programming languages… Would appeal to more broad analytic user community across the enterprise. – Create, execute & test data pipelines natively on Hadoop – Rapidly combine and enrich existing data sets; what-if scenarios – Pre-stage data for Campaigns / Reporting
  • 18. 18#strataconf #hadoopworld POC Highlights  10 Node Cluster running CDH 5.0  718 Million Rows of data from 1200+ input files  Seven use cases with increasing complexity  30 TB of Data
  • 19. 19#strataconf #hadoopworld POC Timeline Task Participants Duration Cloudera Install, Setup and cluster certification for Enterprise Data Hub, Security review Cloudera Core Team 2 weeks Installation, Data preparation, ingestion, using PowerCenter BigData Edition Informatica / TA Core Team 2 weeks Profiling, Data cleansing and standardization, aggregation, de-duping, customer identification using Data Quality and Identity Resolution Informatica TA Core Team 4 weeks Visualizations, models, pmml, campaign files using Big Data Analytics using Datameer Datameer TA Core Team 2 weeks SAS integration TA Core Team 1 week Wrap up 1 week
  • 21. 21#strataconf #hadoopworld Infrastucuture • Cloudera • Enterprise Data Hub (5.0.1) • Hive (0.12), Hue (3.5), Impala, Hbase (0.96), Pig (0.12), Spark (0.9) • 10 node cluster, 6 data nodes • RHEL 6.5, 20 cores, 128 GB RAM • 80TB usable space • Informatica Big Data Governance Edition • BDE 9.6.1 • Identity Match (MapIR) • Datameer
  • 22. 22#strataconf #hadoopworld Team • Business (Sponsor, Data, Analytics, Campaign) • PM • BA • IT (6) • Operations (3 part time) • Support staff • Legal • Procurement • Security
  • 23. 23#strataconf #hadoopworld Why Hbase? • Faster update processing • Leverage index to perform seek for upsert processing instead of full scan against hdfs. Huge gains in update processing times. • Faster Analytics • Can leverage Hbase index for improved query performance
  • 29. 29#strataconf #hadoopworld Lessons Learned • Invest in a PoC – Tie to business use cases to demonstrate value • Partner with key vendors – Technology is changing rapidly • Small team with the right skillset – Naturally innovative individuals – Can wear multiple hats • Evangelize & sell your idea – Partner with business – Socialize the platform & the vision • Big Data requires Data Governance – Establish tools & processes to support data governance – Data Stewards: Profile, validate, catalog, metadata creation, lineage – Managed & Curated Data • Align with larger enterprise strategy
  • 31. 31#strataconf #hadoopworld Stephen Lloyd stephen.lloyd@transamerica.com http://about.me/stephenlloyd David Beaudoin david.beaudoin@transamerica.com Vishal Bamba vishal.bamba@transamerica.com Twitter: @vishalbamba

Notas del editor

  1. Get rid of append