SlideShare a Scribd company logo
1 of 29
How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY
© 2015 Progress Software Corporation. All rights reserved.2
Audio Bridge Options & Question Submission
How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY
© 2015 Progress Software Corporation. All rights reserved.4
Agenda
 What is a Marketing Data Lake?
 Industry trends around accessing marketing
data in SaaS applications
 How to ingest data with Apache Sqoop and
Apache Falcon directly from SaaS applications
 How big data vendors can embed SaaS
connectivity
© 2015 Progress Software Corporation. All rights reserved.5
What is a Marketing Data Lake?
© 2015 Progress Software Corporation. All rights reserved.6
A data lake is a large-scale storage repository and
processing engine. A data lake provides "massive
storage for any kind of data, enormous processing
power and the ability to handle virtually limitless
concurrent tasks or jobs”
- SAS Institute
What is a Marketing Data Lake?
© 2015 Progress Software Corporation. All rights reserved.7
Benefits of a Marketing Data Lake?
Some of the benefits of a data lake include:
 Store data in all shapes and sizes
 Flexible analytics with “schema on read”
 Query data using SQL or big data
programming frameworks
 Eliminate data silos
© 2015 Progress Software Corporation. All rights reserved.8
Why Marketing Data?
 CMOs will outspend CIOs on technology by 2017
(Gartner)
 Oracle spent $3B on a martech aquisition spree to
gain CMO mindshare.
 Expect more collaboration between CMO and CIO
(CIO.com)
 Modern Marketing Data Warehouse Webinar ~500
registrations (Progress)
© 2015 Progress Software Corporation. All rights reserved.9
Industry trends around accessing
marketing data in SaaS applications
© 2015 Progress Software Corporation. All rights reserved.10
It’s easy to forget that it’s still about solving real business problems.
Relevant data
Transaction / behavior history
Manage
Data
Perform
Analytics
Drive
Decisions
Insights
continuous feedback loop
Appropriate
data sources
Answers to
business questions
Strategy (Thinking) Moves Right to Left
Implementation Moves Left to Right
Before you think data, think decisions!
© 2015 Progress Software Corporation. All rights reserved.11
Our marketing data is almost all in the cloud
CRM
Web
Behavior
Mobile
Behavior
Search
Buys
Display
Buys
Owned
Social
Public SocialMeta-Data
And it’s almost all complex, stream data – which means APIs that only
give aggregations aren’t too useful
© 2015 Progress Software Corporation. All rights reserved.12
Detail is important because this digital data is true big data
The
relationship
between
events is
critical
© 2015 Progress Software Corporation. All rights reserved.13
We’re almost never solving for one problem with a big data system
Reporting Analytics
Summarized
Data
Segmented
Data
Detail
Data
We can’t just aggregate / We can’t not aggregate
Dashboarding
Campaign
Optimization
Customer
Drill-down
Attribution, CLTV,
Experience,
Personalization
Targeting
Forecasting
© 2015 Progress Software Corporation. All rights reserved.14
Segmentation is a one important technique to aggregate and join
Customer
segmentation
Visit type
identification
RFM models
KPDs and
metrics
Measu
remen
t
Found
ation
Customers
v. prospects
Owned
products
Persona
Product
focused
Shopping
focused
Social
focused
Customer
service
Measurement
of success
specific to
each segment
and visit
Recency and
frequency for
every
segment and
visit type
Additional
metrics that
help identify
drivers of
success
Segmentation allows for effective aggregation of the meaning and
outcome of streamed event data:
Measurement
foundation
© 2015 Progress Software Corporation. All rights reserved.15
End-to-End Strategies
ReportCubeParkFull Detail
ReportParkFull Detail
 Most organizations do
some combination of at
least 1 & 2
 Direct to Detail (2) has
many advantages if it can
be made performant (more
flexible reporting and much
less maintenance)
 Semi-Detail (3) is designed
to capture most of the
advantages of (2) when (2)
isn’t performantReport
Semi-
Detail
ParkFull Detail
1
2
3
© 2015 Progress Software Corporation. All rights reserved.16
How to ingest data with Apache
Sqoop and Apache Falcon directly
from SaaS applications
© 2015 Progress Software Corporation. All rights reserved.17
What is Apache Sqoop?
Apache Sqoop
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache
Hadoop and structured datastores such as relational databases.
Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level
Apache project
http://sqoop.apache.org/
© 2015 Progress Software Corporation. All rights reserved.18
What is Apache Falcon?
Apache Falcon
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters.
https://falcon.apache.org/
Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate
database driver to connect to the relational database. Please refer to the Sqoop
documentation for any Sqoop related question. Please make sure the database driver
jar is copied into oozie share lib for Sqoop.
© 2015 Progress Software Corporation. All rights reserved.19
Data in SaaS Applications is Siloed, Protected by Proprietary APIs Designed
for Process Integration, not Data Integration
© 2015 Progress Software Corporation. All rights reserved.20
How to ingest data directly from SaaS applications
© 2015 Progress Software Corporation. All rights reserved.21
JDBC access to SaaS data
Progress DataDirect
JDBC Connector
Schema Manager
Apache Sqoop
Salesforce.com
Schema
User Defined
Schema
Driver uses
 SOAP API
 Bulk API
 Metadata API
© 2015 Progress Software Corporation. All rights reserved.22
Geek Speak
$ sqoop help import
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
--connect <jdbc-uri> Specify JDBC connect string
--connect-manager <jdbc-uri> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-mapred-home <dir>+ Override $HADOOP_MAPRED_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--hadoop-home <dir>+ Deprecated. Override $HADOOP_HOME
© 2015 Progress Software Corporation. All rights reserved.23
Why ISVs are turning to a single interface for SaaS?
 Get JDBC interface on top of any API
Data Source API
Eloqua Web Services API (REST/SOAP)
Bulk and non-Bulk APIs
No query language
Oracle Service Cloud Web Services APIs (REST/SOAP)
ROQL
Google Analytics Hypercube (query limits of 10 metrics grouped by
max of 7 dimensions)
Veeva CRM SOAP, BULK, Metadata APIs
SOQL
© 2015 Progress Software Corporation. All rights reserved.24
As the Market Switches from ETL to ELT,
Data Access is critical
ETLELT
Extract
Transform
Load View
Operational Systems Staging Area Data Warehouse Analytics Apps
Operational Systems
Extract &
Load
Big Data Warehouse
Transform
& View
Analytics, Data Prep,
and even traditional DW
© 2015 Progress Software Corporation. All rights reserved.25
How big data vendors are embeding
SaaS connectivity
© 2015 Progress Software Corporation. All rights reserved.26
Progress DataDirect
Embed Sales & Marketing Connectors into the Data Access Layer
© 2015 Progress Software Corporation. All rights reserved.27
Ingest data across 200+ data sources (beyond marketing data sources)
Big Data/NoSQL
 Apache Hadoop Hive
 Cloudera
 Hortonworks
 Pivotal HD
 MapR
 EMR
 Pivotal HAWQ
 Cloudera Impala
 MongoDB
 Spark SQL
 Cassandra
 SAP HANA
Data Warehouses
 Amazon Redshift
 SAP Sybase IQ
 Teradata
 Pivotal Greenplum
Relational
 Oracle DB
 Microsoft SQL Server
 IBM DB2
 MySQL
 PostgreSQL
 IBM Informix
 SAP Sybase
 Pervasive SQL
 Progress OpenEdge
 Progress Rollbase
SaaS/Cloud
 Salesforce.com
 Database.com
 FinancialForce
 Veeva CRM
 ServiceMAX
 Any Force.com App
 Hubspot
 Marketo
 Microsoft Dynamics CRM
 Microsoft SQL Azure
 Oracle Eloqua
 Oracle Service Cloud
 Google Analytics
EDI/XML/Text
 EDIFACT
 EDIG@S
 EANCOM
 X12
 IATA
 Healthcare EDI: X12, HIPAA,
ICD-10, HL7
 Custom EDI
 Flat files: CSV, TSV, dBase,
Clipper, Foxpro, Paradox
 Text Files
Any
 SDK
 SequeLink Socket Server
 Customer Engineering
© 2015 Progress Software Corporation. All rights reserved.28
Single API for data lake ingestion from SaaS sources
 Ingest data against a single API (JDBC)
 Get a single dedicated partner
 Connect to unlimited data with a single API
 Get unlimited support
How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY

More Related Content

What's hot

Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 

What's hot (20)

Lightning Connect: Lessons Learned
Lightning Connect: Lessons LearnedLightning Connect: Lessons Learned
Lightning Connect: Lessons Learned
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Data APIs Don't Discriminate [API World Stage Talk]
Data APIs Don't Discriminate [API World Stage Talk]Data APIs Don't Discriminate [API World Stage Talk]
Data APIs Don't Discriminate [API World Stage Talk]
 
OData Hackathon Challenge
OData Hackathon ChallengeOData Hackathon Challenge
OData Hackathon Challenge
 
OData and the future of business objects universes
OData and the future of business objects universesOData and the future of business objects universes
OData and the future of business objects universes
 
Firewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data accessFirewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data access
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Flash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lonFlash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lon
 
The Power Of Snowflake for SAP BusinessObjects
The Power Of Snowflake for SAP BusinessObjectsThe Power Of Snowflake for SAP BusinessObjects
The Power Of Snowflake for SAP BusinessObjects
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Moving OBIEE to Oracle Analytics Cloud
Moving OBIEE to Oracle Analytics CloudMoving OBIEE to Oracle Analytics Cloud
Moving OBIEE to Oracle Analytics Cloud
 
SQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce AnalyticsSQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce Analytics
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Salesforce External Objects for Big Data
Salesforce External Objects for Big DataSalesforce External Objects for Big Data
Salesforce External Objects for Big Data
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics Cloud
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Oracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementOracle Enterprise Metadata Management
Oracle Enterprise Metadata Management
 
Talend MDM
Talend MDMTalend MDM
Talend MDM
 

Viewers also liked

IBM Becoming a Cloud Service Provider White Paper
IBM Becoming a Cloud Service Provider White PaperIBM Becoming a Cloud Service Provider White Paper
IBM Becoming a Cloud Service Provider White Paper
Mauricio Godoy
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 

Viewers also liked (12)

The IBM Platform Cloud Service
 The IBM Platform Cloud Service The IBM Platform Cloud Service
The IBM Platform Cloud Service
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
 
IBM Becoming a Cloud Service Provider White Paper
IBM Becoming a Cloud Service Provider White PaperIBM Becoming a Cloud Service Provider White Paper
IBM Becoming a Cloud Service Provider White Paper
 
Analyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectAnalyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObject
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
 
CRM Maturity Assessment
CRM Maturity AssessmentCRM Maturity Assessment
CRM Maturity Assessment
 
Ensemble modeling overview, Big Data meetup
Ensemble modeling overview, Big Data meetupEnsemble modeling overview, Big Data meetup
Ensemble modeling overview, Big Data meetup
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Oracle Cloud Reference Architecture
Oracle Cloud Reference ArchitectureOracle Cloud Reference Architecture
Oracle Cloud Reference Architecture
 
2016 Stackies Awards: 41 Marketing Technology Stacks
2016 Stackies Awards: 41 Marketing Technology Stacks2016 Stackies Awards: 41 Marketing Technology Stacks
2016 Stackies Awards: 41 Marketing Technology Stacks
 
Marketing data analytics
Marketing data analyticsMarketing data analytics
Marketing data analytics
 

Similar to Building a marketing data lake

IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
Adrian Turcu
 
Big data tim
Big data timBig data tim
Big data tim
T Weir
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 

Similar to Building a marketing data lake (20)

IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
 
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactAutomate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business Impact
 
Geekier Analytics for SaaS data
Geekier Analytics for SaaS dataGeekier Analytics for SaaS data
Geekier Analytics for SaaS data
 
Self-service data discovery for business users and analysts using SAP Lumira
Self-service data discovery for business users and analysts using SAP LumiraSelf-service data discovery for business users and analysts using SAP Lumira
Self-service data discovery for business users and analysts using SAP Lumira
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Oracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldOracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorld
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
SAP Database Platform, ASE & IoT Roadmap
SAP Database Platform, ASE & IoT RoadmapSAP Database Platform, ASE & IoT Roadmap
SAP Database Platform, ASE & IoT Roadmap
 
Deploy s4 hana
Deploy s4 hanaDeploy s4 hana
Deploy s4 hana
 
3.1 oracle salonika
3.1 oracle salonika3.1 oracle salonika
3.1 oracle salonika
 
Ciber SAP Tech Ed 2013 takeaway presentation
Ciber SAP Tech Ed 2013 takeaway presentationCiber SAP Tech Ed 2013 takeaway presentation
Ciber SAP Tech Ed 2013 takeaway presentation
 
Pivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow KeynotePivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow Keynote
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
SAP Strategy & Innovation Update - Feb 2016 APJ
SAP Strategy & Innovation Update - Feb 2016 APJSAP Strategy & Innovation Update - Feb 2016 APJ
SAP Strategy & Innovation Update - Feb 2016 APJ
 
01 sap inside_track_sapintegrationstrategy
01 sap inside_track_sapintegrationstrategy01 sap inside_track_sapintegrationstrategy
01 sap inside_track_sapintegrationstrategy
 
Big data tim
Big data timBig data tim
Big data tim
 
Rolta iot analytics 17 mar 2015
Rolta iot analytics 17 mar 2015Rolta iot analytics 17 mar 2015
Rolta iot analytics 17 mar 2015
 
SAP API Management sap insider webinar intelligent business operations netw...
SAP API Management   sap insider webinar intelligent business operations netw...SAP API Management   sap insider webinar intelligent business operations netw...
SAP API Management sap insider webinar intelligent business operations netw...
 
Applications Mobiles et Analytiques avec SAP HANA Cloud Platform
Applications Mobiles et Analytiques avec SAP HANA Cloud PlatformApplications Mobiles et Analytiques avec SAP HANA Cloud Platform
Applications Mobiles et Analytiques avec SAP HANA Cloud Platform
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
 

More from Sumit Sarkar

More from Sumit Sarkar (6)

What serverless means for enterprise apps
What serverless means for enterprise appsWhat serverless means for enterprise apps
What serverless means for enterprise apps
 
Digitize Enterprise Assets for Mobility
Digitize Enterprise Assets for MobilityDigitize Enterprise Assets for Mobility
Digitize Enterprise Assets for Mobility
 
Welcome to the Era of Open Analytics
Welcome to the Era of Open AnalyticsWelcome to the Era of Open Analytics
Welcome to the Era of Open Analytics
 
Salesforce Connect External Object Reports
Salesforce Connect External Object ReportsSalesforce Connect External Object Reports
Salesforce Connect External Object Reports
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI Connectors
 
Ibis 2015 final template
Ibis 2015 final templateIbis 2015 final template
Ibis 2015 final template
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Building a marketing data lake

  • 1. How Big Data ISVs get marketing data into lakes Sumit Sarkar Chief Data Evangelist Progress DataDirect Gary Angel Advisory Digital Analytics Center of Excellence Principal EY
  • 2. © 2015 Progress Software Corporation. All rights reserved.2 Audio Bridge Options & Question Submission
  • 3. How Big Data ISVs get marketing data into lakes Sumit Sarkar Chief Data Evangelist Progress DataDirect Gary Angel Advisory Digital Analytics Center of Excellence Principal EY
  • 4. © 2015 Progress Software Corporation. All rights reserved.4 Agenda  What is a Marketing Data Lake?  Industry trends around accessing marketing data in SaaS applications  How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications  How big data vendors can embed SaaS connectivity
  • 5. © 2015 Progress Software Corporation. All rights reserved.5 What is a Marketing Data Lake?
  • 6. © 2015 Progress Software Corporation. All rights reserved.6 A data lake is a large-scale storage repository and processing engine. A data lake provides "massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs” - SAS Institute What is a Marketing Data Lake?
  • 7. © 2015 Progress Software Corporation. All rights reserved.7 Benefits of a Marketing Data Lake? Some of the benefits of a data lake include:  Store data in all shapes and sizes  Flexible analytics with “schema on read”  Query data using SQL or big data programming frameworks  Eliminate data silos
  • 8. © 2015 Progress Software Corporation. All rights reserved.8 Why Marketing Data?  CMOs will outspend CIOs on technology by 2017 (Gartner)  Oracle spent $3B on a martech aquisition spree to gain CMO mindshare.  Expect more collaboration between CMO and CIO (CIO.com)  Modern Marketing Data Warehouse Webinar ~500 registrations (Progress)
  • 9. © 2015 Progress Software Corporation. All rights reserved.9 Industry trends around accessing marketing data in SaaS applications
  • 10. © 2015 Progress Software Corporation. All rights reserved.10 It’s easy to forget that it’s still about solving real business problems. Relevant data Transaction / behavior history Manage Data Perform Analytics Drive Decisions Insights continuous feedback loop Appropriate data sources Answers to business questions Strategy (Thinking) Moves Right to Left Implementation Moves Left to Right Before you think data, think decisions!
  • 11. © 2015 Progress Software Corporation. All rights reserved.11 Our marketing data is almost all in the cloud CRM Web Behavior Mobile Behavior Search Buys Display Buys Owned Social Public SocialMeta-Data And it’s almost all complex, stream data – which means APIs that only give aggregations aren’t too useful
  • 12. © 2015 Progress Software Corporation. All rights reserved.12 Detail is important because this digital data is true big data The relationship between events is critical
  • 13. © 2015 Progress Software Corporation. All rights reserved.13 We’re almost never solving for one problem with a big data system Reporting Analytics Summarized Data Segmented Data Detail Data We can’t just aggregate / We can’t not aggregate Dashboarding Campaign Optimization Customer Drill-down Attribution, CLTV, Experience, Personalization Targeting Forecasting
  • 14. © 2015 Progress Software Corporation. All rights reserved.14 Segmentation is a one important technique to aggregate and join Customer segmentation Visit type identification RFM models KPDs and metrics Measu remen t Found ation Customers v. prospects Owned products Persona Product focused Shopping focused Social focused Customer service Measurement of success specific to each segment and visit Recency and frequency for every segment and visit type Additional metrics that help identify drivers of success Segmentation allows for effective aggregation of the meaning and outcome of streamed event data: Measurement foundation
  • 15. © 2015 Progress Software Corporation. All rights reserved.15 End-to-End Strategies ReportCubeParkFull Detail ReportParkFull Detail  Most organizations do some combination of at least 1 & 2  Direct to Detail (2) has many advantages if it can be made performant (more flexible reporting and much less maintenance)  Semi-Detail (3) is designed to capture most of the advantages of (2) when (2) isn’t performantReport Semi- Detail ParkFull Detail 1 2 3
  • 16. © 2015 Progress Software Corporation. All rights reserved.16 How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications
  • 17. © 2015 Progress Software Corporation. All rights reserved.17 What is Apache Sqoop? Apache Sqoop Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project http://sqoop.apache.org/
  • 18. © 2015 Progress Software Corporation. All rights reserved.18 What is Apache Falcon? Apache Falcon Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters. https://falcon.apache.org/ Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate database driver to connect to the relational database. Please refer to the Sqoop documentation for any Sqoop related question. Please make sure the database driver jar is copied into oozie share lib for Sqoop.
  • 19. © 2015 Progress Software Corporation. All rights reserved.19 Data in SaaS Applications is Siloed, Protected by Proprietary APIs Designed for Process Integration, not Data Integration
  • 20. © 2015 Progress Software Corporation. All rights reserved.20 How to ingest data directly from SaaS applications
  • 21. © 2015 Progress Software Corporation. All rights reserved.21 JDBC access to SaaS data Progress DataDirect JDBC Connector Schema Manager Apache Sqoop Salesforce.com Schema User Defined Schema Driver uses  SOAP API  Bulk API  Metadata API
  • 22. © 2015 Progress Software Corporation. All rights reserved.22 Geek Speak $ sqoop help import usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connect-manager <jdbc-uri> Specify connection manager class to use --driver <class-name> Manually specify JDBC driver class to use --hadoop-mapred-home <dir>+ Override $HADOOP_MAPRED_HOME --help Print usage instructions -P Read password from console --password <password> Set authentication password --username <username> Set authentication username --verbose Print more information while working --hadoop-home <dir>+ Deprecated. Override $HADOOP_HOME
  • 23. © 2015 Progress Software Corporation. All rights reserved.23 Why ISVs are turning to a single interface for SaaS?  Get JDBC interface on top of any API Data Source API Eloqua Web Services API (REST/SOAP) Bulk and non-Bulk APIs No query language Oracle Service Cloud Web Services APIs (REST/SOAP) ROQL Google Analytics Hypercube (query limits of 10 metrics grouped by max of 7 dimensions) Veeva CRM SOAP, BULK, Metadata APIs SOQL
  • 24. © 2015 Progress Software Corporation. All rights reserved.24 As the Market Switches from ETL to ELT, Data Access is critical ETLELT Extract Transform Load View Operational Systems Staging Area Data Warehouse Analytics Apps Operational Systems Extract & Load Big Data Warehouse Transform & View Analytics, Data Prep, and even traditional DW
  • 25. © 2015 Progress Software Corporation. All rights reserved.25 How big data vendors are embeding SaaS connectivity
  • 26. © 2015 Progress Software Corporation. All rights reserved.26 Progress DataDirect Embed Sales & Marketing Connectors into the Data Access Layer
  • 27. © 2015 Progress Software Corporation. All rights reserved.27 Ingest data across 200+ data sources (beyond marketing data sources) Big Data/NoSQL  Apache Hadoop Hive  Cloudera  Hortonworks  Pivotal HD  MapR  EMR  Pivotal HAWQ  Cloudera Impala  MongoDB  Spark SQL  Cassandra  SAP HANA Data Warehouses  Amazon Redshift  SAP Sybase IQ  Teradata  Pivotal Greenplum Relational  Oracle DB  Microsoft SQL Server  IBM DB2  MySQL  PostgreSQL  IBM Informix  SAP Sybase  Pervasive SQL  Progress OpenEdge  Progress Rollbase SaaS/Cloud  Salesforce.com  Database.com  FinancialForce  Veeva CRM  ServiceMAX  Any Force.com App  Hubspot  Marketo  Microsoft Dynamics CRM  Microsoft SQL Azure  Oracle Eloqua  Oracle Service Cloud  Google Analytics EDI/XML/Text  EDIFACT  EDIG@S  EANCOM  X12  IATA  Healthcare EDI: X12, HIPAA, ICD-10, HL7  Custom EDI  Flat files: CSV, TSV, dBase, Clipper, Foxpro, Paradox  Text Files Any  SDK  SequeLink Socket Server  Customer Engineering
  • 28. © 2015 Progress Software Corporation. All rights reserved.28 Single API for data lake ingestion from SaaS sources  Ingest data against a single API (JDBC)  Get a single dedicated partner  Connect to unlimited data with a single API  Get unlimited support
  • 29. How Big Data ISVs get marketing data into lakes Sumit Sarkar Chief Data Evangelist Progress DataDirect Gary Angel Advisory Digital Analytics Center of Excellence Principal EY

Editor's Notes

  1. How Big Data ISVs get marketing data into lakes   Marketing data is driving significant new Big Data investments from CIO and CMO offices.  The latest Big Data trend is storing that data in lakes for analytics, providing massive storage for any type of data to be used for 360 customer views, predictive lead scoring, personalization, or sentiment analysis. However, marketing data is increasingly stored in the cloud creating a connectivity challenge.  Big Data vendors provide facilities to transfer core business data between relational database systems and Data Lakes, such as Apache Sqoop. But what about cloud data sources where existing Apache Sqoop connection managers do not work well with cloud SaaS APIs, each with a proprietary REST or SOAP API? The key to accelerating adoption of big data technology is providing easy access to disparate cloud data sources such as Salesforce, Oracle CX, Marketo, Eloqua, Google Analytics or Adobe Omniture. Competitive advantage then results from having embedded connectivity within your technology for data ingestion to an organization’s most important data, customer data.   Join this informative and entertaining webinar as we explore: What is a Marketing Data Lake? Industry trends around accessing marketing data in SaaS applications How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications How big data vendors can embed SaaS connectivity Speaker(s): Sumit Sarkar, Data Connectivity Evangelist, Progress Software Gary Angel, Advisory Digital Analytics Center of Excellence Principle, Ernst & Young   Asset(s): Follow-up asset sent in email Mike Johnson’s blog: https://www.progress.com/blogs/are-you-ready-to-go-fishing-in-a-data-lake
  2. Give Attendees a closer look at the control panel and how they can participate. Join Audio: 2 ways to do so, 1) to use VoIP, click on “Mic & Speakers”, or 2) to use your telephone, click on “telephone” and dial-in using the numbers and information provided 2) All lines are muted for today’s webinar. We do plan to have a live Q&A session at the end of the presentations. However if you have a question at any time during this webinar, simply submit your questions via the “Question” section of the webinar interface located to the right of your screen – we will collect all questions through this “Question Window”. Final Note: we are recording today’s webinar and will posted to PartnerLink
  3. Why ISVs? Strata: big data vendors, data prep, data pipelines, data management, etc Data Lakes are part of the solution.
  4. Last webinar was around building a Marketing Data Warehouse. Data Warehouse is “Schema on Write” architecture and typically loaded with ETL tools Data Lakes are loaded with raw data (no “T”) and create the “Schema on Read” on business demand
  5. The kinds of data from which you can derive value are unlimited. You can store all types of structured and unstructured data in a data lake, from CRM data, to social media posts. You don’t have to have all the answers upfront. Simply store raw data—you can refine it as your understanding and insight improves. You have no limits on how you can query the data. You can use a variety of tools to gain insight into what the data means. You don’t create any more silos. You gain a democratized access with a single, unified view of data across the organization. http://info.zaloni.com/hubfs/Architecting_Data_Lakes_Zaloni.pdf By Ben Sharma and Alice LaPlante
  6. Source: http://www.cio.com/article/2825086/cio-role/is-the-cio-cmo-transition-of-power-becoming-a-reality.html
  7. http://info.zaloni.com/hubfs/Architecting_Data_Lakes_Zaloni.pdf By Ben Sharma and Alice LaPlante
  8. Traditionally positioned for RDBMS via JDBC. There are specialized connectors for sources such as MySQL or Postgres; and generic JDBC for any third party.
  9. Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate database driver to connect to the relational database. Please refer to the Sqoop documentation for any Sqoop related question. Please make sure the database driver jar is copied into oozie share lib for Sqoop. Commercial data lake management solutions are available from many of Hadoop vendors (Cloudera Navigator), as well as standalone from companies such as Zaloni and Podium Data.
  10. bash-4.1$ sqoop import --connect "jdbc:datadirect:sforce:SecurityToken=3jZ0x4NcgClYDhxJqMa3c744://test.salesforce.com;User=ids.integration@hp.com.fltesta;Password=informatica@123;DatabaseName=sandbox" --query 'SELECT TOP 10 t.* FROM Case as t WHERE $CONDITIONS' -m 1 --target-dir /sample/table/q50 --driver com.ddtek.jdbc.sforce.SForceDriver --verbose
  11. R&D challenges building SQL connectivity across cloud sources such as Marketo Not all SaaS APIs expose a standard query language. In those cases, the engineering team looks at each object individually. Each object may be exposed with a different API with unique rules for invoking, searching filtering, etc. It required a significant effort to provide a standard experience querying across the entire data model. Handling full join capabilities. In cases where the SaaS APIs do not support a query language with JOIN capability, the engineering team has to perform that operation. This requires a translation from SQL to efficiently call Marketo APIs to return the minimal amount of data prior to performing the join. When joining two very large objects, the data access layer may use up considerable resources on the application server or desktop. Therefore, deployment of the data access layer to an elastic cloud service such as DataDirect Cloud makes a lot of sense for two reasons: Faster performance and use fewer memory/CPU resources on the client application server or desktop Leverage the superior bandwidth between DataDirect Cloud and Marketo where pre-joined datasets get exchanged. How to handle data models? Is it static or dynamic? How are changes detected and communicated to the client? Each SaaS data source is different and in the case of Marketo, certain objects are better queried through views and others through tables. Handling this matrix of data models and objects across all SaaS sources was certainly a challenge.
  12. 350+ ISVs 10,000 DEUs We’re excited to get MongoDB data into the hands of more people through open data standards
  13. Develop against open standards Avoid vendor lock-in by adopting open industry standards. DataDirect is the leader in data connectivity standards having co-founded the ODBC specification and serves on the JDBC Expert Group, OData Technical Committee and ANSI SQL Committee.  Connect to unlimited data with a single API Access the full breadth of data sources using a single, decoupled, code base and API for the data access layer protecting you from changes in metadata, error handling, and API or protocol revisions. Get a single dedicated partner Deliver full support for the breadth of data sources in all shapes and sizes, with constant vigilance for the next security vulnerability (POODLE, FREAK, LOGJAM) in your data access layer.Focus your engineering resources on your core business.  Get unlimited support We live for your next big customer. Make sure your POC is a success with 24/7 partner support and access to expertise from our engineering teams, partnerships and leading technology companies such as Microsoft, Oracle, and IBM through our TSANet multi vendor support channel.