"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Building a marketing data lake
1. How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY
3. How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY
29. How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY
Editor's Notes
How Big Data ISVs get marketing data into lakes
Marketing data is driving significant new Big Data investments from CIO and CMO offices. The latest Big Data trend is storing that data in lakes for analytics, providing massive storage for any type of data to be used for 360 customer views, predictive lead scoring, personalization, or sentiment analysis. However, marketing data is increasingly stored in the cloud creating a connectivity challenge. Big Data vendors provide facilities to transfer core business data between relational database systems and Data Lakes, such as Apache Sqoop. But what about cloud data sources where existing Apache Sqoop connection managers do not work well with cloud SaaS APIs, each with a proprietary REST or SOAP API? The key to accelerating adoption of big data technology is providing easy access to disparate cloud data sources such as Salesforce, Oracle CX, Marketo, Eloqua, Google Analytics or Adobe Omniture. Competitive advantage then results from having embedded connectivity within your technology for data ingestion to an organization’s most important data, customer data.
Join this informative and entertaining webinar as we explore:
What is a Marketing Data Lake?
Industry trends around accessing marketing data in SaaS applications
How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications
How big data vendors can embed SaaS connectivity
Speaker(s):
Sumit Sarkar, Data Connectivity Evangelist, Progress Software
Gary Angel, Advisory Digital Analytics Center of Excellence Principle, Ernst & Young
Asset(s):
Follow-up asset sent in email Mike Johnson’s blog: https://www.progress.com/blogs/are-you-ready-to-go-fishing-in-a-data-lake
Give Attendees a closer look at the control panel and how they can participate.
Join Audio: 2 ways to do so, 1) to use VoIP, click on “Mic & Speakers”, or 2) to use your telephone, click on “telephone” and dial-in using the numbers and information provided
2) All lines are muted for today’s webinar. We do plan to have a live Q&A session at the end of the presentations. However if you have a question at any time during this webinar, simply submit your questions via the “Question” section of the webinar interface located to the right of your screen – we will collect all questions through this “Question Window”.
Final Note: we are recording today’s webinar and will posted to PartnerLink
Why ISVs? Strata: big data vendors, data prep, data pipelines, data management, etc
Data Lakes are part of the solution.
Last webinar was around building a Marketing Data Warehouse.
Data Warehouse is “Schema on Write” architecture and typically loaded with ETL tools
Data Lakes are loaded with raw data (no “T”) and create the “Schema on Read” on business demand
The kinds of data from which you can derive value are unlimited. You can store all types of structured and unstructured data in a data lake, from CRM data, to social media posts.
You don’t have to have all the answers upfront. Simply store raw data—you can refine it as your understanding and insight improves.
You have no limits on how you can query the data. You can use a variety of tools to gain insight into what the data means.
You don’t create any more silos. You gain a democratized access with a single, unified view of data across the organization.
http://info.zaloni.com/hubfs/Architecting_Data_Lakes_Zaloni.pdf
By Ben Sharma and Alice LaPlante
http://info.zaloni.com/hubfs/Architecting_Data_Lakes_Zaloni.pdf
By Ben Sharma and Alice LaPlante
Traditionally positioned for RDBMS via JDBC. There are specialized connectors for sources such as MySQL or Postgres; and generic JDBC for any third party.
Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate database driver to connect to the relational database. Please refer to the Sqoop documentation for any Sqoop related question. Please make sure the database driver jar is copied into oozie share lib for Sqoop.
Commercial data lake management solutions are available from many of Hadoop vendors (Cloudera Navigator), as well as standalone from companies such as Zaloni and Podium Data.
bash-4.1$ sqoop import --connect "jdbc:datadirect:sforce:SecurityToken=3jZ0x4NcgClYDhxJqMa3c744://test.salesforce.com;User=ids.integration@hp.com.fltesta;Password=informatica@123;DatabaseName=sandbox" --query 'SELECT TOP 10 t.* FROM Case as t WHERE $CONDITIONS' -m 1 --target-dir /sample/table/q50 --driver com.ddtek.jdbc.sforce.SForceDriver --verbose
R&D challenges building SQL connectivity across cloud sources such as Marketo
Not all SaaS APIs expose a standard query language. In those cases, the engineering team looks at each object individually. Each object may be exposed with a different API with unique rules for invoking, searching filtering, etc. It required a significant effort to provide a standard experience querying across the entire data model.
Handling full join capabilities. In cases where the SaaS APIs do not support a query language with JOIN capability, the engineering team has to perform that operation. This requires a translation from SQL to efficiently call Marketo APIs to return the minimal amount of data prior to performing the join. When joining two very large objects, the data access layer may use up considerable resources on the application server or desktop. Therefore, deployment of the data access layer to an elastic cloud service such as DataDirect Cloud makes a lot of sense for two reasons:
Faster performance and use fewer memory/CPU resources on the client application server or desktop
Leverage the superior bandwidth between DataDirect Cloud and Marketo where pre-joined datasets get exchanged.
How to handle data models? Is it static or dynamic? How are changes detected and communicated to the client? Each SaaS data source is different and in the case of Marketo, certain objects are better queried through views and others through tables. Handling this matrix of data models and objects across all SaaS sources was certainly a challenge.
350+ ISVs
10,000 DEUs
We’re excited to get MongoDB data into the hands of more people through open data standards
Develop against open standards
Avoid vendor lock-in by adopting open industry standards. DataDirect is the leader in data connectivity standards having co-founded the ODBC specification and serves on the JDBC Expert Group, OData Technical Committee and ANSI SQL Committee.
Connect to unlimited data with a single API
Access the full breadth of data sources using a single, decoupled, code base and API for the data access layer protecting you from changes in metadata, error handling, and API or protocol revisions.
Get a single dedicated partner
Deliver full support for the breadth of data sources in all shapes and sizes, with constant vigilance for the next security vulnerability (POODLE, FREAK, LOGJAM) in your data access layer.Focus your engineering resources on your core business.
Get unlimited support
We live for your next big customer. Make sure your POC is a success with 24/7 partner support and access to expertise from our engineering teams, partnerships and leading technology companies such as Microsoft, Oracle, and IBM through our TSANet multi vendor support channel.