Watch this recorded demonstration of SnapLogic from our team of experts who answer your hybrid cloud and big data integration questions.
demo, ipaas, elastic integration, cloud data, app integration, data integration, hybrid could integration, big data, big data integration
3. 3
The Data Lake
“Just as data integration is the
foundation of the data
warehouse, an end-to-end
data processing capability is
the core of the data lake. The
new environment needs a
new workhorse.”
- Mark Madsen, Third Nature
snaplogic.com/resources
4. z
Data
Acquisition
Data Access
z
Data
Management
Data Lake Components
Add information
and improve
data
Spark
Python
Scala
Java
R
Pig
Collect and
integrate data
from multiple
sources
HDFS
AWS S3
MS Azure Blob
On Prem Apps
and Data
• ERP
• CRM
• RDBMS
Cloud Apps
and Data
• CRM
• HCM
• Social
IoT Data
• Sensors
• Wearables
• Devices
Lakeshore
Data Mart
• MS Azure
• AWS
Redshift
BI / Analytics
• Tableau
• MS
PowerBI /
Azure
• AWS
QuickSight
Organize and
prepare data
for
visualization
HDFS
AWS S3
MS Azure Blob
Hive
Batch
Streaming
Schedule and manage:
Oozie, Ambari
Kafka, Sqoop,
Flume
Real-time
Impala, HiveSQL,
SparkSQL
Current Data Lake Architecture
5. z
Data
Acquisition
Data Access
z
Data
Management
The Modern Data Lake
Powered by SnapLogic
Sort,
Aggregate,
Join, Merge,
Transform
SnapLogic
abstracts and
operationalizes
with MapReduce
or Spark
pipelines
Collect and
integrate data
from multiple
sources
SnapLogic
pipelines with
standard mode
execution
Organize and
prepare data
for
visualization
SnapLogic
pipelines with
standard mode
execution
On Prem Apps
and Data
• ERP
• CRM
• RDBMS
Cloud Apps
and Data
• CRM
• HCM
• Social
IoT Data
• Sensors
• Wearables
• Devices
Lakeshore
Data Mart
• MS Azure
• AWS
Redshift
BI / Analytics
• Tableau
• MS
PowerBI /
Azure
• AWS
QuickSight
Schedule and manage:
SnapLogic
Batch
Streaming
Real-time
Modern Data Lake Architecture
SnapLogic Pipeline
6. 6
SnapLogic in the Modern Data Fabric
ConsumeStore&ProcessSource
z z z z
HANA
Data Warehouses &
Data Marts
Big Data and Data
Lakes
INGEST INGEST
Data Integration and
Transformation
On Prem
Applications
Relational
Databases
Cloud
Applications
NoSQL
Databases
Web
Logs
Internet of
Things
DELIVER DELIVER
7. Modern Architecture: Hybrid and Elastic
Streams: No data is
stored/cached
Secure: 100%
standards-based
Elastic: Scales out &
handles data and app
integration use cases
Metadata
Data
Databases
On Prem
Apps
Big Data
Cloud Apps
and DataCloud-Based Designer, Manager,
Dashboard
Cloudplex
Groundplex
Hadooplex
Firewall
This is just a sampling of the available technologies that may go into a data lake.
To date, most data lake deployments have been built through manual coding, open source tools and custom integration.
Manual coding of data processing applications is common because data processing is thought of in terms of application-specific work. Unfortunately, this manual effort is a dead-end investment over the long term because the underlying technologies are constantly changing.
Older data warehouse environments and ETL type integration tools are good at what they do, but they can’t meet many of the new needs. The new environments are focused on data processing, but require a lot of manual work.
The data lake must incorporate aspects of old data warehouse environments like connecting to and extracting data from ERP or transaction processing systems, yet do this without clunky and inefficient tools like Sqoop. The data lake also must support new capabilities like reliable collection of large volumes of events at high speed and
timely processing to make data available immediately. It must also support data coming from multiple sources in a hybrid model. This exceeds the abilities of traditional data integration tools.
SnapLogic accelerates development of a modern data lake through:
Data acquisition: collecting and integrating data from multiple sources. SnapLogic goes beyond developer tools such as Sqoop and Flume with a cloud-based visual pipeline designer, and pre-built connectors for 300+ structured and unstructured data sources, enterprise applications and APIs.
Data transformation: adding information and transforming data. SnapLogic minimizes the manual tasks associated with data shaping and makes data scientists and analysts more efficient. SnapLogic includes Snaps for tasks such as transformations, joins and unions without scripting.
Data access: organizing and preparing data for delivery and visualization. SnapLogic makes data processed on Hadoop or Spark easily available to off-cluster applications and data stores such as statistical packages and business intelligence tools.
Here is an example of a SnapLogic deployment.
The SnapLogic control plane – including he Designer, Manager and Dashboard - does not store your data. It’s metadata only.
Once a pipeline is executed, it looks for the associated Snaplex or Hadooplex. The plex dynamically scales out, adding more nodes as needed.
We like to say that SnapLogic “respects data gravity” and runs as close to the data as need be. If you are integrating only cloud applications, it would make no sense to run your integrations behind the firewall. Similarly, if you’re doing ground to ground or cloud to ground, you may want to run your Snaplex on Window or Linux servers.
Note that the dotted line is sending instructions via metadata to the plex, which is waiting to run. The solid line indicates how data movies bi-directionally between systems.