More Related Content Similar to Harnessing Big Data in Real-Time Similar to Harnessing Big Data in Real-Time (20) More from DataWorks Summit More from DataWorks Summit (20) Harnessing Big Data in Real-Time2. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 2
Information Processing is at a critical inflection point
Real-time
Business
Requirements Real-time bonus
calculations for
consumers
Sales
Customer
Service
Customer overdue credit
calculation by product areas
Finance and
Operations
Iterative period end closing
with new posting into accounts
constantly
Manufacturing
New ATP strategies; MRP run
for individual ATP check/instant
re-planning
IMPACT ON BUSINESS Slow Response Times | Usability Challenges | Lack Of Adaptability
IMPACT ON IT High Latency | Complexity | High Cost of Solutions
Transactional
Datastore
Data
Warehouse
Sensors
Data
Mobile
Data
Archives Social & Text Geo-Spatial
Location
Intelligence
Order
Processing
Operational
Reporting
RT Risk
& Fraud
Trend
Analysis
Sentiment
Analytics
Predictive
Analytics
Pattern
Recognition
Analyze
ETL
Staging
Collect
Clean-Data
Quality
Transact
Aggregate
Summarize
Communicate
Monitor
Predict
Planning
0
1
Point optimization is no longer enough for real-time business
3. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 3
H2 â the Power of HANA and Hadoop
Instant Platform
SAP HANA
Infinite Store
HADOOP
Real-Time Predictive Analytics
SAP Analytical Applications, BI and Infinite Insight
Combine INSTANT Results with INFINITE Store
4. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 4
Open Hadoop Strategy
Big Data Science Services
SAP HANA Platform
SAP
Data Services
Data
ConnectorsAcquire
Accelerate
Analyze
Sybase IQ SAP HANA
GeospatialPredictive Text Analysis
Visualize and Act
Industry/LOB Apps Custom AppsAnalytic Apps
SQL XS EngineR
5. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 5
SAP Big Data Solutions Architecture
DataIngestionAcquisition
Processing Engine
Application Function Libraries & Data Models
Database Services
(OLTP + OLAP)
Extended Application Services
Integration Services
SAP HANA PLATFORMï.
Unified AdministrationApplication Development
Custom Apps Mobile Apps Big Data Apps ERP Apps SAP Analytics
Smart Data
Access
Transfer
Datasets
SAP IQ
Web /
Sensor
Call
Center
Other
Data Sources
SAP SLT / Rep
Server
SAP Data
Services
SAP SQL
Anywhere
SAP ESP
Hadoop
Adapter
Hadoop Hive
SAP ERP
BW
Hadoop
Large Scale Data Capture, Generate Analytical Datasets, Train/Validate Predictive Models
6. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 6
SOURCES
OLTP, ERP,
CRM Systems
Documents
& Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geo-location
Data
SAP in the Modern Data Architecture
OPERATIONS
TOOLS
Provision,
Manage &
Monitor
DEV & DATA
TOOLS
Build &
Test
DATASYSTEMSAPPLICATIONS
ROOMS
Statistical
Analysis
BI / Reporting,
Ad Hoc Analysis
Interactive Web
& Mobile
Applications
Enterprise
Applications
EDW MPP
RDBM
S
EDW
MPP
Governance
&Integration
Security
Operations
Data Access
Data
Management
HANA
7. 7© 2014 SAP AG or an SAP affiliate company. All rights reserved.
1GBâ 3D CT Scan
150MBâ 3D MRI
30MB â X-ray
120MB â Mammograms
300 TB+
200 Cancer Genomes
200 TB+
All Known Variants
15 PB+
Broad & Sanger DB
800 MB
Per Genome
20-40%
annual increase in
medical image
archives
Explosion of Biological Health Information
Has Surpassed Human Cognitive Capacity
BIGDATA
1990
Decisions by Clinical Phenotype
Structural GeneticsFactsper
Decision
2000 2010 2020
5
10
100
1000
Functional Genetics
Proteomics and
other effector
molecules
The Strategic Application of Information Technology in Health Care Organizations (Third Edition 2011) by John P. Glaser and Claudia Salzberg
8. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 8
Example: Data Analysis of Cancer Genome
Goal: Analytics for Personalized Medicine
9. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 9
MKI Design Decisions to Improve Speed of Processing
Use Hadoop for Pre-Processing; SAP HANA for Advanced Analytics
10. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 10
Genomic DNA analysis in real-time will transform how we enable comprehensive patient care to fight against cancer. SAP HANA will be the
mission critical and reliable data platform to make real-time cancer analytics into a reality. Separately, our internal technical comparison
demonstrated that SAP HANA outperforms a traditional disk-based system by factor of 408,000 when performing other types of data analysis.
Yukihisa Kato, Director & Executive Officer, CTO, Research and Development Center, MITSUI KNOWLEDGE INDUSTRY CO.,LTD.
Benefits
ï Accelerated predictive & correlation analysis with in-
memory processing
ï Reduced time to detect variant DNA
ï Optimized treatment plans based on DNA mutations
408,000x faster than
traditional disk-based
systems in PoC
216x faster DNA
analysis results - from
2-3 days to 20 minutes
â â
SAP HANA + HADOOP + R
SAP HANA + Hadoop for Advanced Analytics
Results: Deliver Personalized Results More Quickly
11. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 11
SAP Enterprise
data
Non-SAP
Enterprise data
Mobile data
Machine data
(Sensors, SCADA, Machine
Logs, Etc.)
Data Sources
Analytics & Applications
HANA In Memory
Transactional
Planning &
Simulation
GraphAnalytical
Predictive Analysis
Spatial
Extended
Storage
(SAP IQ)
TieredStorage(TimeCritical
âLessTimeSensitive)
SmartDataAccess
Dashboard / Reporting in Real-Time
Large Low Cost Data Platform (Hadoop)
Stream Processing
Real-Time Replication
Synchronization
Historical Data, Offline Batch Processes,
Model Training etc.
SAP HANA Platform for Big Data
Transform High Volume, High Velocity data into High Value Data. Enable Real-Time Analytics.
Use Cases
âąEnergy Optimization
âąPredictive maintenance
âąRemote asset mgmt.
âąSupply/demand forecast
âąInventory mgmt.
âąRoute optimization
Generic pattern 1: Machine Data Insight
Prototypical Machine Data case
Real-time data stream
(Billions of
events/day)
Millions of events/day
correlated with
Enterprise Data
Enable real-time
operations, analysis
and actions
12. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 12
SAP Enterprise
data
Non-SAP
Enterprise data
Click-Stream
data
Social data
Data Sources
Analytics & Applications
HANA In Memory
Transactional
Planning &
Simulation
GraphAnalytical
Predictive
Analysis
Spatial
Extended
Storage
(SAP IQ)
TieredStorage(TimeCritical
âLessTimeSensitive)
SmartDataAccess
Dashboard / Reporting
in Real-Time
Large Low Cost Data Platform (Hadoop)
Stream Processing
Real-Time Replication
Synchronization
Historical Data, Offline Batch Processes,
Model Training etc.
SAP HANA Platform for Big Data
Process high volume, high variety, high velocity data, offline & real-time. Enabling real-time analytics and actionable insight.
Use Cases
âąCustomer Behavior
âąCustomer Segmentation
âąCustomer Loyalty
âąCustomer Churn
âąOnline Consume Habits
âąCampaign Performance
âąPredictive Maintenance
Generic pattern 2: Customer Insight
Prototypical customer behavior analysis case
Terabytes of
data/month
Millions of events/day
correlated with
Enterprise Data
Enable actionable
insight got targeted
applications
Historical Data
Real-Time Offers
13. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 13
HANA integration with SAP Sybase IQ and Hadoop
Real-time insights by managing & analyzing ALL enterprise data - Fluid Integration
HANA Table
SAP HANA in
memory SAP Sybase IQ petascale
HIVE
IQ Table
HDFS/MapReduce
HANA Extended table
Policy-based automatic data movement
(SDA) Smart Data
Access
Hadoop massively parallel
TableTable
ODBC
JavaUDF
JavaUDF
ETL
ETL
14. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 14
SAP HANA and Hadoop Integration (ETL)
ï§ GUI for design & development
ï§ High performance reading from and
loading into Hadoop
ï§ Extended optimizer: HIVEQL and PIG
aware
SAP
HANA
SAP
Data
Services
ï§ MapReduce pushdown
ï§ Text Data Processing
(Entity Extraction)
15. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 15
Loading data from Hadoop into your database
Job
Process
HQL
Generator
Hive
Result
set
HDFS
FileReader
Process
Data
Database
Loader
ODBC/
JDBC
driver
SAP Data Services
1. Based on target, SAP Data Services
translates queries into:
o Hive Query Language (HQL) ï Hive
o Pig script ï HDFS
2. Hive/Pig converts queries to
Map/Reduce jobs
3. Result data files are generated on the
HDFS system
4. SAP Data Services use multiple
threads to process data from Hive/Pig
5. Optional transforms: Data quality
operations
6. Load results into database
1
2
3
4
5 6
Pig
Generator
HDFS
Join tables, order /
filter data, apply
functions
Text data
processing
M/R
M/R
16. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 16
Rapid Data Provisioning with Data Virtualization
© 2013 SAP AG. All rights reserved.
Application
Merge Results
SELECT
from DB(x)
SELECT
from DB(y)
SELECT
from HIVE
Application
One SQL Script
SAP HANA
Virtual Tables
Currently Supported DBs : SAP ASE, Oracle 12c, MS
SQL Server v11, SAP IQ, Hadoop/HIVE, Teradata
Data-Type Mapping & Compensate
Missing Functions in DB
Modeling
Environment
Modeling
Environment
Modeling
Environment
Modeling and
Development
Environment
17. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 17
MS
SQL Server
Oracle
SAP HANA smart data access capability
Data virtualization for on-premise and hybrid cloud environments
Benefits
ï Enables access to remote data access just like
âlocalâ table
ï Provides SAP HANA to SAP HANA queries
ï Smart query processing including query
decomposition with predicate push-down,
functional compensation
ï Supports data location agnostic development
ï No special syntax to access heterogeneous
data sources
ï Non-disruptive evolution
Heterogeneous data sources
ï SAP HANA to Hadoop (Hive)
ï SAP HANA to Teradata
ï SAP HANA to SAP HANA
ï SAP HANA to SAP ASE, Oracle 12c, Microsoft SQL
Server ver11
ï SAP HANA to SAP IQ
Transactional + Analytical
Teradata
Hadoop
SAP HANA
ASE
IQ
SAP HANA
Virtual TablesHANA Tables
18. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 18
SMART DATA ACCESS WITH VIRTUAL TABLES
Location transparency of remote data is enabled by creating a local virtual table that maps to an
existing object at the remote data source site.
Example DDL:
CREATE VIRTUAL TABLE my_schema.my_table AT remote_source.catalog.schema.object
Remote Table datatypes, column definitions are used to create the Virtual table
When Virtual table is created, HANA system catalog will be updated to include local column names/datatypes, remote
names/datatypes, index information, etc.
Table
Table
Virtual
Table
Remote Object
(Table, View)
Remote Catalog
Object
SAP HANA REMOTE SYSTEM
19. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 19
Example: Smart Data Access
Steps For Creating and Using Virtual Tables
1. Create table in HIVE
2. On SAP HANA, create DSN, e.g. âhive1â
3. With SAP HANA Studio or using DLL command, create a remote source:
oCREATE REMOTE SOURCE HIVE1 ADAPTER "hiveodbc" CONFIGURATION 'DSN=hive1'
WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=dftest;password=dftest';
4. Using a DLL command, create a virtual table for Hive:
ï CREATE VIRTUAL TABLE "HIVE1_PRODUCT" AT "HIVE1"."default"."default"."product";
5. Execute a query on virtual table:
ï SELECT * FROM HIVE1_PRODUCT;
6. Drop a virtual table
ï DROP REMOTE SOURCE HIVE1 CASCADE;
20. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 20
Execute Query
Query
Feder-
ation
Split Query
Execute
Consolidate
Execute
Execute
Query Federation
Between SAP HANA and multiple data stores (including Hadoop)
BI and analytics software
from SAP
In-memory
Disk-based data ware-
house (SAP Sybase IQ)
⊠and/or ...
Analytic
engine
Analytic
engine
Hadoop
Data storage (Hadoop
Distributed File system)
Job Management
Computation Engine(s)
Hive HBase âŠ
Users
21. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 21
Example: UDF to invoke MapReduce job in Hadoop
Creating UDF to Return Results as a Table
For example:
CREATE virtual FUNCTION word_count() RETURNS TABLE ( word NVARCHAR(60), count
INT) PACKAGE âSYSTEMâ.âWORD_COUNTâ CONFIGURATION
âsap.hana.hadoop.mapper=com.sap.hadoop.examples.WordCountMapper;
sap.hana.hadoop.reducer=com.sap.hadoop.examples.WordCountReducer;sap.hana.hadoo
p.input=â/path/to/input' AT âHS1'
When UDF is created, we specify package WORD_COUNT, which is a Jar file contains
JAVA MapReduce program to calculate word count.
22. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 22
Parallel load of
valuable data
Hortonworks Data Platform
Data Reservoir
Load, then transform
at scale:
(MR, Pig, Java)
SAP/Hadoop ETL Rationalization (loading data faster)
SAP HANA
Real-Time Analytics, Interactive Data Exploration & Application Platform
Federated
Smart Data
Access
OLAP Engine Predictive Engine Spatial Engine Application Logic
& Rendering(XS)
Dataorchestration
Services
Batch
TransactionalSystems,Databases,
FlatFiles,BatchDataFeeds
2
3
Falcon
1
âș Low Latency ingestion of data from operational systems
âș Tiered Storage model offers partitioning into Time Critical and less time sensitive data during ingestion.
âș On-the-fly transformation for Time Critical Data can be performed in memory using HANA
âș Off-load pre-processing of data to the Hadoop Platform
23. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 23
Hortonworks Data Platform
Big Data Interactive Data Exploration
SAP HANA
Real-Time Analytics, Interactive Data Exploration & Application Platform
Federated
Smart Data
Access
OLAP Engine Predictive Engine Spatial Engine
Application Logic
& Rendering(XS)
Dataorchestration
Services
Batch
TransactionalSystems,Databases,
FlatFiles,BatchDataFeeds
âș Interactive high performance Analytics and Visualization
âș Agile modeling and shorter turn-around on reports & dashboards
âș Exploration of Data in âmemory and interactively with Hadoop.
âș Uniform Data Science Experience on in-memory and multi-terabyte data sets
Visualization and Reporting
Hive
(Interactive SQL)
Science thru scalable stats and analysis
(SAS, ML, custom)
Hcatalog
(late-binding schemas)
1
24. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 24
Hortonworks Data Platform
Data Reservoir
SAP/Hadoop Real-Time Stream processing
SAP HANA
Real-Time Analytics, Interactive Data Exploration & Application Platform
Federated
Smart Data
Access
OLAP Engine Predictive Engine Spatial Engine
Application Logic
& Rendering(XS)
Dataorchestration
Services
Batch
TransactionalSystems,Databases,
FlatFiles,BatchDataFeeds
âș Real-time ingestion from operational systems, sensors and smart devices
âș Pattern detection, anomaly detection and streaming analytics on data in flight.
âș Scalable storage for offline model tuning and data science.
âș Instant visibility across operations and corporate functions
Visualization and Reporting
Storm
2
Mobile AppsOnline Apps
App events, mobile location data into platform for analysis
1
StreamingDataEvents,ReplicateData
TablesfromTransactionalApplications
Real-time
Real-TimeDataAcquisition
SAP
ESP
SAP
Replication
Server
SAP
SLT
25. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 25
Hortonworks Data Platform
Data Reservoir
SAP/Hadoop Real-Time insights and models
SAP HANA
Real-Time Analytics, Interactive Data Exploration & Application Platform
Federated
Smart Data
Access
OLAP Engine Predictive Engine Spatial Engine
Application Logic
& Rendering(XS)
Dataorchestration
Services
Batch
TransactionalSystems,Databases,
FlatFiles,BatchDataFeeds
âș Real-Time Data Ingestion Real-Time Recommendation Applications
âș Real-Time Response Inline Predictive Analytics for Transactional Applications
âș Close-Looped Analytics Smart Mobile Applications
Visualization and Reporting
Storm
2
Mobile AppsOnline Apps
StreamingDataEvents,ReplicateData
TablesfromTransactionalApplications
Real-time
Real-TimeDataAcquisition
SAP
ESP
SAP
Replication
Server
SAP
SLT
26. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 26© 2014 SAP AG or an SAP affiliate company. All rights reserved. 26Customer
Gain competitive advantage by
becoming a solution provider rather than
an equipment manufacturer
Requires predictive analytics and
algorithms to forecast equipment health
Impact gross transaction value
Optimize offerings from sellers with buyer
demand in the eBay economy by finding
signals within 50+ PB of noise daily
27. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 27© 2014 SAP AG or an SAP affiliate company. All rights reserved. 27Customer
SAP Big Data Practice
To refine data into industry insights
Leading experts in 26 industries, 12 lines of
business
Global data science team who know how to turn
your data into relevant insights
Design Thinking experts trained to help you see
new opportunities in your business
Consulting and services with the experience to
make your project successful
28. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 28
Every company deserves a âdata scientistâ
Achieve break-through results for your top business priorities with Data Science
Data Science - Delivering on your business imperatives
29. SAP Lumira: Visualizing Big Data
unleash analyst creativity
Provides the freedom to understand your data,
personalize it, and create beautiful content
Download and install on your desktop in
less than 5 minutes
Insight from many data sources
Combine, manipulate and enrich data to
apply it to your business scenarios
Self-service visualizations and analytics to
tell your story
Optimized for SAP HANA for real-time on
detailed data
Self Service for Analysts
29
30. © 2014 SAP AG or an SAP affiliate company. All rights reserved.
Thank you
Contact information:
John Schitka
john.schitka@sap.com
@johnschitka
www.sap.com/bigdata
facebook.com/sapanalytics
twitter.com/#!/@sapinmemory