Abstract:- Companies are adopting big data for performing high-velocity real-time analytics on very large volumes of data to enable rapid analysis for business users using self-service and never-before-realized use cases. However, such projects have yielded limited value because these big data systems have become siloed from the rest of the enterprise systems holding critical business operational data. Big Data Fabric is a modern data architecture combining data virtualization, data prep, and lineage capabilities to seamlessly integrate at scale these huge, siloed volumes of structured and unstructured data with other enterprise data assets. This presentation will demonstrate with proven customer case studies in big data and IoT about the value of using big data fabric as a logical data lake for big data analytics.
2. 2
• Competition from a low cost
vendor
• Lower the price, affecting
margins?
• Or, maintain high price, but
differentiate in other ways?
3. 3
Benefits
Large Heavy Equipment Manufacturer
Self-service / Predictive Analytics – IoT Integration
Improved asset performance and
proactive maintenance
Increased revenue from sale of
services and parts
Reduced warranty costs of parts
failure
6. 6
Big Data Fabric – Data Abstraction Layer
Abstracts access to disparate
data sources
Acts as a single repository
(virtual)
Makes data available in
real-time to consumers
7. 7
Consume
in business
applications
Combine
related data
into views
2
3 DATA CONSUMERS
Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users, IoT/Streaming Data
Connect
to disparate
data sources
1 DISPARATE DATA SOURCES
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word...
Less StructuredMore Structured
Multiple protocols,
formats
Linked data services
query, search, browse
Request/Reply,
event driven
Secure
delivery
Library of
wrappers
Web
automation
Any data
or content
Read
& Write
DATA VIRTUALIZATION
Design Tools
Optimization Engine
Data Discovery & Search
In-memory Fabric
Cache
Scheduler
DATA CONSUMERSAnalytical Operational
CONNECT COMBINE CONSUME
Share, Deliver,
Publish, Govern,
Collaborate
Discover,
Transform,
Prepare, Improve
Quality, Integrate
Normalized
views of
disparate data
Data Services (Real-time &
On-demand)
Data catalog / Metadata
Governance
Security
Management & Monitoring
9. 9
Big Data Queries Faster with Denodo Platform
1. Data Virtualization delivers better performance without needing to replicate data into Hadoop.
2. Data Virtualization leverages Data Source Architectures for what they are good at.
Performance comparison of 5 different queries
Impala
Hadoop-only
Runtime (s)
Denodo
Runtime (s)
Denodo
Runtime w/
Cache (s)
Data Volumes
Query 1
199 120 68
Queries 1,2,3,5
•Exadata Row Count: ~5M
•Impala Row Count: ~500k
Query 4
•Exadata Row Count: ~5M
•Impala Row Count: ~2M
Query 2
187 96 88
Query 3
120 212 115
Query 4 timeout
328 69
Query 5
46 91 56
10. 10
Denodo Dynamic Query Optimizer
System Execution Time Data Transferred Optimization Technique
Denodo 9 sec. 4 M Aggregation push-down
Tableau 125 sec. 292 M None: full scan
SELECT c.id, SUM(s.amount) as total
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.id
290 M 2 M
Sales Customer
2 M
2 M
Sales Customer
join
group by join
group by
11. 11
Data Virtualization as the Big Data Fabric
Simplify access to Big Data
Extend traditional data warehouses
Capitalize on in-memory computing
Exploit distributed query processing
Bring benefits of Big data to business users
“Plan to evolve your big data lake into a fabric
over time by adding services like in-memory
caching, data virtualization, or metadata
cataloging.”
-Source: Forrester Research “The Anatomy Of A System Of
Insight, 2017”
12. 12
ROI and TCO of Data Virtualization
Customer-reported projected savings by percentage
Data Integration Cost reduction
• 60-70% savings
Traditional Call Centres, Portals
• 30-70% savings
BI and Reporting
• 40-60% savings
ETL and Data Warehousing
• Project timelines of 6-12 months reduced to 3-6 months
• 85% time reduction
13. 13
Big Data Fabric Vendors
Forrester Wave: Big Data Fabric, Q4 2016
The Forrester WaveTM: Big Data Fabric, Q4 2016
Denodo’s key strength is delivering a unified and
centralized data services fabric with security and real-time
integration across multiple traditional and big data
sources, including Hadoop, NoSQL, cloud, and software-
as-a-service (SaaS). Customers like its easy-to-use, simple
yet sophisticated data modeling capabilities, search, and
support for various big data sources.”
– Analyst Noel Yuhanna, Forrester Research
14. 14
Data Virtualization, Federation, ETL, ESB Compared
Virtualization Federation ETL ESB
Data abstraction Full Partial Partial Full
Robust Performance Full
Limited to a few data
sources
Primarily in Batch mode
Limited to few data
sources
Zero replication Full Partial None Partial
Real-time Information Full
Limited to a few data
sources
Primarily Batch
Limited to few data
sources
Self-service data services Full None None Partial
Centralized metadata,
security, and governance
Full None Partial None
Solutions Denodo, Cisco, RedHat
Tableau, QueryGrid,
SAP SDA
Informatica PowerCenter,
IBM DataStage, Talend
Data Fabric
TIBCO ESB, Mulesoft
15. 15
Denodo
The Leader in Data Virtualization
LEADERSHIP
Longest continuous focus on
data virtualization – since 1999
400+ customers
Winner of numerous awards
Customer Awards
AUTODESK
FINALIST in 2017
Excellence Awards
SEACOAST BANK
2016 Business
Leadership Award
CIT BANK
2016 Premier 100
Technology Leader
ULTRA MOBILE
2017 Best Practices
Award
AUTODESK
2017 CIO 100
Award
ASURION
2017 Best Practices
Award
BIOSTORAGE
2016 Business
Leadership Award