Die Big Data Fabric als Enabler für Machine Learning & AI

Die Big Data Fabric als
Enabler für ML & AI
1
Thomas Niewel
Marketing Manager Central Europe, Denodo
Technical Sales Director DACH, Denodo
Daniel Rapp

2
Denodo
The Leader in Data Virtualization
DENODO OFFICES, CUSTOMERS, PARTNERS
Palo Alto, CA.
Global presence throughout North America,
EMEA, APAC, and Latin America.
LEADERSHIP
▪ Longest continuous focus on data
virtualization – since 1999
▪ Leader in 2020 Forrester Wave –
Enterprise Data Fabric
▪ Challenger in 2019 Gartner Magic
Quadrant for Data Integration Tools
▪ Winner of numerous awards
CUSTOMERS
~800+ customers, including many F500
and G2000 companies across every major
industry have gained significant business agility
and ROI.
FINANCIALS
Backed by $4B+ private equity firm.
50+% annual growth; Profitable.

3
Hype Cycle for Data Management
Source: Gartner

5
The Promise of the Data Lake
The promise of a Data Lake is a place that
you can store data in its raw form,
unencumbered by validation, mastering,
or quality processes, so as to allow
consumers to choose what data is of value
to them with a quick time to market.

6
…Data lakes lack semantic consistency and governed
metadata. Meeting the needs of wider audiences
require curated repositories with governance,
semantic consistency and access controls.”

Data Lake (System of Insight)
Information Management and Governance
Data Lakes and Big Data Fabric
8
1
Multiple repositories, organized based on
source and usage; hosted on appropriate
data platform for workload
1
Catalog of data, ownership, meaning and
permitted usage
2
No direct access to repositories3
Curation of all data to define meaning and
classifications
4
Business-led information governance and
management
5
Active monitoring and management of data6
Data-centric security7
Access to raw data to develop new
production analytics
8
Moderated, view-based self-service access to
data and analytics for Line of Business
9
2
3
4 5
6
7
8
9

9
Data Lakes and Big Data Fabric
Data Lake (System of Insight)
Information Management and Governance

10
Forrester’s Big Data Fabric
Data Virtualization

11
Data Virtualization Reference Architecture
Data Access
Security
Governance and Metadata management
Data Warehouse
Unstructured Data
Structured Data
RDBMS
Excel
Flat Files
XML
Email
Sensors (IIoT)
Social Media
RFID
Wearables
Storage
Compute
RDBMS
Big Data Lakes
noSQL
IMDG
Data Ingestion
Real Time/Streaming
Batch
CDC
Metadata
Enrichment
Search and Index
Data Virtualization
Data Services
Data Insight
Data Mining
Dashboards
Data Discovery
and Self-Service
Reporting
DATA
VIRTUALIZATION

Customer Case Study – Logitech
• Logitech also needed more agile (and cost effective) analytics platform for
customer analytics
• Moved analytics to AWS – utilized many different storage and compute capabilities
on AWS
• Redshift, Snowflake for EDW, RDS for relational data, S3 for data ingestion
• Spark, EMR, NLP for analytics
• Used Data Virtualization as data access layer
• Users could access data irrespective of storage or processing capability
• Data Virtualization Layer provided data security/access permissions
• Abstracted security from different data sources
• Data governance also provided by Data Virtualization Layer
• Data lineage, metadata management, data catalog and discovery, etc.
12

13
Case Study | Logitech
Architectural Diagrams

15https://flic.kr/p/x8HgrF
Can we predict the usage of the NYC
bike system based on data from
previous years?

16
Demo Example
Sources
Combine,
Transform
&
Integrate
Consume
Base View
Source
Abstraction
join
group by
state
join
Date_dim
Citibike_ride_summary
Weather_history_nyc
HDFS

17
What is the optimizer doing?
SELECT c.state, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.state
Sales Customer
join
group by
Sales Customer
join
group by
ID
Group by
state
Sales Customer
Create
temp
table
join
group by
Option 1?
Option 2? Option 3?
Temp_Custom
er
Customer and Sales are in different sources.
What is the best execution plan?
Partial Aggregation PushdownNaïve Strategy Temporary Data Movement
300 M 2 M
2 M
2 M
2 M
50

18
Performance & Scalability in Big Data Ecosystems!
• Cost-based optimization
• Query rewrite & operations pushdown
• Partition Pruning (Join, Union)
• Automatic Data Movement
• Partial and full data caching
• Work-offloading to MPP systems
• Ressource management
• … and many more …

19
Summary
• Big Data initiatives can add deep analytical insight
to help organizations find new opportunities or
provide more efficient services
• When defining your Big Data initiative, think about
how users will find and access the data
• A Big Data Fabric is key to making this easy
• And making the initiative successful
• Data Virtualization is the right technology for an
agile and flexible Big Data Fabric

21
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
www.denodo.com/TestDrive
G E T S TA R T E D TO DAY

Thank you!
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.

Die Big Data Fabric als Enabler für Machine Learning & AI

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Die Big Data Fabric als Enabler für Machine Learning & AI

Similar a Die Big Data Fabric als Enabler für Machine Learning & AI (20)

Más de Denodo

Más de Denodo (20)

Último

Último (20)

Die Big Data Fabric als Enabler für Machine Learning & AI