The document discusses the challenges of data integration and proposes the logical data warehouse as a solution. It summarizes that the logical data warehouse virtually connects distributed data sources, uses different technologies like a data lake for different uses, and provides real-time data access in a flexible and agile way without requiring physical data movement. This approach gives advantages over traditional data warehousing or a data lake alone by allowing for exploration, immediate use, and optimization of data.
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
1. US Office:
1355 Market Street, #488
San Francisco, CA 94103
German Office:
Katharinenstr. 15
04109 Leipzig, Germany
Beyond the Data Lake
Simplifying data integration for the modern age
Matthias Korn | Head of Presales
matthias.korn@datavirtuality.de
2. Variety is The Challenge
Gartner 2014: “VARIETY
is the biggest
challenge.”
“When asked about the
dimensions of data
organizations struggle
with most, 49% answered
variety, while 35%
answered volume and
16% velocity.”
4. Integration using the Data Warehouse
Data is integrated by copying it into a central repository
Approach: ETL process (Extract/Transform/Load)
Structure is applied on the way into the repository
BI users query Data Marts
5. Why do so many DWH projects fail: ETL
Inflexible; costly modifications
Labour-intensive setup and maintenance
Over 50% failure rate*
Slow data-to-actionable-insights (6 to 9+ months)
7. Where does the complexity come from?
Big Data
• Machine data, unstructured data, social data,
streaming data, IoT, etc.
Cloud data
• APIs, cloud data platforms etc.
8. Data Lake – getting some data in pretty easy…
…still challenges with other data
9. Integration using the Data Lake
Data is integrated by copying it into a central repository
Approach: ELT process (Extract/Load/Transform)
Data loaded in the original structure
For Data Scientists rather than for BI users
BI users query Data Marts: wait, didn‘t they do this before already?
10. Data Lake and DWH
Both physical data integration
Both require significant upfront effort to create and fill with data
Both miss agility from BI user‘s point of view
11. Reasons for physical data integration
Query all data with same language
Model data with same language
High performance
12. The Logical Data Warehouse
Introduced by Gartner in 2012
New data management architecture for analytics
Uses repositories just like the EDW
Adds distributed processes like Data Lake
Adds virtualization of data sources for business agility
Removes the obstacle of physical data integration
14. What does the Logical Data Warehouse do?
LDW knows where the data is stored instead of copying it
Combines different technologies for different usecases
• big data processing
• Classical BI
• Agile business analytics
15. Advantages of the Logical Data Warehouse
Real time data available and ready for analysis
Immediately productive
Flexible Logical Data Model
Permissions, governance
APIs, Webservices
Decoupling business layer and tech layer
17. Conclusion
Logical Data Warehouse holds enormous promise
Unified data architecture for both Big Data and classical BI usecases
Flexibility and real-time access give an advantage
Explore->Use->Optimize instead of Build->Test->Use
provide quicker time to solution
We dataconomy
18. US Office:
1355 Market Street, #488
San Francisco, CA 94103
German Office:
Katharinenstr. 15
04109 Leipzig, Germany
Thanks for your attention
19. Backup 1 : Example data flow in an LDW
Distributed query
BI frontend aware of all data sources -
creates SQL statement
Performance optimization engine replicates
data only if needed
20 years ago there were already a lot of variety challenges
ETL process helps structuring data, integrating it for analysis
But: deep knowledge about data sources required
Use case constraints
Growing number of datasources
Lots of manpower and time required
Missing flexibility even for slight changes
Cloud databases and API internals not visibile
Missing realtime data integration
But now it’s getting dramatic
Most Data Lakes base on the Hadoop framework, providing low cost storage
Schema and data requirements not defined until data is queried
No self service BI supported
Data Mart has to be created by Data Scientists
BI users can‘t do new things
Structure of data in the Data Lake intransparent
No permission concept
You cannot get all data from a webservice
A lot of the stored data is never used, eating up the low storage costs
No self service BI supported
Data Mart has to be created by Data Scientists
BI users can‘t do new things
Structure of data in the Data Lake intransparent
No permission concept
You cannot get all data from a webservice
A lot of the stored data is never used, eating up the low storage costs
No self service BI supported
Data Mart has to be created by Data Scientists
BI users can‘t do new things
Structure of data in the Data Lake intransparent
No permission concept
You cannot get all data from a webservice
A lot of the stored data is never used, eating up the low storage costs
EDW: An integrated, subject-oriented, time-variant and physically centralized data management system mounted on hardware optimized for mixed workload management and large-query processing.
LDW: An optimized combination of software and hardware that delivers a logically consistent, subject-oriented integration of time-variant data accessed via a centralized data management infrastructure. It uses repositories, virtualization and distributed processes in combination.
LDW knows where the data is stored instead of copying it
Repositories are used for datasources that are too slow
LDW knows how the data is stored in the original source systems
Federates data sources
Presents all data in a single virtual database
Quickly reacts to changes in data models of source systems
Enables multiple SLAs
Real time data available and ready for analysis
Immediately productive
Different use cases supported: Exploration, data manipulation and batch processing
Data Model creation not tied to physical database: Logical Data Model!
Permission concept implemented
Webservice access using virtualization
Write back to the connected datasources