More Related Content Similar to Hadoop Integration into Data Warehousing Architectures (20) Hadoop Integration into Data Warehousing Architectures1. Integrating Hadoop into Data
Warehousing Architecture
Where is the Wisdom? Lost in the Knowledge.
Where is the Knowledge? Lost in the Information.
T.S. Eliot
© Humza Naseer, University of Melbourne 2014
2. Outline
Findings,
Conclusion &
Future Work
Current Work:
Hadoop Integration
into Data Warehouse
Environment
Related Work:
Trends in Data
Warehouse
Architecture
Link Between Hadoop
and Data Warehouse
Introduction
© Humza Naseer, University of Melbourne 2014 2
3. Identify all possible enterprise data assets
Select those assets that have actionable content and can be
accessed
Bring the data assets into a logically centralized “enterprise
data warehouse”
Expose those data assets most effectively for decision
making
(Kimball & Ross, 2013)
Intro: The Data Warehouse Mission
© Humza Naseer, University of Melbourne 2014 3
4. Hadoop is an Ecosystem of products
Open source
Vendor distributions
Additional tools for development and administration
Hadoop Benefits
Enables big data analytics
Supports advanced forms of analytics
Scales cost effectively
Extends a data warehouse environment
Hadoop Limitations
• Low latency queries
• Ease of access
• Data integration and integrity
• Fine grained security
Intro: Overview of Hadoop
Unstructured
Data
Query Results
HDFS
Data Nodes
Map Reduce
© Humza Naseer, University of Melbourne 2014 4
5. A data warehouse system fetches and unifies data from
heterogeneous source systems into a centralized dimensional
or normalized data repository
(Rainardi, 2008)
Data warehouse is not a tool or technology
It is a business process which unifies an enterprise through data
(Eckerson, 2012)
Hadoop a problem or an opportunity?
Where Hadoop fits into data warehouse architecture?
Link Between Hadoop and Data
Warehouse
© Humza Naseer, University of Melbourne 2014 5
6. Traditional RDBMSs cannot handle
The new data types
Extended analytic processing
Terabytes/hour loading with immediate query access
We want to use SQL, but we don’t want the RDBMS storage
constraints
The disruptive solution: Hadoop (Kimball & Ross, 2013)
Why is Integration Happening?
DB1
DB2
DB3
Transformation
and Load
Central
DW
BI App-1
BI App-2
BI App-3
Decision
Making
© Humza Naseer, University of Melbourne 2014 6
7. Ponniah (2011) notes that selection of DW architecture is based on
enterprise requirements.
DW architecture has multiple architectural layers and components
Logical architecture
Physical architecture
(Moss and Atre, 2013)
DW architecture overlaps with data integration, business intelligence and
enterprise data
(Russom, 2014)
Inmon vs Kimball dichotomy
(Ariyachandra and Watson, 2010)
Trends in Data Warehouse
Architectures
© Humza Naseer, University of Melbourne 2014 7
8. Eckerson (2012) notes that reporting and analytics have different
workload requirements
Reporting is based on the entities and facts which are well known
Advanced analytics empowers the discovery of new facts which are
not well known
Multi-platform unified data architecture
Includes enterprise data warehouse (EDW) and several other new data
platforms which augment EDW
(Russom, 2013)
Hadoop Integration into data
warehousing environment
© Humza Naseer, University of Melbourne 2014 8
9. Data Staging
Data archiving
Advanced analytics
Multi-structured data
Uses of Hadoop that Extend DW
Architectures
DB1
DB2
DB3
Transformation
and Load
EDW
BI App-1
BI App-2
BI App-3
Decision
Making
© Humza Naseer, University of Melbourne 2014 9
10. Analytics and reporting have different requirements for DW
architectures
Characterize the DW architecture by counting the number and
types of workloads it supports
Logical DW architecture must integrate multiple physical
platforms
Design of logical DW architecture must be compartmentalized
Proposed logical architecture for new DW ecosystem
(An Extension of Eckerson (2012) BI architecture)
Findings
© Humza Naseer, University of Melbourne 2014 10
11. Enterprise Data
WarehouseOperational
System
Operational
System
Operational
Data Store
Subject Area
Data Marts
BI
Server
Online Transaction Processing Systems
(Relational Data) Event driven alerting
environment
Reporting/analysis
Environment
Logical Architecture of New DW
Ecosystem
DW-Centric Sandbox
Web Data
Machine Data
Log files
Legacy/External
Data
Replicated
Sandbox
In-memory
BI Sandbox
Hadoop Ecosystem
Cluster
(Non-relational Data)
Exploration/discovery
environment
Non-relational
Extract, transform and Load
(Batch, real time or near real
time)
Power User
Casual User
QueryETLStreaming
Top down architecture
Bottom up architecture
© Humza Naseer, University of Melbourne 2014 11
12. BI Assessment Model
Data Warehouse
Ecosystem
Data Marts
Enterprise Data
Warehouse
Work Load Specific
Data Platforms
Workload Capacity
Degree of
Integration
High
High
Low
Low
Degree of
Standardization
High
Low
© Humza Naseer, University of Melbourne 2014 12
13. Hadoop enables new types of applications within DW
environment
Big data analytics, advanced analytics and discovery analytics
Information exploration and augmenting a data warehouse
Should be implemented in multi-platform DW environment
Future work:
Conformed dimensions
BI maturity roadmap
Conclusion
© Humza Naseer, University of Melbourne 2014 13