2. DEFINITION
Data Warehouse
A collection of corporate
information, derived directly
from operational systems
and some external data
sources. Its specific purpose
is to support business
decisions, not business
operations.
3. THE PURPOSE OF DATA WAREHOUSING
Realize the value of data
Data / information is an asset
Methods to realize the
value, (Reporting, Analysis, etc.)
Make better decisions
Turn data into information
Create competitive advantage
Methods to support the decision making
process, (EIS, DSS, etc.)
4. Data Warehouse Components
• Staging Area
• A preparatory repository where transaction data
can be transformed for use in the data warehouse
• Data Mart
• Traditional dimensionally modeled set of dimension
and fact tables
• Per Kimball, a data warehouse is the union of a set
of data marts
• Operational Data Store (ODS)
• Modeled to support near real-time reporting needs.
6. EVOLUTION ARCHITECTURE OF DATA WAREHOUSE
GO TO
Top-Down Architecture DIAGRAM
GO TO
Bottom-Up Architecture DIAGRAM
GO TO
Enterprise Data Mart Architecture DIAGRAM
GO TO
Data Stage/Data Mart Architecture DIAGRAM
7. VERY LARGE DATA BASES
WAREHOUSES ARE VERY LARGE DATABASES
Terabytes -- 10^12 bytes: Wal-Mart -- 24 Terabytes
Petabytes -- 10^15 bytes: Geographic Information
Systems
Exabytes -- 10^18 bytes: National Medical Records
Zettabytes -- 10^21 bytes: Weather images
Zottabytes -- 10^24 bytes: Intelligence Agency Videos
8. COMPLEXITIES OF CREATING A DATA WAREHOUSE
Incomplete errors
Missing Fields
Records or Fields That, by Design, are not
Being Recorded
Incorrecterrors
Wrong Calculations, Aggregations
Duplicate Records
Wrong Information Entered into Source
System
9. SUCCESS & FUTURE OF DATA WAREHOUSE
The Data Warehouse has successfully supported the
increased needs of the State over the past eight years.
The need for growth continues however, as the desire for
more integrated data increases.
The Data Warehouse has software and tools in place to
provide the functionality needed to support new
enterprise Data Warehouse projects.
The future capabilities of the Data Warehouse can be
expanded to include other programs and agencies.
10. DATA WAREHOUSE PITFALLS
You are going to spend much time
extracting, cleaning, and loading data
Youare going to find problems with systems feeding the
data warehouse
Youwill find the need to store/validate data not being
captured/validated by any existing system
Large scale data warehousing can become an exercise
in data homogenizing
11. DATA WAREHOUSE PITFALLS…
The time it takes to load the warehouse will expand
to the amount of the time in the available window...
and then some
You are building a HIGH maintenance system
You will fail if you concentrate on resource
optimization to the neglect of project, data, and
customer management issues and an understanding
of what adds value to the customer
12. BEST PRACTICES
Complete requirements and design
Prototyping is key to business understanding
Utilizing proper aggregations and detailed data
Training is an on-going process
Build data integrity checks into your system.
Legacy data is historical dataThe working information of a staff member Working hours or time-off hours within the fiscal period, up to the current dateWorking Hours = Overtime, etc.Time-Off Hours = Vacation, Sick Leave, etc.
DataStage database, toolA tool set for designing, developing, and runnin.gapplications that populate one or more tables in a data warehouse