This is the presentation I gave at OakTable World 2013 in San Francisco. #OTW13 was held at the Children's Creativity Museum next to the Moscone Convention Center and was in parallel with Oracle OpenWorld 2013.
The session discussed our attempts to be more agile in designing enterprise data warehouses and how the Data Vault Data Modeling technique helps in that approach.
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
1. Agile Data Warehouse Modeling:
Introduction to Data Vault Modeling
Kent Graziano
Data Warrior LLC
Twitter @KentGraziano
2. Agenda
Bio
What do we mean by Agile?
What is a Data Vault?
Where does it fit in an Oracle BI architecture
How to design a Data Vault model
Being “agile”
3. My Bio
Oracle ACE Director
Certified Data Vault Master and DV 2.0 Architect
Blogger: Oracle Data Warrior
Data Architecture and Data Warehouse Specialist
● 30+ years in IT
● 20+ years of Oracle-related work
● 15+ years of data warehousing experience
Co-Author of
● The Business of Data Vault Modeling
● The Data Model Resource Book (1st Edition)
Editor of “The” Data Vault Book
Past-President of ODTUG and Rocky Mountain Oracle
User Group
4. Manifesto for Agile Software Development
“We are uncovering better ways of developing
software by doing it and helping others do it.
Through this work we have come to value:
Individuals and interactions over processes and
tools
Working software over comprehensive
documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right,
we value the items on the left more.”
http://agilemanifesto.org/
5. Applying the Agile Manifesto to DW
User Stories instead of
requirements documents
Time-boxed iterations
● Iteration has a standard length
● Choose one or more user stories to fit in that
iteration
Rework is part of the game
● There are no “missed requirements”... only
those that haven’t been delivered or
discovered yet.
6. Data Vault Definition
The Data Vault is a detail oriented, historical tracking
and uniquely linked set of normalized tables that
support one or more functional areas of business.
It is a hybrid approach encompassing the best of
breed between 3rd normal form (3NF) and star
schema. The design is flexible, scalable, consistent
and adaptable to the needs of the enterprise.
Dan Linstedt: Defining the Data Vault
TDAN.com Article
Architected specifically to meet the needs
of today’s enterprise data warehouses
7. What is Data Vault Trying to Solve?
What are our other Enterprise
Data Warehouse options?
● Third-Normal Form (3NF): Complex
primary keys (PK’s) with cascading
snapshot dates
● Star Schema (Dimensional): Difficult to
reengineer fact tables for granularity
changes
Difficult to get it right the first
time
Not adaptable to rapid
business change
NOT AGILE!
(C) Kent Graziano
9. Data Vault Evolution
The work on the Data Vault approach began in the
early 1990s, and completed around 1999.
Throughout 1999, 2000, and 2001, the Data Vault
design was tested, refined, and deployed into specific
customer sites.
In 2002, the industry thought leaders were asked to
review the architecture.
● This is when I attend my first DV seminar in Denver and met
Dan!
In 2003, Dan began teaching the modeling techniques
to the mass public.
(C) Kent Graziano
11. Oracle Information Management Reference
Architecture
Staging Layer
● Change tables
● Reject tables for Data Quality
● External tables for file feeds
Foundation Layer
● Transactional granularity
maintained
● Process neutral: no user or
business requirements
● Just recording what happened
Access and Performance
Layer
● Dimensional model
● “Star Schemas”
● Process specific: targeting user
and business requirements
13. What is a Foundation Layer?
Basis for long term enterprise scale data
warehouse
Must be atomic level data
● A historical source of facts
Not based on any one data source or system
Single point of integration
Flexible
Extensible
Provides data to the access/reporting layer
(C) Kent Graziano
14. How to be Agile using DV and Oracle
Model iteratively
● Use Data Vault data modeling technique
● Create basic components, then add over time
Virtualize the Access Layer
● Don’t waste time building facts and dimensions up front
● ETL and testing takes too long
● “Project” objects using pattern-based DV model with OBIEE
BMM or Oracle Views
Users see real reports with real data
(C) Kent Graziano
19. 2: Links = Associations
Links =
Transactions and
Associations
They are used to
hook together
multiple sets of
information
(C) Kent Graziano
20. Link Definitions
What Makes a Link?
● A Link is based on identifiable business element
relationships.
● Otherwise known as a foreign key,
● AKA a business event or transaction between business keys,
● The relationship shouldn’t change over time
● It is established as a fact that occurred at a specific point in time
and will remain that way forever.
● The link table may also represent a hierarchy.
Attributes
● All attributes are mandatory
(C) LearnDataVault.com
21. Modeling Links - 1:1 or 1:M?
Today:
● Relationship is a 1:1 so why model a Link?
Tomorrow:
● The business rule can change to a 1:M.
● You discover new data later.
With a Link in the Data Vault:
● No need to change the EDW structure.
● Existing data is fine.
● New data is added.
(C) Kent Graziano
22. 3. Satellites = Descriptors
Satellites provide
context for the
Hubs and the
Links
(C) Kent Graziano
23. Satellite Definitions
What Makes a Satellite?
● A Satellite is based on an non-identifying business
elements.
● The Satellite data changes, sometimes rapidly,
sometimes slowly.
● The Satellite is dependent on the Hub or Link key as
a parent,
● Satellites are never dependent on more than one parent table.
● The Satellite is never a parent table to any other table (no snow
flaking).
Attributes and Ordering
● All attributes are mandatory – EXCEPT END DATE.
● Parent ID 1st, Load Date 2nd, Load End Date
3rd,Record Source Last.
(C) LearnDataVault.com
24. Satellite Entity- Details
A Satellite has only 1 foreign key; it is dependent on
the parent table (Hub or Link)
A Satellite may or may not have an “Item
Numbering” attribute.
A Satellite’s Load Date represents the date the
EDW saw the data (must be a delta set).
● This is not Effective Date from the Source!
A Satellite’s Record Source represents the actual
source of the row (unit of work).
To avoid Outer Joins, you must ensure that every
satellite has at least 1 entry for every Hub Key.
(C) LearnDataVault.com
27. Standardized modeling rules
• Highly repeatable and learnable modeling technique
• Can standardize load routines
● Delta Driven process
● Re-startable, consistent loading patterns.
• Can standardize extract routines
● Rapid build of new or revised Data Marts
• Can be automated
‣ Can use a BI-meta layer to virtualize the reporting
structures
‣ Example: OBIEE Business Model and Mapping tool
‣ Can put views on the DV structures as well
‣ Simulate ODS/3NF or Star Schemas
Data Vault Productivity
(C) Kent Graziano
32. Notably…
In 2008 Bill Inmon stated that the “Data Vault
is the optimal approach for modeling the EDW
in the DW2.0 framework.” (DW2.0)
The number of Data Vault users in the US
surpassed 500 in 2010 and grows rapidly
(http://danlinstedt.com/about/dv-customers/)
33. Organizations using Data Vault
WebMD Health Services
Anthem Blue-Cross Blue Shield
MD Anderson Cancer Center
Denver Public Schools
Independent Purchasing Cooperative (IPC, Miami)
• Owner of Subway
Kaplan
US Defense Department
Colorado Springs Utilities
State Court of Wyoming
Federal Express
US Dept. Of Agriculture
36. Summary
• Data Vault provides a data
modeling technique that
allows:
‣ Model Agility
‣ Enabling rapid changes and additions
‣ Productivity
‣ Enabling low complexity systems with high
value output at a rapid pace
‣ Easy projections of dimensional models
‣ So? Agile Data Warehousing?
37. Super Charge Your Data Warehouse
Available on Amazon.com
Soft Cover or Kindle Format
Now also available in PDF at
LearnDataVault.com
Hint: Kent is the Technical
Editor