Data Vault 2.0 is a data modeling methodology designed for developing enterprise data warehouses. It was developed by Dan Linstedt in response to the shortcomings of previous data modeling methodologies, such as the Kimball methodology and Inmon methodology, for managing large volumes of data from disparate sources.
2. DATA WAREHOUSING VS BIG DATA
• Does Big Data replace Data Warehousing? Or do I need both?
• What’s the difference:
• Between the data flowing into a data warehouse vs big data tools?
• Between the ingestion processes and infrastructure?
• Data Lakes arrived with Big Data, so are they useful in Data
Warehousing?
• How should I model my data in EDW?
• 3NF, Star Schema, same as my operational data stores?
• Data Vault 2.0
• Graph Databases
• What is an architecture that allows both to co-exists effectively?
5. DATA VAULT 2.0
COMMON FOUNDATIONAL WAREHOUSE ARCHITECTURE
• “The Data Vault Model is a detail oriented, historical tracking and uniquely linked
set of normalized tables that support one or more functional areas of business. It is a
hybrid approach encompassing the best of breed between 3rd normal form (3NF)
and star schema. The design is flexible, scalable, consistent and adaptable to the
needs of the enterprise” -- Dan Linstedt, Creator of Data Vault
• Data loaded as-is from sources, no edits or cleanup
• Append-only to afford highest performance
• Agile & agnostic to changes in the operational store’s data model
• Essentially, a prescription for Layered Graph to Relational Mapping
6. DATA WAREHOUSING & DATA VAULT 2.0
• 60’s, 70’s, 80’s
• E.F. Codd => 3NF
• Bill Inmon invents Data Warehousing
concept
• Dr. Ralph Kimball popularizes Star
Schema design
• 90’s, 00’s:
• Dan Linstedt creates Data Vault Model @
DOD
• 2014:
• Dan Introduces Data Vault 2.0
7.
8. Source: “What are Graph Databases and Why should I care?“, by Dave Bechberger of Expero
14. SERVICED_BY
Flight
Record Source Airport CAE
Load Date 2018-11-17
Source Id 20181117-32-983
Base Dest Forecast
Record
Source
LoadDate Depart Gate
LGA 2018-10-11 1:25P
M
B27
CAE 2018-10-24 3:30P
M
A14
SFO 2018-09-06 8:55P G19
M
RDU 2018-08-12 4:45P
M
C22
Aircraft
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
Base Service FAA NTSB
Recor
d
Source
LoadDate Model Tailno
United 2017-02-11 767 1477
Delta 2015-11-04 A6 2381
Alaska 2013-08-28 747 8312
Frontie
r
2016-07-19 182 1438
r
SERVICED_BY
Record Source United Airlines
Load Date 2018-09-17
Base Dest Manifest
Recor
d
Source
LoadDate Begin End
United 2017-02-11 2017-04-23 2017-09-23
Delta 2015-11-04 2015-12-01 2017-04-22
Alaska 2013-08-28 2013-09-14 2016-05-04
Frontie 2016-07-19 2016-08-02 2018-04-11
Hubs
Links
Satellites
Tab
15. • Organizations which design systems ...
are constrained to produce designs
which are copies of the communication
structures of these organizations
- Mel Conway
16. FLIGHT
Base Dest Forecast
Record
Source
LoadDate Depart G ate
LG A 2018-10-
11
1:25P
M
B27
CAE 2018-10-
24
3:30P
M
A14
FLIGHT
Record Source Airport CAE
Load Date 2018-11-17
Source Id 20181117-32-983
Aircraft
Bas
e
Service FAA NTSB
Recor
d
Source
LoadDate Model Tailno
United 2017-02- 767 1477
11
Delta 2015-11- A6 2381
04
Alaska 2013-08- 747 8312
28
Frontie 2016-07- 182 1438
r 19
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
Airport
Base Dest Manifest
Recor
d
Source
LoadDate Begin End
United 2017-02-11 2017-04-23 2017-09-
23
Delta 2015-11-04 2015-12-01 2017-04-
22
Alaska 2013-08-28 2013-09-14 2016-05-
04
Frontie 2016-07-19 2016-08-02 2018-04-
r 11
Record Source United Airlines
Load Date 2018-09-17
Airline
Base Service FAA
NTS
B
Record
Source
LoadDate Model Tailno
United 2017-02-11 767 1477
Delta 2015-11-04 A6 2381
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
Hubs
Links
Satellites
Tab
18. • Modeled after self-
organizing networks
• A Business Key identifies a
key concept in business.
• They have a business
meaning
• They are unique and
have very low propensity
to change
• Business keys change
only when the business
change
• Enables (forces) cross-
source modeling
Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
30. DATA WAREHOUSING
• Deep Topic
• 60’s, 70’s, 80’s
• E.F. Codd => 3NF
• Bill Inmon invents Data Warehousing
concept
• Dr. Ralph Kimball popularizes Star Schema
design
• 90’s, 00’s:
• Dan Linstedt creates Data Vault Model @
DOD
• 2014:
• Dan Introduces Data Vault 2.0
• Data Warehouse vs Operational Data
Stores
• Data Warehouse as Version Control System
BIG DATA
• MapReduce, 2004, Google by Jeffery
Dean and Sanjay, “MAPREDUCE:
SIMPLIFIED DATA PROCESSING ON
LARGE CLUSTERS” , GFS
• Nutch 2005, Hadoop 2006, 2007 - Doug
Cutting
• What exactly is “Big Data”?
33. ETL OR SERDE ?
S3
Hadoop
Time Series
Event Record
Analysis
Deserializer
L e
L
d
L m
Client
User
Serializer
L p
L p
Eventlog.e Eventlog.d
L
e
Single Source
(Version Locked)
Kafka/Kinesis
Le
Internet