SlideShare una empresa de Scribd logo
1 de 92
Descargar para leer sin conexión
Dimensional Data Modeling – A Primer
Terry Bunio
Dimensional Data Modeling
A Primer
@tbunio
tbunio@protegra.com
bornagainagilist.wordpress.com
www.protegra.com
Agenda
• Data Modeling
• Relational vs Dimensional
• Dimensional concepts
– Facts
– Dimensions
• Complex Concept Introduction
• Why and How?
• My Top 10 Dimensional Modeling
Recommendations
What is Data Modeling?
Definition
• “A database model is a specification
describing how a database is
structured and used” – Wikipedia
Definition
• “A database model is a specification
describing how a database is
structured and used” – Wikipedia
• “A data model describes how the
data entities are related to each other
in the real world” - Terry
Data Model Characteristics
• Organize/Structure like Data Elements
• Define relationships between Data
Entities
• Highly Cohesive
• Loosely Coupled
Data Modeling- Chemistry
• I like to think about the similarities
between Data Modeling and
Chemistry
Data Modeling- Chemistry
• Organize items that share the same
characteristics
• Create standard abstractions to
represent characteristics
– Solid
– Liquid
– Gas
Data Modeling- Chemistry
• Molecules
– Define the relationships between and
within the standard abstractions
– Those relationships form patterns that can
be re-used and describe the behaviour of
the data in real life
Data Modeling- Chemistry
• Ultimately this abstraction, structure,
and patterns allow for the creation of
model that:
– Allows for predictability
– Maximizes re-use and leverage
– Allows for flexibility and adaptability
– Describes reality
Database Design
Two design methods
• Relational
– “Database normalizationis the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency. Normalization
usually involvesdividinglarge tables into smaller (and less
redundant) tables and defining relationships between
them. The objectiveis to isolate data so that additions,
deletions, and modifications of a field can be made in just
one table and then propagated through the rest of the
database via the defined relationships.”.”
Two design methods
• Dimensional
– “Dimensional modeling always uses the concepts of facts
(measures), and dimensions (context).Facts are typically
(but not always)numeric values that can be aggregated,
and dimensions are groups of hierarchies and descriptors
that define the facts
Relational
Relational
• Relational Analysis
– Database design is usually in Third Normal
Form
– Database is optimized for transaction
processing. (OLTP)
– Normalized tables are optimized for
modification rather than retrieval
Normal forms
• 1st - Under first normal form, all occurrences of a
record type must contain the same number of
fields.
• 2nd - Second normal form is violated when a non-
key field is a fact about a subset of a key. It is only
relevant when the key is composite
• 3rd - Third normal form is violated when a non-key
field is a fact about another non-key field
Source: William Kent - 1982
Dimensional
Dimensional
• Dimensional Analysis
– Star Schema/Snowflake
– Database is optimized for analytical
processing. (OLAP)
– Facts and Dimensions optimized for
retrieval
• Facts – Business events – Transactions
• Dimensions – context for Transactions
– People
– Accounts
– Products
– Date
Relational
• 3 Dimensions
• Spatial Model
– No historical components except for
transactional tables
• Relational – Models the one truth of
the data
– One account „11‟
– One person „Terry Bunio‟
– One transaction of „$100.00‟ on April 10th
Dimensional
• 4 Dimensions
• Temporal Model
– All tables have a time component
• Dimensional – Models the data over
time
– Multiple versions of Accounts over time
– Multiple versions of people over time
– One transaction
• Transactions are already temporal
Kimball-lytes
• Bottom-up - incremental
– Operational systems feed the Data
Warehouse
– Data Warehouse is a corporate
dimensional model that Data Marts are
sourced from
– Data Warehouse is the consolidation of
Data Marts
– Sometimes the Data Warehouse is
generated from Subject area Data Marts
Inmon-ians
• Top-down
– Corporate Information Factory
– Operational systems feed the Data
Warehouse
– Enterprise Data Warehouse is a corporate
relational model that Data Marts are
sourced from
– Enterprise Data Warehouse is the source
of Data Marts
The gist…
• Kimball‟s approach is easier to
implement as you are dealing with
separate subject areas, but can be a
nightmare to integrate
• Inmon‟s approach has more upfront
effort to avoid these consistency
problems, but takes longer to
implement.
Facts
Fact Tables
• Contains the measurements or facts
about a business process
• Are thin and deep
• Usually is:
– Business transaction
– Business Event
• The grain of a Fact table is the level of
the data recorded.
Fact Tables
• Contains the following elements
– Primary Key - Surrogate
– Timestamp
– Measure or Metrics
• Transaction Amounts
– Foreign Keys to Dimensions
– Degenerate Dimensions
• Transaction indicators or Flags
Fact Tables
• Types of Measures
– Additive - Measures that can be added
across any dimensions.
• Amounts
– Non Additive - Measures that cannot be
added across any dimension.
• Rates
– Semi Additive - Measures that can be
added across some dimensions.
Fact Tables
• Types of Fact tables
– Transactional - A transactional table is the most basic
and fundamental. The grain associated with a
transactional fact table is usually specified as "one
row per line in a transaction“.
– Periodic snapshots - The periodic snapshot, as the
name implies, takes a "picture of the moment", where
the moment could be any defined period of time.
– Accumulating snapshots - This type of fact table is
used to show the activity of a process that has a well-
defined beginning and end, e.g., the processing of
an order. An order moves through specific steps until
it is fully processed. As steps towards fulfilling the order
are completed, the associated row in the fact table is
updated.
Special Fact Tables
• Degenerate Dimensions
– Degenerate Dimensions are Dimensions
that can typically provide additional
context about a Fact
• For example, flags that describe a transaction
• Degenerate Dimensions can either be
a separate Dimension table or be
collapsed onto the Fact table
– My preference is the latter
Special Fact Tables
• If Degenerate Dimensions are not
collapsed on a Fact table, they are
called Junk Dimensions and remain a
Dimension table
• Junk Dimensions can also have
attributes from different dimensions
– Not recommended
Dimensions
Dimension Tables
• Unlike fact tables, dimension tables
contain descriptive attributes that are
typically textual fields
• These attributes are designed to serve
two critical purposes:
– query constraining and/or filtering
– query result set labeling.
Source: Wikipedia
Dimension Tables
• Shallow and Wide
• Usually corresponds to entities that the
business interacts with
– People
– Locations
– Products
– Accounts
Time Dimension
Time Dimension
• All Dimensional Models need a time
component
• This is either a:
– Separate Time Dimension
(recommended)
– Time attributes on each Fact Table
Dimension Tables
• Contains the following elements
– Primary Key – Surrogate
– Business Natural Key
• Person ID
– Effective and Expiry Dates
– Descriptive Attributes
• Includes de-normalized reference tables
Behavioural Dimensions
• A Dimension that is computed based
on Facts is termed a behavioural
dimension
Junk Dimensions
• A Junk Dimension can be a collection
of attributes associated to a Fact –
discussed earlier
• It can also be a common location to
store information for convenience
– I wouldn‟t recommend this
Mini-Dimensions
Mini-Dimensions
• Splitting a Dimension up due to the
activity of change for a set of
attributes
• Helps to reduce the growth of the
Dimension table
Slowly Changing Dimensions
• Type 1 – Overwrite the row with the
new values and update the effective
date
– Pre-existing Facts now refer to the
updated Dimension
– May cause inconsistent reports
Slowly Changing Dimensions
• Type 2 – Insert a new Dimension row with
the new data and new effective date
– Update the expiry date on the prior row
• Don‟t update old Facts that refer to the old
row
– Only new Facts will refer to this new Dimension
row
• Type 2 Slowly Changing Dimension
maintains the historical context of the data
Slowly Changing Dimensions
• A type 2 change results in multiple
dimension rows for a given natural key
• A type 2 change results in multiple
dimension rows for a given natural key
• A type 2 change results in multiple
dimension rows for a given natural key
Slowly Changing Dimensions
• No longer to I have one row to
represent:
– Account 10123
– Terry Bunio
– Sales Representative 11092
• This changes the mindset and query
syntax to retrieve data
Slowly Changing Dimensions
• Type 3 – The Dimension stores multiple
versions for the attribute in question
• This usually involves a current and
previous value for the attribute
• When a change occurs, no rows are
added but both the current and
previous attributes are updated
• Like Type 1, Type 3 does not retain full
historical context
Slowly Changing Dimensions
• You can also create hybrid versions of
Type 1, Type 2, and Type 3 based on
your business requirements
Type 1/Type 2 Hybrid
• Most common hybrid
• Used when you need history AND the
current name for some types of
statutory reporting
Frozen Attributes
• Some times it is required to freeze
some attributes so that they are not
Type 1, Type 2, or Type 3
• Usually for audit or regulatory
requirements
Conformity
Recall - Kimball-lytes
• Bottom-up - incremental
– Operational systems feed the Data
Warehouse
– Data Warehouse is a corporate
dimensional model that Data Marts are
sourced from
– Data Warehouse is the consolidation of
Data Marts
– Sometimes the Data Warehouse is
generated from Subject area Data Marts
The problem
• Kimball‟s approach can led to
Dimensions that are not conforming
• This is due to the fact that separate
departments define what a client or
product is
– Some times their definitions do not agree
Conforming Dimension
• A Dimension is said to be conforming if:
– A conformed dimension is a set of data
attributes that have been physically
referenced in multiple database tables using
the same key value to refer to the same
structure, attributes, domain values,
definitions and concepts. A conformed
dimension cuts across many facts.
• Dimensions are conformed when they
are either exactly the same (including
keys) or one is a perfect subset of the
other.
If you take one thing away
• Ensure that your Dimensions are
conformed
Complexity
Complexity
• Most textbooks stop here only show
the simplest Dimensional Models
• Unfortunately, I‟ve never run into a
Dimensional Model like that
Simple
More Complex
Real World
Complex Concept Introduction
• Snowflake vs Star Schema
• Multi-Valued Dimensions and Bridges
• Multi-Valued Attributes
• Factless Facts
• Recursive Hierarchies
Snowflake vs Star Schema
Snowflake vs Star Schema
Snowflake vs Star Schema
• These extra table are termed
outriggers
• They are used to address real world
complexities with the data
– Excessive row length
– Repeating groups of data within the
Dimension
• I will use outriggers in a limited way for
repeating data
Multi-Valued Dimensions
• Multi-Valued Dimensions are when a
Fact needs to connect more than
once to a Dimension
– Primary Sales Representative
– Secondary Sales Representative
Multi-Valued Dimensions
• Two possible solutions
– Create copies of the Dimensions for each
role
– Create a Bridge table to resolve the many
to many relationship
Multi-Valued Dimensions
Bridge Tables
Bridge Tables
• Bridge Tables can be used to resolve any
many to many relationships
• This is frequently required with more
complex data areas
• These bridge tables need to be
considered a Dimension and they need
to use the same Slowly Changing
Dimension Design as the base Dimension
– My Recommendation
Multi-Valued Attributes
• In some cases, you will need to keep
multiple values for an attribute or sets
of attributes
• Three solutions
– Outriggers or Snowflake (1:M)
– Bridge Table (M:M)
– Repeat attributes on the Dimension
• Simplest solution but can be hard to query
and causes long record length
Factless Facts
• Fact table with no metrics or measures
• Used for two purposes:
– Records the occurrence of activities.
Although no facts are stored explicitly, these
events can be counted, producing
meaningful process measurements.
– Records significant information that is not
part of a business activity. Examples of
conditions include eligibility of people for
programs and the assignment of Sales
Representatives to Clients
Hierarchies and Recursive
Hierarchies
Hierarchies and Recursive
Hierarchies
• We would need a separate session to
cover this topic
• Solution involves defining Dimension
tables to record the Hierarchy with a
special solution to address the Slowly
Changing Dimension Hierarchy
• Any change in the Hierarchy can result
in needing to duplicate the Hierarchy
downstream
Why?
• Why Dimensional Model?
• Allows for a concise representation of
data for reporting. This is especially
important for Self-Service Reporting
– We reduced from 300+ tables in our
Operational Data Store to 40+ tables in
our Data Warehouse
– Aligns with real world business concepts
Why?
• The most important reason –
– Requires detailed understanding of the
data
– Validates the solution
– Uncovers inconsistencies and errors in the
Normalized Model
• Easy for inconsistencies and errors to hide in
300+ tables
• No place to hide when those tables are
reduced down
Why?
• Ultimately there must be a business
requirement for a temporal data
model and not just a spatial one.
• Although you could go through the
exercise to validate your
understanding and not implement the
Dimensional Data Model
How?
How?
• Start with your simplest Dimension and Fact
tables and define the Natural Keys for them
– i.e. People, Product, Transaction, Time
• De-Normalize Reference tables to Dimensions
(And possibly Facts based on how large the
Fact tables will be)
– I place both codes and descriptions on the
Dimension and Fact tables
• Look to De-normalize other tables with the
same Cardinality into one Dimension
– Validate the Natural Keys still define one row
How?
• Don‟t force entities on the same
Dimension
– Tempting but you will find it doesn‟t
represent the data and will cause issues
for loading or retrieval
– Bridge table or mini-snowflakes are not
bad
• I don‟t like a deep snowflake, but shallow
snowflakes can be appropriate
• Don‟t fall into the Star-Schema/Snowflake Holy
War – Let your data define the solution
How?
• Iterate, Iterate, Iterate
– Your initial solution will be wrong
– Create it and start to define the load
process and reports
– You will learn more by using the data than
months of analysis to try and get the
model right
• Come to SDEC 13 if you want to hear
how our project technically did that
– Star Trek Theme
Top 10
Top 10
1. Copy the design for the Time Dimension
from the Web. Lots of good solutions
with scripts to prepopulate the
dimension
2. Make all your attributes Not-Null. This
makes Self-Service Report writing easy
3. Create a single Surrogate Primary Key
for Dimensions – This will help to simplify
the design and table width
– These FKs get created on Fact tables !
Top 10
4. Never reject a record
– Create an Dummy Invalid record on Each
Dimension. Allows you to store a Fact record
when the relationship is missing
5. Choose a Type 2 Slowly Changing
Dimension as your default
6. Use Effective and Expiry dates on your
Dimensions to allow for maximum
historical information
– If they are Type 2!
Top 10
7. SSIS 2012 has some built-in
functionality for processing Slowly
Changing Dimensions – Check it out!
8. Add “Current_ind” and “Dummy_ind”
attributes to each Dimension to assist
in Report writing
9. Iterate, Iterate, Iterate
10. Read this book
Want More?
Whew! Questions?

Más contenido relacionado

La actualidad más candente

Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the RealityPhar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the RealityDatabricks
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasEric Matthews
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Introduction to SQL Server Security
Introduction to SQL Server SecurityIntroduction to SQL Server Security
Introduction to SQL Server SecurityJason Strate
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional ModellingVincent Rainardi
 

La actualidad más candente (20)

Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the RealityPhar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the Reality
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Introduction to SQL Server Security
Introduction to SQL Server SecurityIntroduction to SQL Server Security
Introduction to SQL Server Security
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 

Destacado

Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Informatica Command Line Statements
Informatica Command Line StatementsInformatica Command Line Statements
Informatica Command Line Statementsmnsk80
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3Malik Alig
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2akitda
 
Kimball Vs Inmon
Kimball Vs InmonKimball Vs Inmon
Kimball Vs Inmonguest2308b5
 
ETIS09 - Data Quality: Common Problems & Checks - Presentation
ETIS09 -  Data Quality: Common Problems & Checks - PresentationETIS09 -  Data Quality: Common Problems & Checks - Presentation
ETIS09 - Data Quality: Common Problems & Checks - PresentationDavid Walker
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Mike Frampton
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleSajjad Zaheer
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 

Destacado (16)

Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Informatica Command Line Statements
Informatica Command Line StatementsInformatica Command Line Statements
Informatica Command Line Statements
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Kimball Vs Inmon
Kimball Vs InmonKimball Vs Inmon
Kimball Vs Inmon
 
ETIS09 - Data Quality: Common Problems & Checks - Presentation
ETIS09 -  Data Quality: Common Problems & Checks - PresentationETIS09 -  Data Quality: Common Problems & Checks - Presentation
ETIS09 - Data Quality: Common Problems & Checks - Presentation
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
080827 abramson inmon vs kimball
080827 abramson   inmon vs kimball080827 abramson   inmon vs kimball
080827 abramson inmon vs kimball
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 

Similar a Dimensional Data Modeling Essentials

Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsTerry Bunio
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Terry Bunio
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Data modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousingData modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousingDr. Dipti Patil
 
Data modeling dimensions
Data modeling dimensionsData modeling dimensions
Data modeling dimensionsDr. Dipti Patil
 
The final frontier
The final frontierThe final frontier
The final frontierTerry Bunio
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxcalf_ville86
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesInformaticaTrainingClasses
 
Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesEtisalat
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptxhqlm1
 
Data modeling tips from the trenches
Data modeling tips from the trenchesData modeling tips from the trenches
Data modeling tips from the trenchesTerry Bunio
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseTran Vi Duan
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanNivetha Durganathan
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)tafosepsdfasg
 

Similar a Dimensional Data Modeling Essentials (20)

Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Data modeling facts
Data modeling factsData modeling facts
Data modeling facts
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousingData modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousing
 
Data modeling dimensions
Data modeling dimensionsData modeling dimensions
Data modeling dimensions
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptx
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data Warehouses
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Data modeling tips from the trenches
Data modeling tips from the trenchesData modeling tips from the trenches
Data modeling tips from the trenches
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 
Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 

Más de Terry Bunio

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys roleTerry Bunio
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT FargoTerry Bunio
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonTerry Bunio
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back againTerry Bunio
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pmTerry Bunio
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agileTerry Bunio
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007Terry Bunio
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20Terry Bunio
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09Terry Bunio
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enoughTerry Bunio
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysTerry Bunio
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementationTerry Bunio
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project ManagerTerry Bunio
 
Agile in different environments
Agile in different environmentsAgile in different environments
Agile in different environmentsTerry Bunio
 

Más de Terry Bunio (20)

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys role
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
#YesEstimates
#YesEstimates#YesEstimates
#YesEstimates
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT Fargo
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back again
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pm
 
Estimating 101
Estimating 101Estimating 101
Estimating 101
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agile
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enough
 
Sdec10 lean AMS
Sdec10 lean AMSSdec10 lean AMS
Sdec10 lean AMS
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92days
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementation
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project Manager
 
Agile in different environments
Agile in different environmentsAgile in different environments
Agile in different environments
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Dimensional Data Modeling Essentials

  • 1. Dimensional Data Modeling – A Primer Terry Bunio
  • 4. Agenda • Data Modeling • Relational vs Dimensional • Dimensional concepts – Facts – Dimensions • Complex Concept Introduction • Why and How? • My Top 10 Dimensional Modeling Recommendations
  • 5. What is Data Modeling?
  • 6. Definition • “A database model is a specification describing how a database is structured and used” – Wikipedia
  • 7. Definition • “A database model is a specification describing how a database is structured and used” – Wikipedia • “A data model describes how the data entities are related to each other in the real world” - Terry
  • 8. Data Model Characteristics • Organize/Structure like Data Elements • Define relationships between Data Entities • Highly Cohesive • Loosely Coupled
  • 9. Data Modeling- Chemistry • I like to think about the similarities between Data Modeling and Chemistry
  • 10.
  • 11. Data Modeling- Chemistry • Organize items that share the same characteristics • Create standard abstractions to represent characteristics – Solid – Liquid – Gas
  • 12.
  • 13. Data Modeling- Chemistry • Molecules – Define the relationships between and within the standard abstractions – Those relationships form patterns that can be re-used and describe the behaviour of the data in real life
  • 14.
  • 15. Data Modeling- Chemistry • Ultimately this abstraction, structure, and patterns allow for the creation of model that: – Allows for predictability – Maximizes re-use and leverage – Allows for flexibility and adaptability – Describes reality
  • 17. Two design methods • Relational – “Database normalizationis the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involvesdividinglarge tables into smaller (and less redundant) tables and defining relationships between them. The objectiveis to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.”.”
  • 18. Two design methods • Dimensional – “Dimensional modeling always uses the concepts of facts (measures), and dimensions (context).Facts are typically (but not always)numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts
  • 20. Relational • Relational Analysis – Database design is usually in Third Normal Form – Database is optimized for transaction processing. (OLTP) – Normalized tables are optimized for modification rather than retrieval
  • 21. Normal forms • 1st - Under first normal form, all occurrences of a record type must contain the same number of fields. • 2nd - Second normal form is violated when a non- key field is a fact about a subset of a key. It is only relevant when the key is composite • 3rd - Third normal form is violated when a non-key field is a fact about another non-key field Source: William Kent - 1982
  • 23. Dimensional • Dimensional Analysis – Star Schema/Snowflake – Database is optimized for analytical processing. (OLAP) – Facts and Dimensions optimized for retrieval • Facts – Business events – Transactions • Dimensions – context for Transactions – People – Accounts – Products – Date
  • 24. Relational • 3 Dimensions • Spatial Model – No historical components except for transactional tables • Relational – Models the one truth of the data – One account „11‟ – One person „Terry Bunio‟ – One transaction of „$100.00‟ on April 10th
  • 25. Dimensional • 4 Dimensions • Temporal Model – All tables have a time component • Dimensional – Models the data over time – Multiple versions of Accounts over time – Multiple versions of people over time – One transaction • Transactions are already temporal
  • 26.
  • 27. Kimball-lytes • Bottom-up - incremental – Operational systems feed the Data Warehouse – Data Warehouse is a corporate dimensional model that Data Marts are sourced from – Data Warehouse is the consolidation of Data Marts – Sometimes the Data Warehouse is generated from Subject area Data Marts
  • 28. Inmon-ians • Top-down – Corporate Information Factory – Operational systems feed the Data Warehouse – Enterprise Data Warehouse is a corporate relational model that Data Marts are sourced from – Enterprise Data Warehouse is the source of Data Marts
  • 29. The gist… • Kimball‟s approach is easier to implement as you are dealing with separate subject areas, but can be a nightmare to integrate • Inmon‟s approach has more upfront effort to avoid these consistency problems, but takes longer to implement.
  • 30. Facts
  • 31. Fact Tables • Contains the measurements or facts about a business process • Are thin and deep • Usually is: – Business transaction – Business Event • The grain of a Fact table is the level of the data recorded.
  • 32. Fact Tables • Contains the following elements – Primary Key - Surrogate – Timestamp – Measure or Metrics • Transaction Amounts – Foreign Keys to Dimensions – Degenerate Dimensions • Transaction indicators or Flags
  • 33. Fact Tables • Types of Measures – Additive - Measures that can be added across any dimensions. • Amounts – Non Additive - Measures that cannot be added across any dimension. • Rates – Semi Additive - Measures that can be added across some dimensions.
  • 34. Fact Tables • Types of Fact tables – Transactional - A transactional table is the most basic and fundamental. The grain associated with a transactional fact table is usually specified as "one row per line in a transaction“. – Periodic snapshots - The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment could be any defined period of time. – Accumulating snapshots - This type of fact table is used to show the activity of a process that has a well- defined beginning and end, e.g., the processing of an order. An order moves through specific steps until it is fully processed. As steps towards fulfilling the order are completed, the associated row in the fact table is updated.
  • 35. Special Fact Tables • Degenerate Dimensions – Degenerate Dimensions are Dimensions that can typically provide additional context about a Fact • For example, flags that describe a transaction • Degenerate Dimensions can either be a separate Dimension table or be collapsed onto the Fact table – My preference is the latter
  • 36. Special Fact Tables • If Degenerate Dimensions are not collapsed on a Fact table, they are called Junk Dimensions and remain a Dimension table • Junk Dimensions can also have attributes from different dimensions – Not recommended
  • 38. Dimension Tables • Unlike fact tables, dimension tables contain descriptive attributes that are typically textual fields • These attributes are designed to serve two critical purposes: – query constraining and/or filtering – query result set labeling. Source: Wikipedia
  • 39. Dimension Tables • Shallow and Wide • Usually corresponds to entities that the business interacts with – People – Locations – Products – Accounts
  • 41. Time Dimension • All Dimensional Models need a time component • This is either a: – Separate Time Dimension (recommended) – Time attributes on each Fact Table
  • 42. Dimension Tables • Contains the following elements – Primary Key – Surrogate – Business Natural Key • Person ID – Effective and Expiry Dates – Descriptive Attributes • Includes de-normalized reference tables
  • 43. Behavioural Dimensions • A Dimension that is computed based on Facts is termed a behavioural dimension
  • 44. Junk Dimensions • A Junk Dimension can be a collection of attributes associated to a Fact – discussed earlier • It can also be a common location to store information for convenience – I wouldn‟t recommend this
  • 46. Mini-Dimensions • Splitting a Dimension up due to the activity of change for a set of attributes • Helps to reduce the growth of the Dimension table
  • 47. Slowly Changing Dimensions • Type 1 – Overwrite the row with the new values and update the effective date – Pre-existing Facts now refer to the updated Dimension – May cause inconsistent reports
  • 48. Slowly Changing Dimensions • Type 2 – Insert a new Dimension row with the new data and new effective date – Update the expiry date on the prior row • Don‟t update old Facts that refer to the old row – Only new Facts will refer to this new Dimension row • Type 2 Slowly Changing Dimension maintains the historical context of the data
  • 49. Slowly Changing Dimensions • A type 2 change results in multiple dimension rows for a given natural key • A type 2 change results in multiple dimension rows for a given natural key • A type 2 change results in multiple dimension rows for a given natural key
  • 50. Slowly Changing Dimensions • No longer to I have one row to represent: – Account 10123 – Terry Bunio – Sales Representative 11092 • This changes the mindset and query syntax to retrieve data
  • 51. Slowly Changing Dimensions • Type 3 – The Dimension stores multiple versions for the attribute in question • This usually involves a current and previous value for the attribute • When a change occurs, no rows are added but both the current and previous attributes are updated • Like Type 1, Type 3 does not retain full historical context
  • 52. Slowly Changing Dimensions • You can also create hybrid versions of Type 1, Type 2, and Type 3 based on your business requirements
  • 53. Type 1/Type 2 Hybrid • Most common hybrid • Used when you need history AND the current name for some types of statutory reporting
  • 54. Frozen Attributes • Some times it is required to freeze some attributes so that they are not Type 1, Type 2, or Type 3 • Usually for audit or regulatory requirements
  • 56. Recall - Kimball-lytes • Bottom-up - incremental – Operational systems feed the Data Warehouse – Data Warehouse is a corporate dimensional model that Data Marts are sourced from – Data Warehouse is the consolidation of Data Marts – Sometimes the Data Warehouse is generated from Subject area Data Marts
  • 57. The problem • Kimball‟s approach can led to Dimensions that are not conforming • This is due to the fact that separate departments define what a client or product is – Some times their definitions do not agree
  • 58. Conforming Dimension • A Dimension is said to be conforming if: – A conformed dimension is a set of data attributes that have been physically referenced in multiple database tables using the same key value to refer to the same structure, attributes, domain values, definitions and concepts. A conformed dimension cuts across many facts. • Dimensions are conformed when they are either exactly the same (including keys) or one is a perfect subset of the other.
  • 59. If you take one thing away • Ensure that your Dimensions are conformed
  • 61. Complexity • Most textbooks stop here only show the simplest Dimensional Models • Unfortunately, I‟ve never run into a Dimensional Model like that
  • 65. Complex Concept Introduction • Snowflake vs Star Schema • Multi-Valued Dimensions and Bridges • Multi-Valued Attributes • Factless Facts • Recursive Hierarchies
  • 68. Snowflake vs Star Schema • These extra table are termed outriggers • They are used to address real world complexities with the data – Excessive row length – Repeating groups of data within the Dimension • I will use outriggers in a limited way for repeating data
  • 69. Multi-Valued Dimensions • Multi-Valued Dimensions are when a Fact needs to connect more than once to a Dimension – Primary Sales Representative – Secondary Sales Representative
  • 70. Multi-Valued Dimensions • Two possible solutions – Create copies of the Dimensions for each role – Create a Bridge table to resolve the many to many relationship
  • 73. Bridge Tables • Bridge Tables can be used to resolve any many to many relationships • This is frequently required with more complex data areas • These bridge tables need to be considered a Dimension and they need to use the same Slowly Changing Dimension Design as the base Dimension – My Recommendation
  • 74. Multi-Valued Attributes • In some cases, you will need to keep multiple values for an attribute or sets of attributes • Three solutions – Outriggers or Snowflake (1:M) – Bridge Table (M:M) – Repeat attributes on the Dimension • Simplest solution but can be hard to query and causes long record length
  • 75. Factless Facts • Fact table with no metrics or measures • Used for two purposes: – Records the occurrence of activities. Although no facts are stored explicitly, these events can be counted, producing meaningful process measurements. – Records significant information that is not part of a business activity. Examples of conditions include eligibility of people for programs and the assignment of Sales Representatives to Clients
  • 77. Hierarchies and Recursive Hierarchies • We would need a separate session to cover this topic • Solution involves defining Dimension tables to record the Hierarchy with a special solution to address the Slowly Changing Dimension Hierarchy • Any change in the Hierarchy can result in needing to duplicate the Hierarchy downstream
  • 78. Why? • Why Dimensional Model? • Allows for a concise representation of data for reporting. This is especially important for Self-Service Reporting – We reduced from 300+ tables in our Operational Data Store to 40+ tables in our Data Warehouse – Aligns with real world business concepts
  • 79. Why? • The most important reason – – Requires detailed understanding of the data – Validates the solution – Uncovers inconsistencies and errors in the Normalized Model • Easy for inconsistencies and errors to hide in 300+ tables • No place to hide when those tables are reduced down
  • 80. Why? • Ultimately there must be a business requirement for a temporal data model and not just a spatial one. • Although you could go through the exercise to validate your understanding and not implement the Dimensional Data Model
  • 81. How?
  • 82. How? • Start with your simplest Dimension and Fact tables and define the Natural Keys for them – i.e. People, Product, Transaction, Time • De-Normalize Reference tables to Dimensions (And possibly Facts based on how large the Fact tables will be) – I place both codes and descriptions on the Dimension and Fact tables • Look to De-normalize other tables with the same Cardinality into one Dimension – Validate the Natural Keys still define one row
  • 83. How? • Don‟t force entities on the same Dimension – Tempting but you will find it doesn‟t represent the data and will cause issues for loading or retrieval – Bridge table or mini-snowflakes are not bad • I don‟t like a deep snowflake, but shallow snowflakes can be appropriate • Don‟t fall into the Star-Schema/Snowflake Holy War – Let your data define the solution
  • 84. How? • Iterate, Iterate, Iterate – Your initial solution will be wrong – Create it and start to define the load process and reports – You will learn more by using the data than months of analysis to try and get the model right • Come to SDEC 13 if you want to hear how our project technically did that – Star Trek Theme
  • 86. Top 10 1. Copy the design for the Time Dimension from the Web. Lots of good solutions with scripts to prepopulate the dimension 2. Make all your attributes Not-Null. This makes Self-Service Report writing easy 3. Create a single Surrogate Primary Key for Dimensions – This will help to simplify the design and table width – These FKs get created on Fact tables !
  • 87. Top 10 4. Never reject a record – Create an Dummy Invalid record on Each Dimension. Allows you to store a Fact record when the relationship is missing 5. Choose a Type 2 Slowly Changing Dimension as your default 6. Use Effective and Expiry dates on your Dimensions to allow for maximum historical information – If they are Type 2!
  • 88. Top 10 7. SSIS 2012 has some built-in functionality for processing Slowly Changing Dimensions – Check it out! 8. Add “Current_ind” and “Dummy_ind” attributes to each Dimension to assist in Report writing 9. Iterate, Iterate, Iterate 10. Read this book
  • 89.
  • 91.