1. Dimensional Fact Model
Stuttgart, 26/11/2014
Stefano Cazzella @StefanoCazzella
http://caccio.blogdns.net
http://bimodeler.com
stefano.cazzella{at}gmail.com
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 1
2. Complexity in SE and IS development
The art of programming is the art of
organizing complexity, of mastering
multitude and avoiding its bastard chaos as
effectively as possible.
– Edsger Dijkstra, “Notes on Structured Programming”
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 2
3. Project Layers
Business
• User requirements
• Conceptual model
Design
• Technical choices
• Logical model
Build
• Tecnology
• Physical model
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 3
4. Civil Engineering Example
Business
What the
client
wants
Design
The
technical
blueprint
Build
The
desired
building
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 4
5. Model-driven engineering
• Business
centric
• No tecnical
details
PIM
Model transformation
PSM • Tecnical
• Tecnical
design
• System
architecture
deliverables
• System
realization
Build
Model transformation
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 5
6. Project Layers for Data Mart
Business
• Dimensional Fact Model
Design
• Relational model
Build
• DBMS specific DDL
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 6
7. Why Dimensional Fact Model ?
Formal language well-specified syntax and an unequivocally
interpretation (semantic) based on a sound algebraic definition
Simple and effective graphical notation (representation)
Specifically defined to represent multi-dimensional models
Does not imply any technical/implementation choice
1
2
3
4
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 7
8. Multi-dimensional model
The SALES event:
On Nov. 25th, 2014
the Store 2 sold 10
pieces of Product X
for a total revenue of
€ 220
Product
Store
Store 1
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 8
Day
Product X
Store 2
Store 3
Product Y
Units sold: 10
pieces
Revenue: € 220
Product Z
3-dimensional SALES hyper-space
9. SALES Hyper-cube
SALES
Hyper-cube
Nov. 26, 2014
Nov. 25, 2014
Nov. 24, 2014
Nov. 23, 2014
Product Z
Product Y
Product X
Units sold: 10 pieces
Revenue: € 220
The SALES event:
On Nov. 25th, 2014
the Store 2 sold 10
pieces of Product X
for a total revenue of
€ 220
Days
Stores
Measures
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 9
11. Data Mart building process
Business user’s needs
Requirements
definition
Model
Technical knowledge
transformation
Deployment
transformation
Implementation
strategy
Logical data model
(Relational model:
tables, columns, etc.)
Data Mart
Phisical data model
(DDL with indexes,
partions, etc.)
Model
Multidimensional
data model
(Dimensional Fact Model)
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 11
12. Data Mart building process
Business user’s needs
+ =
Requirements
definition
Formalize user’s needs in a conceptual
(business-centric) model, then …
Model
Technical knowledge
… transform transformation
it in a logical model integrating
technical specifications and best practices …
Deployment
transformation
Implementation
strategy
… and transform it again in a physical model
that realizes the business requirements
Logical data model
(Relational model:
tables, columns, etc.)
Data Mart
Phisical data model
(DDL with indexes,
partions, etc.)
Model
Multidimensional
data model
(Dimensional Fact Model)
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 12
13. Business - From requisite to DFM
• Context: weblog analytics - the
analysis of the visits of several
web sites belonging to different
domains (eg. Google Analytics)
• Requisite: monitoring and
analyzing the number of visits
and their monthly and daily
average duration for each page
of the websites, or each domain,
distributed by the geographic
region of the IP of the visitors.
+
Domain definition
Aggregation rules
Optional dependencies
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 13
14. Design choice
Reference ROLAP model:
• Star-schema (denormalized dimension table)
• Snow-flake (hierarchies implemented by tables in 3NF)
Hierarchy implementation strategy (for every dimension)
• Use natural key (the dimension attribute PK column)
• Use surrogate key (add a new column with no business meaning)
• Use slow-changing dimension (SCD) of type 2
• Use implicit dimension (no dimension table, only a column in the fact table)
Domain Data type association
• Text VARCHAR(250) ; Currency NUMBER(9,2) ; etc.
Standard naming conventions and abbreviations
• Table name prefix (D for Dimensions, F for Facts) ; Number NBR ; etc.
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 14
15. Transform DFM in a Relational Model
SCD-2
Start date
End date
Model
transformation
Fact grain
Technical design choices:
• Reference ROLAP model star-schema
• Hierarchy Viewer use surrogate key
• Hierarchy Page SCD – Type 2
• Hierarchy Time denormalized with natural key
Surrogate key
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 15
16. Build choice
Choice the DBMS
• SqlServer – Oracle – Hive / Hadoop
Generate constraints?
• Generate unique keys / primary keys / integrity constraints (foreign keys)
Add specific indexes
• Add clustered indexes / column-store indexes / bitmap indexes / etc.
Define table partitions
• Organize fact tables in partitions (by hash, value, range, etc.)
Distribute data over multiple volumes
• Define file groups / tablespaces for tables, partitions, indexes
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 16
17. Phisical model and DDL (1)
Implementation choices & best practice:
• DBMS SQL Server
• Fact F_VISITS partitioned by year
• Column-store index on day and duration
• 2 distinct file groups for tables and indexes
Partition scheme and functions
File groups
Column-store index
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 17
18. Phisical model and DDL (2)
Implementation choices & best practice:
• DBMS Oracle
• Fact F_VISITS partitioned by year
• Bitmap index on viewer dimension
• 2 distinct table spaces for tables and
indexes
Table spaces
Table partitions
Bitmap index
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 18
19. BI Modeler
• In order to apply a model-driven approach, BI Project teams
need a software tool to:
Manage (draw) all the models - DFM, relational, etc.
Support (and drive) the model transformation process
• There was (are) no many tools able to do that so, in 2006 I
started working on the development of …
http://bimodeler.com
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 19