Agile Data Warehousing

Agile Data Warehousing:
From Start to Finish
Davide Mauri
@mauridb
dmauri@solidq.com

Davide Mauri
• Microsoft SQL Server MVP
• Works with SQL Server from 6.5, on BI from 2003
• Specialized in Data Solution Architecture, Database Design, Performance
Tuning, High-Performance Data Warehousing, BI, Big Data
• President of UGISS (Italian SQL Server UG)
• Regular Speaker @ SQL Server events
• Consulting & Training, Mentor @ SolidQ
• E-mail: dmauri@solidq.com
• Twitter: @mauridb
• Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx

Agenda
• Why a Data Warehouse?
• The Agile Approach
• Modeling the Data Warehouse
• Engineering the Solution
• Building the Data Warehouse
• Unit Testing Data
• The Complete Picture
• After the Data Warehouse
• Conclusions

Workshop Motivation
• Give you a solid background on why a DWH and an Agile
approach is needed
• Convince your boss
• Convince your team
• Convince your co-workers
• Understand how engineering and automation is important to
make it happen
• See in practice how a DWH can be build in an Agile way

The Data-Driven Age
DecisionKnowledgeInformationData
“In a modern company, everyone is a Decision Maker.”

Where do the data came from?
• OLTP: Online Transaction Processing
• OLTP databases are built to support
• single fast select/insert/update/delete operations
• high concurrency
• data consistency (normalization)
• “current” version of data: usually there is no need to keep historical information
• Many OLTP database exists within a company
• Data is scattered all around the company
• Not all in a relational format!

Accessing Data Directly – The Principle
OLTP
Magic
Infinite Scale-Out
Database Machine
OLTP
Metadata
Integration
Layer

Accessing Data Directly – The Reality
OLTP
Magic
Infinite Scale-Out
Database Machine
OLTP
Metadata
Integration
Layer
Move
Crunch Data

Accessing Data Directly – Summing Up
• PROS
• Always up to date
• No copies
• Minimal Storage (3NF or above)
• Isolation/security
• CONS
• May change too fast
• Performance Impact
• Slow queries
• Complex Schema (if one exists!)
• Low or No Coherence
• Scattered Data
• Historical information may be
missing

It’s only a technical detail?
• Big Data, In Memory and all the new stuff, can’t just fix any
performance problems?
• The answer would be “yes”, if a simple “container” of data would
be enough.
• (A simple technical artifact in order to speed up queries)
• But much more than this is needed.

What is a DWH, really?
In this new era, data is like water.
Who will ever drink from
• untested,
• untrusted,
• uncertified
data?

• Would a manager or a decision maker, make a decision based
on data of which he doesn’t know the source, the integrity and
the correctness?

• The Data Warehouse is the place where managers and
decision makers will look for
• Correct
• Trusted
• Updated
• Data in order to make a
• Informed or
“conscious” decision

What is a DWH, really? (Metaphysically)
• The answer is now easy:

What is DWH, really? (Physically)
• A place to store consolidated data coming from the whole
company
• A place where cleanse, verify and certify data
• A place where historic data is stored
• A place that holds the single version of truth (if there is one!)
• Forms the core of a BI solution
• User friendly Data models, designed to make data analysis
easier

Modern Data Environment
Master
Data
EDW
Data Mart
Big Data
Unstructured
Data
BI Environment
Analytics Environment
Structured
Data
Data Scientist
Decision Maker
“Data Juice”
See SlideShare

Forrester Research Says That:
• “Business intelligence (BI) is a set of methodologies, processes,
architectures, and technologies that transform raw data into meaningful
and useful information. It allows business users to make informed
business decisions with (real-time) data that can put a company ahead
of its competitors”
• “Data warehouses form the back-end infrastructure”

EDW: Reality Check
• EDW is the trusted container of all company data
• It cannot be created in “one day”
• It has to grow and evolve with business needs.
• (Likely) It will never be 100% complete

Gather
Requirement
s
Design
Develop
Delivery
Generate
Value
• Too few
stakeholders
• Too many technical
people
• Too few iterations
• Too slow
• Too expensive
• Illusion of Control
Traditional Development Lifecycle

Adapt to Survive
“50% of requirements change in the first year of
a BI project”
Andreas Bitterer, Research VP, Gartner

A new approach is needed
• Reduce Risk of misunderstanding
• Increase chances to deliver a useful DW/BI project
• Delivery Quickly
• Immediately create value and get user feedback
• Deliver Frequently
• Prioritize
• Set Quick-Win Objectives (again, create value)
• Fail Fast (and Recover Quickly)

Agile Manifesto
• Our highest priority is to satisfy the customer through early and
continuous delivery of valuable software.
• Welcome changing requirements, even late in development.
Agile processes harness change for the customer's competitive
advantage.
• Business people and developers must work together daily
throughout the project.

Agile Manifesto
• The most efficient and effective method of conveying
information to and within a development team is face-to-face
conversation.
• Simplicity - the art of maximizing the amount of work not done -
is essential.
• Source: http://agilemanifesto.org/principles.html

Hi-Level
Requirements
JIT Model
Implement
Test
Deliver
Generate
Value
• Multi-Disciplinary Team
• Many Iterations
• Cost Effective
• Quick Delivery
• Iterative
• True Control
Agile Development Lifecycle - 1
Weeks or few Months

Agile Project Startup
• Identify the principal Business Unit
• Define a small scope
• Do some very small analysis and design
• JEDUF / JITD
• Create a Prototype
• Let the users “play” with data
• Redefine the requirements
• Grow Build the definitive Project

Prototype is a mandatory!
• Start with small data samples
• Help to understand data
• MDM anyone?
• Help to better estimate efforts
• Low data quality is the problem
• Create a bridge between developer and user
• Help to check that the analysis
is correct and project is feasible

Prototype Outcomes
• User will change/refocus their
mind when they see the actual data
• You have probably forgotten something
• Usually «implied» (for the user)
requirements
• You may have misestimated data sizes

Agile Project Lifecycle - 2
• Iterative Approach
• The general scope is known
• Not the details
• Anything can (and will) change
• Even already deployed objects
• Only the certified data must stay stable
• Otherwise solution will lose credibility
Analyze
Develop
DeployTest
Feedback
Evolve

Agile Project Best Practices
• “JIT” Modeling: don’t try to model everything right from the
beginning, but engineer everything so that it will be easy to
make changes
• Prioritize Requirements
• Short iterations (weeks ideally)
• Automate as much as you can
• Follow a Test Driven Approach: release only after having tests
in place!
• «If ain’t tested it’s broken» (TDD Motto)

Don’t Fear the Change!
• Ability to Embrace Changes is a key value for the DW
• DW and Users will grow and evolve together
• Agility is a mindset more than anything else
• There is NO “Agile Product”
• There is NO “Agile Model”
• Agility allows to fail fast (an recover quickly)

Agile Challenges
• Delivery Quickly and Fast
• Challenge: keep high quality, no matter who’s doing the work
• Embrace Changes
• Challenge: don’t introduce bugs. Change the smallest part possible.
Use automatic Testing to preserve and assure data quality.

Taking the Agile Challenge
• To be Agile, some engineering practices needs to be included in
our work model
• Agility != Anarchy
• Engineering:
• Apply well-known models
• Define & Enforce rules
• Automate and/or Check rules application
• Measure
• Test

Information is like Water
• How can you be sure that changes
won’t
introduce unexpected errors?
• Data Quality Testing is Mandatory!
• Unit Tests
• Regression Tests
• “Gate” Tests

Agile Vocabulary
• Agile introduces a lot of specific words
• Here’s a very nice and complete summary:
• https://www.captechconsulting.com/blog/ben-harden/learning-the-agile-
vocabulary

Lean BI?
• Has the same objective of Agile BI: Support Business Decision
in a ever-changing world
• Limit the different types of waste that occur in BI projects (Lean
Manufacturing),
• Focus on the interdependencies of systems (Systems
Thinking),
• Develop based on values and principles in the agile manifesto
(Agile Software Development).
• http://www.maxmetrics.com/goingagile/agile-bi-vs-lean-bi/
• http://www.b-eye-network.com/view/10264

Data Warehouse is Undefined
• Data Warehousing is still a young discipline
• Lacks Basic definitions
• Data Warehouse
• Data Marts
• Few “universal” rules:
• Depends on modeled business

Data Mart or Data Warehouse ?
• No “Standard” definition, but usually
• «Data Marts» contains departmental data
• «Data Warehouse» contains all data
• The “role” played by DM/DW depends of the approach used
• Inmon
• Kimball
• Data Vault is on the rise
• Latest kid on the block is the “Anchor Modeling”

Kimball Design
Source
Data
Source
Data
Source
Data
Source
Data
Mart
Mart
Mart
Data Warehouse ::= is the sum of all Data
Marts and the conformed dimensions
Conformed
Dimensions

Inmon Design
Source
Data
Source
Data
Source
Data
Source
Data
“Enterprise”
Data Warehouse
Data Warehouse ::= THE corporate wide data model
Datamarts ::= Subsets of the Data Warehouse
Mart
Mart

DW – Still Two Philosophies
KIMBALL
Star Schema
Specialized Models
Model Once (Mart)
User Friendly
INMON
Normal Forms
One Model
Model Twice (EDW/Mart)
But, we agree:
1. There IS a model
2. It is relational(ish)

Which way?
• Inmon or Kimball ?
• Both have pro and cons
• Of course the difference between the two is not only limited to the Data
Warehouse definition!
• Why not both? 
• Avoid religion wars and take the best of both worlds

Facts about Normalizing
• It is expensive to
• Join (especially between large tables)
• Maintain referential integrity
• Build query plans
• It is very hard to
• Get consistently good query plans
• Make users understand >=3NF data
• Write the right query
• This is why we are careful about normalizing warehouses!

DW – Choose your side. Or not?
• Why not have an hybrid solution?
• Take the best from both world
• Inmon DW that generates Kimball DMs
• Solution will grow and evolve to its final design
• Agility is the key: it has to be engineered into the solution
• Emergent Design
• https://en.wikipedia.org/wiki/Emergent_Design

Kimball approach…with an accent
• On average, Kimball approach is the most used
• Easy to understand
• Easy to use
• Efficient
• Well supported by tools
• Well known
• But the idea of having one physical DWH is very good
• Again, the advice is not to be too rigid
• Be willing to mix the things and move from one to another
• Be «Adaptive» 
• My «Perfect Solution» is one that evolves towards an Inmon Data Warehouse
used to generate Kimball Data Marts

Data Vault?
• Modeling technique often associated to Agile BI.
• That’s a myth  Agility is not in the model, remember?
• Introduces the concepts of “Hubs”, “Link” and “Satellites” to split
keys from their dependent values
• Optimized to keep history, not for query performances
• At the end of the day, it will maps to Dimensions and Facts

A model is forever?
• Surely not!
• We’re going to use ANY model that will fit our needs.
• We’ll start with the Kimball+Inmon mix
• But always present a Dimensional Model to the end user
• Behind the scenes we can make the model evolve to anything
we need. Data Vault, 100% Inmon….whatever 

Data Warehouse Performances
• Data Warehouse may need specific hardware or software to
work at best
• Due to huge amount of data
• Due to complex queries
• Why this happens?
• Data is usually stored with the highest level of detail in order to allow
any kind of analysis
• User usually needs aggregated data
• Several specific solutions (logical and physical)
• Using RDBMS or a mixture of technologies

Data Warehouse Performances
• Solutions built to support
• very fast reading of huge amount of data
• analyzing data from multiple perspectives
• easy querying & reporting
• pre-aggregate data
• Specific technology
• Online Analytical Processing (OLAP) Multi-dimensional database
• Different storage flavors (MOLAP, ROLAP, HOLAP)
• In-Memory Technology
• Column-Store Approach

Improving DW Performances
• Hardware Solutions
• Fast-Track
• Parallel Data Warehouse / APS
• Exadata
• Teratadata
• Netezza
• Software Solutions
• Multi-Dimensional Databases (Analysis Services, Cognos)
• In-Memory Databases (Power Pivot, Qlikview…)
• Column-Store Systems (SQL Server 2012+, Vertica, Greenplum)

Hardware is a game changer!
Screenshot Taken from a Fast Track DWH
Cloud can offer good performance too (but not yet up to this…)

Dimensional Modeling
• Modeling a database schema using Facts and Dimension
entities
• Proposed and Documented by Kimball (mid-nineties)
• Applicable both to Relational and Multidimensional database
• SQL Server
• Analysis Services
• Focus on the end user

Defining Facts
• A fact is something happened
• A product has been sold
• A contract has been signed
• A payment has been made
• Facts contains measurable data
• Product final price
• Contract value
• Paid Amount
• The measurable data is called a Measure
• Within the DWH, facts are stored in Fact Tables

Defining Measures
• Measures are usually Additive
• Make sense to sum up measure values
• E.g.: money amount, quantity, etc.
• Semi-Additive data exists
• Data that cannot be summed up
• E.g.: Account balance
• Tools may have specific support for semi-additive measures

Defining Dimensions
• Dimensions define how facts can be analyzed
• Provide a meaning to the fact
• Categorize and classify the fact
• E.g.: Customer, Date, Product, etc.
• Dimensions have Attributes
• Attributes are the building block of a Dimensions
• E.g.: Customer Name, Customer Surname, Product Color, etc.
• Within the DWH, Dimensions are stored in Dimension Tables
• Dimension Members are the values stored in Dimensions

Dimensional Modeling
• Dimension Modeling come in two flavors
• Star Schema
• Snowflake Schema
• Star Schema
• Dimensions have direct relationship with fact tables
• Snowflake Schema
• Dimension may have an indirect relationship with fact fables

Star Schema
Screenshot taken from Wikipedia

Snowflake Schema
Screenshot taken from Wikipedia

Star Schema
• Pros
• Easy to understand and to query
• Offers very good performances
• Well supported by SQL Engines (e.g.: Star-Join optimization)
• Cons
• May require a lot of space
• Make dimension update and maintenance harder
• Somehow rigid

Snowflake Schema
• Pros
• Less duplicate data
• Easier dimension update
• Flexibility
• Cons
• (Much) More complex to understand
• (Much) More complex to query
• In turn this means: more resource-hungry, slower, expensive

Snowflake or Star schema?
• Feel free to design the Data Warehouse as you prefer, but
present a Star Schema to OLAP engine or to the End User
• Views will protect end-users from model complexity
• Views will guarantee that you can have all the flexibility you need to
properly model your data
• Views will allow to make changes in future (e.g.: moving from Star to
Snowflake)
• If in doubt, start with the Star Schema
• Is usually the preferred solution
• So start with this one, you can always change your mind later
• Remember, we embrace changes 

Understand fact granularity
• Before doing physical design
• Understand facts granularity
• Understand if and how historical data should be preserved
• Granularity is the level of detail
• Granularity has to be agreed with SME and Decision Makers
• Data should be stored at the highest granularity
• Aggregation will be done later
• Must be defined both for facts and dimensions

Deal with changes in dimension data
• Two options:
• Keep only the last value
• Keep all the values
• Kimball has defined specific terminology
• “Slowly Changing Dimension”
• Kind of Architectural Pattern (well known, universally recognized)
• Three type of SCD
• 1, 2 and 3 
• Mix of them

SCD Type 1
• Update all data to last value
• Use Cases
• Correct erroneous data
• Make the past look like the present situation
• E.g.: A Business Unit changed its name

SCD Type 2
• Preserve all past values
• Use Cases
• Keep the information known at the time the fact occurred
• Avoid inconsistent analysis

SCD Type 3
• Preserve only the last valid value before the current (“previous”
values)
• Use Cases
• I’ve never seen it in use 

Other well-known objects
• Junk Dimensions
• Generic Attributes that do no belong to any specific dimension
• They are grouped in only one dimension in order to avoid to have too many
dimensions, since this may “scare” final user
• Degenerate Dimensions
• Dimension generated from the fact table
• E.g.: Invoice Number

Fact Table Types
• Kimball has defined two main types
• Transactional
• Snapshot
• Again, kind of Architectural Pattern (well known, universally
recognized)
• We proposed a new fact table type at PASS Summit 2011
• Temporal Snapshot
• http://www.slideshare.net/davidemauri/temporal-snapshot-fact-tables

Transactional Fact Table
• Used to store «Transactional Data»
• Sales
• Invoices
• Quantities
• Each row represent an event happened in a specific point in
time

Snapshot Fact Table
• Useful when you need to store inventory/stock/quotes data
• Data that is *not* additive
• Store the entire situation of a precise point in time
• «Picture of the moment»
• Expensive in terms of data usage
• Usually snapshot are at week level or above (months / semester etc.)
• Thought Column-Oriented storage can help a lot here

Temporal Snapshot Fact Table
• New approach to store snapshot data without doing snapshots
• Each rows doesn’t represent a point in time but a time interval
• It seems easy but it’s a completely new way to approach the problem
• Bring Temporal Database theory into Data Warehousing
• Free PDF Book online:
http://www.cs.arizona.edu/people/rts/tdbbook.pdf

• Allows the user to have daily (or even hourly) snapshot of data
• Avoids data explosion
• Look in the
• PASS 2011 DVDs, SQL Bits 11 website (shorter version), SlideShare
(shorter version)

Many to Many relationships
• How to manage M:N relationships between dimensions?
• e.g.: Books and Authors
• An additional table is (still) needed
• The table will not hold facts (in the BI meaning)
• Hence it will be a “factless” table
• Or – better – a Bridge table
• The OLAP engine must support such modeling approach

Bridge / Factless Tables
• Bookstore sample:
• The bridge table (usually) doesn’t contain facts…so it’s a factless table. It’s only used
to store M:N relationship.
• In really it could happen that a fact table also act as a bridge/factless table
Sales Fact
Table
Book
Dimension
Author
Dimension
Sales
Factless
(Bridge)
Table

Generic Modeling Best Practices
• Don’t create too many dimensions
• Keep It Super Simple
• If you have a lot of attributes in a dimension and some are SCD1 and
some SCD2 it may make sense to split the dimension in two
• If a dimension become huge (>1M rows) its worth to analyze how to
split it into two or more dimensions
• Keep security in mind right from the very first steps
• Since this may require you to change the way you model your Data Warehouse

Architecture is well known
• We now have «architectural» elements of a BI solution
• Inmon / Kimball / Other
• Star Schema / Snowflake Schema
• Facts & Dimensions
• In some specific cases we also have well-known «Design
Pattern»
• Slowly Changing Dimensions

Implementation is problematic
• So, from an architectural point of view, we can be happy. But
from the implementation standpoint, what we can say?
• Each time we have to start from scratch
• Every person has its own way to implement the architectural solutions
adopted
• The quality of the implementation is directly proportional to the
experience of the implementer

Time lost in low-value work
• You lose a lot of time in implementing “technical” stuff. Time that
is subtracted from the identification of the optimal resolution to
the business problem
• Ex: load an SCD type 2. How much you’ll spend on its development?
• From 2 days to 10 days depending on the experience that you have
• An a minimum of 2 days is still there
• Since there are no standard implementation rules, each one
applies its own
• That works, but everyone is different

Choices
• In the development of a BI solution you will need to make a lot
choices in terms of architecture and implementation
• Every choice we make brings pros and cons
• It will impact the future of the solution
• How do you choose? Who chooses? Why? All the people in the team
are able to make autonomous choices?
• How can you be sure that all those choices do not conflict with each
other?
• Especially when performed by different people?

Reaching the goal - 1
• This is the situation
• Everyone follows his own path
• It will be better to work in harmony …
• …with common rules
Target

DW is a TeamWork
• Problems arises when the team is made of several people
• One work well alone
• «Geniuses» (or geniuses-wannabe ) work well together
• We need to do a “exceptional” job with “normal” people. Smart and
willing but “normal”
• Must be "guaranteed" a minimum quality regardless of who does the work
• It must be easy to "scale" the number of people at work
• It must be easy to replace a person
• It’s vital to allow people to do what they do best: to give added value to the
solution. The "monkey work" should be as small as possible.

Software Engineering for BI
• «Software Engineering is the application of a systematic,
disciplined, quantifiable approach to the development,
operation, and maintenance of software, and the study of these
approaches; that is, the application of engineering to software”
IEEE Computer Society

With clear and well defined rules…
• We’d like to have this!
• So, we need to formally define
our rules for work
Target

Objectives
• What are the objectives we want to set?
• It must be possible to "change our mind" during development (and thus
being independent of the initial architectural choices)
• Each person must be able to solve the given problem in a personal
way, but the implementation of the solution should be made following a
common path
• Careless mistakes and errors due repetitive processes should
be minimized
• It must be possible to parallelize and (when possible) to
automate the work
• The solution must be testable
• It must have rigidity and flexibility at the same time
• It should be “adaptive”!

Achieve a common goal
• Everything must be designed to achieve a common goal:
• Spend more time to find the best solution to the business problem
• Spend (much) less time to implement the solution
• making as few mistakes as possible
• preventing common mistakes
• In other words, take the best from each player on the field
• Men -> Added value: Intelligence
• Machine -> Added value: Automation

Engineering The Solution
• A set of rules that defines
• Naming Convention
• Mandatory Objects / Attributes
• Standard implementations of solutions to common problems
• Dependencies between objects
• Best practices and development methodology
• Each and every rules has purpose to
• Prevent Errors
• Set a Standard
• Assure Maintainability
• Help Team Scale-Out
• Let developer concentrate more on solving the business problem and less on the
implementation

• All rules presented here are born from real-world experience
• Following the Agile Principle of Simplicity
• Metadata are embedded in the rules
• Sometimes this bring to some ugly solutions…
• …if you want to avoid this, external files/documents MUST be
maintained

• A BI Solution has three main layers
• Producers
• Coordinators
• Consumers
• Producers Layer
• Contains all the data sources
• Coordinators Layer
• Contains all objects that process source data into a Data Warehouse
• Consumers Layers
• Where Data Warehouse data is consumed

• A BI solution can be thought as made of
3 different layers
• Data flows from and only from lower
levels to higher levels
• Higher levels doesn’t know how data is
managed in lower levels
• (Information Hiding Principle)
Producers
Coordinators
Consumers

Databases
• Core
• Configuration
• Staging
• Data Warehouse
• Optional (recommended)
• Helper
• Support
• Log
• Metadata
OLTP SYS 1 OLTP SYS 2
Helper 1 Helper 2
Staging
Data
Warehouse
Configuration
MetadataLog

Helper 1 Helper 2
Staging
Data
Warehouse
Configuration
Cub
e
Repor
ts
Producer
Coordinators
Consumers

Databases
• Helper
• Contains object that permits to
access the data from the OLTP
database.
Helper 1 Helper 2
Staging
Data
Warehouse
Configuration

Databases
• Staging
• Contains intermediate “volatile”
data
• Contains ETL procedures and
support objects (like err tables)
Helper 1 Helper 2
Staging
Data
Warehouse
Configuration

Databases
• Configuration
• objects that add additional value
to the data (e.g.: lookup tables)
• objects that allows the BI solution
to be configurable, like, for which
company load data
Helper 1 Helper 2
Staging
Data
Warehouse
Configuration

Databases
• Data Warehouse
• The final data store
Helper 1 Helper 2
Staging
Data
Warehouse
Configuration

Databases
• Metadata
• Contains all the information needed to automate the creation and the
loading of
• Staging
• Data Warehouse
• Log
• Guess? 

Databases
• Naming Convention:
• projectname_*
• * = CFG, LOG, STG, DWH, MD, HLP
• Databases Files
• STG & DWH databases MUST be created with 2 filegroups (at least)
• PRIMARY (system catalogs),
• SECONDARY (all other table). This is the default filegroup
• Strongly recommended also for other databases

Schemas
• Schemas helps to
• create logical boundaries
• distinguish objects scopes
• Several Schemas used to identify the different scopes
• stg, etl, cfg, dwh, tmp, bi, err, olap, rpt
• optional “util” schema to store utility objects
• eg: fn_Nums, a function to generate numbers
• A schema (generally) cannot be used in more than one database
• Prevents careless mistakes

Schemas
bi
Helper
stg
etl
tmp
err
util
Staging
dwh
olap
rpt
DWH
bi
OLTP
cfg
Config
md
MetaData
log
Log

Views
• Views are the key of abstraction
• Shields higher levels from the complexity of underlying levels
• Used throughout the entire solution to reduce “friction” between
layers and objects
• Apply the “Information Hiding Principle” (helps to have teams that work
in parallel)
• Helps to auto-document the solution

Views
• General Rules
• Do basic data preparation in order to simplify SSIS package
development
• Casts
• Column rename
• Basic Data Filtering
• Simple data normalization and cleansing
• Join tables

Stored Procedures
• Their usage should be very very limited
• The majority of ETL logic is in SSIS
• Usage
• Incremental Load/Management
• SCD loading (MERGE)
• Dummy member management
• Additional abstraction that helps to avoid to change SSIS packages
• for debugging (import one specific fact table row)
• for optimizations (eg: query hints)
• for ordering data

Basic Concepts
• Dimension will gather data from one or more data source
• Dimension will holds key value of each source entity (if
available)
• The “Business Key”

Basic Concepts
• Business Key won’t be used to relate Dimension to Fact table
• A surrogate key will be created during ETL phase
• The surrogate key with be used to create the relationship
• The Surrogate key has several advantages
• Is meaningless
• Is small
• Is independent from the data source
• Helps to make the fact table smaller

Why Integer Keys are Better
• Smaller row sizes
• More rows/page = more compression
• Faster to join
• Faster in column stores

Dimensions – Example
• Data comes from three tables: Departments, SubDepartmens
and Working Area (sample model from a Logistic company)
Business Keys «Payload»Surrogate Key

Dimensions – Key points
• A dimension is (usually) created using data coming from master
data or reference tables
• OLTP PK/AK -> Business Key
• Dimension PK will be artificial and surrogate

SCD Type 1
• Scope
• Update data to last value
• Implementation
• UPDATE

SCD Type 2
• Scope
• Keep the all the past values and the current ones
• Implementation
• Row Valid Time + UPDATE + INSERT

SCD Type 3
• Scope
• Keep the current value and the one before that only
• Implementation
• Specific Columns + UPDATE

SCD Key vs BK
• We defined the SCD Key as the key used to lookup dimension
data while loading the fact table
• It may be not made by *ALL* BK
• It’s an ALTERNATE KEY (and thus is UNIQUE)

Hierarchies
• In our sample the dimension also holds a (natural) hiearchy
• Department > Subdepartment > Working Area

Things to keep in mind
• Huge dimension (>1M members)
• Evaluate to split it in two
• Dimension with SCD1+SCD2 attributes
• Evaluate to split it in two
• Security: keep it in mind from the beginning since it may be a
painful process if done after

Dimensions Rules
• Dimensions has to be created in
• Database: DWH - Schema: dwh
• Table rules
• Name: dim_<plural_dimension_name>
• Dimension key: id_<table_name>
• Surrogate / Artificial Key
• Business Key: prefixed by bk_
• Additional mandatory columns
• last_update (datetime) or log id (int)
• scd1_checksum / scd2_checksum
• only one or both, depending on scd usage

Dimensions Dummy Values
• Add at least one «dummy» value
• To represent a “not available” data
• Dummy value rules
• Dimension key: negative number
• Business Key: NULL
• Fixed values for text and numeric data
• Text: “N/A” or “Not Available”
• Choose appropriate terms if more than on dummy exists
• Numeric: NULL

Date Dimension
• Date Dimension is an exception
• Key (id_dim_date) is not
meaningless
• Integer Data Type
• Format: yyyymmdd
• This allows easier queries on the fact table and usage of negative
dummy values for dummy members
• Eg: Unknown Date, Erroneous Date, Invalid Date
• Don’t need last_update and scd_checksum mandatory columns

Time Dimension
• Time Dimension is also exception
• Key (id_dim_time) is not
meaningless
• Integer Data Type
• Format: hhmmss
• Don’t need last_update
and scd_checksum
mandatory columns
• If not mandatory Drill-Down, Date & Time should be two separate
Dimensions

Fact Tables
• More than one table may exists within the same DW solution
• Different Granularity? Different Fact Table!
• It’s only important that they all use the same dimensions
• where applicable
• Example: Product Sales and Product Costs
• This allows to make coherent queries

Transactional Fact Table
• «total_amount» can just be summed up to get aggregated
values for all possible combination of dimension values

Snapshot Fact Table
• All data is stored for each snapshot taken.
• «Snapshot Date» Mandatory for almost all analysis

• Each row represent an interval (max one year wide)
12
6
Underlying interval: 20090701->20090920

• Some real-world usage
• Using Temporal Fact
• 148.380.542 Rows that uses 13 GB
• Without this technique we would have had
• 11.733.038.614 Rows that would have used 1TB of data
• This just for one month. So for one year we would have more than
10TB of data.

Fact Tables
• Fact Tables has to be created in
• Table rules
• Table: fact_<plural_fact_name>
• Fact key: id_[fact]_<table_name>
• insert_time (datetime) or log id (int)
• Foreign Key to Dimensions: not needed
• Put into fact table the business key columns of the source OLTP table to ease
debugging and error checking
• If BK are not too big 
• Business Key: prefixed by bk_

Factless/Bridge Tables
• Factless/Bridge Tables has to be created in
• Table rules
• Table: factless_<plural_table_name>
• Factless key: not needed
• Foreign Key to Dimensions: not needed
• insert_time (datetime) or log id (int)

The DW Query Pattern
SELECT foo [..n], <aggregate>(something)
FROM dwh.fact F
JOIN dwh.dim_a A
ON F.id_a = A.id_a
JOIN dwh.dim_b B
ON F.id_b = B.id_b
WHERE <filter>
GROUP BY foo [..n]

The expected Relational Query Plan
Partial
Aggregate
Fact CSI Scan
Dim Scan
Dim Seek
Batch
Build
Batch
Build
Hash
Join
Hash
Join
Has
h
Stream
Aggregate

Loading the Data Warehouse
• Loading the DWH means doing ETL
• Extract data from data sources
• Databases, Files, Web Services, etc.
• Transform extracted data so that
• It can be cleansed and verified
• It can be enriched with additional data
• It can be placed into a star-schema
• Load data into the Data Warehouse

• ETL is usually the most complex and long phase
• roughly 80% of the entire work is done here
• Integration Services is the engine we use to do ETL
• Very very fast
• Completely In-Memory
• 64 bits aware
• Very scalable

• SSIS does NOT substitute T-SQL
• T-SQL and set based-operations are still faster
• When possible avoid working on per-row basis but favor «set-based»
operations
• Just keep in mind that you have to deal with the t-log
• They are complementary work together
• T-SQL: ideal for “simple” set-oriented data manipulation
• SSIS: ideal for complex, multi-stage, data manipulation
• Advanced scripting through SSIS Expression or .NET

• Integration Services and T-SQL plays the major role here
• .NET help may be needed from time to time for complex transformations
• Our objective: create an ETL solution such in a way is almost auto-
documented
• It should be possible to understand what ETL do, just «reading» the SSIS
Packages
• Following the KISS principle, avoid to mix ETL logic
• “Simple” ETL logic in views
• “Complex” ETL logic in SSIS Packages

• SSIS will NEVER load data directly from a table
• ALWAYS go through a view
• View will decrease complexity of package and make it loosely coupled with
the database schema
• This will make SSIS development easier
• Simple filtering changes or joins can be changed here without having to touch
SSIS
• SSIS Package are like applications!
• Only one exception to this rule will be seen in loading Fact and
Dimension tables
• Exception is made since there is a case where using a view will not decrease
complexity

Divide et Impera
• To be able to be Agile is *vital* to keep business and technical
process completely separated
• Business Process: ETL logic that can be applied only to the
specific solution you’re building
• Technical Process: ETL logic that can be used with any Data
Warehouse and that can be highly automated

Divide et Impera
• Follow the “Divide et Impera” principle
• Move data from OLTP to Staging
• Move data from Staging to Data Warehouse
• Create at least two different SSIS solutions
• One to load the Staging Database
• One to load the Data Warehouse Database

Divide et Impera
STG
ETLETL
OLTP DWH
ETL
Technical
Process
Business
Process
Technical
Process

Loading the Data Warehouse – Step 1
OLTP STGExtract
& Load
Views
HLP
Other
Data
Sources

• First step is to load data into staging database
• From Data Sources
• NO “Transformation” here, just load data as is
• In other words, create a copy of OLTP data used in the BI solution
• Total or Partial in case of Incremental Load
• This will make us free to do complex ETL queries without interfering with
production systems
• Only filter data that by definition should not be handled by BI solution
• Sample or Test data

The “Helper” database
• Create views to expose data that will be used to create DWH
• Views are simple “SELECT columns FROM…”
• no data transformation allowed
• no casts, no column renaming, no data cleansing
• only filter data that should never ever be imported into DWH
• eg: customer id 999 which is the “test customer”
• Views has to be put in the bi schema

STG
ETL
Views
StoredProcedures
TMP ERR
CFG

• Second step is to transform data so that it can be loaded into
the Data Warehouse
• “Transform” can be a complex duty
• Transform = Cleanse, Check, De-Duplicate, Correct
• Data may have to go through several transformations in order to reach
the final shape
• All intermediate values will never go out the staging database
• Here is where you’ll spend most of your time

The “Configuration” database
• “Configuration” data
• Data non available elsewhere
• E.g.: lookup tables of “Well-Known” values
• E.g.: C1 -> Company 1, C2 -> Company2
• Tables used to hold “configuration” data
• Use the cfg schema

The “Staging” Database
• Contains a copy of OLTP data
• Only the needed data, of course 
• Copying data is fast. This allows us to avoid to use OLTP database for
too long
• Avoid concurrency problems
• All further work will be done on the BI server an won’t affect OLTP performances
• Data from tables from the OLTP data sources has to be copied
into staging tables
• tables must have the same schema of OLTP tables
• staging tables has to be created in the staging schema

• Contains intermediate tables used to transform the data
• Favor usage of several intermediate tables (even if you’ll use more
space) instead of doing everything in memory with SSIS
• This will make debugging/troubleshooting much more easier!
• The correct balance to decide how many intermediate tables are needed has to
found on per-project basis

• Tables used to hold data coming from files
• E.g.: Excel, Flat Files
• Use the etl schema
• Tables used to hold intermediate data
• Use the tmp schema
• Objects used in the ETL phase
• Views, Stored Procedures, User-Defined Functions, ecc..
• All these objects must be placed in the etl schema

• Views prepare data to be further processed by SSIS
• SSIS read data only from views
• Source view naming convention
• vw_<logical_name>
• E.g.: etl.vw_claims
• Destination table naming convention
• <logical_name>
• E.g.: tmp.claims
• If ETL has to be done in more than one step
• append the «step_number» to objects_name
• E.g.: etl.vw_claims_step_1, tmp.claims_step_1

• Views take care of creating a “logical” view of dimension or fact
data
• rename columns to give human understandable meaning
• CAST data types in order to make them consistent with the one used in
DWH
• perform basic data filtering and data re-organization
• eg: flatten hiearchies to “n” columns, trim white spaces
• perform basic ETL logic
• CASE statments, ROW_NUMBER, Joins, Ecc.

• ETL Stored procedures are used only to manage dimension
loading (SCD 1 or 2) and Dummy Members:
• Naming convention:
• etl.stp_merge_dim_<dimension target>
• etl.stp_add_dummy_dim_<dimension target>

• The err schema contains table that holds rows with errors that
cannot be corrected or ignored (rows that cannot be processed)
• For example: you have a temporal database and for some rows you
find that “Valid To” happens before “Valid From”
• This data can be later exposed to SMEs in order to fix it
• Is interesting to note that already in the middle of development the BI
solution become useful
• Helps to increase data quality

STG DWH
SSIS
Views
StoredProcedures

• Third step is the loading of Data Warehouse
• Very simple: just take the transformed data from staging database and put it
into Facts and Dimensions
• Load all dimensions
• Generate dimension IDs
• Load fact tables
• “Just” convert business keys to dimension IDs
• Not so easy 
• Must handle incremental loading
• Mandatory for dimensions (otherwise you may have problems if loaded data have
different dimension ID)
• Would be nice also for facts
• More complex when you have «early arriving facts»/«late arriving
dimensions»

Handling Dimension Keys
• Mapping Source Dimension Keys (the BK) to the surrogate
Dimension ID may be more complex that what expected. You may
encounter several key «pathologies»
• Composite Keys, Zombie Keys, Multi Keys, Dolly Keys
• A good way to solve the problems is to add an additional abstraction
layer, using mapping tables
• Thomas Kejser has some very good posts on that here
• http://blog.kejser.org/tag/keys/

The “Data Warehouse” database
• DWH database must contain only
• tables related to the dwh fact, factless and dimensions
• all tables must be in the dwh schema
• Views to allow access to physical tables
• use specific schemas to expose data to other tools
• use olap schema for views used by SSAS
• use rpt schema for views used by SSRS
• Add your own schema depending on the technology you use
• Or even create a Data Mart out of the Data Warehouse!

• Stored Procedures
• If needed for reporting purposes must be put into the reporting schema
• No other use allowed

• Dimension loading
• Always incremental
• With all the rules in place there is only one way to load them 
• Of course it there may be differences on per dimension-basis
• But is just like building an house. No two house are identical, yet all are built following
the same rules
• This means that it can be completely automatized!

• Fact tables loading
• Incremental would be nice
• But it may be not an easy task
• SQL Server 2008 CDC in the source can help a lot
• Sometimes just dropping and re-loading the facts is the most effective solution
• Rarely for the entire table
• More common with time-partitioning
• FAST load of fact tables:
• Drop and re-create indexes
• Remove Compression and add it later
• Load Partitions in Parallel
• A tool to automatize partitioned table managing exists 
• SQL CAT Partition Management Tool

Improving DW Querying Performance
• Use ColumnStore Indexes to speed up queries against the DW
(if you’re not using other additional solutions)
• Try to keep Factless/Bridge table as small as possible. A
Whitepaper details how to implement a «proprietary»
compression that works extremely well:
• http://www.microsoft.com/en-us/download/details.aspx?id=137

Tools that helps
• Use Multiple Hash Component to calculate hash values
• http://ssismhash.codeplex.com/
• When looking up SCD2 dimension, try to avoid the default
Lookup transformation since it does not support FULL cache in
this scenario. Matt Masson has a very good post no how to
implement «Range Lookups»
• http://bit.ly/SSISRangeLookup

Integration Services Rules
• Avoid usage of OLEDB Command in DataFlow
• It’s just too slow, prefer a set-based solution
• Try to do as much as transformation / operations here and NOT in
SSAS or SSRS
• In other words: avoid to spread ETL process all around
• Always read from views
• Use of OPTION(RECOMPILE) is encouraged so that we can have optimum
plans
• Except for Dimension loading lookup component
• (Doesn’t help to lower complexity)

• Package Naming Convention
• Use “setup_” prefix for all packages that contains logic that must to be
run in first place in order to be able to load data
• Use “load_” prefix for all packages that loads data into “final” tables
• E.g.: staging tables, dwh tables
• Use “prepare_” prefix for all packages that transform data in order to
make it usable by another transformation phase
• E.g.: tmp tables
• Use a sequence number (###)
• To group all independent packages
• To quickly identify package dependencies

Integration Services Rules - Staging
load_DFKKKO
load_DFKKOP
load_BUT000
load_<xxxxxxxx>
prepare_010_orders
prepare_010_customers
prepare_020_invoices
prepare_020_orders
All these packages are
independent from each
other and can be run
simultaneously
simultaneously, but
works on data loaded by
“load_” packages
simultaneously, but
works on data loaded by
previous “prepare_”
packages

Integration Services Rules - DWH
load_dim_time
load_dim_customers
load_dim_products
load_dim_categories
load_dim_geography
load_fact_orders
load_fact_invoices
load_fact_costs
load_factless_products_categories
First load all Dimensions
Than load all Facts
Then load all Factless

• One “action” per package!
• With SQL Server 2012+ use Shared Connections and the «Project»
deployment model
• Use one or more “Master Package” to execute packages in the correct sequence /
parallelism
• With Previous Versions Try to make sure that all packages of the same
layer (STG or DWH) uses the same connection managers
• In this way you can have only one configuration file to configure connections when
running packages
• Don’t bother too much about logging
• SQL Server 2012+ has native support
• http://ssis-dashboard.azurewebsites.net/
• If using SQL Server 2005 or 2008/R2 use DTLoggedExec
• http://dtloggedexec.codeplex.com/

Building a DWH in 2013
• Is still a (almost) manual process
• A *lot* of repetitive low-value work
• No (or very few) standard tools available

How it should be
• Semi-automatic process
• “develop by intent”
• Define the mapping logic from a
semantic perspective
• Source to Dimensions / Measures
• (Metadata anyone?)
• Design the model and let the
tool build it for you
CREATE DIMENSION Customer
FROM SourceCustomerTable
MAP USING CustomerMetadata
ALTER DIMENSION Customers
ADD ATTRIBUTE LoyaltyLevel
AS TYPE 1
CREATE FACT Orders
FROM SourceOrdersTable
MAP USING OrdersMetadata
ALTER FACT Orders
ADD DIMENSION Customer

The perfect BI process & architecture
Iterative!

Invest on Automation?
• Faster development
• Reduce Costs
• Embrace Changes
• Less bugs
• Increase solution quality and
make it consistent throughout
the whole product

Automation Pre-Requisites
• Split the process to have two separate type of processes
• What can be automated
• What can NOT be automated
• Create and impose a set of rules that defines
• How to solve common technical problems
• How to implement such identified solutions

No Monkey Work!
Let the people think and
let the machines do the
«monkey» work.

Design Pattern
“A general reusable
solution to a commonly
occurring problem within
a given context”

Design Pattern
• Generic ETL Pattern
• Partition Load
• Incremental/Differential Load
• Generic BI Design Pattern
• Slowly Changing Dimension
• SCD1, SCD2, ecc.
• Fact Table
• Transactional, Snapshot, Temporal Snapshot

Design Pattern
• Specific SQL Server Patterns
• Change Data Capture
• Change Tracking
• Partition Load
• SSIS Parallelism

Engineering the DWH
• “Software Engineering allows and require the formalization of
software building and maintenance process.”

Sample Rules
• Always put «last_update» column
• Always log Inserted/Updated/Deleted rows to log.load_info table
• Use FNV1a64 for checksums
• Use views to expose data
• Dimension & Fact views MUST use the same column names for lookup
columns

Engineering the DWH
There are two intrinsc
processes hidden in the
development of a BI
solution that must be
allowed (or forced) to
emerge.

Business Process
• Data manipulation,
transformation, enrichment &
cleansing logic
• Specific for every customer.
Almost not automatable

Technical Process
• Application of data extraction
and loading techniques
• Recurring (pattern) in any
solution
• Highly Automatable

Hi-Level Vision
STG
ETLETL
OLTP DWH
ETL
Technical Process
Business Process
Technical Process

ETL Phases
• «E» and «L» must be
• Simple, Easy and Straightforward
• Completely Automated
• Completely Reusable
• «E» and «L» have ZERO value in a BI Solution
• Should be done in the most economic way

Source Incremental Load
E
In this scenario,
“ID” is a IDENTITY/SEQUENCE.
Probably a PK.

Source Differential Load/1
E
In this scenario the source table
doesn’t offer any specific way to
Understand what’s changed

Source Differential Load/2
E
In this scenario the source table
has a TimeStamp-Like column

Source Differential Load
• SQL Server 2012 that can help with incremental/differential load
• Change Data Capture
• Natively supported in SSIS 2012
• http://www.mattmasson.com/2011/12/cdc-in-ssis-for-sql-server-2012-2/
• Change Tracking
• Underused feature in BI…not so rich as CDC but MUCH more simpler and easier
E

SCD 1 & SCD 2
L
Start
Lookup Dimension Id
and MD5 Checksum
From Business Key
Calculate MD5
Checksum of Non-
SCD-Key Colums
Dimension Id is
Null?
Yes
Insert new members
into DWH
No
Checksum are
different?
Yes
Store into temp
table
Merge data from
temp table to DWH
End

SCD 2 Special Note
• Merge => UPDATE Interval + INSERT New Row
L

Parallel Load
• Logically split the work in several steps
• E.g: Load/Process one customer at time
• Create a «queue» table the stores information for each step
• Step 1 -> Load Customer «A»
• Step 2 -> Load Customer «B»
• Create a Package that
• Pick the first not already picked up
• Do work
• Back to step 3
• Call the Package «n» times simultaneously
EL

Other SSIS Specific Patterns
• Range Lookup
• Not natively supported
• Matt Masson has the answer in his blog 
• http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-
lookups.aspx

Metadata
• Provide context information
• Which columns are used to build/feed a Dimension?
• Which columns are Business Keys?
• Which table is the Fact Table?
• How Fact and Dimension are connected?
• Which columns are used?

How to manage Metadata?
• Naming Convention
• Specific, Ad Hoc Database or Tables
• JSON
• Other (XML, File, ecc.)

Naming Convention
• The easiest and cheapest
• No additional (hidden) costs
• No need to be maintained
• Never out-of-sync
• No documentation need
• Actually, it IS PART of the documentation
• Imposes a Standard
• Very limited in terms of flexibility and usage

Extended Properties
• Support most of metadata needs
• No additional software needed
• Very verbose usage
• Development of a wrapper to make usage simpler is feasible and
encouraged

Metadata Objects
• Dedicated Ad-Hoc Database and Tables
• As Flexible as you need
• Maintenance Overhead to keep metadata in-sync with data
• Development of automatic check procedure is needed
• DMV can help a lot here
• Need a GUI to make them user-friendly

JSON
• Could be expensive to keep them in-sync
• A tool is needed, otherwise too much manual work
• User and Developer Friendly!
• VERY flexible
• If too much JSON.Net Schema may help
• Supported by Visual Studio
• An SQL Server 2016

Automation Scenarios
• Run-Time: «Auto-Configuring» Packages
• Really hard to customize packages
• SSIS limitations must be managed
• Eg: Data Flow cannot be changed at runtime
• On-the fly creation of package may be needed
• Design-Time: Package Generators / Package Templates
• Easy to customize created packages

Automation Solutions
• Specific Tool/frameworks
• BIML / MIST
• SQL Server Platform
• SQL, PowerShell, .NET
• SMO, AMO

Package Generators
• Required Assemblies
• Microsoft.SqlServer.ManagedDTS
• Microsoft.SqlServer.DTSRuntimeWrap
• Microsoft.SqlServer.DTSPipelineWrap
• Path:
• C:Program Files (x86)Microsoft SQL Server110SDKAssemblies

Useful Resources
• «STOCK» Tasks:
• http://msdn.microsoft.com/en-us/library/ms135956.aspx
• How to set Task properties at runtime:
• http://technet.microsoft.com/en-
us/library/microsoft.sqlserver.dts.runtime.executables.add.aspx

BIML – BI Markup Language
• Developed by Varigence
• http://www.varigence.com
• http://bimlscript.com/
• MIST: BIML Full-Featured IDE
• Free via BIDS Helper
• Support “limited” to SSIS package generation
• http://bidshelper.codeplex.com

Data Warehouse Unit Test
• Before releasing anything data in the DW must be tested.
• User has to validate a sample of data
• (e.g.:total invoice amount of January 2012)
• That validated value will become the reference value
• Before release, the same query will be executed again. If the data is
the expected reference data then test is green otherwise the test
fails

Data Warehouse Unit Test
• Of course test MUST be automated when possibile
• Visual Studio
• BI.Quality (on CodePlex…now old)
• Based on Nunit
• NBI is the new way to go http://www.nbi.io/ !
• Based on Nunit
• What to test?
• Structures
• Aggregated results
• Specific values of some «special» rule
• Fixed bugs/tickets
• Values in the various layers

Modern Data Environment
Master
Data
EDW
Data Mart
Big Data
Unstructured
Data
BI Environment
Analytics Environment
Structured
Data
Data Scientist
Decision Maker

Modern Data Environment - Details
Files
Web Svc
Cloud /
Syndicated
RDBMS
Master Data
E
x
tr
a
c
t
Archive / Big Data
Facts
Staging
Archive
Replay
DimensionsStandardise
Extract
Cube
V-Mart
Mart
Mart
Copy
Facts
Facts
Process
Secure
/ Expose
Aggregate
Transform

Inside The Data Warehouse
SSIS
source tables
stg.* tables
etl.*
tables
tmp.*
tables
dwh.* tables
olap.* views report.* views
ReportingAnalysis
config.*
tables
etl.* objects
SSIS
bi.* views

What’s Next?
• Now that the DW is ready, any tool can be used to create a
BI/Reporting solution on a solid and simpler, user friendly,
ground.
• Reporting
• Reporting Services / Business Object / Microstrategy / JasperReports
• Analysis
• Analysis Services, Cognos
• Power Pivot, QlikView, Tableau, Power BI

A Starting Point
• The presented content can be used as is or as a starting point to build your
own framework
• Extend the content when it doesn’t fit in your solution (for example: add
additional databases, like «SYSCFG» if this help you)
• Define your rules! Drive the tools and be not driven by them!
• Keep the layers separated and favor loose coupling (less «friction» to
changes)
• Spread the idea of Unit Testing Data even if at the beginning it seems and
expensive approach.

Real World Samples
• The presented content comes from on-the field experience
• More than 40 (successful) project using the proposed approach
• More than 2000 packages managed (biggest solution: 572 packages)
• Several team involved (biggest team: 12 people)
• Several customer grown their own standard starting from this
• Data coming from ANY source: SAP, Dynamics DB2, Text or Excel Files

Some challenges faced
• Changed and entire accounting system, moving from one vendor to another
• DWH and OLAP/Reporting solution completely untouched. 2/3 of budget saved
• Started with a full load only and the added incremental load later
• Less then 5% of Extract and Load logic changed (Transformations untouched)
• Created a solution in 3 month with a minimal set of features and evolved and
grown in to be an enterprise data warehouse / BI solution.
• Monthly Delivery.
• Never release bad data (helped to correct errors in the source systems)
• Helped an enterprise company to reduce time spent on crunching data by 66%
percent.

Latest challenges faced
• Supported on a *big* electronic retail company in creating their
BI/DSS solution on their shiny new Dynamics CRM installation.
• During CRM Development.
• The first specification document for reporting was very “agile”…
• “What do you need?”: “Don’t know, but all”

Agile Data Warehousing

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Agile Data Warehousing

Similar a Agile Data Warehousing (20)

Más de Davide Mauri

Más de Davide Mauri (20)

Último

Último (20)

Agile Data Warehousing

Notas del editor