SlideShare a Scribd company logo
1 of 46
Download to read offline
ITA BI Roundtable
    Dimensional Modeling: Organizing your Data for Analytics




    Jeff Block, Managing Consultant
    Jeff.Block@neudesic.com
    (847) 924-1317




1     © 2001-2010 Neudesic, LLC. All rights reserved.
Welcome
                                                        to the

                         ITA
                 Business Intelligence
                     Roundtable

2   © 2001-2010 Neudesic, LLC. All rights reserved.
Who am I?




                                                         Jeff Block, Neudesic
                                                       BI Roundtable Chairman




3    © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?


                                                       Today’s Agenda

          Brief Introduction
          Who’s in the room?
          Presentation:
              Organizing your Data for Analytics
          Discussion / Networking
          Coming up Next Month



                                                              4
4    © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?


                                                       Today’s Agenda

          Brief Introduction
          Who’s in the room?
          Presentation:
              Organizing your Data for Analytics
          Discussion / Networking
          Coming up Next Month



                                                              5
5    © 2001-2010 Neudesic, LLC. All rights reserved.
Why are we here?
    What kind of session is this?

    • 2nd Tuesday of every month; 8-10 AM
            – Here at the ITA TechNexus unless there’s a good reason to change
              venues
    • Sometimes a presentation
            – My ideas, your ideas, case studies, best practices, panel discussions,
              new developments, etc
    • Sometimes an outside speaker
            – Love to have some of you step up to the plate
    • Always discussion
            – Collaboration is the whole point of this group
    • Always networking
            – Meet people who will be valuable connections


                                                       6
6    © 2001-2010 Neudesic, LLC. All rights reserved.
Why are we here?
    Topics and Target Audience

    • Business and technology leaders
            – Not going to spend much time deep in the technical weeds
    • Those who want to
            – Learn from each other
            – Collaborate on solutions
            – Network
      in the BI space
    • ITA members and their friends and their friends
      and …


                                                       7
7    © 2001-2010 Neudesic, LLC. All rights reserved.
Why are we here?
    In Scope

    • Business Intelligence
            – Vision and strategy
    • Planning and implementing BI initiatives
            –    High-level architecture
            –    Best practices / Anti-patterns
            –    Case studies
            –    Etc
    • What about data warehousing?
            – It’s in! (part of BI, in my world)




                                                       8
8    © 2001-2010 Neudesic, LLC. All rights reserved.
Introduction
    What is Business Intelligence?

      Business Intelligence is the art and science of turning
       corporate data into practical, accessible, actionable
        knowledge assets, and leveraging them to make
    empirically-based strategic or operational decisions which
     increase an organization’s capacity to fulfill its mission.

                                                   To this end, BI requires:
                                A disciplined, well-governed culture
                                A specialized, analytic engine
                                A well-designed data architecture


                                                               9
9    © 2001-2010 Neudesic, LLC. All rights reserved.
Why are we here?
     Classic BI Architecture

     Our focus is the stuff in this picture and the practices and
              processes that get it there effectively.
                                                        BI Presentation Components




                                                Data         Data         Data   Data
                                                Mart         Mart         Mart   Mart

                                                             OLAP Services


        Source                                                                                   Source
        Systems                           ETL                Data Warehouse          ETL         Systems




                                                                     10
10    © 2001-2010 Neudesic, LLC. All rights reserved.
Why are we here?
     Out of Scope

     • Other random stuff
             – No matter how cool Aunt Ruth’s cat is, she’s out of scope
     • Building the tech together
     • Arguing over low-level details
     • Generally, if we talk about
             – Project management / SDLC
             – Architecture and design
             – Business processes
             – Etc
        then it will be in the context of BI / DW / EDM


                                                        11
11    © 2001-2010 Neudesic, LLC. All rights reserved.
Why are we here?
     Some Quick Feedback




                                   How does this line up with
                                     your expectations?




                                                        12
12    © 2001-2010 Neudesic, LLC. All rights reserved.
Why are we here?
     A Few Logistics

     • Grab on the way in...
             – A nametag
             – You too can have a spiffy nametag; just pre-register. 
     • Let me know you’re here
             – Toss a card in the fish bowl
             – No spam policy
             – No card? No problem. Sign the list.
     • Join our LinkedIn group
             – http://www.linkedin.com/groups?gid=1801350
             – Don’t worry, we’ll send you an invite
     • Restrooms, etc…

                                                        13
13    © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?


                                                        Today’s Agenda

           Brief Introduction
           Who’s in the room?
           Presentation:
               Organizing your Data for Analytics
           Discussion / Networking
           Coming up Next Month



                                                              14
14    © 2001-2010 Neudesic, LLC. All rights reserved.
Who’s in the room?
     Brief Introductions

     Please share with the group…

                                              • Name
                                              • Company
                                              • Role
                                              • What you want to get
                                                out of this session?




                                                         15
15    © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?


                                                        Today’s Agenda

           Brief Introduction
           Who’s in the room?
           Presentation:
               Organizing your Data for Analytics
           Discussion / Networking
           Coming up Next Month



                                                              16
16    © 2001-2010 Neudesic, LLC. All rights reserved.
Where are you?

      When you talk about dimensional modeling, I …



       1                                     2                 3           4          5
Think you’re talking                                      Know enough           Could model
 about Star Trek                                         to be dangerous       Aunt Ruth’s cat




 17    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     What is Dimensional Modeling?

     Dimensional modeling is the art and science of modeling
     data for the purposes of fast, efficient and intuitive
     retrieval (typically from a data warehouse) for use in
     online analytic processing.




18    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     What is Dimensional Modeling?

     Dimensional modeling is the art and science of modeling
     data for the purposes of fast, efficient and intuitive
     retrieval (typically from a data warehouse) for use in
     online analytic processing.




19    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     What is Dimensional Modeling?

     Dimensional modeling is the art and science of modeling
     data for the purposes of fast, efficient and intuitive
     retrieval (typically from a data warehouse) for use in
     online analytic processing.

     • Completely different data modeling approach
             – Than most of us are used to
     • Two strategic goals:
             – Fast, efficient data retrieval
             – Intuitive interface to the data




20    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Why a different model?




                                    Different Goals
                              Storage of Historic Records
                             Predictability of Requirements




21    © 2001-2010 Neudesic, LLC. All rights reserved.
… A Different Model
                                                         Why a different model?
     Different Goals

     • Transactional systems
             – An effective interface between a business process and a user
             – Effective execution of a single business transaction
     • OLAP systems
             – An effective interface between a corporate decision-maker and
               analytic analysis data
             – Effective analysis of a set of business transactions


     • Note the absence of “efficient storage” goals. Why?




22    © 2001-2010 Neudesic, LLC. All rights reserved.
… A Different Model
                                                            Why a different model?
     Storage of Historic Records

     • Transactional systems
             – No need to know history
             – Optimized for the current transaction
     • OLAP systems
             – Business should be able to arbitrarily define the longevity of
               data
             – Optimized for consistent historic and predictive analysis


     • Why no history in operational systems?




23    © 2001-2010 Neudesic, LLC. All rights reserved.
… A Different Model
                                                            Why a different model?
     Predictability of Requirements

     • Transactional systems
             – Very predictable usage requirements
             – Every interaction follows the same transactional process
     • OLAP systems
             – Very unpredictable usage requirements
             – Ad-hoc / business-configured queries
             – Every interaction potentially follows a completely different
               pattern than the previous interaction


     • Why are OLAP queries so unpredictable?



24    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     How Data is Modeled

     • The dimensional model stores data in “star schemas”
     • Two core elements: “facts” and “dimensions”
     • Facts
             – Core data of a business event
             – The “verb” in the sentence describing the event
             – Also called a “measure”
     • Dimensions
             – Context in which the event (measurement) occurred
             – The “nouns” in the sentence




25    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Two Kinds of “Facts”

     • Measuring a business event
             –    A customer ordered a widget
             –    A new book was published
             –    A relationship was established
             –    A lead was converted


     • Taking a snapshot of reality
             – Inventory looks like this at this time
             – Membership looks like this on this date
             – Current workflow is at this stage at this time




26    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Seeing the Model in the Data

     • An example of a business event
             – “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on
               Tuesday at 3:28PM”
     • Implies a dimensional model with…



                                                        You tell me

         Huddle up, and list the facts and
            dimensions in this event
27    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Seeing the Model in the Data

               “Sally purchased milk and eggs from Clerk 12 at
                       Wal-Mart on Tuesday at 3:28PM”

     • Implies a dimensional model with
             – One fact
                       › “customer purchased items”
                       › Two lines written to fact table; one for each item purchased
             – Several dimensions
                       ›    Customer “Sally”
                       ›    Inventory items “milk” and “eggs” with specific SKUs
                       ›    A particular “Wal-Mart” store with a specific identifier
                       ›    A particular clerk, identified as “Clerk 12”
                       ›    Date “Tuesday”
                       ›    Time “3:28PM”

28    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     How to Use the Model

               “Sally purchased milk and eggs from Clerk 12 at
                       Wal-Mart on Tuesday at 3:28PM”

     • “Pivot” the context on the measurement taken
             – Offers various perspectives on the data
     • Aggregate many measures to achieve analytic report

     • If aggregated…
             – Hundreds, thousands, millions of times
     • What questions could you ask these data?



29    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     How to Use the Model

               “Sally purchased milk and eggs from Clerk 12 at
                       Wal-Mart on Tuesday at 3:28PM”

     • A few questions I thought of…
             – In what regions of the country do we sell the most dairy
               products in the first quarter?
             – Which three clerks sold the most impulse items in each Super
               Wal-Mart in the mid-west this year?
             – What is the correlation between the sale of milk and eggs in
               summer vs. winter months?
             – Who are our most loyal customers?
             – At what time of day do we typically not sell any dairy products?
             – Does staying open later on the weekends result in more dairy
               product sales?
30    © 2001-2010 Neudesic, LLC. All rights reserved.
An Example
     How to Build the Model

               “Sally purchased milk and eggs from Clerk 12 at
                       Wal-Mart on Tuesday at 3:28PM”
          Dimensions                                                              Dimensions

               Customers                                    Fact                     Clerks




                                                          Customer
                    Items                                                            Dates
                                                        Purchased Item




                    Store                                                            Times



                                          See why they call it a “star schema”?
31    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     What’s an “Analytic Cube”?




                                                        Product
                                                                  Store

                                         Purchase                  Date



     • Purchase measure is the pivot point
     • Joins 2 or more dimensions (context)




32    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     What’s an “Analytic Cube”?




                                                        Product
                                                                  Store

                                         Purchase                  Date



     • Extrapolate to a cube
             – Several measure sharing a set of dimensions
     • Pivot cube on any point to get different analytic views of the data
     • Really N-dimensional, but we mere mortals can’t visualize that
             – So it’s a cube


33    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Selecting Appropriate Grain for Facts

     • The “grain” of a fact table is the most granular level of
       information that can be retrieved from the table.
     • Shoot for “Atomic” grain facts
             – Irreducibly complex; cannot be subdivided
             – Dimensionally unconstrained
                       › Rolls up in any and all possible ways
                       › BI requires cutting through details in precise ways
             – Required for drilling into reports
                       › One of the core strengths of BI
             – Required for ad-hoc querying
             – Can always create other fact tables or business views with
               aggregations


34    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Kimball’s Dimensional Design Process

     • Step 1: Select business process to model
             – Natural business activity performed
             – Not a department or business function
     • Step 2: Declare grain of the business process
             – Level of detail associated with fact measurement
             – Define exactly what a fact table row represents
             – Atomic data is typically best
     • Step 3: Choose dimensions applying to each fact table row
             – Context in which we’re taking measurements
             – Answer: “How do businesspeople describe the data that results
               from the business process?”
             – List dimensions, then all attributes per dimension



35    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Kimball’s Dimensional Design Process

     • Step 4: Identify numeric measure to populate fact tables
             –    Numeric fact info which will populate the rows of the fact table
             –    Answer: “What are we measuring?”
             –    Measure only in the determined grain
             –    Different grain requires different fact table




36    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Dimensional Conformity

     • The power of the enterprise data warehouse is making
       a “single source of the truth” available to the business
             – Only possible with conformed dimensions
             – Kimball’s “enterprise bus” model favors this
     • Dimensions are nouns
             – “Product”, “Customer”, “Store”, “Person”, etc
             – If more than one definition of a noun, sentences start to have
               conflicting meanings
     • Only one definition of a dimension means it’s
       “conformed”




37    © 2001-2010 Neudesic, LLC. All rights reserved.
… Dimensional Conformity
                                                              Why a different model?
     Beautiful if you have it…

     • Cross-functional view of data
     • Whole organization working in concert
             –    Trend analysis
             –    Predictive analysis
             –    Drilling down into the true root cause of problems
             –    Accurate and complete financial pictures




38    © 2001-2010 Neudesic, LLC. All rights reserved.
… Dimensional Conformity
                                                           Why a different model?
     Anarchy if you don’t…

     • Missed opportunities from silo’d data
     • Nearly redundant departmental databases
             –    Nearly redundant data development
             –    Nearly redundant administration
             –    Nearly redundant storage
             –    Nearly redundant system development
             –    A lot of wasted time, energy and money
     • Even more waste comes from trying to reconcile slightly
       different versions of the truth




39    © 2001-2010 Neudesic, LLC. All rights reserved.
… Dimensional Conformity
                                                           Why a different model?
     But you can Restore Order

     • Three requirements
             1. Political clout
             2. Financial means
             3. Willingness / ability to challenge the status quo


     • Pick a silo where you can drive a stake in the ground
             – I call it “bedrock data”
     • Expand out from there
             – Analyze and graft other silos onto the bedrock
             – DO NOT start ANY initiative that creates a new center of data




40    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Other (Advanced?) Topics

     •     Snowflakes
     •     Slowly Changing Dimensions
     •     Denormalized Dimensions
     •     Factless Fact Tables
     •     Degenerate Dimensions
     •     Master Data Management
     •     Much more

                                                       Interested in a follow-up?



41       © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?


                                                        Today’s Agenda

           Brief Introduction
           Who’s in the room?
           Presentation:
               Organizing your Data for Analytics
           Discussion / Networking
           Coming up Next Month



                                                              42
42    © 2001-2010 Neudesic, LLC. All rights reserved.
Discussion Time




43    © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?


                                                        Today’s Agenda

           Brief Introduction
           Who’s in the room?
           Presentation:
               Organizing your Data for Analytics
           Discussion / Networking
           Coming up Next Month



                                                              44
44    © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
     Coming Up…

     • March 9, 2010; 8-10 AM at the ITA
             – Topic: What Thomas Edison would do with
               your data
             – Speaker: Sarah Miller Caldicott
                       › Great grandniece of Thomas Edison
                       › Co-author: “Innovate Like Edison”
                       › Founder: The Power Patterns of Innovation
     • April 13, 2010; 8-10:30AM at the ITA
             – Topic: Grudge Match II – Another Smackdown
             – Proposed featured BI product vendors:
                       › Microsoft
                       › Oracle
                       › Info Bright
                       › MicroStrategy
45    © 2001-2010 Neudesic, LLC. All rights reserved.
Effective Data Modeling

More Related Content

What's hot

The Essential Toolkit for Your: EDRM Renovation Australia 2017
The Essential Toolkit for Your: EDRM Renovation Australia 2017The Essential Toolkit for Your: EDRM Renovation Australia 2017
The Essential Toolkit for Your: EDRM Renovation Australia 2017
Steven Oest
 
NBS8053 Introduction 2012
NBS8053 Introduction 2012NBS8053 Introduction 2012
NBS8053 Introduction 2012
Lee Schlenker
 
For netapp haifa 2012 v3
For netapp haifa 2012 v3For netapp haifa 2012 v3
For netapp haifa 2012 v3
Pini Cohen
 
Greater China Awards 2013 Report - English
Greater China Awards 2013 Report - EnglishGreater China Awards 2013 Report - English
Greater China Awards 2013 Report - English
Bob Sharon
 

What's hot (19)

Emm introduction
Emm introductionEmm introduction
Emm introduction
 
The Essential Toolkit for Your: EDRM Renovation Australia 2017
The Essential Toolkit for Your: EDRM Renovation Australia 2017The Essential Toolkit for Your: EDRM Renovation Australia 2017
The Essential Toolkit for Your: EDRM Renovation Australia 2017
 
Promise notes
Promise notesPromise notes
Promise notes
 
Tdwi event summary
Tdwi event summaryTdwi event summary
Tdwi event summary
 
Architectural considerations
Architectural considerationsArchitectural considerations
Architectural considerations
 
NBS8053 Introduction 2012
NBS8053 Introduction 2012NBS8053 Introduction 2012
NBS8053 Introduction 2012
 
NISO Webinar: October Two-Part Webinar: Managing Data for Scholarly Communica...
NISO Webinar: October Two-Part Webinar: Managing Data for Scholarly Communica...NISO Webinar: October Two-Part Webinar: Managing Data for Scholarly Communica...
NISO Webinar: October Two-Part Webinar: Managing Data for Scholarly Communica...
 
Iasa North Welcome
Iasa North WelcomeIasa North Welcome
Iasa North Welcome
 
One2OneResearch Presentation
One2OneResearch PresentationOne2OneResearch Presentation
One2OneResearch Presentation
 
For netapp haifa 2012 v3
For netapp haifa 2012 v3For netapp haifa 2012 v3
For netapp haifa 2012 v3
 
Neuron Intellectual Property Management Presentation - October 2011
Neuron Intellectual Property Management Presentation - October 2011Neuron Intellectual Property Management Presentation - October 2011
Neuron Intellectual Property Management Presentation - October 2011
 
"How to create usless software... and distribute it" (Alto university lecture...
"How to create usless software... and distribute it" (Alto university lecture..."How to create usless software... and distribute it" (Alto university lecture...
"How to create usless software... and distribute it" (Alto university lecture...
 
Greater China Awards 2013 Report - English
Greater China Awards 2013 Report - EnglishGreater China Awards 2013 Report - English
Greater China Awards 2013 Report - English
 
My Discoveries
My DiscoveriesMy Discoveries
My Discoveries
 
Oracle Bi Foundation
Oracle Bi FoundationOracle Bi Foundation
Oracle Bi Foundation
 
Osbi Sesame?
Osbi Sesame?Osbi Sesame?
Osbi Sesame?
 
EA @ UCLan
EA @ UCLanEA @ UCLan
EA @ UCLan
 
True Drivers of MDM webinar
True Drivers of MDM webinarTrue Drivers of MDM webinar
True Drivers of MDM webinar
 
Constellation's Sneak Peak Into Social Business Trends
Constellation's Sneak Peak Into Social Business TrendsConstellation's Sneak Peak Into Social Business Trends
Constellation's Sneak Peak Into Social Business Trends
 

Similar to Effective Data Modeling

phiLogica Quick Profile
phiLogica Quick ProfilephiLogica Quick Profile
phiLogica Quick Profile
ozgurkuru
 
Module 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalModule 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience Final
Vivastream
 
New ways of working & knowledge sharing - Dirk W. Bijl
New ways of working & knowledge sharing - Dirk W. BijlNew ways of working & knowledge sharing - Dirk W. Bijl
New ways of working & knowledge sharing - Dirk W. Bijl
Vlerick Business School
 
10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academia10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academia
Jon Hedlund
 
20101116 deckers
20101116 deckers20101116 deckers
20101116 deckers
CIONET
 

Similar to Effective Data Modeling (20)

Effective Portal Governance
Effective Portal GovernanceEffective Portal Governance
Effective Portal Governance
 
phiLogica Quick Profile
phiLogica Quick ProfilephiLogica Quick Profile
phiLogica Quick Profile
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
Module 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalModule 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience Final
 
Chetan Karkhanis Profile
Chetan Karkhanis ProfileChetan Karkhanis Profile
Chetan Karkhanis Profile
 
SharePoint Conference Recap - BI
SharePoint Conference Recap - BISharePoint Conference Recap - BI
SharePoint Conference Recap - BI
 
ITCube Staffing Solutions
ITCube Staffing SolutionsITCube Staffing Solutions
ITCube Staffing Solutions
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
 
New ways of working & knowledge sharing - Dirk W. Bijl
New ways of working & knowledge sharing - Dirk W. BijlNew ways of working & knowledge sharing - Dirk W. Bijl
New ways of working & knowledge sharing - Dirk W. Bijl
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academia10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academia
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
InSource 2017 IIoT Roadshow: Evolution or Revolution
InSource 2017 IIoT Roadshow: Evolution or RevolutionInSource 2017 IIoT Roadshow: Evolution or Revolution
InSource 2017 IIoT Roadshow: Evolution or Revolution
 
Sentri SharePoint Performance webinar
Sentri SharePoint Performance webinarSentri SharePoint Performance webinar
Sentri SharePoint Performance webinar
 
20101116 deckers
20101116 deckers20101116 deckers
20101116 deckers
 
Tdwi march 2015 presentation
Tdwi march 2015 presentationTdwi march 2015 presentation
Tdwi march 2015 presentation
 
Actuarial Analytics in R
Actuarial Analytics in RActuarial Analytics in R
Actuarial Analytics in R
 
MS Business Intelligence with SQL Server 2005
MS Business Intelligence with SQL Server 2005MS Business Intelligence with SQL Server 2005
MS Business Intelligence with SQL Server 2005
 
Innovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle RInnovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle R
 
Self-Service BI Trends
Self-Service BI TrendsSelf-Service BI Trends
Self-Service BI Trends
 

Effective Data Modeling

  • 1. ITA BI Roundtable Dimensional Modeling: Organizing your Data for Analytics Jeff Block, Managing Consultant Jeff.Block@neudesic.com (847) 924-1317 1 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 2. Welcome to the ITA Business Intelligence Roundtable 2 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 3. Who am I? Jeff Block, Neudesic BI Roundtable Chairman 3 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 4. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 4 4 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 5. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 5 5 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 6. Why are we here? What kind of session is this? • 2nd Tuesday of every month; 8-10 AM – Here at the ITA TechNexus unless there’s a good reason to change venues • Sometimes a presentation – My ideas, your ideas, case studies, best practices, panel discussions, new developments, etc • Sometimes an outside speaker – Love to have some of you step up to the plate • Always discussion – Collaboration is the whole point of this group • Always networking – Meet people who will be valuable connections 6 6 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 7. Why are we here? Topics and Target Audience • Business and technology leaders – Not going to spend much time deep in the technical weeds • Those who want to – Learn from each other – Collaborate on solutions – Network in the BI space • ITA members and their friends and their friends and … 7 7 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 8. Why are we here? In Scope • Business Intelligence – Vision and strategy • Planning and implementing BI initiatives – High-level architecture – Best practices / Anti-patterns – Case studies – Etc • What about data warehousing? – It’s in! (part of BI, in my world) 8 8 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 9. Introduction What is Business Intelligence? Business Intelligence is the art and science of turning corporate data into practical, accessible, actionable knowledge assets, and leveraging them to make empirically-based strategic or operational decisions which increase an organization’s capacity to fulfill its mission. To this end, BI requires:  A disciplined, well-governed culture  A specialized, analytic engine  A well-designed data architecture 9 9 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 10. Why are we here? Classic BI Architecture Our focus is the stuff in this picture and the practices and processes that get it there effectively. BI Presentation Components Data Data Data Data Mart Mart Mart Mart OLAP Services Source Source Systems ETL Data Warehouse ETL Systems 10 10 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 11. Why are we here? Out of Scope • Other random stuff – No matter how cool Aunt Ruth’s cat is, she’s out of scope • Building the tech together • Arguing over low-level details • Generally, if we talk about – Project management / SDLC – Architecture and design – Business processes – Etc then it will be in the context of BI / DW / EDM 11 11 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 12. Why are we here? Some Quick Feedback How does this line up with your expectations? 12 12 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 13. Why are we here? A Few Logistics • Grab on the way in... – A nametag – You too can have a spiffy nametag; just pre-register.  • Let me know you’re here – Toss a card in the fish bowl – No spam policy – No card? No problem. Sign the list. • Join our LinkedIn group – http://www.linkedin.com/groups?gid=1801350 – Don’t worry, we’ll send you an invite • Restrooms, etc… 13 13 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 14. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 14 14 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 15. Who’s in the room? Brief Introductions Please share with the group… • Name • Company • Role • What you want to get out of this session? 15 15 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 16. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 16 16 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 17. Where are you? When you talk about dimensional modeling, I … 1 2 3 4 5 Think you’re talking Know enough Could model about Star Trek to be dangerous Aunt Ruth’s cat 17 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 18. Why a different model? What is Dimensional Modeling? Dimensional modeling is the art and science of modeling data for the purposes of fast, efficient and intuitive retrieval (typically from a data warehouse) for use in online analytic processing. 18 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 19. Why a different model? What is Dimensional Modeling? Dimensional modeling is the art and science of modeling data for the purposes of fast, efficient and intuitive retrieval (typically from a data warehouse) for use in online analytic processing. 19 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 20. Why a different model? What is Dimensional Modeling? Dimensional modeling is the art and science of modeling data for the purposes of fast, efficient and intuitive retrieval (typically from a data warehouse) for use in online analytic processing. • Completely different data modeling approach – Than most of us are used to • Two strategic goals: – Fast, efficient data retrieval – Intuitive interface to the data 20 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 21. Why a different model? Why a different model? Different Goals Storage of Historic Records Predictability of Requirements 21 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 22. … A Different Model Why a different model? Different Goals • Transactional systems – An effective interface between a business process and a user – Effective execution of a single business transaction • OLAP systems – An effective interface between a corporate decision-maker and analytic analysis data – Effective analysis of a set of business transactions • Note the absence of “efficient storage” goals. Why? 22 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 23. … A Different Model Why a different model? Storage of Historic Records • Transactional systems – No need to know history – Optimized for the current transaction • OLAP systems – Business should be able to arbitrarily define the longevity of data – Optimized for consistent historic and predictive analysis • Why no history in operational systems? 23 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 24. … A Different Model Why a different model? Predictability of Requirements • Transactional systems – Very predictable usage requirements – Every interaction follows the same transactional process • OLAP systems – Very unpredictable usage requirements – Ad-hoc / business-configured queries – Every interaction potentially follows a completely different pattern than the previous interaction • Why are OLAP queries so unpredictable? 24 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 25. Why a different model? How Data is Modeled • The dimensional model stores data in “star schemas” • Two core elements: “facts” and “dimensions” • Facts – Core data of a business event – The “verb” in the sentence describing the event – Also called a “measure” • Dimensions – Context in which the event (measurement) occurred – The “nouns” in the sentence 25 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 26. Why a different model? Two Kinds of “Facts” • Measuring a business event – A customer ordered a widget – A new book was published – A relationship was established – A lead was converted • Taking a snapshot of reality – Inventory looks like this at this time – Membership looks like this on this date – Current workflow is at this stage at this time 26 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 27. Why a different model? Seeing the Model in the Data • An example of a business event – “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • Implies a dimensional model with… You tell me Huddle up, and list the facts and dimensions in this event 27 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 28. Why a different model? Seeing the Model in the Data “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • Implies a dimensional model with – One fact › “customer purchased items” › Two lines written to fact table; one for each item purchased – Several dimensions › Customer “Sally” › Inventory items “milk” and “eggs” with specific SKUs › A particular “Wal-Mart” store with a specific identifier › A particular clerk, identified as “Clerk 12” › Date “Tuesday” › Time “3:28PM” 28 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 29. Why a different model? How to Use the Model “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • “Pivot” the context on the measurement taken – Offers various perspectives on the data • Aggregate many measures to achieve analytic report • If aggregated… – Hundreds, thousands, millions of times • What questions could you ask these data? 29 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 30. Why a different model? How to Use the Model “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • A few questions I thought of… – In what regions of the country do we sell the most dairy products in the first quarter? – Which three clerks sold the most impulse items in each Super Wal-Mart in the mid-west this year? – What is the correlation between the sale of milk and eggs in summer vs. winter months? – Who are our most loyal customers? – At what time of day do we typically not sell any dairy products? – Does staying open later on the weekends result in more dairy product sales? 30 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 31. An Example How to Build the Model “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” Dimensions Dimensions Customers Fact Clerks Customer Items Dates Purchased Item Store Times See why they call it a “star schema”? 31 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 32. Why a different model? What’s an “Analytic Cube”? Product Store Purchase Date • Purchase measure is the pivot point • Joins 2 or more dimensions (context) 32 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 33. Why a different model? What’s an “Analytic Cube”? Product Store Purchase Date • Extrapolate to a cube – Several measure sharing a set of dimensions • Pivot cube on any point to get different analytic views of the data • Really N-dimensional, but we mere mortals can’t visualize that – So it’s a cube 33 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 34. Why a different model? Selecting Appropriate Grain for Facts • The “grain” of a fact table is the most granular level of information that can be retrieved from the table. • Shoot for “Atomic” grain facts – Irreducibly complex; cannot be subdivided – Dimensionally unconstrained › Rolls up in any and all possible ways › BI requires cutting through details in precise ways – Required for drilling into reports › One of the core strengths of BI – Required for ad-hoc querying – Can always create other fact tables or business views with aggregations 34 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 35. Why a different model? Kimball’s Dimensional Design Process • Step 1: Select business process to model – Natural business activity performed – Not a department or business function • Step 2: Declare grain of the business process – Level of detail associated with fact measurement – Define exactly what a fact table row represents – Atomic data is typically best • Step 3: Choose dimensions applying to each fact table row – Context in which we’re taking measurements – Answer: “How do businesspeople describe the data that results from the business process?” – List dimensions, then all attributes per dimension 35 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 36. Why a different model? Kimball’s Dimensional Design Process • Step 4: Identify numeric measure to populate fact tables – Numeric fact info which will populate the rows of the fact table – Answer: “What are we measuring?” – Measure only in the determined grain – Different grain requires different fact table 36 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 37. Why a different model? Dimensional Conformity • The power of the enterprise data warehouse is making a “single source of the truth” available to the business – Only possible with conformed dimensions – Kimball’s “enterprise bus” model favors this • Dimensions are nouns – “Product”, “Customer”, “Store”, “Person”, etc – If more than one definition of a noun, sentences start to have conflicting meanings • Only one definition of a dimension means it’s “conformed” 37 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 38. … Dimensional Conformity Why a different model? Beautiful if you have it… • Cross-functional view of data • Whole organization working in concert – Trend analysis – Predictive analysis – Drilling down into the true root cause of problems – Accurate and complete financial pictures 38 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 39. … Dimensional Conformity Why a different model? Anarchy if you don’t… • Missed opportunities from silo’d data • Nearly redundant departmental databases – Nearly redundant data development – Nearly redundant administration – Nearly redundant storage – Nearly redundant system development – A lot of wasted time, energy and money • Even more waste comes from trying to reconcile slightly different versions of the truth 39 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 40. … Dimensional Conformity Why a different model? But you can Restore Order • Three requirements 1. Political clout 2. Financial means 3. Willingness / ability to challenge the status quo • Pick a silo where you can drive a stake in the ground – I call it “bedrock data” • Expand out from there – Analyze and graft other silos onto the bedrock – DO NOT start ANY initiative that creates a new center of data 40 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 41. Why a different model? Other (Advanced?) Topics • Snowflakes • Slowly Changing Dimensions • Denormalized Dimensions • Factless Fact Tables • Degenerate Dimensions • Master Data Management • Much more Interested in a follow-up? 41 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 42. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 42 42 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 43. Discussion Time 43 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 44. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 44 44 © 2001-2010 Neudesic, LLC. All rights reserved.
  • 45. Why a different model? Coming Up… • March 9, 2010; 8-10 AM at the ITA – Topic: What Thomas Edison would do with your data – Speaker: Sarah Miller Caldicott › Great grandniece of Thomas Edison › Co-author: “Innovate Like Edison” › Founder: The Power Patterns of Innovation • April 13, 2010; 8-10:30AM at the ITA – Topic: Grudge Match II – Another Smackdown – Proposed featured BI product vendors: › Microsoft › Oracle › Info Bright › MicroStrategy 45 © 2001-2010 Neudesic, LLC. All rights reserved.