SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Data Management & Warehousing




                                                              WHITE PAPER


                         Data Warehouse
                  Documentation Roadmap
                                                      DAVID M WALKER
                                                                           Version: 1.0
                                                                      Date: 05/04/2007




                      Data Management & Warehousing

   138 Finchampstead Road, Wokingham, Berkshire, RG41 2NU, United Kingdom

                          http://www.datamgmt.com
White Paper - Data Warehouse Documentation Roadmap




       Table of Contents

Table of Contents ...................................................................................................................... 2
Synopsis .................................................................................................................................... 3
Intended Audience .................................................................................................................... 3
About Data Management & Warehousing ................................................................................. 3
Introduction................................................................................................................................ 4
Considerations .......................................................................................................................... 5
Documentation as a tool............................................................................................................ 5
Which tools and products to use ............................................................................................... 6
   What about a Wiki? ............................................................................................................... 6
Put your documentation on the Internet! ................................................................................... 7
Document Short Names ............................................................................................................ 7
Overview Diagram ..................................................................................................................... 8
The Templates .......................................................................................................................... 9
   1    Concept ........................................................................................................................ 9
   2    Requirements ............................................................................................................... 9
   3    Architecture ................................................................................................................ 11
   4    Data Models ............................................................................................................... 12
   5    Analysis ...................................................................................................................... 14
   6    Design ........................................................................................................................ 16
   7    Build............................................................................................................................ 17
   8    Test............................................................................................................................. 17
   9    Implementation ........................................................................................................... 18
   10 Project Management .................................................................................................. 21
   11 Miscellaneous............................................................................................................. 24
Summary ................................................................................................................................. 25
Appendices.............................................................................................................................. 26
   Appendix 1 – Lifecycle of a bug .......................................................................................... 26
   Appendix 2 – Project Quick Start Infrastructure .................................................................. 27
References .............................................................................................................................. 28
   Web resources .................................................................................................................... 28
Copyright ................................................................................................................................. 28




         © 2006 Data Management & Warehousing                                                                                         Page 2
White Paper - Data Warehouse Documentation Roadmap




Synopsis
All projects need documentation and many companies provide templates as part of a
methodology. This document describes the templates, tools and source documents used
by Data Management & Warehousing. It serves two purposes:

    •   For projects using other methodologies or creating their own set of documents to
        use as a checklist. This allows the project to ensure that the documentation
        covers the essential areas for describing the data warehouse.
    •   To demonstrate our approach to our clients by describing the templates and
        deliverables that are produced.

Documentation, methodologies and templates are inherently both incomplete and
flexible. Projects may wish to add, change, remove or ignore any part of any document.
Some may also believe that aspects of one document would sit better in another. If this
is the case then users of this document and these templates are encouraged to change
them to fit their needs.

Data Management & Warehousing believes that the approach or methodology for
building a data warehouse should be to use a series of guides and checklists. This
ensures that small teams of relatively skilled resources developing the system can cover
all aspects of the project whilst being free to deal with the specific issues of their
environment to deliver exceptional solutions, rather than a rigid methodology that
ensures that large teams of relatively unskilled staff can meet a minimum standard.

Intended Audience
   Reader                                             Recommended Reading
   Executive                                          Synopsis to Overview Diagram
   Business Users                                     Synopsis to Overview Diagram
   IT Management                                      Synopsis to Overview Diagram
   IT Strategy                                        Synopsis to Overview Diagram
   IT Project Management                              Entire Document
   IT Developers                                      Entire Document




About Data Management & Warehousing
Data Management & Warehousing is a specialist consultancy in data warehousing,
based in Wokingham, Berkshire in the United Kingdom. Founded in 1995 by David M
Walker, our consultants have worked for major corporations around the world including
the US, Europe, Africa and the Middle East. Our clients are invariably large organisations
with a pressing need for business intelligence. We have worked in many industry sectors
but have specialists in Telco’s, manufacturing, retail, financial and transport as well as
technical expertise in many of the leading technologies.

For further information visit our website at: http://www.datamgmt.com




 © 2006 Data Management & Warehousing                                                Page 3
White Paper - Data Warehouse Documentation Roadmap




    Introduction
    A data warehouse programme will often run for many years and produce much
    documentation. Data Management & Warehousing has identified three essential aspects
    for documentation:

        •   A roadmap that describes what documentation is required and how it fits
            together.

        •   Team members within the project to use the templates, create quality documents
            and store them to the project repositories.

        •   Easy access for people outside the project team to the documentation including
            publication or notification of changes, updates and new releases.

    This document provides the roadmap and looks at some of the issues associated with
    the distribution of information outside the project team. The processes and procedures
    required to create and store the in formation in the first place are a matter of project
                  1
    governance.

    The documents listed are the templates used by Data Management & Warehousing and
    we believe that they cover all the areas necessary for a major programme of work.
    Templates, however, are created to fulfil a need and should be adapted as required. By
    combining this document, the project plan and suitable governance a project will have
    developed a strong foundation developing a successful data warehouse.




1
 Data Management & Warehousing have published a white paper on Data Warehouse
Governance which is available from the website at
http://www.datamgmt.com/index.php?module=article&view=78


     © 2006 Data Management & Warehousing                                             Page 4
White Paper - Data Warehouse Documentation Roadmap




    Considerations
    This document assumes that a data warehouse is a long-term investment by an
    organisation and as such will form a programme of work. This programme will be broken
    down into projects and where appropriate a project will have subsidiary phases.

    The document also assumes that the project will maintain tight change control. Each
    document should have:

        •   A consistent naming conventions.

        •   A Version Number and Date.

        •   A draft, review, publish process that will allow a document version to be signed
            off.

        •   A process that over time allows a document to have many signed off versions.

        •   A configuration management tool that enforces good practice.

    Programmes that do not achieve this will find that the documentation becomes both
    contradictory and a burden in itself and this can become a risk factor in the success of
    the overall programme.

    Documentation as a tool
                                                          2
    Every project acknowledges the need to document itself. However, this ranges from lip
    service and the production of some minimal notes to volumes of shelf-ware, paper that
    sits unread for years on end because no one dares throw it away. Neither of these
    outcomes is of any value.

    Here are some guidelines for when and how to produce documentation:

        •   Documents should only be produced when they serve a purpose.

        •   Documents should only be maintained whilst the information needs to be current.

        •   Documents should only be retained whilst they have value.

        •   Documents should refer to other documents rather than duplicate information.

        •   Documents should be under change/version control.

        •   Documents should be succinct.
                                                                                  3
        •   Poor grammar and bad writing are often signs of poor comprehension.

        •   Good documentation takes time.



2
  See also Agile Documentation: A Pattern Guide to Producing Lightweight Documents for
Software Projects (Wiley Software Patterns Series) by Andreas Rueping
3
  From Redhat Magazine: How to write really good documentation: Four Rules and an Axiom.
http://www.redhatmagazine.com/2007/01/30/how-to-write-really-good-documentationfour-
rules-and-an-axiom/


     © 2006 Data Management & Warehousing                                             Page 5
White Paper - Data Warehouse Documentation Roadmap



        •    Great expertise in a subject is not automatically a prerequisite for creation of
             good documentation.

        •    Do not let working cultures that put too great a premium on knowing everything
             dominate.

    Therefore pull from this roadmap what you need, do not produce everything just because
    it is there.



    Which tools and products to use
    The choice of tools and products to use for a given project is based largely on the
    standards of the organisation; most are standard office productivity tools. The table
    below lists some of the tools used in our example templates:


                   Type of Document                  Example Template:
                                       4
                   Code Repository                   CVS
                                         5
                   Data Cleaning tools
                                 6
                   Data Models
                                   7
                   Data Profiling
                   Diagram                           Microsoft Visio
                   Document                          Microsoft Word
                   Document Distribution             Adobe Acrobat
                             8
                   Issue Log                         Bugzilla
                   Presentation                      Microsoft PowerPoint
                                9
                   Project Plan                      Microsoft Project
                                     10
                   System Testing



       What about a Wiki?
                  A Wiki is a web application that allows users to add and edit content in a
                  collaborative fashion. This is an ideal alternative for many of the
                  documents that are used in a data warehousing project and if the
                  organisation supports the use of a Wiki then it is often preferable to
       create these templates in the Wiki rather than as separate documents and allow
       widespread collaborative access to them. Throughout this document the ‘Works With
       Wikis’ logo has been added to those documents that Data Management &
       Warehousing believe work well on a Wiki.


4
  A list of configuration management tools can be found at:
http://www.cmcrossroads.com/cgi-bin/cmwiki/view/CM/WebHome
5
  A list of data quality tools can be found at:
http://mediaproducts.gartner.com/reprints/dataflux/137738.html
6
  A fist of data modelling tools can be found at:
http://www.databaseanswers.com/modelling_tools.htm
7
  A list of data quality tools can be found at:
http://mediaproducts.gartner.com/reprints/dataflux/137738.html (March 2006)
8
  A list of issue tracking tools can be found at:
http://www.testingfaqs.org/
9
  A list of project management tools can be found at:
http://www.startwright.com/project1.htm
10
   A list of testing tools can be found at:
http://www.testingfaqs.org/


      © 2006 Data Management & Warehousing                                                  Page 6
White Paper - Data Warehouse Documentation Roadmap




     Put your documentation on the Internet!
     The screams at this recommendation could be heard from the second it was first written
     but Data Management & Warehousing strongly recommend that you put as much, if not
     all, of your documentation on the web. Most organisations building a large data
     warehouse will have individuals on vendor or remote sites or support people who have to
     respond out of hours and may not have everything immediately available. To this end
                              11
     providing remote secure access improves responsiveness for all involved and ensures
     collective ownership of this information. If the web is not an option for your organisation
     then at least consider publishing it on the corporate intranet.

     An example solution would include a Wiki for dynamic documentation, a file repository
     for distributed documents in Adobe Acrobat format, Bugzilla for issue tracking and
     CVSWeb to allow users to view (but not edit) the code held in CVS. All of this software is
                                                                                       12
     free and can be hosted on a single secure web server running Windows or Linux.



     Document Short Names
     Some of the document titles have a three-letter acronym (e.g. KDD besides Key Design
     Decisions or SSA besides Source Systems Analysis). This is because these documents
     are often numbered and a short code allows them to be easily identified.




11
   Whilst we recommend putting it on the internet, access should, as with any web application,
be controlled and secure.
12
   See Appendix 2 – Project Quick Start Infrastructure for a reference configuration


      © 2006 Data Management & Warehousing                                                Page 7
White Paper - Data Warehouse Documentation Roadmap




Overview Diagram




Figure 1 - Overview Documentation Roadmap Diagram




 © 2006 Data Management & Warehousing                                   Page 8
White Paper - Data Warehouse Documentation Roadmap




     The Templates
     The templates are divided into eleven categories. Within each category the documents
     are numbered sequentially. Some templates depend on others (indicated by the arrow
     on the diagram whilst others can be done at any time within the phase. Finally, category
     10 documents exist across the project lifecycle, whilst category 11 templates are just
     general ones that can be used as required.

          1     Concept
          The concept phase is about describing what the big idea is. The business may
          have a concept and the IT team will be able to describe the major component and
          concepts of a data warehouse.

           1.1 Business Concepts for the Data Warehouse

           In order to start the data warehouse project a document that describes the
           conceptual model of the information required by the business. This document
           describes subject areas and their broad relationships as well as key performance
           indicators used by the business. This document is a useful introduction but once
           read it is unlikely to be an ongoing reference source.

           1.2 Overview Architecture for Enterprise Data Warehouses

           The Overview Architecture for Enterprise Data Warehouses is a design pattern for
           data warehousing to describe the basic concepts of the data warehouse. As such,
           a project can just download the completed document from the Data Management
                                    13
           & Warehousing website and further terms used in this document relate to
           components described in that document.


          2     Requirements
          The requirements gathering phase of any data warehouse is one of the most
          difficult. The objective of these templates is to give breadth and depth to the
          requirements. Breadth is the ability to ensure that all truly required information
          would be covered, whilst depth is the amount of detail that is specified in the
          requirements to ensure that the developers have sufficient, unambiguous, detail
          with which to develop.

          Requirements should have a programme-long life cycle. After the initial version of
          the requirements is developed a project can start the build, however the business
          moves on and therefore whilst the build phase is occurring it is important that new
          versions of the requirements are also being developed. A project within a
          programme of work should have a fixed version of the requirements; however each
          project may work with a different version of the requirements.




13
   This document is available at:
http://www.datamgmt.com/index.php?module=article&view=76


      © 2006 Data Management & Warehousing                                             Page 9
White Paper - Data Warehouse Documentation Roadmap




     2.1 Data Warehouse Business Requirements (WBR)

     The first template that Data Management & Warehousing use is called
     the Data Warehouse Business Requirements and it details the ‘soft’ requirements
     for business information according to a number of subject areas of interest to the
     business.

     A business requirement is something of the form: ‘Provide the average and total
     revenue for each product category by customer market segment for the last three
     years’ It is a requirement that is specified in business language and without
     regard for the practicalities of delivering it. These requirements should be used to
     get business users to underwrite the business benefit, i.e. if I could answer all of
     these questions then I would be able to increase margin by a given percentage for
     a given product.


     2.2 Data Warehouse Data Requirements (WDR)

     The second document details the ‘hard’ requirements for business information
     from the data perspective. This document goes a step deeper into understanding
     the requirements, but is still written from the business users’ perspective.

     This is the refinement of the business requirements in that the analysts can use
     the business requirement to drive out the data required to answer the questions.
     In the example above it is clear that both some part of the product hierarchy and
     of the customer hierarchy are required as well as a time dimension and
     information about revenue. It has also told the analyst that the minimum retention
     period is three years.

     Consequently, the analyst would start to build data requirements that may fulfil
     many business requirements and to add additional attributes to help make sense
     of the data. The data requirements lifecycle is similar to that of the business
     requirements i.e. fixed for a project and variable over the lifespan of a
     programme.


     2.3 Data Warehouse Query Requirements (WQR)

     The third document lists a number of potential queries to which the solution
     should be able to provide answers. This is not an exhaustive list, but rather
     represents the types of queries that are being asked by the business.

     It is used to test the relationship between the data requirements, the data model
     and the business requirements. A set of queries should be able to provide the
     data required to answer a business requirement. At the same time the data must
     be available as described in the data requirements and joined in such a way as to
     be usable in the data model.




© 2006 Data Management & Warehousing                                              Page 10
White Paper - Data Warehouse Documentation Roadmap




           2.4 Data Warehouse Technical Requirements (WTR)

           The fourth document details the functional and non-functional requirements that
           are expected of the solution. Again, these requirements are stated from the
           business perspective rather than the technical perspective. The document should
           include topics such:

                •   The functionality required of the query tools.

                •   The general retention requirements for data.

                •   The performance characteristics.

                •   The systems availability expectations.

           2.5 Data Warehouse Interface Requirements (WIR)

           The fifth and final requirements document details the requirements for interfaces
           that feed from the data warehouse out to other systems. Often a data warehouse
           will be required to deliver information to downstream systems that have existing
           data interface specifications, these requirements have to be gathered to ensure
           that as well as the user demands (derived from the Query Requirements) the
           interface demands are also met.

           2.6 Business Definitions Dictionary (BDD)

           Throughout all of this process a number of business terms will be used. It is
           important that a common dictionary is developed and kept so that there is a
           common reference for words. This does not mean that there must be one
           definition for each term but there should be a definition for how a term is used
           within a given context. For example, to a customer support division a customer
           might be those who have an active support contract, whilst to the sales team it
           may be anyone who has ever bought a product. Both definitions are right within
           their context.

          3     Architecture
          The architecture category contains a number of documents that describe how the
          system should be built, these provide a blueprint to developers on how to approach
          any particular problem by helping them select the appropriate tools, platforms and
          configurations to both meet their need and conform to the overall strategy.

           3.1 Technical Architecture

           The technical architecture describes the technical components that will be used to
           build the system. This will include the hardware, software and network
           configuration, along with specific versions where appropriate and standards as to
                                                                14
           which software product should be used for which job.




14
  As vendors broaden out their products they start to overlap in functionality. Ensuring that
two different products are not used to build the solution should be covered in this document


      © 2006 Data Management & Warehousing                                            Page 11
White Paper - Data Warehouse Documentation Roadmap




            3.2 Security Model

            The security model should describe all the required roles/groups etc that will be
            required for each component of the system. It needs to first set out the general
            policies and then list explicit permissions for each component on the system.

            3.3 Resilience Plan
                                 15
            The resilience plan should describe how the system is made resilient; this
            should include (as required) the need for redundant hardware and networks,
            incremental, cumulative and full backups, restores of individual components or
            entire systems, how to back out records individually, as a group or entire sets and
            how disasters such the loss of a data centre etc are managed.

            3.4 Data Quality Plan (DQP)

            The data quality plan should describe how data quality is managed. This will
            include the principles of where data is cleansed (in the source, in the staging, in
            the data warehouse itself, etc.) how it is profiled, what type of cleansing is carried
                                              16
            out (e.g. rule based or heuristic ), how it is profiled, what metrics are set and
            monitored for data improvement, etc.


          4     Data Models
          Data models are (normally) graphical representations of the data that is required.
          These are normally created in special software that can also generate the DDL
          required to create the physical objects in the database.

            4.1 Data Modelling Standards

            This document describes the naming conventions of objects in the database, as
            well as any particular modelling methods (e.g., a hierarchy must always be
            modelled in a specific way and any exceptions noted along with a justification for
            the difference). This document should describe the standards for all three data
            models described modelling techniques for Logical and Physical models.

            4.2 Logical Model

            The logical data model is a model that represents the true structure of data used
            by the business, independent of software or hardware implementation constraints.
            Normally the model is closely related to the information described in Data
            Warehouse Business Requirements.




15
   This is sometimes called a Disaster Recovery Plan; however, as a name this does not
cover the full range of activities that are required.
16
   Heuristic data cleansing uses methods such as fuzzy logic, etc. to try to clean data. This is
most successful with data such as addresses where there is the opportunity for lots of human
error in the information


      © 2006 Data Management & Warehousing                                                 Page 12
White Paper - Data Warehouse Documentation Roadmap




            4.3 Repository Data Model
                                                                     17
            The repository data model is a physical data model of the main storage area
            within a data warehouse. This model will reflect the logical data model in overall
            structure but will have a number of compromises for the practical delivery of the
            solution. It normally closely reflects the information described in the Data
            Warehouse Data Requirements.

            4.4 Data Mart Data Model(s)

            The data mart data models are the physical models of the part of the system that
                                                                                      18
            the user will query. These are often, though not necessarily, star schemas and
            closely reflect the Data Warehouse Query Requirements.

          It can be seen from the definitions above that the data models are derived from the
          requirements and that they combination of the six documents act together to ensure
          completeness. This is highlighted in the diagram below:




          Figure 2 - The relationship between requirements and data models




17
   A physical data model is a representation of a data design which takes into account the
facilities and constraints of a given database management system. A complete physical data
model will include all the database artefacts required to create relationships between tables or
achieve performance goals.
Wikipedia: http://en.wikipedia.org/wiki/Physical_data_model
18
  A relational database schema that is used to represent multidimensional data. The data is
stored in a central fact table, with one or more tables holding information on each dimension.
Dimensions have levels and all levels are usually shown as columns in each dimension table.
OLAP Report: http://www.olapreport.com/glossary.htm


      © 2006 Data Management & Warehousing                                               Page 13
White Paper - Data Warehouse Documentation Roadmap




    5     Analysis
    The goal of the analysis phase is to identify the sources of the information required
    to populate the physical data models. The main goal should be to populate the
    repository data model as this is used as the source for all data in the data marts.
    This is achieved in a number of steps:

     5.1 Source Systems Analysis (SSA)

     The source system analysis is a high-level analysis that gathers information about
     available systems. Each system is a potential candidate as a source system and
     generates its own analysis document. Some systems may be documented and
     then rejected e.g. because it is a secondary source and only contains information
     created in another system that can be used as the source. The document covers
     hardware, software, network connectivity, availability and functional areas (e.g.
     CRM system containing customer data etc.).


     5.2 Data Profiling

     Data profiling is a process whereby an existing source system is examined in
     order to collect information and statistics about that data held. This allows sources
     for the data warehouse to be identified, validation of the metadata held about the
     system and an assessment of the data quality.

     There are many commercial tools available for this process; however, a simple
     set of SQL scripts will often prove adequate. The SQL scripts allow an
     experienced DBA with knowledge of the system to quickly write and iteratively
     explore the data. Using a tool formalises the process but often performs
     unnecessary analysis and requires significant additional infrastructure and tool
     specific knowledge, slowing the process down. Results from any profiling should
     be kept for future comparative analysis.

     5.3 Source Entity Analysis (SEA)

     The source entity analysis is the detailed documentation of the sources selected
     because data profiling has validated these sources as being useful for the data
     warehouse. This includes detailed information about every table and column,
     including data types and data quality metrics. A source entity analysis will be
     produced for each system that is to be used as a source.




© 2006 Data Management & Warehousing                                               Page 14
White Paper - Data Warehouse Documentation Roadmap




           5.4 Target Orientated Analysis (TOA)

           The final document template of the analysis phase is the Target Orientated
           Analysis document that is used to describe which sources will be used to populate
           which target entities. This document is sometimes replaced by a ‘source to target’
                                19
           mapping document. This document is used by the developers in the design and
                             20
           build of the ETL code. A target-orientated analysis will be developed for each
           subject area in the data warehouse.

          The output dependencies for the analysis can described as follows:




          Figure 3 - Analysis Dependencies




19
    Data Management & Warehousing prefer target orientated analysis which asks the
question ‘Which sources do I need in order to populate this target table completely?’ to the
source to target mapping method which asks the question ‘Which target entities do I need to
put data into from this source?’ This is because the thought process used in the first method
is geared towards the delivery of the information to the user rather than the extraction of the
data by the developer.
20
   ETL: Extract, Transform, Load – code written to move data from the sources to the data
warehouse


      © 2006 Data Management & Warehousing                                              Page 15
White Paper - Data Warehouse Documentation Roadmap




          6     Design
          The design phase concentrates on taking the analysis and creating a plan for the
          code build.

           6.1 ETL Execution Plan

           The ETL Execution Plan is a document that explains from the high level down to
           the low level how the ETL code will be put together. One of the most effective
                                                                  21
           ways of doing this is as a series of ‘directed graphs’. For example, there may be
           a diagram that represents the overall flow. In this case each point would represent
           a subject area. There are then a number of additional graphs, one for each
           subject area, that represent the detailed flows within a subject area. This drilling
           down is repeated until the lowest level ETL mappings are described or the
           required level of detail is documented.

           6.2 Initial Capacity Plan

           The initial capacity plan describes the sizes of the databases and database
           objects required for the initial build. This should describe the sizing for a known
           period (e.g. for 1 year) and a number of environments (Development, Test,
           Production). A large amount of the information required for this can often be
           derived from the data-modelling tool used.

           6.3 Coding Standards

           A document that describes the naming conventions for all objects that will be
           created, including but not limited to: database objects such as table and column
           names, ETL mapping names, script names etc. It will also describe and mandate
           or recommend any specific coding standards and/or algorithms.

       The diagram below describes the output dependencies:




              Figure 4 - Design Dependencies

21
   This is a term taken from mathematics. Graph theory is the study of graphs, mathematical
structures used to model relations between objects in a given collection. A "graph" in this
context refers to a collection of mappings and their dependencies. The most famous graph
theory problem is known as ‘The Seven Bridges of Konigsberg’ and was solved by Leonhard
Euler. http://mathforum.org/isaac/problems/bridges1.html



      © 2006 Data Management & Warehousing                                              Page 16
White Paper - Data Warehouse Documentation Roadmap




       7      Build
       This white paper is a roadmap to the documentation that should be produced during a
       data warehouse project. It should follow the structure of most projects but it is not a
       substitute for a project plan.

           7.1 Code Repository

           A lot of the code will contain valuable documentation in the form of comments. It
           is also vital that the history of changes to code is recorded. Therefore, an
           important part of the documentation is the information held in the configuration
                             22
           management tool.

           7.2 Data Cleansing Integration

           The data profiling described above will have generated a number of rules that will
           have to be implemented in order to maintain data quality. These rules will have to
                                                                          23
           be stored and integrated into the ETL. If a data-cleansing tool is being used then
           these rules will be documented within the tool. Otherwise, the rules should be
           explicitly documented for future reference.

       8      Test
                                                                                        24
              Testing software is operating the software under controlled conditions , to

                   1. Verify that it behaves “as specified”

                       Verification is the checking or testing of items, including software, for
                       conformance and consistency by evaluating the results against pre-
                       specified requirements.

                   2. To detect errors

                       Testing should intentionally attempt to make things go wrong to
                       determine if things happen when they should not or things do not
                       happen when they should.
                                                                               25
                       In this area it is important to test boundary conditions e.g. what
                       happens with a percentage over 100% or less than 0%.

                   3. To validate that what has been specified is what the user actually
                      wanted.

                       Validation looks at the system correctness – i.e. is the process of
                       checking that what has been specified is what the user actually
                       wanted.



22
   A list of configuration management tools can be found at:
http://www.cmcrossroads.com/cgi-bin/cmwiki/view/CM/WebHome
23
   A list of data quality tools can be found at:
http://mediaproducts.gartner.com/reprints/dataflux/137738.html (March 2006)
24
   Taken from: http://members.tripod.com/~bazman/
25
   A useful description of boundary values testing can be found at:
http://www.geocities.com/xtremetesting/BoundaryValues.html


      © 2006 Data Management & Warehousing                                                   Page 17
White Paper - Data Warehouse Documentation Roadmap



                  Remember: The purpose of testing is verification, validation and error detection
                  in order to find problems – and the purpose of finding those problems is to get
                  them fixed.

                  Each type of test document below should have inclusions and exclusions, test
                                                                                      26
                  cycles, expected results, entrance and exit criteria, etc. If a tool is used much
                  of this can be automated.

               8.1 Unit Testing

               These are tests that are designed to validate what an individual unit of
               development work (normally an ETL mapping, input screen or report) is
               functioning as expected.

               8.2 System Testing

               These are tests that are designed to check that a suite of newly developed or
               changed units work correctly together in the expected manner.

               8.3 Integration Testing

               These are tests that are designed to ensure that the suites of newly developed or
               changed units work with other suites that are already deployed on the system and
               do not damage the existing product environment.

               8.4 Performance Testing

               The final set of tests is designed to ensure the performance of the system. A
               system that is recording 10,000 transactions a day will be inserting into an empty
               table on the first day of operation and a table of 3.5M records after a year, the
               performance characteristics of this work vary dramatically from one database to
               the next. These tests must therefore examine the short term and long-term
               performance impacts of any given change.

           9      Implementation
           After the development and testing are over the system has to be deployed into
           production and left operating. Implementation is often neglected on project plans. It
           requires considerable thought and time to document procedures that will be used for
           many years to come.

               9.1 Configuration Management Procedures

               The configuration management procedures should cover all aspects of the
               changes to the configuration from applying patches and new releases through to
               system software upgrades.




26
     A list of testing tools can be found at: http://www.testingfaqs.org/


         © 2006 Data Management & Warehousing                                               Page 18
White Paper - Data Warehouse Documentation Roadmap




Historical Data Migration Plan

              When a data warehouse is deployed it is usual that some amount of historical
              data is required. There should be a plan that identifies what data is required, how
              far back in history it needs to go (one week, one year, etc.), how long this data will
              take to load, whether it will be loaded before or after go live and impacts on the
              day to day operation whilst it is loading etc.

              9.2 Operations Guide

              The operations guide is intended for those with responsibility for looking after the
              system on a day-to-day basis. These will not be the developers who originally
              created the system and therefore a simple, clear guide as what needs to be done
              routinely, what needs to be checked regularly and the escalation procedures in
              case of exceptions and failures needs to be created.

              9.3 Capacity Plan

              A document describing the Initial Capacity Plan will have already been produced.
              Various external factors will however change the capacity requirements (e.g.
              sudden growth in sales, mergers and acquisitions, new product lines, etc. or
              simply more people making use of the system or a new version of some software
              component) that can have a dramatic effect on the capacity of the system. As
              such, there should be regularly updated capacity plan that monitors the available
              disk, CPU and memory resources of the solution and ensures that sufficient
              resource is available and that any procurement of additional resource is done in
              line with the company budgetary cycle.

              9.4 Service Level Agreements (SLA)
                                                27
              A Service Level Agreement is a formal negotiated agreement between two
              parties (i.e. the business users of the data warehouse and the IT department or
              the IT department and outsourced service providers). It documents the common
              understanding about services, priorities, responsibilities, guarantee, etc. with the
              main purpose to agree on the level of service.

              For example, it may specify the levels of availability, serviceability, performance,
              operation or other attributes of the service like billing and even penalties in the
              case of violation of the Service Level Agreement. A Service Level Agreement is
              generally business oriented and does not go into much technical detail. Its
                                                                                                28
              technical specifications are commonly described through a series of appendices
              known as Service Level Specifications (or SLS) that define the technical metrics
              required.

              9.5 Helpdesk Scripts

              The helpdesk will need to be able to handle support calls. This is normally done
              as a series of help desk scripts that provide the questions for the support desk
              operators to ask the user. The operator can then either give the resolution or ask
              subsequent questions (which are normally dependent on the result of previous
              questions).


27
     Definition in part from Wikipedia: http://en.wikipedia.org/wiki/Service_Level_Agreement
28
     Depending on size the Service Level Specifications may become separate documents.


         © 2006 Data Management & Warehousing                                               Page 19
White Paper - Data Warehouse Documentation Roadmap



             The helpdesk scripts are normally broken down into a number of categories such
             as:

                  •   Support with using the front-end tools.

                  •   Server and Operational Issues.

                  •   Data Quality Issues including availability and currency of data.

                  •   Ad hoc enquiries.


             9.6 Training Plan
                                                                   29
             The project will need to provide a training plan . This is how users become
             competent enough to use the system. It normally consists of the steps:

                  •   Define the training goal:
                      The overall results or capabilities the user should attain.

                  •   Set the learning objectives:
                      What the user will be able to do as a result of the learning activities.

                  •   Learning methods and activities:
                      What the user will do in order to achieve the learning objectives.

                  •   Documentation and evidence of learning:
                      Produced by the user of the learning activities, these are the tangible
                      results.

                  •   Evaluation:
                      Assessment and judgment on quality of evidence in order to conclude
                      whether the user has achieved the learning objectives or not.

             Training plans should be created for all types of users and operators of the
             system.

             9.7 Operational Schedule

             The Operational Schedule is the list of tasks that must be performed each hour,
             day, week and month, etc. and any dependencies (e.g. must run after midnight,
             must only run if a previous job is successful etc.). It is not only the ETL code but
             also the backups and any maintenance windows that have to be included. These
                                                              30
             are often implemented in Job Scheduling Tools that automate the process and
             send alerts if anything fails.

             9.8 System Monitoring Plan

             The system monitoring plan is the list of system components that are going to be
             monitored, along with threshold at which warnings and errors are signalled. It
             should also include the way in which each message is communicated (e.g.
             audible or visible alert in a control room, SMS or text message, e-mail, etc.).



29
     Adapted from: http://www.managementhelp.org/trng_dev/gen_plan.htm
30
     See: http://www.jobschedulingtools.com/


        © 2006 Data Management & Warehousing                                                 Page 20
White Paper - Data Warehouse Documentation Roadmap



          Systems monitoring should also deal with “heartbeat” messages, i.e. messages
          that tell you that the monitoring is still working. Monitoring information should be
          retained so that it can be used to manage Service Level Agreements, provide
          information for Capacity Plans.

          The documents described in the implementation category interact as follows:




       Figure 5 - Implementation Flow


       10 Project Management
       Up to now this document has described documents required for individual phases of
       the project. There are a number of tools and templates required for the effective
                   31
       governance of the programme or project. Project management should have the
       minimal impact on the process of development whilst ensuring that control over
       resources, finances and scope is maintained. This category describes documents
       used to control or assist in the management of a project.

          10.1          Documentation Roadmap

          The Documentation Roadmap is the document that describes all the documents
          that should be produced for each of the phases of a project. The document you
          are reading is an example of this document.

          10.2          Project Plan

          The project plan is the list of tasks and activities with timescales, resources and
          dependencies that must be performed to deliver the solution. A project plan is
          base-lined and regularly updated throughout the life of the project. It is important
          that project plans have sufficient detail without trying to micro-mange tasks in the
          short-term whilst having larger objectives with less detailed activities for the
          longer-term aspects of the project. The plan is then updated as sufficient detail to
          plan later tasks becomes available.



31
   White Paper – Data Warehouse Governance
http://www.datamgmt.com/index.php?module=article&view=78


     © 2006 Data Management & Warehousing                                              Page 21
White Paper - Data Warehouse Documentation Roadmap




              10.3          ‘DRIVE’ Statements

              A drive statement is short one page template that helps a project manager assess
              whether a project, or work package should be undertaken. It looks at five aspects
              in order to make the assessment:

                  •   Dependencies:
                      What is required before this work can start?

                  •   Risks & Issues:
                      What can go wrong with doing this and how will it affect the overall
                      business, this deliverable and/or other deliverables?

                  •   Imperative:
                      Why do we have to do this? What makes it so important?

                  •   Value:
                      What value will the business, team or overall project get from doing this?

                  •   Exploitation:
                      Once we have this solution how will be able to take advantage of it?

              10.4          ‘SWOT’ Analysis
                       32
              A SWOT analysis is often used in data warehouse projects as a way of
              comparing different approaches to a problem. It does this by looking at the
              following attributes of each approach:

                  •   Strengths

                  •   Weaknesses

                  •   Opportunities

                  •   Threats


              10.5          ‘MoSCoW’ Analysis
                              33
              The MoSCoW analysis is a method of prioritising a list of requirements of
              features of the system by breaking the list down into the following groups:

                  •   Must have in order to meet a minimum requirement.

                  •   Should have in order to get real value from the development.

                  •   Could have if there was available time or resources.

                  •   Would have if there were no limits on the development.




32
     Further information at: http://en.wikipedia.org/wiki/SWOT_analysis
33
     Further information at: http://en.wikipedia.org/wiki/MoSCoW_Method


        © 2006 Data Management & Warehousing                                                 Page 22
White Paper - Data Warehouse Documentation Roadmap




            10.6         Change Requests (CR)

            The change request is a critical component of any project and is vital to data
            warehouse projects. At the outset of this document the requirements gathering
            process was discussed, however during the lifecycle of the project the
            requirements (and other aspects of the project) will change. The change request
            is the template that documents a change from the original requirement to what is
            now required. Change requests can be accepted or rejected as appropriate and
            should be encouraged as a way to prevent uncontrolled and un-scoped
            development from occurring.

            10.7         Risk Register

            The risk register is a list of events that may happen. If the event occurs then it will
            have some negative impact on the project in terms of cost, resource or time. This
            is contrasted with an issue that is something that has happened and therefore
            needs to be managed.

            A risk can be described in two
            dimensions:

                •   The first is the probability of
                    it happening which is a
                    measure of how likely it is
                    that a risk will become an
                    issue.

                •   The second is the impact, a
                    measure of the cost it will
                    have in terms of resource,
                    time or scope.

            The ‘hotter’ the risk the         more
            attention should be paid to it.


                                                      Figure 6 - Risk Assessment
            10.8         Issue Log (BUG)

            The issue log is the active management of issues that have arisen. This is best
                                                      34
            managed with an issue-tracking tool that supports the allocation of work to
            resources and tracks the history of actions taken in response to an issue. Each
                                                                                             35
            issue has a lifecycle that starts with its being reported and ends in resolution.




34
   A list of issue tracking tools can be found at: http://www.testingfaqs.org/
35
   Bugzilla has one of the best descriptions of the lifecycle of an issue. This can be found at:
http://www.bugzilla.org/docs/3.0/html/lifecycle.html and is reproduced in Appendix 2 of this
document


      © 2006 Data Management & Warehousing                                                 Page 23
White Paper - Data Warehouse Documentation Roadmap




     10.9          Key Design Decisions (KDD)

     The key design decision is a template to record significant design decisions. It
     records the issue, the chosen option, any rejected options and rationale behind
     the decision.

     Examples of when to use a Key Design Decision might include the choice of a
     specific tool for a specific function, the choice of data model style, etc. This is
     important for long-term programmes and projects as some decisions are
     questioned when reviewed at a later stage. This is often done without context and
     justification for how the original decision was made and sometimes without the
     original decision makers being available.

     The template helps project managers from constantly returning to resolved
     issues. The document should contain the justification for the decision and any
     rejected opposing arguments.


 11 Miscellaneous
 The final category of this document describes some general-purpose documents that
 a project will find useful. The direct impact of the data warehouse project will not
 always be visible to business users, who may see it as a large budget line with little
 benefit. It is therefore important to market the data warehouse to the wider business
 audience.

 Business users should understand how and when they are getting information from
 the data warehouse rather than other sources and see the impact of data quality
 initiatives, etc. Therefore, Data Management & Warehousing recommend that all
 documents use a consistent set of templates and that where a set of templates are
 used they are branded with the name of the project rather than any third party that is
 contributing to the project.

     11.1          General Purpose Document

     A standard look and feel document with the required categories for any project
     document required.

     11.2          General Purpose Presentation

     This document is a presentation with a standard look and feel. This should be
     used for all presentations that are given either inside or outside the team.

     11.3          Meeting Agenda

     A standard agenda template for meetings.

     11.4          Memo

     This document provides a standard memo format for anyone who is recording
     formal aspects of the project outside the documentation roadmap.




© 2006 Data Management & Warehousing                                             Page 24
White Paper - Data Warehouse Documentation Roadmap




Summary
Many data warehousing projects are both long running and poorly documented. This
does not mean that there is not a lot of documentation, just a lack of the right
documentation in the right place. It is the quality and availability of the documentation
that leads to an understanding of what is available and hence to the value and reputation
of the data warehouse itself.

This white paper has looked at a consistent set of documents developed over fifteen
years of project experience. It reflects a desire to develop the right amount of
documentation at the right time in the project lifecycle and stored in the right place. Doing
so means moving some documents held on project shared drives to web based media
and publishing documentation to a wider audience, whilst replacing some documents
with online tools. It is essential to the success of a data warehouse project that a culture
of open access is fostered and that the documentation is seen as the entry point to the
data warehouse.

Data Management &          Warehousing has identified three aspects to             essential
documentation:

    •   A roadmap that describes what documentation is required and how it fits
        together.

    •   Team members within the project to use the templates, create quality documents
        and store them to the project repositories.

    •   Easy access for people outside the project team to the documentation including
        publication or notification of changes, updates and new releases.

This white paper has provided the documentation roadmap with both explanations and a
significant number of examples. It has also looked at some of the issues associated with
the distribution of information outside the project team. It has highlighted that the
processes and procedures required to create and store the information in the first place
are a matter of good project governance.

Data Management & Warehousing believe that the documents described here cover all
the areas necessary for a major programme of work. However, this is only a guide and a
set of templates and these should be adjusted to meet the needs of the programme. By
combining the documentation roadmap, the project plan and suitable governance a
project will have developed a strong foundation for the real work of developing a
successful data warehouse.




 © 2006 Data Management & Warehousing                                                 Page 25
White Paper - Data Warehouse Documentation Roadmap




       Appendices
           Appendix 1 – Lifecycle of a bug
                                                                          36
           The lifecycle of a bug (or issue) is taken from the Bugzilla        documentation




36
     The original can be found at: http://www.bugzilla.org/docs/3.0/html/lifecycle.html


         © 2006 Data Management & Warehousing                                                  Page 26
White Paper - Data Warehouse Documentation Roadmap




       Appendix 2 – Project Quick Start Infrastructure
                                                                             37
       Readers of this document and some of our other white papers will be concerned
       about just how much effort is required to get a data warehouse project under way.
       Data Management & Warehousing do not recommend products or individual vendors
       unless specifically requested to do so for a particular project or need. However, there
       is a need to suggest a basic infrastructure for organisations that do not have anything
       in place. The basic infrastructure is neither exhaustive nor exclusive but a guideline
       configuration. It is very cheap and sufficient to support a very large organisation for a
       several years:
                            38
        •    A small server
             For example a dual core 2GHz CPU, 2Gb memory, two (for mirroring) 200Gb
             disks.

        •    Network connectivity
             Normally two network cards, one for the LAN and the other for the internet, IP
             Addresses and server names.

        •    Remote backup capability
             For example a location on the SAN where files can be backed up to every day or
             more frequently if required.

        •    Linux
             A version such Redhat, SuSe, CentOS, Debian, etc.

        •    Apache Web Server
             To provide all web services.

        •    Bugzilla
             To provide issue tracking.

        •    Samba
             To provide a Microsoft compatible shared file system.

        •    CVS
             To provide the source code control.

        •    CVSWeb
             To provide a web interface to CVS.

        •    Perl
             Pre-requisite language for Bugzilla.

        •    CPAN Bundle::Bugzilla
             Pre-requisite language modules for Bugzilla.
                                                      39
        •    A Content Management System (CMS) package
             Data Management & Warehousing use phpWebsite but any will do.



37
   Overview Architecture for Enterprise Data Warehouses and Data Warehouse Governance
are both available from http://www.datamgmt.com.
38
   Many organisations will have a server being de-commissioned from some other project that
could be re-used for this it does not have to be very powerful
39
   List available from http://www.cmsmatrix.org/


      © 2006 Data Management & Warehousing                                               Page 27
White Paper - Data Warehouse Documentation Roadmap




   •    A Wiki
        Data Management & Warehousing use the one included with phpWebsite but
        any will do.

   •    PHP
        Pre-requisite language for phpWebsite.

   •    MySQL
        Pre-requisite database for Bugzilla and phpWebsite.

   From a technical point of view this server can be built and the software downloaded,
   installed, configured, secured and put on the internet and shared onto the LAN very
   quickly. Normally an experienced Linux Systems Administrator could configure a
   virtually maintenance free solution within a couple of days and as the software is all
   free the only costs incurred will be for time and hardware. This is also the basic
   configuration of the Data Management & Warehousing website.

   In addition it is recommended that the following desktop software be provided:

   •    Office product
        e.g. Microsoft Office, Star Office, etc.

   •    CVS Client
        Data Management & Warehousing normally use WinCVS.

   This list is for a server for the project governance and documentation of a data
   warehouse project and does not include the development, test and production
   environments from the data warehouse itself. Its simplicity, fast setup and low cost is
   a demonstration of low impact governance of a data warehouse project.

References
The section below represents some useful resources for those considering building a
data warehouse solution.

   Web resources

  Organisation                               Website
  Data Management & Warehousing              http://www.datamgmt.com
  Configuration Management Wiki              http://www.cmcrossroads.com/
  Data Quality Tools                         http://mediaproducts.gartner.com/
  Software Testing Tools                     http://www.testingfaqs.org/
  Job Scheduling Tools                       http://www.jobschedulingtools.com/
  Data Modelling Tools                       http://www.databaseanswers.com/
  Project Management Tools                   http://www.startwright.com/
  CMS Tools                                  http://www.cmsmatrix.org/
  Bugzilla                                   http://www.bugzilla.org



Copyright
© 2007 Data Management & Warehousing. All rights reserved. Reproduction not
permitted without written authorisation. References to other companies and their
products use trademarks owned by the respective companies and are for reference
purposes only.




 © 2006 Data Management & Warehousing                                               Page 28

Más contenido relacionado

La actualidad más candente

Business requirements gathering for bi
Business requirements gathering for biBusiness requirements gathering for bi
Business requirements gathering for biCorey Dayhuff
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business EnablerSrinivasan Sankar
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Chapter 4: Data Architecture Management
Chapter 4: Data Architecture ManagementChapter 4: Data Architecture Management
Chapter 4: Data Architecture ManagementAhmed Alorage
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
How to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that LastsHow to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that LastsDATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdfDatacademy.ai
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesDavid Walker
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model DATUM LLC
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategylarryzagata
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaEdureka!
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 

La actualidad más candente (20)

Data warehouse proposal
Data warehouse proposalData warehouse proposal
Data warehouse proposal
 
Business requirements gathering for bi
Business requirements gathering for biBusiness requirements gathering for bi
Business requirements gathering for bi
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business Enabler
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Chapter 4: Data Architecture Management
Chapter 4: Data Architecture ManagementChapter 4: Data Architecture Management
Chapter 4: Data Architecture Management
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
How to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that LastsHow to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that Lasts
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdf
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategy
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 

Destacado

Gathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business RequirementsGathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business RequirementsWynyard Group
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Processsilvaft
 
Why Dashboards Fail
Why Dashboards FailWhy Dashboards Fail
Why Dashboards FailGeckoboard
 
Capturing Data Requirements
Capturing Data RequirementsCapturing Data Requirements
Capturing Data Requirementsmcomtraining
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements TemplateAlan D. Duncan
 
BI Dashboard Formula Methodology: How to make your first big data visualizati...
BI Dashboard Formula Methodology: How to make your first big data visualizati...BI Dashboard Formula Methodology: How to make your first big data visualizati...
BI Dashboard Formula Methodology: How to make your first big data visualizati...BI Brainz
 

Destacado (6)

Gathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business RequirementsGathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business Requirements
 
SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Process
 
Why Dashboards Fail
Why Dashboards FailWhy Dashboards Fail
Why Dashboards Fail
 
Capturing Data Requirements
Capturing Data RequirementsCapturing Data Requirements
Capturing Data Requirements
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template
 
BI Dashboard Formula Methodology: How to make your first big data visualizati...
BI Dashboard Formula Methodology: How to make your first big data visualizati...BI Dashboard Formula Methodology: How to make your first big data visualizati...
BI Dashboard Formula Methodology: How to make your first big data visualizati...
 

Similar a White Paper - Data Warehouse Documentation Roadmap

White Paper - Data Warehouse Governance
White Paper -  Data Warehouse GovernanceWhite Paper -  Data Warehouse Governance
White Paper - Data Warehouse GovernanceDavid Walker
 
White Paper - Overview Architecture For Enterprise Data Warehouses
White Paper -  Overview Architecture For Enterprise Data WarehousesWhite Paper -  Overview Architecture For Enterprise Data Warehouses
White Paper - Overview Architecture For Enterprise Data WarehousesDavid Walker
 
White Paper - How Data Works
White Paper - How Data WorksWhite Paper - How Data Works
White Paper - How Data WorksDavid Walker
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
White Paper - The Business Case For Business Intelligence
White Paper -  The Business Case For Business IntelligenceWhite Paper -  The Business Case For Business Intelligence
White Paper - The Business Case For Business IntelligenceDavid Walker
 
Information architecture overview
Information architecture overviewInformation architecture overview
Information architecture overviewJames M. Dey
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Implement Data Ware House
Implement Data Ware HouseImplement Data Ware House
Implement Data Ware Housebhuphender
 
Building an Enterprise Metadata Repository
Building an Enterprise Metadata RepositoryBuilding an Enterprise Metadata Repository
Building an Enterprise Metadata RepositoryEmbarcadero Technologies
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteCarlo Vaccari
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL DatabaseNuoDB
 
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...IRJET Journal
 
An Overview of General Data Mining Tools
An Overview of General Data Mining ToolsAn Overview of General Data Mining Tools
An Overview of General Data Mining ToolsIRJET Journal
 
EA for MA - London June 15 - FINAL v1.1
EA for MA - London June 15 - FINAL v1.1EA for MA - London June 15 - FINAL v1.1
EA for MA - London June 15 - FINAL v1.1Andrew Swindell
 
IT_RFO10-14-ITS_AppendixA_20100513
IT_RFO10-14-ITS_AppendixA_20100513IT_RFO10-14-ITS_AppendixA_20100513
IT_RFO10-14-ITS_AppendixA_20100513Alexander Doré
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paperjuly12jana
 
Whitepaper: Datacenter Migration - Happiest Minds
Whitepaper: Datacenter Migration - Happiest MindsWhitepaper: Datacenter Migration - Happiest Minds
Whitepaper: Datacenter Migration - Happiest MindsHappiest Minds Technologies
 
Sabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdfSabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdfBrion Carroll (II)
 

Similar a White Paper - Data Warehouse Documentation Roadmap (20)

White Paper - Data Warehouse Governance
White Paper -  Data Warehouse GovernanceWhite Paper -  Data Warehouse Governance
White Paper - Data Warehouse Governance
 
White Paper - Overview Architecture For Enterprise Data Warehouses
White Paper -  Overview Architecture For Enterprise Data WarehousesWhite Paper -  Overview Architecture For Enterprise Data Warehouses
White Paper - Overview Architecture For Enterprise Data Warehouses
 
White Paper - How Data Works
White Paper - How Data WorksWhite Paper - How Data Works
White Paper - How Data Works
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
White Paper - The Business Case For Business Intelligence
White Paper -  The Business Case For Business IntelligenceWhite Paper -  The Business Case For Business Intelligence
White Paper - The Business Case For Business Intelligence
 
Information architecture overview
Information architecture overviewInformation architecture overview
Information architecture overview
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
KEDAR_TERDALKAR
KEDAR_TERDALKARKEDAR_TERDALKAR
KEDAR_TERDALKAR
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Implement Data Ware House
Implement Data Ware HouseImplement Data Ware House
Implement Data Ware House
 
Building an Enterprise Metadata Repository
Building an Enterprise Metadata RepositoryBuilding an Enterprise Metadata Repository
Building an Enterprise Metadata Repository
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suite
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
 
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
 
An Overview of General Data Mining Tools
An Overview of General Data Mining ToolsAn Overview of General Data Mining Tools
An Overview of General Data Mining Tools
 
EA for MA - London June 15 - FINAL v1.1
EA for MA - London June 15 - FINAL v1.1EA for MA - London June 15 - FINAL v1.1
EA for MA - London June 15 - FINAL v1.1
 
IT_RFO10-14-ITS_AppendixA_20100513
IT_RFO10-14-ITS_AppendixA_20100513IT_RFO10-14-ITS_AppendixA_20100513
IT_RFO10-14-ITS_AppendixA_20100513
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 
Whitepaper: Datacenter Migration - Happiest Minds
Whitepaper: Datacenter Migration - Happiest MindsWhitepaper: Datacenter Migration - Happiest Minds
Whitepaper: Datacenter Migration - Happiest Minds
 
Sabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdfSabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdf
 

Más de David Walker

Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServicesDavid Walker
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure ClustersDavid Walker
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceDavid Walker
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Big Data Analytics 2017 - Worldpay - Empowering Payments
Big Data Analytics 2017  - Worldpay - Empowering PaymentsBig Data Analytics 2017  - Worldpay - Empowering Payments
Big Data Analytics 2017 - Worldpay - Empowering PaymentsDavid Walker
 
Data Driven Insurance Underwriting
Data Driven Insurance UnderwritingData Driven Insurance Underwriting
Data Driven Insurance UnderwritingDavid Walker
 
Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)David Walker
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceDavid Walker
 
BI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosBI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosDavid Walker
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platformDavid Walker
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environmentDavid Walker
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data recordsDavid Walker
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data managementDavid Walker
 
A linux mac os x command line interface
A linux mac os x command line interfaceA linux mac os x command line interface
A linux mac os x command line interfaceDavid Walker
 
Connections a life in the day of - david walker
Connections   a life in the day of - david walkerConnections   a life in the day of - david walker
Connections a life in the day of - david walkerDavid Walker
 
Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or futureDavid Walker
 
An introduction to social network data
An introduction to social network dataAn introduction to social network data
An introduction to social network dataDavid Walker
 
Using the right data model in a data mart
Using the right data model in a data martUsing the right data model in a data mart
Using the right data model in a data martDavid Walker
 
Implementing Netezza Spatial
Implementing Netezza SpatialImplementing Netezza Spatial
Implementing Netezza SpatialDavid Walker
 
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesDavid Walker
 

Más de David Walker (20)

Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServices
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI Compliance
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Big Data Analytics 2017 - Worldpay - Empowering Payments
Big Data Analytics 2017  - Worldpay - Empowering PaymentsBig Data Analytics 2017  - Worldpay - Empowering Payments
Big Data Analytics 2017 - Worldpay - Empowering Payments
 
Data Driven Insurance Underwriting
Data Driven Insurance UnderwritingData Driven Insurance Underwriting
Data Driven Insurance Underwriting
 
Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
 
BI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosBI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for Telcos
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environment
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data records
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data management
 
A linux mac os x command line interface
A linux mac os x command line interfaceA linux mac os x command line interface
A linux mac os x command line interface
 
Connections a life in the day of - david walker
Connections   a life in the day of - david walkerConnections   a life in the day of - david walker
Connections a life in the day of - david walker
 
Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or future
 
An introduction to social network data
An introduction to social network dataAn introduction to social network data
An introduction to social network data
 
Using the right data model in a data mart
Using the right data model in a data martUsing the right data model in a data mart
Using the right data model in a data mart
 
Implementing Netezza Spatial
Implementing Netezza SpatialImplementing Netezza Spatial
Implementing Netezza Spatial
 
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store Databases
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

White Paper - Data Warehouse Documentation Roadmap

  • 1. Data Management & Warehousing WHITE PAPER Data Warehouse Documentation Roadmap DAVID M WALKER Version: 1.0 Date: 05/04/2007 Data Management & Warehousing 138 Finchampstead Road, Wokingham, Berkshire, RG41 2NU, United Kingdom http://www.datamgmt.com
  • 2. White Paper - Data Warehouse Documentation Roadmap Table of Contents Table of Contents ...................................................................................................................... 2 Synopsis .................................................................................................................................... 3 Intended Audience .................................................................................................................... 3 About Data Management & Warehousing ................................................................................. 3 Introduction................................................................................................................................ 4 Considerations .......................................................................................................................... 5 Documentation as a tool............................................................................................................ 5 Which tools and products to use ............................................................................................... 6 What about a Wiki? ............................................................................................................... 6 Put your documentation on the Internet! ................................................................................... 7 Document Short Names ............................................................................................................ 7 Overview Diagram ..................................................................................................................... 8 The Templates .......................................................................................................................... 9 1 Concept ........................................................................................................................ 9 2 Requirements ............................................................................................................... 9 3 Architecture ................................................................................................................ 11 4 Data Models ............................................................................................................... 12 5 Analysis ...................................................................................................................... 14 6 Design ........................................................................................................................ 16 7 Build............................................................................................................................ 17 8 Test............................................................................................................................. 17 9 Implementation ........................................................................................................... 18 10 Project Management .................................................................................................. 21 11 Miscellaneous............................................................................................................. 24 Summary ................................................................................................................................. 25 Appendices.............................................................................................................................. 26 Appendix 1 – Lifecycle of a bug .......................................................................................... 26 Appendix 2 – Project Quick Start Infrastructure .................................................................. 27 References .............................................................................................................................. 28 Web resources .................................................................................................................... 28 Copyright ................................................................................................................................. 28 © 2006 Data Management & Warehousing Page 2
  • 3. White Paper - Data Warehouse Documentation Roadmap Synopsis All projects need documentation and many companies provide templates as part of a methodology. This document describes the templates, tools and source documents used by Data Management & Warehousing. It serves two purposes: • For projects using other methodologies or creating their own set of documents to use as a checklist. This allows the project to ensure that the documentation covers the essential areas for describing the data warehouse. • To demonstrate our approach to our clients by describing the templates and deliverables that are produced. Documentation, methodologies and templates are inherently both incomplete and flexible. Projects may wish to add, change, remove or ignore any part of any document. Some may also believe that aspects of one document would sit better in another. If this is the case then users of this document and these templates are encouraged to change them to fit their needs. Data Management & Warehousing believes that the approach or methodology for building a data warehouse should be to use a series of guides and checklists. This ensures that small teams of relatively skilled resources developing the system can cover all aspects of the project whilst being free to deal with the specific issues of their environment to deliver exceptional solutions, rather than a rigid methodology that ensures that large teams of relatively unskilled staff can meet a minimum standard. Intended Audience Reader Recommended Reading Executive Synopsis to Overview Diagram Business Users Synopsis to Overview Diagram IT Management Synopsis to Overview Diagram IT Strategy Synopsis to Overview Diagram IT Project Management Entire Document IT Developers Entire Document About Data Management & Warehousing Data Management & Warehousing is a specialist consultancy in data warehousing, based in Wokingham, Berkshire in the United Kingdom. Founded in 1995 by David M Walker, our consultants have worked for major corporations around the world including the US, Europe, Africa and the Middle East. Our clients are invariably large organisations with a pressing need for business intelligence. We have worked in many industry sectors but have specialists in Telco’s, manufacturing, retail, financial and transport as well as technical expertise in many of the leading technologies. For further information visit our website at: http://www.datamgmt.com © 2006 Data Management & Warehousing Page 3
  • 4. White Paper - Data Warehouse Documentation Roadmap Introduction A data warehouse programme will often run for many years and produce much documentation. Data Management & Warehousing has identified three essential aspects for documentation: • A roadmap that describes what documentation is required and how it fits together. • Team members within the project to use the templates, create quality documents and store them to the project repositories. • Easy access for people outside the project team to the documentation including publication or notification of changes, updates and new releases. This document provides the roadmap and looks at some of the issues associated with the distribution of information outside the project team. The processes and procedures required to create and store the in formation in the first place are a matter of project 1 governance. The documents listed are the templates used by Data Management & Warehousing and we believe that they cover all the areas necessary for a major programme of work. Templates, however, are created to fulfil a need and should be adapted as required. By combining this document, the project plan and suitable governance a project will have developed a strong foundation developing a successful data warehouse. 1 Data Management & Warehousing have published a white paper on Data Warehouse Governance which is available from the website at http://www.datamgmt.com/index.php?module=article&view=78 © 2006 Data Management & Warehousing Page 4
  • 5. White Paper - Data Warehouse Documentation Roadmap Considerations This document assumes that a data warehouse is a long-term investment by an organisation and as such will form a programme of work. This programme will be broken down into projects and where appropriate a project will have subsidiary phases. The document also assumes that the project will maintain tight change control. Each document should have: • A consistent naming conventions. • A Version Number and Date. • A draft, review, publish process that will allow a document version to be signed off. • A process that over time allows a document to have many signed off versions. • A configuration management tool that enforces good practice. Programmes that do not achieve this will find that the documentation becomes both contradictory and a burden in itself and this can become a risk factor in the success of the overall programme. Documentation as a tool 2 Every project acknowledges the need to document itself. However, this ranges from lip service and the production of some minimal notes to volumes of shelf-ware, paper that sits unread for years on end because no one dares throw it away. Neither of these outcomes is of any value. Here are some guidelines for when and how to produce documentation: • Documents should only be produced when they serve a purpose. • Documents should only be maintained whilst the information needs to be current. • Documents should only be retained whilst they have value. • Documents should refer to other documents rather than duplicate information. • Documents should be under change/version control. • Documents should be succinct. 3 • Poor grammar and bad writing are often signs of poor comprehension. • Good documentation takes time. 2 See also Agile Documentation: A Pattern Guide to Producing Lightweight Documents for Software Projects (Wiley Software Patterns Series) by Andreas Rueping 3 From Redhat Magazine: How to write really good documentation: Four Rules and an Axiom. http://www.redhatmagazine.com/2007/01/30/how-to-write-really-good-documentationfour- rules-and-an-axiom/ © 2006 Data Management & Warehousing Page 5
  • 6. White Paper - Data Warehouse Documentation Roadmap • Great expertise in a subject is not automatically a prerequisite for creation of good documentation. • Do not let working cultures that put too great a premium on knowing everything dominate. Therefore pull from this roadmap what you need, do not produce everything just because it is there. Which tools and products to use The choice of tools and products to use for a given project is based largely on the standards of the organisation; most are standard office productivity tools. The table below lists some of the tools used in our example templates: Type of Document Example Template: 4 Code Repository CVS 5 Data Cleaning tools 6 Data Models 7 Data Profiling Diagram Microsoft Visio Document Microsoft Word Document Distribution Adobe Acrobat 8 Issue Log Bugzilla Presentation Microsoft PowerPoint 9 Project Plan Microsoft Project 10 System Testing What about a Wiki? A Wiki is a web application that allows users to add and edit content in a collaborative fashion. This is an ideal alternative for many of the documents that are used in a data warehousing project and if the organisation supports the use of a Wiki then it is often preferable to create these templates in the Wiki rather than as separate documents and allow widespread collaborative access to them. Throughout this document the ‘Works With Wikis’ logo has been added to those documents that Data Management & Warehousing believe work well on a Wiki. 4 A list of configuration management tools can be found at: http://www.cmcrossroads.com/cgi-bin/cmwiki/view/CM/WebHome 5 A list of data quality tools can be found at: http://mediaproducts.gartner.com/reprints/dataflux/137738.html 6 A fist of data modelling tools can be found at: http://www.databaseanswers.com/modelling_tools.htm 7 A list of data quality tools can be found at: http://mediaproducts.gartner.com/reprints/dataflux/137738.html (March 2006) 8 A list of issue tracking tools can be found at: http://www.testingfaqs.org/ 9 A list of project management tools can be found at: http://www.startwright.com/project1.htm 10 A list of testing tools can be found at: http://www.testingfaqs.org/ © 2006 Data Management & Warehousing Page 6
  • 7. White Paper - Data Warehouse Documentation Roadmap Put your documentation on the Internet! The screams at this recommendation could be heard from the second it was first written but Data Management & Warehousing strongly recommend that you put as much, if not all, of your documentation on the web. Most organisations building a large data warehouse will have individuals on vendor or remote sites or support people who have to respond out of hours and may not have everything immediately available. To this end 11 providing remote secure access improves responsiveness for all involved and ensures collective ownership of this information. If the web is not an option for your organisation then at least consider publishing it on the corporate intranet. An example solution would include a Wiki for dynamic documentation, a file repository for distributed documents in Adobe Acrobat format, Bugzilla for issue tracking and CVSWeb to allow users to view (but not edit) the code held in CVS. All of this software is 12 free and can be hosted on a single secure web server running Windows or Linux. Document Short Names Some of the document titles have a three-letter acronym (e.g. KDD besides Key Design Decisions or SSA besides Source Systems Analysis). This is because these documents are often numbered and a short code allows them to be easily identified. 11 Whilst we recommend putting it on the internet, access should, as with any web application, be controlled and secure. 12 See Appendix 2 – Project Quick Start Infrastructure for a reference configuration © 2006 Data Management & Warehousing Page 7
  • 8. White Paper - Data Warehouse Documentation Roadmap Overview Diagram Figure 1 - Overview Documentation Roadmap Diagram © 2006 Data Management & Warehousing Page 8
  • 9. White Paper - Data Warehouse Documentation Roadmap The Templates The templates are divided into eleven categories. Within each category the documents are numbered sequentially. Some templates depend on others (indicated by the arrow on the diagram whilst others can be done at any time within the phase. Finally, category 10 documents exist across the project lifecycle, whilst category 11 templates are just general ones that can be used as required. 1 Concept The concept phase is about describing what the big idea is. The business may have a concept and the IT team will be able to describe the major component and concepts of a data warehouse. 1.1 Business Concepts for the Data Warehouse In order to start the data warehouse project a document that describes the conceptual model of the information required by the business. This document describes subject areas and their broad relationships as well as key performance indicators used by the business. This document is a useful introduction but once read it is unlikely to be an ongoing reference source. 1.2 Overview Architecture for Enterprise Data Warehouses The Overview Architecture for Enterprise Data Warehouses is a design pattern for data warehousing to describe the basic concepts of the data warehouse. As such, a project can just download the completed document from the Data Management 13 & Warehousing website and further terms used in this document relate to components described in that document. 2 Requirements The requirements gathering phase of any data warehouse is one of the most difficult. The objective of these templates is to give breadth and depth to the requirements. Breadth is the ability to ensure that all truly required information would be covered, whilst depth is the amount of detail that is specified in the requirements to ensure that the developers have sufficient, unambiguous, detail with which to develop. Requirements should have a programme-long life cycle. After the initial version of the requirements is developed a project can start the build, however the business moves on and therefore whilst the build phase is occurring it is important that new versions of the requirements are also being developed. A project within a programme of work should have a fixed version of the requirements; however each project may work with a different version of the requirements. 13 This document is available at: http://www.datamgmt.com/index.php?module=article&view=76 © 2006 Data Management & Warehousing Page 9
  • 10. White Paper - Data Warehouse Documentation Roadmap 2.1 Data Warehouse Business Requirements (WBR) The first template that Data Management & Warehousing use is called the Data Warehouse Business Requirements and it details the ‘soft’ requirements for business information according to a number of subject areas of interest to the business. A business requirement is something of the form: ‘Provide the average and total revenue for each product category by customer market segment for the last three years’ It is a requirement that is specified in business language and without regard for the practicalities of delivering it. These requirements should be used to get business users to underwrite the business benefit, i.e. if I could answer all of these questions then I would be able to increase margin by a given percentage for a given product. 2.2 Data Warehouse Data Requirements (WDR) The second document details the ‘hard’ requirements for business information from the data perspective. This document goes a step deeper into understanding the requirements, but is still written from the business users’ perspective. This is the refinement of the business requirements in that the analysts can use the business requirement to drive out the data required to answer the questions. In the example above it is clear that both some part of the product hierarchy and of the customer hierarchy are required as well as a time dimension and information about revenue. It has also told the analyst that the minimum retention period is three years. Consequently, the analyst would start to build data requirements that may fulfil many business requirements and to add additional attributes to help make sense of the data. The data requirements lifecycle is similar to that of the business requirements i.e. fixed for a project and variable over the lifespan of a programme. 2.3 Data Warehouse Query Requirements (WQR) The third document lists a number of potential queries to which the solution should be able to provide answers. This is not an exhaustive list, but rather represents the types of queries that are being asked by the business. It is used to test the relationship between the data requirements, the data model and the business requirements. A set of queries should be able to provide the data required to answer a business requirement. At the same time the data must be available as described in the data requirements and joined in such a way as to be usable in the data model. © 2006 Data Management & Warehousing Page 10
  • 11. White Paper - Data Warehouse Documentation Roadmap 2.4 Data Warehouse Technical Requirements (WTR) The fourth document details the functional and non-functional requirements that are expected of the solution. Again, these requirements are stated from the business perspective rather than the technical perspective. The document should include topics such: • The functionality required of the query tools. • The general retention requirements for data. • The performance characteristics. • The systems availability expectations. 2.5 Data Warehouse Interface Requirements (WIR) The fifth and final requirements document details the requirements for interfaces that feed from the data warehouse out to other systems. Often a data warehouse will be required to deliver information to downstream systems that have existing data interface specifications, these requirements have to be gathered to ensure that as well as the user demands (derived from the Query Requirements) the interface demands are also met. 2.6 Business Definitions Dictionary (BDD) Throughout all of this process a number of business terms will be used. It is important that a common dictionary is developed and kept so that there is a common reference for words. This does not mean that there must be one definition for each term but there should be a definition for how a term is used within a given context. For example, to a customer support division a customer might be those who have an active support contract, whilst to the sales team it may be anyone who has ever bought a product. Both definitions are right within their context. 3 Architecture The architecture category contains a number of documents that describe how the system should be built, these provide a blueprint to developers on how to approach any particular problem by helping them select the appropriate tools, platforms and configurations to both meet their need and conform to the overall strategy. 3.1 Technical Architecture The technical architecture describes the technical components that will be used to build the system. This will include the hardware, software and network configuration, along with specific versions where appropriate and standards as to 14 which software product should be used for which job. 14 As vendors broaden out their products they start to overlap in functionality. Ensuring that two different products are not used to build the solution should be covered in this document © 2006 Data Management & Warehousing Page 11
  • 12. White Paper - Data Warehouse Documentation Roadmap 3.2 Security Model The security model should describe all the required roles/groups etc that will be required for each component of the system. It needs to first set out the general policies and then list explicit permissions for each component on the system. 3.3 Resilience Plan 15 The resilience plan should describe how the system is made resilient; this should include (as required) the need for redundant hardware and networks, incremental, cumulative and full backups, restores of individual components or entire systems, how to back out records individually, as a group or entire sets and how disasters such the loss of a data centre etc are managed. 3.4 Data Quality Plan (DQP) The data quality plan should describe how data quality is managed. This will include the principles of where data is cleansed (in the source, in the staging, in the data warehouse itself, etc.) how it is profiled, what type of cleansing is carried 16 out (e.g. rule based or heuristic ), how it is profiled, what metrics are set and monitored for data improvement, etc. 4 Data Models Data models are (normally) graphical representations of the data that is required. These are normally created in special software that can also generate the DDL required to create the physical objects in the database. 4.1 Data Modelling Standards This document describes the naming conventions of objects in the database, as well as any particular modelling methods (e.g., a hierarchy must always be modelled in a specific way and any exceptions noted along with a justification for the difference). This document should describe the standards for all three data models described modelling techniques for Logical and Physical models. 4.2 Logical Model The logical data model is a model that represents the true structure of data used by the business, independent of software or hardware implementation constraints. Normally the model is closely related to the information described in Data Warehouse Business Requirements. 15 This is sometimes called a Disaster Recovery Plan; however, as a name this does not cover the full range of activities that are required. 16 Heuristic data cleansing uses methods such as fuzzy logic, etc. to try to clean data. This is most successful with data such as addresses where there is the opportunity for lots of human error in the information © 2006 Data Management & Warehousing Page 12
  • 13. White Paper - Data Warehouse Documentation Roadmap 4.3 Repository Data Model 17 The repository data model is a physical data model of the main storage area within a data warehouse. This model will reflect the logical data model in overall structure but will have a number of compromises for the practical delivery of the solution. It normally closely reflects the information described in the Data Warehouse Data Requirements. 4.4 Data Mart Data Model(s) The data mart data models are the physical models of the part of the system that 18 the user will query. These are often, though not necessarily, star schemas and closely reflect the Data Warehouse Query Requirements. It can be seen from the definitions above that the data models are derived from the requirements and that they combination of the six documents act together to ensure completeness. This is highlighted in the diagram below: Figure 2 - The relationship between requirements and data models 17 A physical data model is a representation of a data design which takes into account the facilities and constraints of a given database management system. A complete physical data model will include all the database artefacts required to create relationships between tables or achieve performance goals. Wikipedia: http://en.wikipedia.org/wiki/Physical_data_model 18 A relational database schema that is used to represent multidimensional data. The data is stored in a central fact table, with one or more tables holding information on each dimension. Dimensions have levels and all levels are usually shown as columns in each dimension table. OLAP Report: http://www.olapreport.com/glossary.htm © 2006 Data Management & Warehousing Page 13
  • 14. White Paper - Data Warehouse Documentation Roadmap 5 Analysis The goal of the analysis phase is to identify the sources of the information required to populate the physical data models. The main goal should be to populate the repository data model as this is used as the source for all data in the data marts. This is achieved in a number of steps: 5.1 Source Systems Analysis (SSA) The source system analysis is a high-level analysis that gathers information about available systems. Each system is a potential candidate as a source system and generates its own analysis document. Some systems may be documented and then rejected e.g. because it is a secondary source and only contains information created in another system that can be used as the source. The document covers hardware, software, network connectivity, availability and functional areas (e.g. CRM system containing customer data etc.). 5.2 Data Profiling Data profiling is a process whereby an existing source system is examined in order to collect information and statistics about that data held. This allows sources for the data warehouse to be identified, validation of the metadata held about the system and an assessment of the data quality. There are many commercial tools available for this process; however, a simple set of SQL scripts will often prove adequate. The SQL scripts allow an experienced DBA with knowledge of the system to quickly write and iteratively explore the data. Using a tool formalises the process but often performs unnecessary analysis and requires significant additional infrastructure and tool specific knowledge, slowing the process down. Results from any profiling should be kept for future comparative analysis. 5.3 Source Entity Analysis (SEA) The source entity analysis is the detailed documentation of the sources selected because data profiling has validated these sources as being useful for the data warehouse. This includes detailed information about every table and column, including data types and data quality metrics. A source entity analysis will be produced for each system that is to be used as a source. © 2006 Data Management & Warehousing Page 14
  • 15. White Paper - Data Warehouse Documentation Roadmap 5.4 Target Orientated Analysis (TOA) The final document template of the analysis phase is the Target Orientated Analysis document that is used to describe which sources will be used to populate which target entities. This document is sometimes replaced by a ‘source to target’ 19 mapping document. This document is used by the developers in the design and 20 build of the ETL code. A target-orientated analysis will be developed for each subject area in the data warehouse. The output dependencies for the analysis can described as follows: Figure 3 - Analysis Dependencies 19 Data Management & Warehousing prefer target orientated analysis which asks the question ‘Which sources do I need in order to populate this target table completely?’ to the source to target mapping method which asks the question ‘Which target entities do I need to put data into from this source?’ This is because the thought process used in the first method is geared towards the delivery of the information to the user rather than the extraction of the data by the developer. 20 ETL: Extract, Transform, Load – code written to move data from the sources to the data warehouse © 2006 Data Management & Warehousing Page 15
  • 16. White Paper - Data Warehouse Documentation Roadmap 6 Design The design phase concentrates on taking the analysis and creating a plan for the code build. 6.1 ETL Execution Plan The ETL Execution Plan is a document that explains from the high level down to the low level how the ETL code will be put together. One of the most effective 21 ways of doing this is as a series of ‘directed graphs’. For example, there may be a diagram that represents the overall flow. In this case each point would represent a subject area. There are then a number of additional graphs, one for each subject area, that represent the detailed flows within a subject area. This drilling down is repeated until the lowest level ETL mappings are described or the required level of detail is documented. 6.2 Initial Capacity Plan The initial capacity plan describes the sizes of the databases and database objects required for the initial build. This should describe the sizing for a known period (e.g. for 1 year) and a number of environments (Development, Test, Production). A large amount of the information required for this can often be derived from the data-modelling tool used. 6.3 Coding Standards A document that describes the naming conventions for all objects that will be created, including but not limited to: database objects such as table and column names, ETL mapping names, script names etc. It will also describe and mandate or recommend any specific coding standards and/or algorithms. The diagram below describes the output dependencies: Figure 4 - Design Dependencies 21 This is a term taken from mathematics. Graph theory is the study of graphs, mathematical structures used to model relations between objects in a given collection. A "graph" in this context refers to a collection of mappings and their dependencies. The most famous graph theory problem is known as ‘The Seven Bridges of Konigsberg’ and was solved by Leonhard Euler. http://mathforum.org/isaac/problems/bridges1.html © 2006 Data Management & Warehousing Page 16
  • 17. White Paper - Data Warehouse Documentation Roadmap 7 Build This white paper is a roadmap to the documentation that should be produced during a data warehouse project. It should follow the structure of most projects but it is not a substitute for a project plan. 7.1 Code Repository A lot of the code will contain valuable documentation in the form of comments. It is also vital that the history of changes to code is recorded. Therefore, an important part of the documentation is the information held in the configuration 22 management tool. 7.2 Data Cleansing Integration The data profiling described above will have generated a number of rules that will have to be implemented in order to maintain data quality. These rules will have to 23 be stored and integrated into the ETL. If a data-cleansing tool is being used then these rules will be documented within the tool. Otherwise, the rules should be explicitly documented for future reference. 8 Test 24 Testing software is operating the software under controlled conditions , to 1. Verify that it behaves “as specified” Verification is the checking or testing of items, including software, for conformance and consistency by evaluating the results against pre- specified requirements. 2. To detect errors Testing should intentionally attempt to make things go wrong to determine if things happen when they should not or things do not happen when they should. 25 In this area it is important to test boundary conditions e.g. what happens with a percentage over 100% or less than 0%. 3. To validate that what has been specified is what the user actually wanted. Validation looks at the system correctness – i.e. is the process of checking that what has been specified is what the user actually wanted. 22 A list of configuration management tools can be found at: http://www.cmcrossroads.com/cgi-bin/cmwiki/view/CM/WebHome 23 A list of data quality tools can be found at: http://mediaproducts.gartner.com/reprints/dataflux/137738.html (March 2006) 24 Taken from: http://members.tripod.com/~bazman/ 25 A useful description of boundary values testing can be found at: http://www.geocities.com/xtremetesting/BoundaryValues.html © 2006 Data Management & Warehousing Page 17
  • 18. White Paper - Data Warehouse Documentation Roadmap Remember: The purpose of testing is verification, validation and error detection in order to find problems – and the purpose of finding those problems is to get them fixed. Each type of test document below should have inclusions and exclusions, test 26 cycles, expected results, entrance and exit criteria, etc. If a tool is used much of this can be automated. 8.1 Unit Testing These are tests that are designed to validate what an individual unit of development work (normally an ETL mapping, input screen or report) is functioning as expected. 8.2 System Testing These are tests that are designed to check that a suite of newly developed or changed units work correctly together in the expected manner. 8.3 Integration Testing These are tests that are designed to ensure that the suites of newly developed or changed units work with other suites that are already deployed on the system and do not damage the existing product environment. 8.4 Performance Testing The final set of tests is designed to ensure the performance of the system. A system that is recording 10,000 transactions a day will be inserting into an empty table on the first day of operation and a table of 3.5M records after a year, the performance characteristics of this work vary dramatically from one database to the next. These tests must therefore examine the short term and long-term performance impacts of any given change. 9 Implementation After the development and testing are over the system has to be deployed into production and left operating. Implementation is often neglected on project plans. It requires considerable thought and time to document procedures that will be used for many years to come. 9.1 Configuration Management Procedures The configuration management procedures should cover all aspects of the changes to the configuration from applying patches and new releases through to system software upgrades. 26 A list of testing tools can be found at: http://www.testingfaqs.org/ © 2006 Data Management & Warehousing Page 18
  • 19. White Paper - Data Warehouse Documentation Roadmap Historical Data Migration Plan When a data warehouse is deployed it is usual that some amount of historical data is required. There should be a plan that identifies what data is required, how far back in history it needs to go (one week, one year, etc.), how long this data will take to load, whether it will be loaded before or after go live and impacts on the day to day operation whilst it is loading etc. 9.2 Operations Guide The operations guide is intended for those with responsibility for looking after the system on a day-to-day basis. These will not be the developers who originally created the system and therefore a simple, clear guide as what needs to be done routinely, what needs to be checked regularly and the escalation procedures in case of exceptions and failures needs to be created. 9.3 Capacity Plan A document describing the Initial Capacity Plan will have already been produced. Various external factors will however change the capacity requirements (e.g. sudden growth in sales, mergers and acquisitions, new product lines, etc. or simply more people making use of the system or a new version of some software component) that can have a dramatic effect on the capacity of the system. As such, there should be regularly updated capacity plan that monitors the available disk, CPU and memory resources of the solution and ensures that sufficient resource is available and that any procurement of additional resource is done in line with the company budgetary cycle. 9.4 Service Level Agreements (SLA) 27 A Service Level Agreement is a formal negotiated agreement between two parties (i.e. the business users of the data warehouse and the IT department or the IT department and outsourced service providers). It documents the common understanding about services, priorities, responsibilities, guarantee, etc. with the main purpose to agree on the level of service. For example, it may specify the levels of availability, serviceability, performance, operation or other attributes of the service like billing and even penalties in the case of violation of the Service Level Agreement. A Service Level Agreement is generally business oriented and does not go into much technical detail. Its 28 technical specifications are commonly described through a series of appendices known as Service Level Specifications (or SLS) that define the technical metrics required. 9.5 Helpdesk Scripts The helpdesk will need to be able to handle support calls. This is normally done as a series of help desk scripts that provide the questions for the support desk operators to ask the user. The operator can then either give the resolution or ask subsequent questions (which are normally dependent on the result of previous questions). 27 Definition in part from Wikipedia: http://en.wikipedia.org/wiki/Service_Level_Agreement 28 Depending on size the Service Level Specifications may become separate documents. © 2006 Data Management & Warehousing Page 19
  • 20. White Paper - Data Warehouse Documentation Roadmap The helpdesk scripts are normally broken down into a number of categories such as: • Support with using the front-end tools. • Server and Operational Issues. • Data Quality Issues including availability and currency of data. • Ad hoc enquiries. 9.6 Training Plan 29 The project will need to provide a training plan . This is how users become competent enough to use the system. It normally consists of the steps: • Define the training goal: The overall results or capabilities the user should attain. • Set the learning objectives: What the user will be able to do as a result of the learning activities. • Learning methods and activities: What the user will do in order to achieve the learning objectives. • Documentation and evidence of learning: Produced by the user of the learning activities, these are the tangible results. • Evaluation: Assessment and judgment on quality of evidence in order to conclude whether the user has achieved the learning objectives or not. Training plans should be created for all types of users and operators of the system. 9.7 Operational Schedule The Operational Schedule is the list of tasks that must be performed each hour, day, week and month, etc. and any dependencies (e.g. must run after midnight, must only run if a previous job is successful etc.). It is not only the ETL code but also the backups and any maintenance windows that have to be included. These 30 are often implemented in Job Scheduling Tools that automate the process and send alerts if anything fails. 9.8 System Monitoring Plan The system monitoring plan is the list of system components that are going to be monitored, along with threshold at which warnings and errors are signalled. It should also include the way in which each message is communicated (e.g. audible or visible alert in a control room, SMS or text message, e-mail, etc.). 29 Adapted from: http://www.managementhelp.org/trng_dev/gen_plan.htm 30 See: http://www.jobschedulingtools.com/ © 2006 Data Management & Warehousing Page 20
  • 21. White Paper - Data Warehouse Documentation Roadmap Systems monitoring should also deal with “heartbeat” messages, i.e. messages that tell you that the monitoring is still working. Monitoring information should be retained so that it can be used to manage Service Level Agreements, provide information for Capacity Plans. The documents described in the implementation category interact as follows: Figure 5 - Implementation Flow 10 Project Management Up to now this document has described documents required for individual phases of the project. There are a number of tools and templates required for the effective 31 governance of the programme or project. Project management should have the minimal impact on the process of development whilst ensuring that control over resources, finances and scope is maintained. This category describes documents used to control or assist in the management of a project. 10.1 Documentation Roadmap The Documentation Roadmap is the document that describes all the documents that should be produced for each of the phases of a project. The document you are reading is an example of this document. 10.2 Project Plan The project plan is the list of tasks and activities with timescales, resources and dependencies that must be performed to deliver the solution. A project plan is base-lined and regularly updated throughout the life of the project. It is important that project plans have sufficient detail without trying to micro-mange tasks in the short-term whilst having larger objectives with less detailed activities for the longer-term aspects of the project. The plan is then updated as sufficient detail to plan later tasks becomes available. 31 White Paper – Data Warehouse Governance http://www.datamgmt.com/index.php?module=article&view=78 © 2006 Data Management & Warehousing Page 21
  • 22. White Paper - Data Warehouse Documentation Roadmap 10.3 ‘DRIVE’ Statements A drive statement is short one page template that helps a project manager assess whether a project, or work package should be undertaken. It looks at five aspects in order to make the assessment: • Dependencies: What is required before this work can start? • Risks & Issues: What can go wrong with doing this and how will it affect the overall business, this deliverable and/or other deliverables? • Imperative: Why do we have to do this? What makes it so important? • Value: What value will the business, team or overall project get from doing this? • Exploitation: Once we have this solution how will be able to take advantage of it? 10.4 ‘SWOT’ Analysis 32 A SWOT analysis is often used in data warehouse projects as a way of comparing different approaches to a problem. It does this by looking at the following attributes of each approach: • Strengths • Weaknesses • Opportunities • Threats 10.5 ‘MoSCoW’ Analysis 33 The MoSCoW analysis is a method of prioritising a list of requirements of features of the system by breaking the list down into the following groups: • Must have in order to meet a minimum requirement. • Should have in order to get real value from the development. • Could have if there was available time or resources. • Would have if there were no limits on the development. 32 Further information at: http://en.wikipedia.org/wiki/SWOT_analysis 33 Further information at: http://en.wikipedia.org/wiki/MoSCoW_Method © 2006 Data Management & Warehousing Page 22
  • 23. White Paper - Data Warehouse Documentation Roadmap 10.6 Change Requests (CR) The change request is a critical component of any project and is vital to data warehouse projects. At the outset of this document the requirements gathering process was discussed, however during the lifecycle of the project the requirements (and other aspects of the project) will change. The change request is the template that documents a change from the original requirement to what is now required. Change requests can be accepted or rejected as appropriate and should be encouraged as a way to prevent uncontrolled and un-scoped development from occurring. 10.7 Risk Register The risk register is a list of events that may happen. If the event occurs then it will have some negative impact on the project in terms of cost, resource or time. This is contrasted with an issue that is something that has happened and therefore needs to be managed. A risk can be described in two dimensions: • The first is the probability of it happening which is a measure of how likely it is that a risk will become an issue. • The second is the impact, a measure of the cost it will have in terms of resource, time or scope. The ‘hotter’ the risk the more attention should be paid to it. Figure 6 - Risk Assessment 10.8 Issue Log (BUG) The issue log is the active management of issues that have arisen. This is best 34 managed with an issue-tracking tool that supports the allocation of work to resources and tracks the history of actions taken in response to an issue. Each 35 issue has a lifecycle that starts with its being reported and ends in resolution. 34 A list of issue tracking tools can be found at: http://www.testingfaqs.org/ 35 Bugzilla has one of the best descriptions of the lifecycle of an issue. This can be found at: http://www.bugzilla.org/docs/3.0/html/lifecycle.html and is reproduced in Appendix 2 of this document © 2006 Data Management & Warehousing Page 23
  • 24. White Paper - Data Warehouse Documentation Roadmap 10.9 Key Design Decisions (KDD) The key design decision is a template to record significant design decisions. It records the issue, the chosen option, any rejected options and rationale behind the decision. Examples of when to use a Key Design Decision might include the choice of a specific tool for a specific function, the choice of data model style, etc. This is important for long-term programmes and projects as some decisions are questioned when reviewed at a later stage. This is often done without context and justification for how the original decision was made and sometimes without the original decision makers being available. The template helps project managers from constantly returning to resolved issues. The document should contain the justification for the decision and any rejected opposing arguments. 11 Miscellaneous The final category of this document describes some general-purpose documents that a project will find useful. The direct impact of the data warehouse project will not always be visible to business users, who may see it as a large budget line with little benefit. It is therefore important to market the data warehouse to the wider business audience. Business users should understand how and when they are getting information from the data warehouse rather than other sources and see the impact of data quality initiatives, etc. Therefore, Data Management & Warehousing recommend that all documents use a consistent set of templates and that where a set of templates are used they are branded with the name of the project rather than any third party that is contributing to the project. 11.1 General Purpose Document A standard look and feel document with the required categories for any project document required. 11.2 General Purpose Presentation This document is a presentation with a standard look and feel. This should be used for all presentations that are given either inside or outside the team. 11.3 Meeting Agenda A standard agenda template for meetings. 11.4 Memo This document provides a standard memo format for anyone who is recording formal aspects of the project outside the documentation roadmap. © 2006 Data Management & Warehousing Page 24
  • 25. White Paper - Data Warehouse Documentation Roadmap Summary Many data warehousing projects are both long running and poorly documented. This does not mean that there is not a lot of documentation, just a lack of the right documentation in the right place. It is the quality and availability of the documentation that leads to an understanding of what is available and hence to the value and reputation of the data warehouse itself. This white paper has looked at a consistent set of documents developed over fifteen years of project experience. It reflects a desire to develop the right amount of documentation at the right time in the project lifecycle and stored in the right place. Doing so means moving some documents held on project shared drives to web based media and publishing documentation to a wider audience, whilst replacing some documents with online tools. It is essential to the success of a data warehouse project that a culture of open access is fostered and that the documentation is seen as the entry point to the data warehouse. Data Management & Warehousing has identified three aspects to essential documentation: • A roadmap that describes what documentation is required and how it fits together. • Team members within the project to use the templates, create quality documents and store them to the project repositories. • Easy access for people outside the project team to the documentation including publication or notification of changes, updates and new releases. This white paper has provided the documentation roadmap with both explanations and a significant number of examples. It has also looked at some of the issues associated with the distribution of information outside the project team. It has highlighted that the processes and procedures required to create and store the information in the first place are a matter of good project governance. Data Management & Warehousing believe that the documents described here cover all the areas necessary for a major programme of work. However, this is only a guide and a set of templates and these should be adjusted to meet the needs of the programme. By combining the documentation roadmap, the project plan and suitable governance a project will have developed a strong foundation for the real work of developing a successful data warehouse. © 2006 Data Management & Warehousing Page 25
  • 26. White Paper - Data Warehouse Documentation Roadmap Appendices Appendix 1 – Lifecycle of a bug 36 The lifecycle of a bug (or issue) is taken from the Bugzilla documentation 36 The original can be found at: http://www.bugzilla.org/docs/3.0/html/lifecycle.html © 2006 Data Management & Warehousing Page 26
  • 27. White Paper - Data Warehouse Documentation Roadmap Appendix 2 – Project Quick Start Infrastructure 37 Readers of this document and some of our other white papers will be concerned about just how much effort is required to get a data warehouse project under way. Data Management & Warehousing do not recommend products or individual vendors unless specifically requested to do so for a particular project or need. However, there is a need to suggest a basic infrastructure for organisations that do not have anything in place. The basic infrastructure is neither exhaustive nor exclusive but a guideline configuration. It is very cheap and sufficient to support a very large organisation for a several years: 38 • A small server For example a dual core 2GHz CPU, 2Gb memory, two (for mirroring) 200Gb disks. • Network connectivity Normally two network cards, one for the LAN and the other for the internet, IP Addresses and server names. • Remote backup capability For example a location on the SAN where files can be backed up to every day or more frequently if required. • Linux A version such Redhat, SuSe, CentOS, Debian, etc. • Apache Web Server To provide all web services. • Bugzilla To provide issue tracking. • Samba To provide a Microsoft compatible shared file system. • CVS To provide the source code control. • CVSWeb To provide a web interface to CVS. • Perl Pre-requisite language for Bugzilla. • CPAN Bundle::Bugzilla Pre-requisite language modules for Bugzilla. 39 • A Content Management System (CMS) package Data Management & Warehousing use phpWebsite but any will do. 37 Overview Architecture for Enterprise Data Warehouses and Data Warehouse Governance are both available from http://www.datamgmt.com. 38 Many organisations will have a server being de-commissioned from some other project that could be re-used for this it does not have to be very powerful 39 List available from http://www.cmsmatrix.org/ © 2006 Data Management & Warehousing Page 27
  • 28. White Paper - Data Warehouse Documentation Roadmap • A Wiki Data Management & Warehousing use the one included with phpWebsite but any will do. • PHP Pre-requisite language for phpWebsite. • MySQL Pre-requisite database for Bugzilla and phpWebsite. From a technical point of view this server can be built and the software downloaded, installed, configured, secured and put on the internet and shared onto the LAN very quickly. Normally an experienced Linux Systems Administrator could configure a virtually maintenance free solution within a couple of days and as the software is all free the only costs incurred will be for time and hardware. This is also the basic configuration of the Data Management & Warehousing website. In addition it is recommended that the following desktop software be provided: • Office product e.g. Microsoft Office, Star Office, etc. • CVS Client Data Management & Warehousing normally use WinCVS. This list is for a server for the project governance and documentation of a data warehouse project and does not include the development, test and production environments from the data warehouse itself. Its simplicity, fast setup and low cost is a demonstration of low impact governance of a data warehouse project. References The section below represents some useful resources for those considering building a data warehouse solution. Web resources Organisation Website Data Management & Warehousing http://www.datamgmt.com Configuration Management Wiki http://www.cmcrossroads.com/ Data Quality Tools http://mediaproducts.gartner.com/ Software Testing Tools http://www.testingfaqs.org/ Job Scheduling Tools http://www.jobschedulingtools.com/ Data Modelling Tools http://www.databaseanswers.com/ Project Management Tools http://www.startwright.com/ CMS Tools http://www.cmsmatrix.org/ Bugzilla http://www.bugzilla.org Copyright © 2007 Data Management & Warehousing. All rights reserved. Reproduction not permitted without written authorisation. References to other companies and their products use trademarks owned by the respective companies and are for reference purposes only. © 2006 Data Management & Warehousing Page 28