SlideShare una empresa de Scribd logo
1 de 75
Descargar para leer sin conexión
Data Quality Engineering
           TITLE




             This presentation provides guidance to
             organizations considering data quality initiatives
             or preparing for data quality initiatives. This talk
             will illustrate how organizations with chronic
             business challenges often can trace the root of
             the problem to poor data quality. Showing how
             data quality can be engineered provides a
             useful framework in which to develop an
             organizational approach. This in turn will allow
             organizations to more quickly identify data
             problems caused by structural issues versus
             practice-oriented defects. Participants will also
                                                                                                  Starting

             learn the importance of practicing data quality                                      point
                                                                                                  for new
                                                                                                  system
                                                                                                                   Metadata Creation
                                                                                                                   • Define Data Architecture
                                                                                                                   • Define Data Model Structures
                                                                                                                                                                 Metadata Refinement
                                                                                                                                                                 • Correct Structural Defects
                                                                                                                                                                 • Update Implementation

             engineering quantification.
                                                                                                  development


                                                                                                                                                           architecture
                                                                                                                                  data architecture
                                                                                                                                                           refinements

                                                                                                   Metadata Structuring                                                             Data Refinement
                                                                                                   • Implement Data Model Views                                                     • Correct Data Value Defects
                                                                                                   • Populate Data Model Views                                          corrected   • Re-store Data Values
                                                                                                                                                                          data
                                                                                                                          data


               Date: October 9, 2012                                                                Data Creation
                                                                                                                    architecture and
                                                                                                                      data models


                                                                                                                                 facts &
                                                                                                                                                       Metadata &
                                                                                                                                                      Data Storage
                                                                                                                                                                      data performance metadata
                                                                                                                                                                                        Data Assessment
                                                                                                                                meanings


               Time: 2:00 PM ET
                                                                                                    • Create Data                                                                       • Assess Data Values
                                                                                                    • Verify Data Values                                                                • Assess Metadata

                                                                                                                                  shared data                        updated data
                                                                                                                                                                                                  Starting point
                                                                                                                                                                                                  for existing


               Presented by: Dr. Peter Aiken
                                                                                                                           Data Utilization                                Data Manipulation      systems
                                                                                                                           • Inspect Data                                  • Manipulate Data
                                                                                                                           • Present Data                                  • Updata Data




           PRODUCED	
  BY                                                                                              CLASSIFICATION                           DATE                            SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                    EDUCATION                                 10/09/12                                    1
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
Get Social With Us!
           TITLE




                        Live Twitter Feed                                                         Like Us on Facebook               Join the Group
                         Join the conversation!                                                       www.facebook.com/            Data Management &
                                         Follow us:                                                     datablueprint              Business Intelligence
                                @datablueprint                                                        Post questions and         Ask questions, gain insights
                                                                                                          comments               and collaborate with fellow
                                          @paiken
                                                                                                  Find industry news, insightful     data management
                    Ask questions and submit
                                                                                                             content                   professionals
                    your comments: #dataed
                                                                                                      and event updates.


           PRODUCED	
  BY                                                                                                        CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                              EDUCATION        10/09/12           2
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
Meet Your Presenter: Dr. Peter Aiken

  • Internationally recognized thought-
    leader in the data management
    field - 30 years of experience
                  – Recipient of multiple international
                    awards
                  – Founder, Data Blueprint
                    (http://datablueprint.com)
  • 7 books and dozens of articles
  • Experienced w/ 500+ data
    management practices in 20
    countries
  • Multi-year immersions with
    organizations as diverse as the
    US DoD, Deutsche Bank, Nokia,
    Wells Fargo, the Commonwealth
    of Virginia and Walmart
3 - datablueprint.com                      10/11/2012   ©   Copyright this and previous years by Data Blueprint - all rights reserved!
Data Quality
                                                                       Engineering




             Data Quality Engineering
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060   EDUCATION   10/09/12
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           5
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
           The DAMA Guide to the Data Management Body of Knowledge
           Published by DAMA
           International
           •        The professional
                    association for Data
                    Managers (40
                    chapters worldwide)
           DMBoK organized
           around
           •        Primary data
                    management
                    functions focused
                    around data delivery
                    to the organization
           •        Organized around
                    several
                    environmental
                    elements

                                                                           Data
                                                                        Management
                                                                         Functions
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           6
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
           The DAMA Guide to the Data Management Body of Knowledge

                                                                                                               Amazon:
                                                                                                                http://
                                                                                                                www.amazon.com/
                                                                                                                DAMA-Guide-
                                                                                                                Management-
                                                                                                                Knowledge-DAMA-
                                                                                                                DMBOK/dp/
                                                                                                                0977140083
                                                                                                                Or enter the terms
                                                                                                                "dama dm bok" at the
                                                                                                                Amazon search
                                                                                                                engine




                                                                                                  Environmental Elements
           PRODUCED	
  BY                                                                              CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                    EDUCATION        10/09/12           7
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                     What is the CDMP?
            • Certified Data Management
              Professional
            • DAMA International and ICCP
            • Membership in a distinct group made
              up of your fellow professionals
            • Recognition for your specialized
              knowledge in a choice of 17 specialty
              areas
            • Series of 3 exams
            • For more information, please visit:
                         – http://www.dama.org/i4a/pages/
                           index.cfm?pageid=3399
                         – http://iccp.org/certification/
                           designations/cdmp
                                                                                                       #dataed
         PRODUCED BY                                                           CLASSIFICATION   DATE        SLIDE
         DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060              EDUCATION        5/15/2012           8
© Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                                                                                     Data Management




       PRODUCED	
  BY                                                                              CLASSIFICATION   DATE       SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                    EDUCATION        10/09/12           9
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                                                                                     Data Management
                                            Manage data coherently.

                   Data Program
                   Coordination
                                                                                                                 Share data across boundaries.
                                                                       Organizational
                                                                       Data Integration



                                                                                              Data Stewardship                      Data Development



            Assign responsibilities for data.
                                                                                                                    Engineer data delivery systems.


                                                                                                                   Data Support
                                                                                                                    Operations

                                       Maintain data availability.



       PRODUCED	
  BY                                                                                                             CLASSIFICATION   DATE       SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                   EDUCATION        10/09/12           10
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                                                                                         Data Management




           PRODUCED	
  BY                                                                              CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                    EDUCATION        10/09/12           11
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                                       Overview: Data Quality Engineering




                                                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             12
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                                       Overview: Data Quality Engineering




                                                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             13
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           14
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE

                      Definitions
            Data Quality Management
            • Planning, implementation and control activities that
              apply quality management techniques to measure,
              assess, improve, and ensure the fitness of data for
              use
            • Entails the establishment and deployment of roles,
              responsibilities concerning the acquisition,
              maintenance, dissemination, and disposition of
              data.” http://www2.sas.com/proceedings/sugi29/098-29.pdf
            • Critical support process in organizational change management
            • Continuous process for defining the parameters for specifying
              acceptable levels of data quality to meet business needs and for
              ensuring that data quality meets these levels
            Data Quality
            • Synonymous with information quality, since poor data quality results
              in inaccurate information and poor business performance
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/2012
                                                                                                                                                           10/09/12             15
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Overview: DQM Concepts and Activities
            1)              Data Quality Management Approach
            2)              Develop and promote data quality awareness
            3)              Define data quality requirements
            4)              Profile, analyze and assess data quality
            5)              Define data quality metrics
            6)              Define data quality business rules
            7)              Test and validate data quality requirements
            8)              Set and evaluate data quality service levels
            9)              Measure and monitor data quality
            10)             Manage data quality issues
            11)             Clean and correct data quality defects
            12)             Design and implement operational DQM procedures
            13)             Monitor operational DQM procedures and performance
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             16
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Concepts and Activities
                      Data quality expectations provide the inputs
                      necessary to define the data quality framework:
                         – Requirements
                         – Inspection policies
                         – Measures, and monitors that reflect changes in data
                           quality and performance
            • The data quality framework requirements reflect 3
              aspects of business data expectations
                         1) A manner to record the expectation in business rules
                         2) A way to measure the quality of data within that
                            dimension
                         3) An acceptability threshold
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             17
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           18
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      The DQM Cycle
            The general approach to DQM
            is a version of the Deming
            cycle.
            Deming proposes a problem–solving
            model known as “plan-do-study-act”
            or “plan-do-check-act”

            The cycle begins by:
             1) Identifying data issues that are
                critical to the achievement of
                business objectives

              2) Defining business requirements for data quality
              3) Identifying key data quality dimensions
              4) Defining business rules critical to ensuring high quality data
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             19
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      The DQM Cycle: (1) Plan
            Plan for the assessment of
            the current state and
            identification of key metrics
            for measuring quality
            • The data quality team
              assesses the scope of
              known issues
            • This involves:
                          – Determining cost and
                            impact
                          – Evaluating alternatives for
                            addressing them
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             20
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      The DQM Cycle: (2) Deploy
            Deploy processes for
            measuring and improving
            the quality of data:
            • Data profiling
            • Institute inspections and
              monitors to identify data issues
              when they occur
            • Fix flawed processes that are
              the root cause of data errors or
              correct errors downstream
            • When it is not possible to
              correct errors at their source,
              correct them at their earliest
              point in the data flow
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             21
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      The DQM Cycle: (3) Monitor
            Monitor the quality of data as
            measured against the defined
            business rules
            • If data quality meets defined
              thresholds for acceptability,
              the processes are in control
              and the level of data quality
              meets the business
              requirements
            • If data quality falls below
              acceptability thresholds,
              notify data stewards so they
              can take action during the
              next stage
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             22
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      The DQM Cycle: (4) Act
            Act to resolve any
              identified issues to
              improve data quality and
              better meet business
              expectations
            • New cycles begin as
              new data sets come
              under investigation or as
              new data quality
              requirements are
              identified for existing
              data sets
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             23
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           24
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Develop and Promote DQ Awareness
            •          Promoting data quality awareness is
                       essential to ensure buy-in of necessary
                       stakeholders in the organization
            •          Ensure that the right people in the
                       organization are aware of the existence
                       of data quality issues
            •          Awareness increases the chance of
                       success of any DQM program
            •          Awareness includes:
                          – Relating material impacts to data issues
                          – Ensuring systematic approaches to
                            regulators
                          – Oversight of the quality of organizational
                            data
                          – Socializing the concept that data quality
                            problems cannot be solely addressed by
                            technology solutions
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             25
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Polling Question #1
            Which is not a step to promote data quality
            awareness?
                        a) Training	
  on	
  the	
  core	
  concepts	
  of	
  
                           data	
  quality

                        b) Establish	
  data	
  governance	
  
                           framework	
  for	
  data	
  quality

                        c) Create	
  a	
  data	
  architecture	
  map




           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           26
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Develop and Promote DQ Awareness: Steps
            1) Training on the core
               concepts of data quality
            2) Establish data governance
               framework for data quality
            3) Create a data quality
               oversight board that has a
               reporting hierarchy
               associated with the
               different data governance
               roles
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             27
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Define DQ Requirements
            • Data quality must be understood within the context of ‘fitness for
              use’
            • Data quality requirements are often hidden within defined
              business policies
            • Incremental detailed review and iterative refinement of business
              policies helps to identify those information requirements which
              become data quality rules
            • Steps for incremental detailed review:
                          – Identify key data components associated with business policies
                          – Determine how identified data assertions affect the business
                          – Evaluate how data errors are categorized within a set of data quality
                            dimensions
                          – Specify the business rules that measure the occurrence of data
                            errors
                          – Provide a means for implementing measurement processes that
                            assess conformance to those business rules
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION  DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12           28
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Data Quality Dimensions




                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             29
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Profile, Analyze and Assess DQ
            Data assessment using 2 different approaches:
                          1) Bottom-up
                          2) Top-down
            Bottom-up assessment:
                                        • Inspection and evaluation of the data sets
                                        • Highlight potential issues based on the results of automated
                                          processes


            Top-down assessment:
                                        • Engage business users to document their business processes
                                          and the corresponding critical data dependencies
                                        • Understand how their processes consume data and which
                                          data elements are critical to the success of the business
                                          application
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             30
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE

                      Define DQ Metrics
            • Metrics development occurs as part of the
              strategy/design/plan step
            • Process for defining data quality metrics:
                          1) Select one of the identified critical business impacts
                          2) Evaluate the dependent data elements, create and
                             update processes associate with that business
                             impact
                          3) List any associated data requirements
                          4) Specify the associated dimension of data quality and
                             one or more business rules to use to determine
                             conformance of the data to expectations
                          5) Describe the process for measuring conformance
                          6) Specify an acceptability threshold
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             31
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Test and Validate DQ Requirements
            • Data profiling tools
              analyze data to find
              potential anomalies

            • Use the same tools
              for rule validation

            • Rules discovered or defined during the
              data quality assessment phase are
              referenced in measuring conformance as
              part of the operational process
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             32
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Set and Evaluate DQ Service Levels
            • Data quality inspection and monitoring are used to
              measure and monitor compliance with defined data
              quality rules
            • Data quality SLAs specify the organization’s expectations
              for response and remediation
            • Operational data quality control defined in data quality
              SLAs includes:
                          – Data elements covered by the agreement
                          – Business impacts associated with data flaws
                          – Data quality dimensions associated with each data element
                          – Quality expectations for each data element of the indentified
                            dimensions in each application for system in the value chain
                          – Methods for measuring against those expectations
                          – (…)                                                        from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             33
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Measure and Monitor DQ
            • DQM procedures depend on available data
              quality measuring and monitoring services
            • 2 contexts for control/measurement of
              conformance to data quality business rules exist:
                          – In-stream: collect in-stream measurements while
                            creating data
                          – In batch: perform batch activities on collections of data
                            instances assembled in a data set
            • Apply measurements at 3 levels of granularity:
                          – Data element value
                          – Data instance or record
                          – Data set                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             34
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
Clean & Correct
                Manage DQ Issues
                                                                                                                           DQ Defects
            •       Supporting the enforcement of                                                                     Perform data correction
                    the data quality SLA requires a
                    mechanism for reporting and                                                                       in 3 ways:
                    tracking data quality incidents                                                                   1)      Automated correction
                    and activities for researching                                                                    2)      Manual directed correction
                    and resolving those incidents                                                                     3)      Manual correction
            •       A data quality incident
                    reporting system can provide
                    this capability

            •       It can log the evaluation, initial
                    diagnosis, and actions
                    associated with data quality
                    events
                                                                                    from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                              CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                    EDUCATION        10/09/12             35
10/04/12        © Copyright this and previous years by Data Blueprint - all rights reserved!
Manage DQ Issues: Example
           TITLE




             Data quality incident tracking focuses on training staff to recognize
             when data issues appear and how they are to be classified, logged and
             tracked according to the data quality SLA
                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             36
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
Design and Implement                                                                              Monitor Operational
                       Operational DQM                                                                                DQM Procedures and
                          Procedures                                                                                       Performances
            1)       Inspection and monitoring                                                                      1) Accountability is critical
            2)       Diagnosis and evaluation                                                                          to governance
                     of remediation                                                                                    protocols overseeing
                     alternatives                                                                                      data quality control
            3)       Resolve issues                                                                                 2) All issues must be
            4)       Reporting                                                                                         assigned
                                                                                                                    3) The tracking process
                                                                                                                       should specify and
                                                                                                                       document the ultimate
                                                                                                                       issue accountability


                                                                                    from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                              CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                    EDUCATION        10/09/12             37
10/04/12        © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           38
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Example: Data Quality Interview Session Summary
               • During mid-February, the Data Governance Team and Data
                 Blueprint conducted ten qualitative interview sessions with groups
                 of individuals who interact with data on regular basis
               • A series of patterns emerged as participants shared stories about
                 the impact of poor data quality on the client, its products, and its
                 customers
               • These patterns highlight gaps in best
                 practices for ensuring data quality,
                 i.e. the extent to which data is
                 “fit for use”
               • Our preliminary analysis evaluated
                 these stories against attributes of four
                 data quality dimensions
               • At this early stage of the post-interview
                 process, we are seeking confirmation of
                 our assumptions and method
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           39
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Which Activities Support Quality Data?
               • Data quality best practices depend on both
                 – Practice-oriented activities
                 – Structure-oriented activities



                                                                                                  Quality
                        Practice-oriented                                                          Data     Structure-oriented
                        activities focus on                                                                 activities focus on
                        the capture and                                                                     the data
                        manipulation of data                                                                implementation




           PRODUCED	
  BY                                                                                      CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                            EDUCATION        10/09/12           40
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Quality Dimensions
            Practice-oriented causes
            • Stem from a failure to rigor when
              capturing and manipulating data
              such as:
                          – Edit masking
                          – Range checking of input data
                          – CRC-checking of transmitted data
            Structure-oriented causes
            • Occur because of data and metadata that has been arranged
              imperfectly. For example:
                          – When the data is in the system but we just can't access it;
                          – When a correct data value is provided as the wrong response to
                            a query; or
                          – When data is not provided because it is unavailable or
                            inaccessible to the customer
            • Developer focus within system boundaries instead of within
              organization boundaries
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           41
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Practice-Oriented Activities
             • Affect the Data Value Quality and Data Representation
               Quality
             • Examples of improper practice-oriented activities:
                – Allowing imprecise or incorrect data to be collected when
                  requirements specify otherwise
                – Presenting data out of sequence
             • Typically diagnosed in bottom-up manner: find and fix the
               resulting problem
             • Addressed by imposing more rigorous data-handling
               governance

                                                                                       Practice-oriented activities
                                                                                 Quality	
  of	
  Data	
     Quality	
  of	
  Data	
  
                                                                                    Values                   Representa2on


           PRODUCED	
  BY                                                                                                    CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                          EDUCATION        10/09/12           42
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Structure-Oriented Activities
             • Affect the Data Model Quality and Data Architecture Quality
             • Examples of improper structure-oriented activities:
                – Providing a correct response but incomplete data to a
                  query because the user did not comprehend the system
                  data structure
                – Costly maintenance of inconsistent data used by
                  redundant systems
             • Typically diagnosed in top-down manner: root cause fixes
             • Addressed through fundamental data structure governance


                                                                                Structure-oriented activities

                                                                                 Quality	
  of	
  	
        Quality	
  of	
  
                                                                                Data	
  Models           Data	
  Architecture


           PRODUCED	
  BY                                                                                               CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                     EDUCATION        10/09/12           43
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      4 Dimensions of Data Quality
                                     An organization’s overall data quality is a function of four
                                       distinct components, each with its own attributes:
                                     • Data Value: the quality of data as stored & maintained in
                                       the system
           Practice-
           oriented



                                     • Data Representation – the quality of representation for
                                       stored values; perfect data values stored in a system that
                                       are inappropriately represented can be harmful
                                     • Data Model – the quality of data logically representing
                                       user requirements related to data entities, associated
                                       attributes, and their relationships; essential for effective
           Structure-­‐




                                       communication among data suppliers and consumers
            oriented




                                     • Data Architecture – the coordination of data
                                       management activities in cross-functional system
                                       development and operations

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/2012
                                                                                                                   10/09/12             44
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Effective Data Quality Engineering
               • Data quality engineering has been focused on operational
                 problem correction
                 – Directing attention to practice-oriented data imperfections
               • Data quality engineering is more effective when also focused
                 on structure-oriented causes
                 – Ensuring the quality of shared data across system
                    boundaries
              (closer	
  to	
  the	
  user)                                                                                               (closer	
  to	
  the	
  architect)

             Data	
                                                           Data	
  Value	
  
                                                                                                           Data	
  Model	
                   Data	
  Architecture	
  
             Representa9on	
                                                  Quality
                                                                                                           Quality                           Quality
             Quality
                                                                                                                                             As	
  an	
  
                                                                                                           As	
  understood	
  by	
          organiza9onal	
  
             As	
  presented	
  to	
                                          As	
  maintained	
  in	
  
                                                                                                           developers                        asset
             the	
  user                                                      the	
  system

           PRODUCED	
  BY                                                                                                       CLASSIFICATION    DATE             SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                             EDUCATION         10/09/12                 45
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                                          Full Set of Data Quality Attributes




           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           46
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Data Value Quality




           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           47
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Data Representation Quality




           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           48
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Data Model Quality




           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           49
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Data Architecture Quality




           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           50
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
             Extended data life cycle model with metadata sources and uses
               Starting
               point                                                                                               Metadata Refinement
                                                         Metadata Creation
               for new                                   • Define Data Architecture                                • Correct Structural Defects
               system                                                                                              • Update Implementation
                                                         • Define Data Model Structures
               development


                                                                                                             architecture
                                                                                    data architecture
                                                                                                             refinements

                     Metadata Structuring                                                                                              Data Refinement
                     • Implement Data Model Views                                                                                      • Correct Data Value Defects
                     • Populate Data Model Views                                                                          corrected    • Re-store Data Values
                                                                                                                            data
                                                                 data
                                                           architecture and                              Metadata &
                                                             data models                                Data Storage
                                                                                                                        data performance metadata
                       Data Creation                                            facts &                                                   Data Assessment
                       • Create Data                                           meanings                                                   • Assess Data Values
                       • Verify Data Values                                                                                               • Assess Metadata

                                                                                    shared data                        updated data
                                                                                                                                                        Starting point
                                                                                                                                                        for existing
                                                                    Data Utilization                                         Data Manipulation          systems
                                                                    • Inspect Data                                           • Manipulate Data
                                                                    • Present Data                                           • Updata Data


           PRODUCED	
  BY                                                                                                             CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                   EDUCATION        10/09/12           51
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                  Data Quality Engineering


                                                  ü                       ü                 ü                ü               ü              ü               ü
                                                  ü                       ü                 ü                ü               ü              ü               ü
                                                  ü                       ü                 ü                ü               ü              ü               ü
                                                  ü                       ü                 ü                ü               ü              ü               ü
                                                  ü                       ü                 ü                ü               ü              ü               ü
                                                  ü                       ü                 ü                ü               ü              ü               ü

                                                  ü                       ü                 ü                ü               ü              ü               ü

                                                  ü                       ü                 ü                ü               ü              ü               ü
                                                  ü                       ü                 ü                ü               ü              ü               ü

                                                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             52
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
Goals and Principles
       TITLE




            § To measurably improve the
               quality of data in relation to
               defined business expectations
            § To define requirements and
               specifications for integrating
               data quality control into the
               system development life cycle
            § To provide defined processes for measuring,
               monitoring, and reporting conformance to
               acceptable levels of data quality
                                                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             53
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                   Activities
            •       Develop and Promote Data Quality Awareness
            •       Set and Evaluate Data Quality Service Levels
            •       Test and Validate Data Quality Requirements
            •       Profile, Analyze, and Assess Data Quality
            •       Continuously Measure and Monitor Data Quality
            •       Monitor Operational DQM Procedures and Performance
            •       Define Data Quality Business Rules
            •       Define Data Quality Metrics
            •       Manage Data Quality Issues
            •       Clean and Correct Data Quality Defects
            •       Define Data Quality Requirements
            •       Design and Implement Operational DQM Procedures
                                                                                    from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                  CLASSIFICATION   DATE         SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                        EDUCATION        10/09/12             54
1/26/2010
10/04/12        © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                  Primary Deliverables
            • Improved Quality Data
            • Data Management
              Operational Analysis
            • Data profiles
            • Data Quality Certification
              Reports
            • Data Quality Service
              Level Agreements
                                                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             55
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE

                  Roles and Responsibilities
            Suppliers:
                     § External Sources
                     § Regulatory Bodies
                     § Business Subject Matter
                        Experts
                     § Information Consumers
                     § Data Producers
                     § Data Architects
                     § Data Modelers
                     § Data Stewards
            Participants:                                                                                                   Consumers:
                     §       Data Quality Analysts                                                                               § Data Stewards
                     §       Data Analysts                                                                                       § Data Professionals
                     §       Database Administrators                                                                             § Other IT Professionals
                     §       Data Stewards                                                                                       § Knowledge Workers
                     §       Other Data Professionals                                                                            § Managers and
                     §       DRM Director                                                                                           Executives
                     §       Data Stewardship Council                                                                            § Customers
                                                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                  CLASSIFICATION   DATE        SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                        EDUCATION        10/09/12            56
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Polling Question #2
            What is one guiding principle for data quality?
                          a. Business	
  process	
  owners	
  will	
  
                             agree	
  to	
  and	
  abide	
  by	
  data	
  
                             quality	
  SLAs

                          a. IdenDfy	
  a	
  blue	
  record	
  for	
  all	
  
                             data	
  elements

                          a. Upstream	
  data	
  consumers	
  
                             specific	
  data	
  quality	
  
                             expectaDons


           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           57
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           58
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Technology
            •          Data Profiling Tools
            •          Statistical Analysis Tools
            •          Data Cleansing Tools
            •          Data Integration Tools
            •          Issue and Event Management Tools




                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             59
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Overview: Data Quality Tools
            4 categories of                                                                                    Principal tools:
            activities:                                                                                               1) Data Profiling
                        1)              Analysis                                                                      2) Parsing and
                        2)              Cleansing                                                                        Standardization
                        3)              Enhancement                                                                   3) Data Transformation
                        4)              Monitoring                                                                    4) Identity Resolution and
                                                                                                                         Matching
                                                                                                                      5) Enhancement
                                                                                                                      6) Reporting




                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             60
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      DQ Tool #1: Data Profiling
            •          Data profiling is the assessment of
                       value distribution and clustering of
                       values into domains
            •          Need to be able to distinguish
                       between good and bad data before
                       making any improvements
            •          Data profiling is a set of algorithms
                       for 2 purposes:
                          – Statistical analysis and assessment of the data quality values within a
                            data set
                          – Exploring relationships that exist between value collections within and
                            across data sets
            •          At its most advanced, data profiling takes a series of prescribed rules
                       from data quality engines. It then assesses the data, annotates and
                       tracks violations to determine if they comprise new or inferred data
                       quality rules
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           61
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      DQ Tool #1: Data Profiling, cont’d
            • Data profiling vs. data quality-business context and
              semantic/logical layers
                          – Data quality is concerned with proscriptive rules
                          – Data profiling looks for patterns when rules are adhered to and
                            when rules are violated; able to provide input into the business
                            context layer
            • Incumbent that data profiling services notify all concerned
              parties of whatever is discovered
            • Profiling can be used to…
                          – …notify the help desk that valid
                            changes in the data are about to
                            case an avalanche of “skeptical
                            user” calls
                          – …notify business analysts of
                            precisely where they should be
                            working today in terms of shifts
                            in the data
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           62
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      DQ Tool #2: Parsing & Standardization
            • Data parsing tools enable the definition
              of patterns that feed into a rules engine
              used to distinguish between valid
              and invalid data values
            • Actions are triggered upon matching
              a specific pattern
            • When an invalid pattern is recognized,
              the application may attempt to
              transform the invalid value into one that meets expectations
            • Data standardization is the process of conforming to a set of
              business rules and formats that are set up by data stewards
              and administrators
            • Data standardization example:
                          – Brining all the different formats of “street” into a single format, e.g.
                            “STR”, “ST.”, “STRT”, “STREET”, etc.
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           63
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      DQ Tool #3: Data Transformation
            • Upon identification of data
              errors, trigger data rules to
              transform the flawed data
            • Perform standardization
              and guide rule-based
              transformations by
              mapping data values in
              their original formats and
              patterns into a target
              representation
            • Parsed components of a
              pattern are subjected to
              rearrangement,
              corrections, or any
              changes as directed by the
              rules in the knowledge
              base
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           64
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      DQ Tool #4: Identify Resolution & Matching
            •  Data matching enables analysts to identify relationships between
               records for de-duplication or group-based processing
            • Matching is central to maintaining data consistency and integrity
               throughout the enterprise
            • The matching process should be used in
               the initial data migration of data into a
               single repository
            2 basic approaches to matching:
            •          Deterministic
                          – Relies on defined patterns/rules for assigning
                            weights and scores to determine similarity
                          – Predictable
                          – Dependent on rules developers anticipations
            •          Probabilistic
                          – Relies on statistical techniques for assessing the probability that any pair of record
                            represents the same entity
                          – Not reliant on rules
                          – Probabilities can be refined based on experience -> matchers can improve
                            precision as more data is analyzed
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           65
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      DQ Tool #5: Enhancement
            Definition:                                                                           Examples of data
            •          A method for adding value to                                                 enhancements:
                       information by accumulating                                                •   Time/date stamps
                       additional information about a                                             •   Auditing information
                       base set of entities and then
                       merging all the sets of                                                    •   Contextual information
                       information to provide a focused                                           •   Geographic information
                       view. Improves master data.                                                •   Demographic information
            Benefits:                                                                             •   Psychographic information
            •          Enables use of third party data
                       sources
            •          Allows you to take advantage of
                       the information and research
                       carried out by external data
                       vendors to make data more
                       meaningful and useful

           PRODUCED	
  BY                                                                                       CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                             EDUCATION        10/09/12           66
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      DQ Tool #6: Reporting
            • Good reporting supports:
                          – Inspection and monitoring of conformance to data quality expectations
                          – Monitoring performance of data stewards conforming to data quality
                            SLAs
                          – Workflow processing for data quality incidents
                          – Manual oversight of data cleansing and correction
            • Data quality tools provide dynamic reporting and
              monitoring capabilities
            • Enables analyst and data stewards to support and drive
              the methodology for ongoing DQM and improvement with
              a single, easy-to-use solution
            • Associate report results with:
                          – Data quality measurement
                          – Metrics
                          – Activity
           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           67
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           68
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
Guiding Principles
           TITLE




            1)  Manage data as a core organizational asset.
            2)  Identify a gold record for all data elements
            3)  All data elements will have a standardized data definition, data type, and
                acceptable value domain
            4) Leverage data governance for the control and performance of DQM
            5) Use industry and international data standards whenever possible
            6) Downstream data consumers specify data quality expectations
            7) Define business rules to assert conformance to data quality expectations
            8) Validate data instances and data sets against defined business rules
            9) Business process owners will agree to and abide by data quality SLAs
            10) Apply data corrections at the original source if possible
            11) If it is not possible to correct data at the source, forward data corrections to
                the owner of the original source. Influence on data brokers to conform to
                local requirements may be limited
            12) Report measured levels of data quality to appropriate data stewards,
                business process owners, and SLA managers

                                                                                       from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
           PRODUCED	
  BY                                                                                                                 CLASSIFICATION  DATE         SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12           69
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
               Interdependencies - Tools alone cannot do the job!
                                                                                                  Education and Training
                                                                                                        (People)




            Data Cleansing and Prevention                                                                                         Data Quality Tools
                      (Process)                                                                                                     (Technology)
           PRODUCED	
  BY                                                                                                  CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                        EDUCATION        10/09/12
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                                      Summary: Data Quality Engineering




                                                                                   from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
       PRODUCED	
  BY                                                                                                                 CLASSIFICATION   DATE         SLIDE
       DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                                                                       EDUCATION        10/09/12             71
1/26/2010
10/04/12       © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Outline
           1. Data Management Introduction

           2. Data Quality Definitions & Overview

           3. DQM Cycle

           4. DQ Awareness & Requirements

           5. DQ Dimensions

           6. Data Quality Tools

           7. Guiding Principles
                                                                                                        Tweeting now:
           8. References and Q&A                                                                          #dataed

           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           72
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
TITLE
                      Recommended Reading




           PRODUCED	
  BY                                                                         CLASSIFICATION   DATE       SLIDE
           DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060                               EDUCATION        10/09/12           73
10/04/12           © Copyright this and previous years by Data Blueprint - all rights reserved!
Data-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality Challenges

Más contenido relacionado

La actualidad más candente

Chapter 2: Data Management Overviews
Chapter 2: Data Management OverviewsChapter 2: Data Management Overviews
Chapter 2: Data Management OverviewsAhmed Alorage
 
DAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataDAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataMary Levins, PMP
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality ManagementAhmed Alorage
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
 
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional DevelopmentAhmed Alorage
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY
 
Tips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonizationTips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonizationVerdantis
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Chapter 3: Data Governance
Chapter 3: Data Governance Chapter 3: Data Governance
Chapter 3: Data Governance Ahmed Alorage
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsIke Ellis
 
Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Alan McSweeney
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence ManagementAhmed Alorage
 

La actualidad más candente (20)

Chapter 2: Data Management Overviews
Chapter 2: Data Management OverviewsChapter 2: Data Management Overviews
Chapter 2: Data Management Overviews
 
DAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataDAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master Data
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
 
Tips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonizationTips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonization
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Chapter 3: Data Governance
Chapter 3: Data Governance Chapter 3: Data Governance
Chapter 3: Data Governance
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
 

Similar a Data-Ed Engineering Solutions to Data Quality Challenges

Data-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality ChallengesData-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality ChallengesData Blueprint
 
Sybase whats in_your_architecture_wp
Sybase whats in_your_architecture_wpSybase whats in_your_architecture_wp
Sybase whats in_your_architecture_wpSybase Türkiye
 
Wallchart - Continuous Data Quality Process
Wallchart - Continuous Data Quality ProcessWallchart - Continuous Data Quality Process
Wallchart - Continuous Data Quality ProcessDavid Walker
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapDavid Walker
 
Big Data For Investment Research Management
Big Data For Investment Research ManagementBig Data For Investment Research Management
Big Data For Investment Research ManagementIDT Partners
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2David Linthicum
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger PresentationMauricio Godoy
 
NASA Facilities GIS
NASA Facilities GISNASA Facilities GIS
NASA Facilities GISrjinterr
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Cambridge Semantics
 
Data Mining
Data MiningData Mining
Data Miningswami920
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)Will Gardella
 
Physical Database Requirements.pdf
Physical Database Requirements.pdfPhysical Database Requirements.pdf
Physical Database Requirements.pdfseifusisay06
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityDatabase Architechs
 
The METL Process in Investment Banking
The METL Process in Investment BankingThe METL Process in Investment Banking
The METL Process in Investment BankingAntony Benzing
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMark Ginnebaugh
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
1.1 Data Modelling - Part I (Understand Data Model).pdf
1.1 Data Modelling - Part I (Understand Data Model).pdf1.1 Data Modelling - Part I (Understand Data Model).pdf
1.1 Data Modelling - Part I (Understand Data Model).pdfRakeshKumar145431
 

Similar a Data-Ed Engineering Solutions to Data Quality Challenges (20)

Data-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality ChallengesData-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality Challenges
 
Sybase whats in_your_architecture_wp
Sybase whats in_your_architecture_wpSybase whats in_your_architecture_wp
Sybase whats in_your_architecture_wp
 
Wallchart - Continuous Data Quality Process
Wallchart - Continuous Data Quality ProcessWallchart - Continuous Data Quality Process
Wallchart - Continuous Data Quality Process
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
 
Big Data For Investment Research Management
Big Data For Investment Research ManagementBig Data For Investment Research Management
Big Data For Investment Research Management
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger Presentation
 
ICT for Governance and Policy Modelling
ICT for Governance and Policy Modelling ICT for Governance and Policy Modelling
ICT for Governance and Policy Modelling
 
NASA Facilities GIS
NASA Facilities GISNASA Facilities GIS
NASA Facilities GIS
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
 
Data Mining
Data MiningData Mining
Data Mining
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
 
Physical Database Requirements.pdf
Physical Database Requirements.pdfPhysical Database Requirements.pdf
Physical Database Requirements.pdf
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data Quality
 
The METL Process in Investment Banking
The METL Process in Investment BankingThe METL Process in Investment Banking
The METL Process in Investment Banking
 
SAP EIM
SAP EIM SAP EIM
SAP EIM
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data Services
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
1.1 Data Modelling - Part I (Understand Data Model).pdf
1.1 Data Modelling - Part I (Understand Data Model).pdf1.1 Data Modelling - Part I (Understand Data Model).pdf
1.1 Data Modelling - Part I (Understand Data Model).pdf
 

Más de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 

Más de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 

Último

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Data-Ed Engineering Solutions to Data Quality Challenges

  • 1. Data Quality Engineering TITLE This presentation provides guidance to organizations considering data quality initiatives or preparing for data quality initiatives. This talk will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality can be engineered provides a useful framework in which to develop an organizational approach. This in turn will allow organizations to more quickly identify data problems caused by structural issues versus practice-oriented defects. Participants will also Starting learn the importance of practicing data quality point for new system Metadata Creation • Define Data Architecture • Define Data Model Structures Metadata Refinement • Correct Structural Defects • Update Implementation engineering quantification. development architecture data architecture refinements Metadata Structuring Data Refinement • Implement Data Model Views • Correct Data Value Defects • Populate Data Model Views corrected • Re-store Data Values data data Date: October 9, 2012 Data Creation architecture and data models facts & Metadata & Data Storage data performance metadata Data Assessment meanings Time: 2:00 PM ET • Create Data • Assess Data Values • Verify Data Values • Assess Metadata shared data updated data Starting point for existing Presented by: Dr. Peter Aiken Data Utilization Data Manipulation systems • Inspect Data • Manipulate Data • Present Data • Updata Data PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 2. Get Social With Us! TITLE Live Twitter Feed Like Us on Facebook Join the Group Join the conversation! www.facebook.com/ Data Management & Follow us: datablueprint Business Intelligence @datablueprint Post questions and Ask questions, gain insights comments and collaborate with fellow @paiken Find industry news, insightful data management Ask questions and submit content professionals your comments: #dataed and event updates. PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 3. Meet Your Presenter: Dr. Peter Aiken • Internationally recognized thought- leader in the data management field - 30 years of experience – Recipient of multiple international awards – Founder, Data Blueprint (http://datablueprint.com) • 7 books and dozens of articles • Experienced w/ 500+ data management practices in 20 countries • Multi-year immersions with organizations as diverse as the US DoD, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart 3 - datablueprint.com 10/11/2012 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 4. Data Quality Engineering Data Quality Engineering DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12
  • 5. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 5 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 6. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International • The professional association for Data Managers (40 chapters worldwide) DMBoK organized around • Primary data management functions focused around data delivery to the organization • Organized around several environmental elements Data Management Functions PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 7. TITLE The DAMA Guide to the Data Management Body of Knowledge Amazon: http:// www.amazon.com/ DAMA-Guide- Management- Knowledge-DAMA- DMBOK/dp/ 0977140083 Or enter the terms "dama dm bok" at the Amazon search engine Environmental Elements PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 7 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 8. TITLE What is the CDMP? • Certified Data Management Professional • DAMA International and ICCP • Membership in a distinct group made up of your fellow professionals • Recognition for your specialized knowledge in a choice of 17 specialty areas • Series of 3 exams • For more information, please visit: – http://www.dama.org/i4a/pages/ index.cfm?pageid=3399 – http://iccp.org/certification/ designations/cdmp #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5/15/2012 8 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 9. TITLE Data Management PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 9 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 10. TITLE Data Management Manage data coherently. Data Program Coordination Share data across boundaries. Organizational Data Integration Data Stewardship Data Development Assign responsibilities for data. Engineer data delivery systems. Data Support Operations Maintain data availability. PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 10 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 11. TITLE Data Management PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 11 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 12. TITLE Overview: Data Quality Engineering from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 12 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 13. TITLE Overview: Data Quality Engineering from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 13 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 14. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 14 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 15. TITLE Definitions Data Quality Management • Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure the fitness of data for use • Entails the establishment and deployment of roles, responsibilities concerning the acquisition, maintenance, dissemination, and disposition of data.” http://www2.sas.com/proceedings/sugi29/098-29.pdf • Critical support process in organizational change management • Continuous process for defining the parameters for specifying acceptable levels of data quality to meet business needs and for ensuring that data quality meets these levels Data Quality • Synonymous with information quality, since poor data quality results in inaccurate information and poor business performance from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 15 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 16. TITLE Overview: DQM Concepts and Activities 1) Data Quality Management Approach 2) Develop and promote data quality awareness 3) Define data quality requirements 4) Profile, analyze and assess data quality 5) Define data quality metrics 6) Define data quality business rules 7) Test and validate data quality requirements 8) Set and evaluate data quality service levels 9) Measure and monitor data quality 10) Manage data quality issues 11) Clean and correct data quality defects 12) Design and implement operational DQM procedures 13) Monitor operational DQM procedures and performance from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 16 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 17. TITLE Concepts and Activities Data quality expectations provide the inputs necessary to define the data quality framework: – Requirements – Inspection policies – Measures, and monitors that reflect changes in data quality and performance • The data quality framework requirements reflect 3 aspects of business data expectations 1) A manner to record the expectation in business rules 2) A way to measure the quality of data within that dimension 3) An acceptability threshold from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 17 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 18. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 18 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 19. TITLE The DQM Cycle The general approach to DQM is a version of the Deming cycle. Deming proposes a problem–solving model known as “plan-do-study-act” or “plan-do-check-act” The cycle begins by: 1) Identifying data issues that are critical to the achievement of business objectives 2) Defining business requirements for data quality 3) Identifying key data quality dimensions 4) Defining business rules critical to ensuring high quality data from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 19 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 20. TITLE The DQM Cycle: (1) Plan Plan for the assessment of the current state and identification of key metrics for measuring quality • The data quality team assesses the scope of known issues • This involves: – Determining cost and impact – Evaluating alternatives for addressing them from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 20 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 21. TITLE The DQM Cycle: (2) Deploy Deploy processes for measuring and improving the quality of data: • Data profiling • Institute inspections and monitors to identify data issues when they occur • Fix flawed processes that are the root cause of data errors or correct errors downstream • When it is not possible to correct errors at their source, correct them at their earliest point in the data flow from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 21 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 22. TITLE The DQM Cycle: (3) Monitor Monitor the quality of data as measured against the defined business rules • If data quality meets defined thresholds for acceptability, the processes are in control and the level of data quality meets the business requirements • If data quality falls below acceptability thresholds, notify data stewards so they can take action during the next stage from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 22 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 23. TITLE The DQM Cycle: (4) Act Act to resolve any identified issues to improve data quality and better meet business expectations • New cycles begin as new data sets come under investigation or as new data quality requirements are identified for existing data sets from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 23 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 24. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 24 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 25. TITLE Develop and Promote DQ Awareness • Promoting data quality awareness is essential to ensure buy-in of necessary stakeholders in the organization • Ensure that the right people in the organization are aware of the existence of data quality issues • Awareness increases the chance of success of any DQM program • Awareness includes: – Relating material impacts to data issues – Ensuring systematic approaches to regulators – Oversight of the quality of organizational data – Socializing the concept that data quality problems cannot be solely addressed by technology solutions from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 25 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 26. TITLE Polling Question #1 Which is not a step to promote data quality awareness? a) Training  on  the  core  concepts  of   data  quality b) Establish  data  governance   framework  for  data  quality c) Create  a  data  architecture  map PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 26 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 27. TITLE Develop and Promote DQ Awareness: Steps 1) Training on the core concepts of data quality 2) Establish data governance framework for data quality 3) Create a data quality oversight board that has a reporting hierarchy associated with the different data governance roles from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 27 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 28. TITLE Define DQ Requirements • Data quality must be understood within the context of ‘fitness for use’ • Data quality requirements are often hidden within defined business policies • Incremental detailed review and iterative refinement of business policies helps to identify those information requirements which become data quality rules • Steps for incremental detailed review: – Identify key data components associated with business policies – Determine how identified data assertions affect the business – Evaluate how data errors are categorized within a set of data quality dimensions – Specify the business rules that measure the occurrence of data errors – Provide a means for implementing measurement processes that assess conformance to those business rules from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 28 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 29. TITLE Data Quality Dimensions from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 29 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 30. TITLE Profile, Analyze and Assess DQ Data assessment using 2 different approaches: 1) Bottom-up 2) Top-down Bottom-up assessment: • Inspection and evaluation of the data sets • Highlight potential issues based on the results of automated processes Top-down assessment: • Engage business users to document their business processes and the corresponding critical data dependencies • Understand how their processes consume data and which data elements are critical to the success of the business application from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 30 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 31. TITLE Define DQ Metrics • Metrics development occurs as part of the strategy/design/plan step • Process for defining data quality metrics: 1) Select one of the identified critical business impacts 2) Evaluate the dependent data elements, create and update processes associate with that business impact 3) List any associated data requirements 4) Specify the associated dimension of data quality and one or more business rules to use to determine conformance of the data to expectations 5) Describe the process for measuring conformance 6) Specify an acceptability threshold from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 31 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 32. TITLE Test and Validate DQ Requirements • Data profiling tools analyze data to find potential anomalies • Use the same tools for rule validation • Rules discovered or defined during the data quality assessment phase are referenced in measuring conformance as part of the operational process from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 32 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 33. TITLE Set and Evaluate DQ Service Levels • Data quality inspection and monitoring are used to measure and monitor compliance with defined data quality rules • Data quality SLAs specify the organization’s expectations for response and remediation • Operational data quality control defined in data quality SLAs includes: – Data elements covered by the agreement – Business impacts associated with data flaws – Data quality dimensions associated with each data element – Quality expectations for each data element of the indentified dimensions in each application for system in the value chain – Methods for measuring against those expectations – (…) from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 33 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 34. TITLE Measure and Monitor DQ • DQM procedures depend on available data quality measuring and monitoring services • 2 contexts for control/measurement of conformance to data quality business rules exist: – In-stream: collect in-stream measurements while creating data – In batch: perform batch activities on collections of data instances assembled in a data set • Apply measurements at 3 levels of granularity: – Data element value – Data instance or record – Data set from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 34 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 35. Clean & Correct Manage DQ Issues DQ Defects • Supporting the enforcement of Perform data correction the data quality SLA requires a mechanism for reporting and in 3 ways: tracking data quality incidents 1) Automated correction and activities for researching 2) Manual directed correction and resolving those incidents 3) Manual correction • A data quality incident reporting system can provide this capability • It can log the evaluation, initial diagnosis, and actions associated with data quality events from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 35 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 36. Manage DQ Issues: Example TITLE Data quality incident tracking focuses on training staff to recognize when data issues appear and how they are to be classified, logged and tracked according to the data quality SLA from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 36 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 37. Design and Implement Monitor Operational Operational DQM DQM Procedures and Procedures Performances 1) Inspection and monitoring 1) Accountability is critical 2) Diagnosis and evaluation to governance of remediation protocols overseeing alternatives data quality control 3) Resolve issues 2) All issues must be 4) Reporting assigned 3) The tracking process should specify and document the ultimate issue accountability from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 37 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 38. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 38 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 39. TITLE Example: Data Quality Interview Session Summary • During mid-February, the Data Governance Team and Data Blueprint conducted ten qualitative interview sessions with groups of individuals who interact with data on regular basis • A series of patterns emerged as participants shared stories about the impact of poor data quality on the client, its products, and its customers • These patterns highlight gaps in best practices for ensuring data quality, i.e. the extent to which data is “fit for use” • Our preliminary analysis evaluated these stories against attributes of four data quality dimensions • At this early stage of the post-interview process, we are seeking confirmation of our assumptions and method PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 39 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 40. TITLE Which Activities Support Quality Data? • Data quality best practices depend on both – Practice-oriented activities – Structure-oriented activities Quality Practice-oriented Data Structure-oriented activities focus on activities focus on the capture and the data manipulation of data implementation PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 40 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 41. TITLE Quality Dimensions Practice-oriented causes • Stem from a failure to rigor when capturing and manipulating data such as: – Edit masking – Range checking of input data – CRC-checking of transmitted data Structure-oriented causes • Occur because of data and metadata that has been arranged imperfectly. For example: – When the data is in the system but we just can't access it; – When a correct data value is provided as the wrong response to a query; or – When data is not provided because it is unavailable or inaccessible to the customer • Developer focus within system boundaries instead of within organization boundaries PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 41 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 42. TITLE Practice-Oriented Activities • Affect the Data Value Quality and Data Representation Quality • Examples of improper practice-oriented activities: – Allowing imprecise or incorrect data to be collected when requirements specify otherwise – Presenting data out of sequence • Typically diagnosed in bottom-up manner: find and fix the resulting problem • Addressed by imposing more rigorous data-handling governance Practice-oriented activities Quality  of  Data   Quality  of  Data   Values Representa2on PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 42 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 43. TITLE Structure-Oriented Activities • Affect the Data Model Quality and Data Architecture Quality • Examples of improper structure-oriented activities: – Providing a correct response but incomplete data to a query because the user did not comprehend the system data structure – Costly maintenance of inconsistent data used by redundant systems • Typically diagnosed in top-down manner: root cause fixes • Addressed through fundamental data structure governance Structure-oriented activities Quality  of     Quality  of   Data  Models Data  Architecture PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 43 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 44. TITLE 4 Dimensions of Data Quality An organization’s overall data quality is a function of four distinct components, each with its own attributes: • Data Value: the quality of data as stored & maintained in the system Practice- oriented • Data Representation – the quality of representation for stored values; perfect data values stored in a system that are inappropriately represented can be harmful • Data Model – the quality of data logically representing user requirements related to data entities, associated attributes, and their relationships; essential for effective Structure-­‐ communication among data suppliers and consumers oriented • Data Architecture – the coordination of data management activities in cross-functional system development and operations PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 44 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 45. TITLE Effective Data Quality Engineering • Data quality engineering has been focused on operational problem correction – Directing attention to practice-oriented data imperfections • Data quality engineering is more effective when also focused on structure-oriented causes – Ensuring the quality of shared data across system boundaries (closer  to  the  user) (closer  to  the  architect) Data   Data  Value   Data  Model   Data  Architecture   Representa9on   Quality Quality Quality Quality As  an   As  understood  by   organiza9onal   As  presented  to   As  maintained  in   developers asset the  user the  system PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 45 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 46. TITLE Full Set of Data Quality Attributes PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 46 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 47. TITLE Data Value Quality PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 47 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 48. TITLE Data Representation Quality PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 48 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 49. TITLE Data Model Quality PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 49 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 50. TITLE Data Architecture Quality PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 50 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 51. TITLE Extended data life cycle model with metadata sources and uses Starting point Metadata Refinement Metadata Creation for new • Define Data Architecture • Correct Structural Defects system • Update Implementation • Define Data Model Structures development architecture data architecture refinements Metadata Structuring Data Refinement • Implement Data Model Views • Correct Data Value Defects • Populate Data Model Views corrected • Re-store Data Values data data architecture and Metadata & data models Data Storage data performance metadata Data Creation facts & Data Assessment • Create Data meanings • Assess Data Values • Verify Data Values • Assess Metadata shared data updated data Starting point for existing Data Utilization Data Manipulation systems • Inspect Data • Manipulate Data • Present Data • Updata Data PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 51 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 52. TITLE Data Quality Engineering ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 52 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 53. Goals and Principles TITLE § To measurably improve the quality of data in relation to defined business expectations § To define requirements and specifications for integrating data quality control into the system development life cycle § To provide defined processes for measuring, monitoring, and reporting conformance to acceptable levels of data quality from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 53 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 54. TITLE Activities • Develop and Promote Data Quality Awareness • Set and Evaluate Data Quality Service Levels • Test and Validate Data Quality Requirements • Profile, Analyze, and Assess Data Quality • Continuously Measure and Monitor Data Quality • Monitor Operational DQM Procedures and Performance • Define Data Quality Business Rules • Define Data Quality Metrics • Manage Data Quality Issues • Clean and Correct Data Quality Defects • Define Data Quality Requirements • Design and Implement Operational DQM Procedures from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 54 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 55. TITLE Primary Deliverables • Improved Quality Data • Data Management Operational Analysis • Data profiles • Data Quality Certification Reports • Data Quality Service Level Agreements from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 55 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 56. TITLE Roles and Responsibilities Suppliers: § External Sources § Regulatory Bodies § Business Subject Matter Experts § Information Consumers § Data Producers § Data Architects § Data Modelers § Data Stewards Participants: Consumers: § Data Quality Analysts § Data Stewards § Data Analysts § Data Professionals § Database Administrators § Other IT Professionals § Data Stewards § Knowledge Workers § Other Data Professionals § Managers and § DRM Director Executives § Data Stewardship Council § Customers from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 56 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 57. TITLE Polling Question #2 What is one guiding principle for data quality? a. Business  process  owners  will   agree  to  and  abide  by  data   quality  SLAs a. IdenDfy  a  blue  record  for  all   data  elements a. Upstream  data  consumers   specific  data  quality   expectaDons PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 57 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 58. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 58 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 59. TITLE Technology • Data Profiling Tools • Statistical Analysis Tools • Data Cleansing Tools • Data Integration Tools • Issue and Event Management Tools from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 59 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 60. TITLE Overview: Data Quality Tools 4 categories of Principal tools: activities: 1) Data Profiling 1) Analysis 2) Parsing and 2) Cleansing Standardization 3) Enhancement 3) Data Transformation 4) Monitoring 4) Identity Resolution and Matching 5) Enhancement 6) Reporting from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 60 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 61. TITLE DQ Tool #1: Data Profiling • Data profiling is the assessment of value distribution and clustering of values into domains • Need to be able to distinguish between good and bad data before making any improvements • Data profiling is a set of algorithms for 2 purposes: – Statistical analysis and assessment of the data quality values within a data set – Exploring relationships that exist between value collections within and across data sets • At its most advanced, data profiling takes a series of prescribed rules from data quality engines. It then assesses the data, annotates and tracks violations to determine if they comprise new or inferred data quality rules PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 61 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 62. TITLE DQ Tool #1: Data Profiling, cont’d • Data profiling vs. data quality-business context and semantic/logical layers – Data quality is concerned with proscriptive rules – Data profiling looks for patterns when rules are adhered to and when rules are violated; able to provide input into the business context layer • Incumbent that data profiling services notify all concerned parties of whatever is discovered • Profiling can be used to… – …notify the help desk that valid changes in the data are about to case an avalanche of “skeptical user” calls – …notify business analysts of precisely where they should be working today in terms of shifts in the data PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 62 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 63. TITLE DQ Tool #2: Parsing & Standardization • Data parsing tools enable the definition of patterns that feed into a rules engine used to distinguish between valid and invalid data values • Actions are triggered upon matching a specific pattern • When an invalid pattern is recognized, the application may attempt to transform the invalid value into one that meets expectations • Data standardization is the process of conforming to a set of business rules and formats that are set up by data stewards and administrators • Data standardization example: – Brining all the different formats of “street” into a single format, e.g. “STR”, “ST.”, “STRT”, “STREET”, etc. PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 63 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 64. TITLE DQ Tool #3: Data Transformation • Upon identification of data errors, trigger data rules to transform the flawed data • Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation • Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 64 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 65. TITLE DQ Tool #4: Identify Resolution & Matching • Data matching enables analysts to identify relationships between records for de-duplication or group-based processing • Matching is central to maintaining data consistency and integrity throughout the enterprise • The matching process should be used in the initial data migration of data into a single repository 2 basic approaches to matching: • Deterministic – Relies on defined patterns/rules for assigning weights and scores to determine similarity – Predictable – Dependent on rules developers anticipations • Probabilistic – Relies on statistical techniques for assessing the probability that any pair of record represents the same entity – Not reliant on rules – Probabilities can be refined based on experience -> matchers can improve precision as more data is analyzed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 65 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 66. TITLE DQ Tool #5: Enhancement Definition: Examples of data • A method for adding value to enhancements: information by accumulating • Time/date stamps additional information about a • Auditing information base set of entities and then merging all the sets of • Contextual information information to provide a focused • Geographic information view. Improves master data. • Demographic information Benefits: • Psychographic information • Enables use of third party data sources • Allows you to take advantage of the information and research carried out by external data vendors to make data more meaningful and useful PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 66 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 67. TITLE DQ Tool #6: Reporting • Good reporting supports: – Inspection and monitoring of conformance to data quality expectations – Monitoring performance of data stewards conforming to data quality SLAs – Workflow processing for data quality incidents – Manual oversight of data cleansing and correction • Data quality tools provide dynamic reporting and monitoring capabilities • Enables analyst and data stewards to support and drive the methodology for ongoing DQM and improvement with a single, easy-to-use solution • Associate report results with: – Data quality measurement – Metrics – Activity PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 67 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 68. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 68 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 69. Guiding Principles TITLE 1) Manage data as a core organizational asset. 2) Identify a gold record for all data elements 3) All data elements will have a standardized data definition, data type, and acceptable value domain 4) Leverage data governance for the control and performance of DQM 5) Use industry and international data standards whenever possible 6) Downstream data consumers specify data quality expectations 7) Define business rules to assert conformance to data quality expectations 8) Validate data instances and data sets against defined business rules 9) Business process owners will agree to and abide by data quality SLAs 10) Apply data corrections at the original source if possible 11) If it is not possible to correct data at the source, forward data corrections to the owner of the original source. Influence on data brokers to conform to local requirements may be limited 12) Report measured levels of data quality to appropriate data stewards, business process owners, and SLA managers from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 69 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 70. TITLE Interdependencies - Tools alone cannot do the job! Education and Training (People) Data Cleansing and Prevention Data Quality Tools (Process) (Technology) PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 71. TITLE Summary: Data Quality Engineering from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 71 1/26/2010 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 72. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 72 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 73. TITLE Recommended Reading PRODUCED  BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 73 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!