SlideShare a Scribd company logo
1 of 3
A McKeel Research LLC White Paper




                                                    455 Newport Way Suite 103
                                                  Issaquah, Washington 98027
                                                                (425) 996-0427




                                     Data Quality Process Design
                                           for Ad-Hoc Reporting




                                    By Jim Atwater, Principal Consultant
                                         Management Analytics Practice




  September 2008
Data Quality Process Design for Ad-Hoc Reporting
                                                        McKeel Research, LLC
                                                              All rights reserved


                                    Introduction
               Contents
                                    This white paper provides an overview of
                                    some of the key objects contained within a
Introduction              2
                                    baseline data-cleansing subsystem for use by
Problem Statement         2
                                    ad-hoc reporting solutions, be they
Previous Options          2
                                    relational, dimensional or somewhere in
Our Solution              3         between. The key scenario is based on
                                    experience in enterprise sales and marketing
Implementation            3
                                    work groups responsible for metrics and
Summary                   3
                                    analytics.


                                    Problem Statement
                                    Business organizations have come to realize
                                    the value of dimensional data modeling.
                                    This is particularly the case when it comes
                                    to the “one version of the truth” level of
                                    rigor such systems bring to issues of data
                                    quality. Unfortunately, complexity inherent
                                    in a proper data warehouse implementation
                                    puts such tactics outside the reach of many
                                    sales and marketing workgroups, even in
                                    large enterprise organizations. Barriers
                                    include lack of skilled resources, time and
                                    commitment required in the analysis phase,
                                    and expense compared to relationally-based
                                    legacy ad-hoc reporting solutions.


                                    Previous Options
                                    Legacy relational solutions typically build
                                    reporting solutions directly on source-
                                    system data. Data cleansing and auditing is
                                    typically compiled after the fact by analysts
                                    as footnotes to the reports. This practice
                                    wastes time, causes errors, and leaves a rich
                                    source of analytical information untapped.
                                    As such workgroups evolve, the most
                                    common errors tend to surface by virtue of
                                    their repetition and lead to “fixes” in the
                                    reports themselves, usually along the lines
                                    of computations within the reports that only
                                    serve to obfuscate the source data.



September 2008
Data Quality Process Design for Ad-Hoc Reporting
                                                                    McKeel Research, LLC
                                                                          All rights reserved

Our Solution                                    This simple benefit guarantees one version
                                                of the truth while maintaining an informed
Our solution is to leverage key data
                                                level of trust that is otherwise mixed into the
quality aspects of the transform
                                                reporting data stream.
procedures detailed by the Kimball
Group for enterprise data
warehousing solutions. This
                                                Data Warehouse “Glide Path”
approach provides three key benefits:
    More robust data quality                   By implementing the accepted best practice
    Integrity of the source system             for data quality in the data warehousing
                                                field, workgroups have armed themselves
        data
                                                metadata that is easily understood by data
    A “glide path” toward the
                                                warehouse implementers. More importantly,
        data warehouse
                                                they have purchased for themselves a “seat
                                                at the table” in future cost containment and
                                                report centralization efforts.
Data Quality Benefits
The basis of our solution lies in a
metadata store of specific screens,
                                                Implementation
each of which serves to quantify
                                                Implementation of the solution is designed
specific aspects of each data record.
                                                to fit into the existing workflow of a typical
Screens can enforce column
                                                sales or marketing analytics team.
properties within each record, the
                                                Automation of the existing reports and the
structural relation of columns to each
                                                standard “what decisions do you make using
other, or logical business rules that
                                                this data” kinds of analysis form the normal
check individual or aggregate data
                                                weekly workflow. These efforts lead to the
values. The upshot is a data quality
                                                screen definitions.
score that is applied to each record.
                                                This effort is actualized by the baseline data
The added value is that data quality
                                                quality code within the Microsoft SQL
metrics are an authentic data source.
                                                Server Integration Services (SSIS) toolset.
They guide both report owners and
                                                Once the codebase is in place, the screens
producers to concentrate data
                                                are brought to bear and the key error and
cleansing efforts on the source
                                                audit deliverables mature naturally over
systems where they belong.
                                                time.

Source System Data Integrity
                                                Summary
Data integrity is preserved in a
                                                Data quality is something all ad-hoc
pristine state by virtue of the
                                                reporting systems do at some point. Ideally,
separation of data between the
                                                before your V.P. pitches a fit in the middle
source systems and the QA screen
                                                of a big meeting. By building in a metadata-
metrics. Chiefly, the QA metrics
                                                driven data screening facility, this solution
take the form of an audit dimension
                                                adds auditing and error handling to the
whose columns can be either
                                                existing reporting and pays tangible
integrated into existing report queries
                                                dividends going forward.
or delivered separately in the
resulting workbook or deck.

More Related Content

What's hot

Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo
 
Client engagement wendy
Client engagement wendyClient engagement wendy
Client engagement wendyTaylor Nichols
 
Imagine What The Cloud Can Do/
Imagine What The Cloud Can Do/ Imagine What The Cloud Can Do/
Imagine What The Cloud Can Do/ Hitachi Vantara
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Whitepaper : The Bridge From PACS to VNA: Scale Out Storage
Whitepaper : The Bridge From PACS to VNA: Scale Out Storage Whitepaper : The Bridge From PACS to VNA: Scale Out Storage
Whitepaper : The Bridge From PACS to VNA: Scale Out Storage EMC
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl
 
Introduction to Open Source VistA
Introduction to Open Source VistAIntroduction to Open Source VistA
Introduction to Open Source VistAbmehling
 
Data centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentData centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentKevin Lee
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Big Data Spain
 
Achieving Medical Imaging Interoperability with PACS and RIS Integrations
Achieving Medical Imaging Interoperability with PACS and RIS IntegrationsAchieving Medical Imaging Interoperability with PACS and RIS Integrations
Achieving Medical Imaging Interoperability with PACS and RIS IntegrationsChetu
 
DMsuite Static & Dynamic Data Masking Overview
DMsuite Static & Dynamic Data Masking OverviewDMsuite Static & Dynamic Data Masking Overview
DMsuite Static & Dynamic Data Masking OverviewAxis Technology, LLC
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeDataWorks Summit
 
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Cambridge Semantics
 
Data Lakes for HHS: Unlocking Data to Gain New Insight
Data Lakes for HHS: Unlocking Data to Gain New InsightData Lakes for HHS: Unlocking Data to Gain New Insight
Data Lakes for HHS: Unlocking Data to Gain New InsightAmazon Web Services
 
Log analyzer Needle in a haystack
Log analyzer  Needle in a haystackLog analyzer  Needle in a haystack
Log analyzer Needle in a haystackCenterRetro
 
VNA Technology-Evaluation Checklist
VNA Technology-Evaluation ChecklistVNA Technology-Evaluation Checklist
VNA Technology-Evaluation ChecklistCarestream
 

What's hot (20)

Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
 
Client engagement wendy
Client engagement wendyClient engagement wendy
Client engagement wendy
 
Imagine What The Cloud Can Do/
Imagine What The Cloud Can Do/ Imagine What The Cloud Can Do/
Imagine What The Cloud Can Do/
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Physion.PDF
Physion.PDFPhysion.PDF
Physion.PDF
 
Whitepaper : The Bridge From PACS to VNA: Scale Out Storage
Whitepaper : The Bridge From PACS to VNA: Scale Out Storage Whitepaper : The Bridge From PACS to VNA: Scale Out Storage
Whitepaper : The Bridge From PACS to VNA: Scale Out Storage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Introduction to Open Source VistA
Introduction to Open Source VistAIntroduction to Open Source VistA
Introduction to Open Source VistA
 
Data centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentData centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data development
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
 
Achieving Medical Imaging Interoperability with PACS and RIS Integrations
Achieving Medical Imaging Interoperability with PACS and RIS IntegrationsAchieving Medical Imaging Interoperability with PACS and RIS Integrations
Achieving Medical Imaging Interoperability with PACS and RIS Integrations
 
DMsuite Static & Dynamic Data Masking Overview
DMsuite Static & Dynamic Data Masking OverviewDMsuite Static & Dynamic Data Masking Overview
DMsuite Static & Dynamic Data Masking Overview
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
 
Separating pacs-servers-from-vna
Separating pacs-servers-from-vnaSeparating pacs-servers-from-vna
Separating pacs-servers-from-vna
 
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
 
Data Lakes for HHS: Unlocking Data to Gain New Insight
Data Lakes for HHS: Unlocking Data to Gain New InsightData Lakes for HHS: Unlocking Data to Gain New Insight
Data Lakes for HHS: Unlocking Data to Gain New Insight
 
dvprimer-architecture
dvprimer-architecturedvprimer-architecture
dvprimer-architecture
 
DBArtisan XE6 Datasheet
DBArtisan XE6 DatasheetDBArtisan XE6 Datasheet
DBArtisan XE6 Datasheet
 
Log analyzer Needle in a haystack
Log analyzer  Needle in a haystackLog analyzer  Needle in a haystack
Log analyzer Needle in a haystack
 
VNA Technology-Evaluation Checklist
VNA Technology-Evaluation ChecklistVNA Technology-Evaluation Checklist
VNA Technology-Evaluation Checklist
 

Viewers also liked

Data Quality Process Design For Analytics And Reporting
Data Quality Process Design For Analytics And ReportingData Quality Process Design For Analytics And Reporting
Data Quality Process Design For Analytics And Reportingmacrochaotic
 
External Data Quality Assessment Methodology and Implementation in Mozambique
External Data Quality Assessment Methodology and Implementation in MozambiqueExternal Data Quality Assessment Methodology and Implementation in Mozambique
External Data Quality Assessment Methodology and Implementation in MozambiqueJSI
 
Workflow Management V2
Workflow Management V2Workflow Management V2
Workflow Management V2Raymond Chin
 
Introduction portfolio management
Introduction portfolio managementIntroduction portfolio management
Introduction portfolio managementNoorulhadi Qureshi
 
Project, Program & Portfolio Management
Project, Program & Portfolio ManagementProject, Program & Portfolio Management
Project, Program & Portfolio ManagementAnand Subramaniam
 

Viewers also liked (8)

Data Quality Process Design For Analytics And Reporting
Data Quality Process Design For Analytics And ReportingData Quality Process Design For Analytics And Reporting
Data Quality Process Design For Analytics And Reporting
 
External Data Quality Assessment Methodology and Implementation in Mozambique
External Data Quality Assessment Methodology and Implementation in MozambiqueExternal Data Quality Assessment Methodology and Implementation in Mozambique
External Data Quality Assessment Methodology and Implementation in Mozambique
 
Data quality process
Data quality processData quality process
Data quality process
 
Workflow Management V2
Workflow Management V2Workflow Management V2
Workflow Management V2
 
Sales & Proposal Process
Sales & Proposal ProcessSales & Proposal Process
Sales & Proposal Process
 
Proposal Management Process
Proposal  Management  ProcessProposal  Management  Process
Proposal Management Process
 
Introduction portfolio management
Introduction portfolio managementIntroduction portfolio management
Introduction portfolio management
 
Project, Program & Portfolio Management
Project, Program & Portfolio ManagementProject, Program & Portfolio Management
Project, Program & Portfolio Management
 

Similar to White Paper Data Quality Process Design For Ad Hoc Reporting

How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehousemark madsen
 
DB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System ZDB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System ZSurekha Parekh
 
Prime Dimensions Capabilities
Prime Dimensions CapabilitiesPrime Dimensions Capabilities
Prime Dimensions Capabilitiesdrowan
 
2012 02-07 sql denali presentatie microsoft
2012 02-07 sql denali presentatie microsoft2012 02-07 sql denali presentatie microsoft
2012 02-07 sql denali presentatie microsoftCombell NV
 
Advanced Topics In Business Intelligence
Advanced Topics In Business IntelligenceAdvanced Topics In Business Intelligence
Advanced Topics In Business Intelligenceguest1a9ef2
 
Automated Product Data Publishing from Oracle Product Hub Is the Way Forward
Automated Product Data Publishing from Oracle Product Hub Is the Way ForwardAutomated Product Data Publishing from Oracle Product Hub Is the Way Forward
Automated Product Data Publishing from Oracle Product Hub Is the Way ForwardCognizant
 
Pivotal CRM - Analytics
Pivotal CRM - Analytics Pivotal CRM - Analytics
Pivotal CRM - Analytics Pivotal CRM
 
Business Intelligence Maturity
Business Intelligence MaturityBusiness Intelligence Maturity
Business Intelligence MaturityLouis Fernandes
 
Enterprise Data Management | Getting Meta All The Time
Enterprise Data Management | Getting Meta All The TimeEnterprise Data Management | Getting Meta All The Time
Enterprise Data Management | Getting Meta All The TimeMichael Findling
 
Business analytics for the CIO
Business analytics for the CIOBusiness analytics for the CIO
Business analytics for the CIOManish Nair
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)James Serra
 
Maximizing the Benefits of Virtualization with Real-­time Compression
Maximizing the Benefits of Virtualization with Real-­time CompressionMaximizing the Benefits of Virtualization with Real-­time Compression
Maximizing the Benefits of Virtualization with Real-­time CompressionIBM India Smarter Computing
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesCognizant
 

Similar to White Paper Data Quality Process Design For Ad Hoc Reporting (20)

How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
Using Big Data Smarter Decision Making
Using Big Data Smarter Decision MakingUsing Big Data Smarter Decision Making
Using Big Data Smarter Decision Making
 
DB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System ZDB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System Z
 
Prime Dimensions Capabilities
Prime Dimensions CapabilitiesPrime Dimensions Capabilities
Prime Dimensions Capabilities
 
2012 02-07 sql denali presentatie microsoft
2012 02-07 sql denali presentatie microsoft2012 02-07 sql denali presentatie microsoft
2012 02-07 sql denali presentatie microsoft
 
Advanced Topics In Business Intelligence
Advanced Topics In Business IntelligenceAdvanced Topics In Business Intelligence
Advanced Topics In Business Intelligence
 
9 Lenses
9 Lenses9 Lenses
9 Lenses
 
Automated Product Data Publishing from Oracle Product Hub Is the Way Forward
Automated Product Data Publishing from Oracle Product Hub Is the Way ForwardAutomated Product Data Publishing from Oracle Product Hub Is the Way Forward
Automated Product Data Publishing from Oracle Product Hub Is the Way Forward
 
Pivotal CRM - Analytics
Pivotal CRM - Analytics Pivotal CRM - Analytics
Pivotal CRM - Analytics
 
Business Intelligence Maturity
Business Intelligence MaturityBusiness Intelligence Maturity
Business Intelligence Maturity
 
ATLIS_OView_LnkdIn
ATLIS_OView_LnkdInATLIS_OView_LnkdIn
ATLIS_OView_LnkdIn
 
ATLIS_04
ATLIS_04ATLIS_04
ATLIS_04
 
Enterprise Data Management | Getting Meta All The Time
Enterprise Data Management | Getting Meta All The TimeEnterprise Data Management | Getting Meta All The Time
Enterprise Data Management | Getting Meta All The Time
 
Getting Enterprise Meta Data All the Time
Getting Enterprise Meta Data All the TimeGetting Enterprise Meta Data All the Time
Getting Enterprise Meta Data All the Time
 
Business analytics for the CIO
Business analytics for the CIOBusiness analytics for the CIO
Business analytics for the CIO
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
 
Business process based analytics
Business process based analyticsBusiness process based analytics
Business process based analytics
 
Maximizing the Benefits of Virtualization with Real-­time Compression
Maximizing the Benefits of Virtualization with Real-­time CompressionMaximizing the Benefits of Virtualization with Real-­time Compression
Maximizing the Benefits of Virtualization with Real-­time Compression
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
 

White Paper Data Quality Process Design For Ad Hoc Reporting

  • 1. A McKeel Research LLC White Paper 455 Newport Way Suite 103 Issaquah, Washington 98027 (425) 996-0427 Data Quality Process Design for Ad-Hoc Reporting By Jim Atwater, Principal Consultant Management Analytics Practice September 2008
  • 2. Data Quality Process Design for Ad-Hoc Reporting McKeel Research, LLC All rights reserved Introduction Contents This white paper provides an overview of some of the key objects contained within a Introduction 2 baseline data-cleansing subsystem for use by Problem Statement 2 ad-hoc reporting solutions, be they Previous Options 2 relational, dimensional or somewhere in Our Solution 3 between. The key scenario is based on experience in enterprise sales and marketing Implementation 3 work groups responsible for metrics and Summary 3 analytics. Problem Statement Business organizations have come to realize the value of dimensional data modeling. This is particularly the case when it comes to the “one version of the truth” level of rigor such systems bring to issues of data quality. Unfortunately, complexity inherent in a proper data warehouse implementation puts such tactics outside the reach of many sales and marketing workgroups, even in large enterprise organizations. Barriers include lack of skilled resources, time and commitment required in the analysis phase, and expense compared to relationally-based legacy ad-hoc reporting solutions. Previous Options Legacy relational solutions typically build reporting solutions directly on source- system data. Data cleansing and auditing is typically compiled after the fact by analysts as footnotes to the reports. This practice wastes time, causes errors, and leaves a rich source of analytical information untapped. As such workgroups evolve, the most common errors tend to surface by virtue of their repetition and lead to “fixes” in the reports themselves, usually along the lines of computations within the reports that only serve to obfuscate the source data. September 2008
  • 3. Data Quality Process Design for Ad-Hoc Reporting McKeel Research, LLC All rights reserved Our Solution This simple benefit guarantees one version of the truth while maintaining an informed Our solution is to leverage key data level of trust that is otherwise mixed into the quality aspects of the transform reporting data stream. procedures detailed by the Kimball Group for enterprise data warehousing solutions. This Data Warehouse “Glide Path” approach provides three key benefits:  More robust data quality By implementing the accepted best practice  Integrity of the source system for data quality in the data warehousing field, workgroups have armed themselves data metadata that is easily understood by data  A “glide path” toward the warehouse implementers. More importantly, data warehouse they have purchased for themselves a “seat at the table” in future cost containment and report centralization efforts. Data Quality Benefits The basis of our solution lies in a metadata store of specific screens, Implementation each of which serves to quantify Implementation of the solution is designed specific aspects of each data record. to fit into the existing workflow of a typical Screens can enforce column sales or marketing analytics team. properties within each record, the Automation of the existing reports and the structural relation of columns to each standard “what decisions do you make using other, or logical business rules that this data” kinds of analysis form the normal check individual or aggregate data weekly workflow. These efforts lead to the values. The upshot is a data quality screen definitions. score that is applied to each record. This effort is actualized by the baseline data The added value is that data quality quality code within the Microsoft SQL metrics are an authentic data source. Server Integration Services (SSIS) toolset. They guide both report owners and Once the codebase is in place, the screens producers to concentrate data are brought to bear and the key error and cleansing efforts on the source audit deliverables mature naturally over systems where they belong. time. Source System Data Integrity Summary Data integrity is preserved in a Data quality is something all ad-hoc pristine state by virtue of the reporting systems do at some point. Ideally, separation of data between the before your V.P. pitches a fit in the middle source systems and the QA screen of a big meeting. By building in a metadata- metrics. Chiefly, the QA metrics driven data screening facility, this solution take the form of an audit dimension adds auditing and error handling to the whose columns can be either existing reporting and pays tangible integrated into existing report queries dividends going forward. or delivered separately in the resulting workbook or deck.