SlideShare a Scribd company logo
1 of 85
Download to read offline
Copyright 2013 by Data Blueprint
1
Unlock Business Value through Data Quality Engineering
Organizations must realize what it means to utilize data
quality management in support of business strategy. This
webinar focuses on obtaining business value from data
quality initiatives. I will illustrate how organizations with
chronic business challenges often can trace the root of the
problem to poor data quality. Showing how data quality
should be engineered provides a useful framework in which
to develop an effective approach. This in turn allows
organizations to more quickly identify business problems as
well as data problems caused by structural issues versus
practice-oriented defects and prevent these from re-
occurring.
Date: April 8, 2014
Time: 2:00 PM ET/11:00 AM PT
Presenter: Peter Aiken, Ph.D.
Time:
ā€¢ timeliness
ā€¢ currency
ā€¢ frequency
ā€¢ time period
Form:
ā€¢ clarity
ā€¢ detail
ā€¢ order
ā€¢ presentation
ā€¢ media
Content:
ā€¢ accuracy
ā€¢ relevance
ā€¢ completeness
ā€¢ conciseness
ā€¢ scope
ā€¢ performance
Time:
ā€¢ timeliness
ā€¢ currency
ā€¢ frequency
ā€¢ time period
Form:
ā€¢ clarity
ā€¢ detail
ā€¢ order
ā€¢ presentation
ā€¢ media
Content:
ā€¢ accuracy
ā€¢ relevance
ā€¢ completeness
ā€¢ conciseness
ā€¢ scope
ā€¢ performance
Copyright 2013 by Data Blueprint
Get Social With Us!
Live Twitter Feed
Join the conversation!
Follow us:
@datablueprint
@paiken
Ask questions and submit your
comments: #dataed
2
Like Us on Facebook
www.facebook.com/datablueprint
Post questions and comments
Find industry news, insightful content
and event updates.
Join the Group
Data Management & Business
Intelligence
Ask questions, gain insights and
collaborate with fellow data
management professionals
Copyright 2013 by Data Blueprint
3
Peter Aiken, PhD
ā€¢ 25+ years of experience in data
management
ā€¢ Multiple international awards &
recognition
ā€¢ Founder, Data Blueprint (datablueprint.com)
ā€¢ Associate Professor of IS, VCU (vcu.edu)
ā€¢ President, DAMA International (dama.org)
ā€¢ 8 books and dozens of articles
ā€¢ Experienced w/ 500+ data
management practices in 20 countries
ā€¢ Multi-year immersions with
organizations as diverse as the
US DoD, Nokia, Deutsche Bank,
Wells Fargo, and the Commonwealth
of Virginia
2
Unlock Business Value through
Data Quality Engineering
Presented by Peter Aiken, Ph.D.
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
5
Tweeting now:
#dataed
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
6
Tweeting now:
#dataed
Data Program
Coordination
Feedback
Data
Development
Copyright 2013 by Data Blueprint
Standard
Data
Organizational DM Practices and their Inter-relationships
Organizational Strategies
Goals
Business
Data
Business Value
Application
Models &
Designs
Implementation
Direction
Guidance
7
Organizational
Data Integration
Data
Stewardship
Data Support
Operations
Data
Asset Use
Integrated
Models
Data Program
Coordination
Feedback
Data
Development
Copyright 2013 by Data Blueprint
Standard
Data
Organizational DM Practices and their Inter-relationships
Organizational Strategies
Goals
Business
Data
Business Value
Application
Models &
Designs
Implementation
Direction
Guidance
Identifying, modeling, coordinating, organizing, distributing, and architecting
data shared across business areas or organizational boundaries.
Ensuring that specific individuals
are assigned the responsibility
for the maintenance of specific
data as organizational assets,
and that those individuals are
provided the requisite
knowledge, skills, and abilities to
accomplish these goals in
conjunction with other data
stewards in the organization.
Initiation, operation, tuning, maintenance,
backup/recovery, archiving and disposal of data
assets in support of organizational activities.
8
Specifying and designing appropriately
architected data assets that are engineered to
be capable of supporting organizational needs.
Organizational
Data Integration
Data
Stewardship
Data Support
Operations
Data
Asset Use
Integrated
Models
Defining, coordinating, resourcing, implementing, and monitoring organizational
data program strategies, policies, plans, etc. as coherent set of activities.
Data Program
Coordination
Feedback
Data
Development
Copyright 2013 by Data Blueprint
Standard
Data
Five Integrated DM Practice Areas
Organizational Strategies
Goals
Business
Data
Business Value
Application
Models &
Designs
Implementation
Direction
Guidance
9
Organizational
Data Integration
Data
Stewardship
Data Support
Operations
Data
Asset Use
Integrated
Models
Leverage data in organizational activities
Data management
processes and
infrastructure
Combining multiple
assets to produce
extra value
Organizational-entity
subject area data
integration
Provide reliable
data access
Achieve sharing of data
within a business area
Copyright 2013 by Data Blueprint
Five Integrated DM Practice Areas
10
Manage data coherently.
Share data across boundaries.
Assign responsibilities for data.
Engineer data delivery systems.
Maintain data availability.
Data Program
Coordination
Organizational
Data Integration
Data
Stewardship
Data
Development
Data Support
Operations
Copyright 2013 by Data Blueprint
ā€¢ 5 Data Management
Practices Areas / Data
Management Basics
ā€¢ Are necessary but
insufficient
prerequisites to
organizational data
leveraging
applications
(that is Self Actualizing
Data or Advanced
Data Practices)
Basic Data Management Practices
ā€“ Data Program Management
ā€“ Organizational Data Integration
ā€“ Data Stewardship
ā€“ Data Development
ā€“ Data Support Operations
http://3.bp.blogspot.com/-ptl-9mAieuQ/T-idBt1YFmI/
AAAAAAAABgw/Ib-nVkMmMEQ/s1600/
maslows_hierarchy_of_needs.png
Advanced
Data
Practices
ā€¢ Cloud
ā€¢ MDM
ā€¢ Mining
ā€¢ Analytics
ā€¢ Warehousing
ā€¢ Big
Data Management Practices Hierarchy (after Maslow)
Copyright 2013 by Data Blueprint
Data Management
Body of
Knowledge
12
Data
Management
Functions
ā€¢ Published by DAMA International
ā€“ The professional association for
Data Managers (40 chapters worldwide)
ā€“ DMBoK organized around
ā€¢ Primary data management functions focused
around data delivery to the organization (dama.org)
ā€¢ Organized around several environmental elements
ā€¢ CDMP
ā€“ Certified Data Management Professional
ā€“ DAMA International and ICCP
ā€“ Membership in a distinct group made up of your
fellow professionals
ā€“ Recognition for your specialized knowledge in a
choice of 17 specialty areas
ā€“ Series of 3 exams
ā€“ For more information, please visit:
ā€¢ http://www.dama.org/i4a/pages/index.cfm?pageid=3399
ā€¢ http://iccp.org/certification/designations/cdmp
Copyright 2013 by Data Blueprint
DAMA DM BoK & CDMP
13
Copyright 2013 by Data Blueprint
Overview: Data Quality Engineering
14
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
15
Tweeting now:
#dataed
Copyright 2013 by Data Blueprint
Data
Data
Data
Information
Fact Meaning
Request
A Model Specifying Relationships Among Important Terms
[Built on definition by Dan Appleton 1983]
Intelligence
Use
1. Each FACT combines with one or more MEANINGS.
2. Each specific FACT and MEANING combination is referred to as a DATUM.
3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST
4. INFORMATION REUSE is enabled when one FACT is combined with more than one
MEANING.
5. INTELLIGENCE is INFORMATION associated with its USES.
Wisdom & knowledge are
often used synonymously
Data
Data
Data Data
16
Copyright 2013 by Data Blueprint
Definitions
ā€¢ Quality Data
ā€“ Fit for use meets the requirements of its authors, users,
and administrators (adapted from Martin Eppler)
ā€“ Synonymous with information quality, since poor data quality
results in inaccurate information and poor business performance
ā€¢ Data Quality Management
ā€“ Planning, implementation and control activities that apply quality
management techniques to measure, assess, improve, and
ensure data quality
ā€“ Entails the "establishment and deployment of roles, responsibilities
concerning the acquisition, maintenance, dissemination, and
disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf
āœ“ Critical supporting process from change management
āœ“ Continuous process for defining acceptable levels of data quality to meet business
needs and for ensuring that data quality meets these levels
ā€¢ Data Quality Engineering
ā€“ Recognition that data quality solutions cannot not managed but must be engineered
ā€“ Engineering is the application of scientific, economic, social, and practical knowledge in
order to design, build, and maintain solutions to data quality challenges
ā€“ Engineering concepts are generally not known and understood within IT or business!
17
Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166
Copyright 2013 by Data Blueprint
Improving Data Quality during System Migration
18
ā€¢ Challenge
ā€“ Millions of NSN/SKUs
maintained in a catalog
ā€“ Key and other data stored in
clear text/comment fields
ā€“ Original suggestion was manual
approach to text extraction
ā€“ Left the data structuring problem unsolved
ā€¢ Solution
ā€“ Proprietary, improvable text extraction process
ā€“ Converted non-tabular data into tabular data
ā€“ Saved a minimum of $5 million
ā€“ Literally person centuries of work
Unmatched
Items
Ignorable
Items
Items
Matched
Week # (% Total) (% Total) (% Total)
1 31.47% 1.34% N/A
2 21.22% 6.97% N/A
3 20.66% 7.49% N/A
4 32.48% 11.99% 55.53%
ā€¦ ā€¦ ā€¦ ā€¦
14 9.02% 22.62% 68.36%
15 9.06% 22.62% 68.33%
16 9.53% 22.62% 67.85%
17 9.50% 22.62% 67.88%
18 7.46% 22.62% 69.92%
Copyright 2013 by Data Blueprint
Determining Diminishing Returns
19
Time needed to review all NSNs once over the life of the project:Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year
Saved
93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Copyright 2013 by Data Blueprint
20
Quantitative Benefits
Copyright 2013 by Data Blueprint
Data Quality Misconceptions
1. You can fix the data
2. Data quality is an IT problem
3. The problem is in the data sources or data entry
4. The data warehouse will provide a single version of
the truth
5. The new system will provide a single version of the
truth
6. Standardization will eliminate the problem of
different "truths" represented in the reports or
analysis Source: Business Intelligence solutions, Athena Systems
21
The Blind Men and
the Elephant
ā€¢ It was six men of Indostan, To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.
ā€¢ The First approached the Elephant,
And happening to fall
Against his broad and sturdy side,
At once began to bawl:
"God bless me! but the Elephant
Is very like a wall!"
ā€¢ The Second, feeling of the tusk
Cried, "Ho! what have we here,
So very round and smooth and sharp? To me `tis mighty clear
This wonder of an Elephant
Is very like a spear!"
ā€¢ The Third approached the animal,
And happening to take
The squirming trunk within his hands, Thus boldly up he spake:
"I see," quoth he, "the Elephant
Is very like a snake!"
ā€¢ The Fourth reached out an eager hand, And felt about the knee:
"What most this wondrous beast is like Is mighty plain," quoth he;
"'Tis clear enough the Elephant
Is very like a tree!"
ā€¢ The Fifth, who chanced to touch the ear, Said: "E'en
the blindest man
Can tell what this resembles most;
Deny the fact who can,
This marvel of an Elephant
Is very like a fan!"
ā€¢ The Sixth no sooner had begun
About the beast to grope,
Than, seizing on the swinging tail
That fell within his scope.
"I see," quoth he, "the Elephant
Is very like a rope!"
ā€¢ And so these men of Indostan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong,
Though each was partly in the right,
And all were in the wrong!
(Source: John Godfrey Saxe's ( 1816-1887) version of the famous Indian legend ) 22
Copyright 2013 by Data Blueprint
Copyright 2013 by Data Blueprint
No universal conception of data
quality exists, instead many differing
perspective compete.
ā€¢ Problem:
ā€“Most organizations approach
data quality problems in the same way
that the blind men approached the elephant - people
tend to see only the data that is in front of them
ā€“Little cooperation across boundaries, just as the blind
men were unable to convey their impressions about the
elephant to recognize the entire entity.
ā€“Leads to confusion, disputes and narrow views
ā€¢ Solution:
ā€“Data quality engineering can help achieve a more
complete picture and facilitate cross boundary
communications
23
Copyright 2013 by Data Blueprint
Structured Data Quality Engineering
1. Allow the form of the
Problem to guide the
form of the solution
2. Provide a means of
decomposing the problem
3. Feature a variety of tools
simplifying system understanding
4. Offer a set of strategies for evolving a design solution
5. Provide criteria for evaluating the quality of the
various solutions
6. Facilitate development of a framework for developing
organizational knowledge.
24
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
25
Tweeting now:
#dataed
Copyright 2013 by Data Blueprint
Mizuho Securities
ā€¢ Wanted to sell 1 share for
600,000 yen
ā€¢ Sold 600,000 shares for 1
yen
ā€¢ $347 million loss
ā€¢ In-house system did not
have limit checking
ā€¢ Tokyo stock exchange
system did not have limit
checking ...
ā€¢ ā€¦ and doesn't allow order
cancellations
CLUMSY typing cost a Japanese bank at
least Ā£128 million and staff their Christmas
bonuses yesterday, after a trader
mistakenly sold 600,000 more shares than
he should have. The trader at Mizuho
Securities, who has not been named, fell
foul of what is known in financial circles as
ā€œfat finger syndromeā€ where a dealer types
incorrect details into his computer. He
wanted to sell one share in a new telecoms
company called J Com, for 600,000 yen
(about Ā£3,000).
Infamous Data Quality Example
26
Copyright 2013 by Data Blueprint
Four ways to make your data sparkle!
1.Prioritize the task
ā€“ Cleaning data is costly and time
consuming
ā€“ Identify mission critical/non-mission
critical data
2.Involve the data owners
ā€“ Seek input of business units on what constitutes "dirty"
data
3.Keep future data clean
ā€“ Incorporate processes and technologies that check every
zip code and area code
4.Align your staff with business
ā€“ Align IT staff with business units
(Source: CIO JULY 1 2004)
27
Copyright 2013 by Data Blueprint
ā€¢ Deming cycle
ā€¢ "Plan-do-study-act" or
"plan-do-check-act"
1. Identifying data issues that are
critical to the achievement of
business objectives
2. Defining business
requirements for data quality
3. Identifying key data quality
dimensions
4. Defining business rules critical
to ensuring high quality data
28
The DQE Cycle
Copyright 2013 by Data Blueprint
The DQE Cycle: (1) Plan
ā€¢ Plan for the assessment of
the current state and
identification of key metrics
for measuring quality
ā€¢ The data quality engineering
team assesses the scope of
known issues
ā€“ Determining cost and impact
ā€“ Evaluating alternatives for
addressing them
29
Copyright 2013 by Data Blueprint
The DQE Cycle: (2) Deploy
30
ā€¢ Deploy processes for
measuring and improving
the quality of data:
ā€¢ Data profiling
ā€“ Institute inspections and
monitors to identify data
issues when they occur
ā€“ Fix flawed processes that are
the root cause of data errors
or correct errors downstream
ā€“ When it is not possible to
correct errors at their source,
correct them at their earliest
point in the data flow
Copyright 2013 by Data Blueprint
The DQE Cycle: (3) Monitor
ā€¢ Monitor the quality of data
as measured against the
defined business rules
ā€¢ If data quality meets
defined thresholds for
acceptability, the
processes are in control
and the level of data
quality meets the
business requirements
ā€¢ If data quality falls below
acceptability thresholds,
notify data stewards so
they can take action
during the next stage
31
Copyright 2013 by Data Blueprint
The DQE Cycle: (4) Act
ā€¢ Act to resolve any
identified issues to
improve data quality
and better meet
business
expectations
ā€¢ New cycles begin as
new data sets come
under investigation
or as new data
quality requirements
are identified for
existing data sets
32
Copyright 2013 by Data Blueprint
DQE Context & Engineering Concepts
ā€¢ Can rules be implemented stating that no data can be
corrected unless the source of the error has been
discovered and addressed?
ā€¢ All data must
be 100%
perfect?
ā€¢ Pareto
ā€“ 80/20 rule
ā€“ Not all data
is of equal
Importance
ā€¢ Scientific,
economic,
social, and
practical
knowledge
33
Copyright 2013 by Data Blueprint
Data quality is now acknowledged as a major source
of organizational risk by certified risk professionals!
34
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
35
Tweeting now:
#dataed
Copyright 2013 by Data Blueprint
Two Distinct Activities Support Quality Data
36
ā€¢ Data quality best practices depend on both
ā€“ Practice-oriented activities
ā€“ Structure-oriented activities
Practice-oriented
activities focus on the
capture and
manipulation of data
Structure-oriented
activities focus on the
data implementation
Quality
Data
Copyright 2013 by Data Blueprint
Practice-Oriented Activities
37
ā€¢ Stem from a failure to rigor when capturing/manipulating data such as:
ā€“ Edit masking
ā€“ Range checking of input data
ā€“ CRC-checking of transmitted data
ā€¢ Affect the Data Value Quality and Data Representation Quality
ā€¢ Examples of improper practice-oriented activities:
ā€“ Allowing imprecise or incorrect data to be collected when requirements specify
otherwise
ā€“ Presenting data out of sequence
ā€¢ Typically diagnosed in bottom-up manner: find and fix the resulting
problem
ā€¢ Addressed by imposing more rigorous data-handling governance
Quality of Data
Representation
Quality of Data
Values
Practice-oriented activities
Copyright 2013 by Data Blueprint
Structure-Oriented Activities
38
ā€¢ Occur because of data and metadata that has been arranged imperfectly. For
example:
ā€“ When the data is in the system but we just can't access it;
ā€“ When a correct data value is provided as the wrong response to a query; or
ā€“ When data is not provided because it is unavailable or inaccessible to the customer
ā€¢ Developer focus within system boundaries instead of within organization boundaries
ā€¢ Affect the Data Model Quality and Data Architecture Quality
ā€¢ Examples of improper structure-oriented activities:
ā€“ Providing a correct response but incomplete data to a query because the user did not
comprehend the system data structure
ā€“ Costly maintenance of inconsistent data used by redundant systems
ā€¢ Typically diagnosed in top-down manner: root cause fixes
ā€¢ Addressed through fundamental data structure governance
Quality of
Data Architecture
Quality of
Data Models
Structure-oriented activities
Copyright 2013 by Data Blueprint
Quality Dimensions
39
Copyright 2013 by Data Blueprint
A congratulations
letter from another
bank
Problems
ā€¢ Bank did not know it
made an error
ā€¢ Tools alone could not
have prevented this error
ā€¢ Lost confidence in the
ability of the bank to
manage customer funds
40
Copyright 2013 by Data Blueprint
4 Dimensions of Data Quality
41
An organizationā€™s overall data quality is a function of four distinct
components, each with its own attributes:
ā€¢ Data Value: the quality of data as stored & maintained in the
system
ā€¢ Data Representation ā€“ the quality of representation for stored
values; perfect data values stored in a system that are
inappropriately represented can be harmful
ā€¢ Data Model ā€“ the quality of data logically representing user
requirements related to data entities, associated attributes, and
their relationships; essential for effective communication among
data suppliers and consumers
ā€¢ Data Architecture ā€“ the coordination of data management
activities in cross-functional system development and operations
Practice-
oriented
Structure-
oriented
Copyright 2013 by Data Blueprint
Effective Data Quality Engineering
42
Data
Representation
Quality
As presented to
the user
Data Value
Quality
As maintained in
the system
Data Model
Quality
As understood by
developers
Data Architecture
Quality
As an
organizational
asset
(closer to the architect)(closer to the user)
ā€¢ Data quality engineering has been focused on
operational problem correction
ā€“ Directing attention to practice-oriented data imperfections
ā€¢ Data quality engineering is more effective when also
focused on structure-oriented causes
ā€“ Ensuring the quality of shared data across system boundaries
Copyright 2013 by Data Blueprint
Full Set of Data Quality Attributes
43
Copyright 2013 by Data Blueprint
Difficult to obtain leverage at the bottom of the falls
44
Copyright 2013 by Data Blueprint
Frozen Falls
45
Copyright 2013 by Data Blueprint
New York Turns to Big
Data to Solve Big Tree
Problem
ā€¢ NYC
ā€“ 2,500,000 trees
ā€¢ 11-months from 2009 to 2010
ā€“ 4 people were killed or seriously injured by falling tree limbs in
Central Park alone
ā€¢ Belief
ā€“ Arborists believe that pruning and otherwise maintaining trees
can keep them healthier and make them more likely to withstand
a storm, decreasing the likelihood of property damage, injuries
and deaths
ā€¢ Until recently
ā€“ No research or data to back it up
46
http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05
Copyright 2013 by Data Blueprint
NYC's Big Tree Problem
ā€¢ Question
ā€“ Does pruning trees in one year reduce the
number of hazardous tree conditions in the
following year?
ā€¢ Lots of data but granularity challenges
ā€“ Pruning data recorded block by block
ā€“ Cleanup data recorded at the address level
ā€“ Trees have no unique identifiers
ā€¢ After downloading, cleaning, merging, analyzing and intensive
modeling
ā€“ Pruning trees for certain types of hazards caused a 22 percent reduction in the
number of times the department had to send a crew for emergency cleanups
ā€¢ The best data analysis
ā€“ Generates further questions
ā€¢ NYC cannot prune each block every year
ā€“ Building block risk profiles: number of trees, types of trees, whether the block
is in a flood zone or storm zone
47
http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
48
Tweeting now:
#dataed
Copyright 2013 by Data Blueprint
Letter from the Bank
ā€¦ so please continue to open your
mail from either Chase or Bank One
P.S. Please be on the lookout for any
upcoming communications from
either Chase or Bank One regarding
your Bank One credit card and any
other Bank One product you may
have.
Problems
ā€¢ I initially discarded the letter!
ā€¢ I became upset after reading it
ā€¢ It proclaimed that Chase has data
quality challenges
49
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
50
Tweeting now:
#dataed
Copyright 2013 by Data Blueprint
Data acquisition activities Data usage activitiesData storage
Traditional Quality Life Cycle
51
restored data
Metadata
Creation
Metadata Refinement
Metadata
Structuring
Data Utilization
Copyright 2013 by Data Blueprint
Data Manipulation
Data Creation
Data Storage
Data
Assessment
Data
Refinement
52
data
architecture
& models
populated data
models and
storage locations
data values
data
values
data
values
value
defects
structure
defects
architecture
refinements
model
refinements
Data Life
Cycle
Model
Products
data
restored data
Metadata Refinement
Metadata
Structuring
Data Utilization
Copyright 2013 by Data Blueprint
Data Manipulation
Data Creation
Data Storage
Data
Assessment
Data
Refinement
53
populated data
models and
storage locations
data
values
Data Life
Cycle
Model:
Quality
Focus
data
architecture &
model quality
model quality
value quality
value quality
value quality
representation
quality
Metadata
Creation
architecture
quality
Copyright 2013 by Data Blueprint
Starting
point
for new
system
development
data performance metadata
data architecture
data
architecture and
data models
shared data updated data
corrected
data
architecture
refinements
facts &
meanings
Metadata &
Data Storage
Starting point
for existing
systems
Metadata Refinement
ā€¢ Correct Structural Defects
ā€¢ Update Implementation
Metadata Creation
ā€¢ Define Data Architecture
ā€¢ Define Data Model Structures
Metadata Structuring
ā€¢ Implement Data Model Views
ā€¢ Populate Data Model Views
Data Refinement
ā€¢ Correct Data Value Defects
ā€¢ Re-store Data Values
Data Manipulation
ā€¢ Manipulate Data
ā€¢ Updata Data
Data Utilization
ā€¢ Inspect Data
ā€¢ Present Data
Data Creation
ā€¢ Create Data
ā€¢ Verify Data Values
Data Assessment
ā€¢ Assess Data Values
ā€¢ Assess Metadata
Extended data life cycle model with metadata sources and uses
54
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
55
Tweeting now:
#dataed
Copyright 2013 by Data Blueprint
Profile, Analyze and Assess DQ
ā€¢ Data assessment using 2 different approaches:
ā€“ Bottom-up
ā€“ Top-down
ā€¢ Bottom-up assessment:
ā€“ Inspection and evaluation of the data sets
ā€“ Highlight potential issues based on the
results of automated processes
ā€¢ Top-down assessment:
ā€“ Engage business users to document
their business processes and the
corresponding critical data dependencies
ā€“ Understand how their processes
consume data and which data elements
are critical to the success of the business
applications
56
Copyright 2013 by Data Blueprint
Define DQ Measures
ā€¢ Measures development occurs as part of the strategy/
design/plan step
ā€¢ Process for defining data quality measures:
1. Select one of the identified critical business impacts
2. Evaluate the dependent data elements, create and update
processes associate with that business impact
3. List any associated data requirements
4. Specify the associated dimension of data quality and one or
more business rules to use to determine conformance of the
data to expectations
5. Describe the process for measuring conformance
6. Specify an acceptability threshold
57
Copyright 2013 by Data Blueprint
Set and Evaluate DQ Service Levels
ā€¢ Data quality inspection and
monitoring are used to
measure and monitor
compliance with defined
data quality rules
ā€¢ Data quality SLAs specify
the organizationā€™s expectations for response and remediation
ā€¢ Operational data quality control defined in data quality SLAs
includes:
ā€“ Data elements covered by the agreement
ā€“ Business impacts associated with data flaws
ā€“ Data quality dimensions associated with each data element
ā€“ Quality expectations for each data element of the identified dimensions in
each application for system in the value chain
ā€“ Methods for measuring against those expectations
ā€“ (ā€¦)
58
Measure, Monitor & Manage DQ
Copyright 2013 by Data Blueprint
ā€¢ DQM procedures depend on
available data quality measuring
and monitoring services
ā€¢ 2 contexts for control/measurement
of conformance to data quality
business rules exist:
ā€“ In-stream: collect in-stream measurements while creating data
ā€“ In batch: perform batch activities on collections of data
instances assembled in a data set
ā€¢ Apply measurements at 3 levels of granularity:
ā€“ Data element value
ā€“ Data instance or record
ā€“ Data set
59
Copyright 2013 by Data Blueprint
Overview: Data Quality Tools
4 categories of
activities:
1) Analysis
2) Cleansing
3) Enhancement
4) Monitoring
60
Principal tools:
1) Data Profiling
2) Parsing and Standardization
3) Data Transformation
4) Identity Resolution and
Matching
5) Enhancement
6) Reporting
Copyright 2013 by Data Blueprint
DQ Tool #1: Data Profiling
ā€¢ Data profiling is the assessment of
value distribution and clustering of
values into domains
ā€¢ Need to be able to distinguish
between good and bad data before
making any improvements
ā€¢ Data profiling is a set of algorithms
for 2 purposes:
ā€“ Statistical analysis and assessment of the data quality values within a
data set
ā€“ Exploring relationships that exist between value collections within and
across data sets
ā€¢ At its most advanced, data profiling takes a series of prescribed
rules from data quality engines. It then assesses the data,
annotates and tracks violations to determine if they comprise
new or inferred data quality rules
61
Copyright 2013 by Data Blueprint
DQ Tool #1: Data Profiling, contā€™d
ā€¢ Data profiling vs. data quality-business context and
semantic/logical layers
ā€“ Data quality is concerned with proscriptive rules
ā€“ Data profiling looks for patterns when rules are adhered to and when
rules are violated; able to provide input into the business context layer
ā€¢ Incumbent that data profiling services notify all concerned
parties of whatever is discovered
ā€¢ Profiling can be used toā€¦
ā€“ ā€¦notify the help desk that valid
changes in the data are about to
case an avalanche of ā€œskeptical
userā€ calls
ā€“ ā€¦notify business analysts of
precisely where they should be
working today in terms of shifts
in the data
62
Copyright 2013 by Data Blueprint
Courtesy GlobalID.com
63
Copyright 2013 by Data Blueprint
DQ Tool #2: Parsing & Standardization
ā€¢ Data parsing tools enable the definition
of patterns that feed into a rules engine
used to distinguish between valid
and invalid data values
ā€¢ Actions are triggered upon matching
a specific pattern
ā€¢ When an invalid pattern is recognized,
the application may attempt to
transform the invalid value into one that meets expectations
ā€¢ Data standardization is the process of conforming to a set of
business rules and formats that are set up by data stewards
and administrators
ā€¢ Data standardization example:
ā€“ Brining all the different formats of ā€œstreetā€ into a single format, e.g.
ā€œSTRā€, ā€œST.ā€, ā€œSTRTā€, ā€œSTREETā€, etc.
64
Copyright 2013 by Data Blueprint
DQ Tool #3: Data Transformation
ā€¢ Upon identification of data errors, trigger data rules to
transform the flawed data
ā€¢ Perform standardization and guide rule-based
transformations by mapping data values in their original
formats and patterns into a target representation
ā€¢ Parsed components of a pattern are subjected to
rearrangement, corrections, or any changes as directed
by the rules in the knowledge base
65
Copyright 2013 by Data Blueprint
DQ Tool #4: Identify Resolution & Matching
ā€¢ Data matching enables analysts to identify relationships between records for
de-duplication or group-based processing
ā€¢ Matching is central to maintaining data consistency and integrity throughout
the enterprise
ā€¢ The matching process should be used in
the initial data migration of data into a
single repository
ā€¢ 2 basic approaches to matching:
ā€¢ Deterministic
ā€“ Relies on defined patterns/rules for assigning
weights and scores to determine similarity
ā€“ Predictable
ā€“ Dependent on rules developers anticipations
ā€¢ Probabilistic
ā€“ Relies on statistical techniques for assessing the probability that any pair of record
represents the same entity
ā€“ Not reliant on rules
ā€“ Probabilities can be refined based on experience -> matchers can improve precision as
more data is analyzed
66
Copyright 2013 by Data Blueprint
DQ Tool #5: Enhancement
ā€¢ Definition:
ā€“ A method for adding value to information by accumulating additional
information about a base set of entities and then merging all the sets of
information to provide a focused view. Improves master data.
ā€¢ Benefits:
ā€“ Enables use of third party data sources
ā€“ Allows you to take advantage of the information and research carried
out by external data vendors to make data more meaningful and useful
ā€¢ Examples of data enhancements:
ā€“ Time/date stamps
ā€“ Auditing information
ā€“ Contextual information
ā€“ Geographic information
ā€“ Demographic information
ā€“ Psychographic information
67
Copyright 2013 by Data Blueprint
DQ Tool #6: Reporting
ā€¢ Good reporting supports:
ā€“ Inspection and monitoring of conformance to data quality expectations
ā€“ Monitoring performance of data stewards conforming to data quality
SLAs
ā€“ Workflow processing for data quality incidents
ā€“ Manual oversight of data cleansing and correction
ā€¢ Data quality tools provide dynamic reporting and monitoring
capabilities
ā€¢ Enables analyst and data stewards to support and drive the
methodology for ongoing DQM and improvement with a
single, easy-to-use solution
ā€¢ Associate report results with:
ā€“ Data quality measurement
ā€“ Metrics
ā€“ Activity
68
Copyright 2013 by Data Blueprint
1. Data Management Overview
2. DQE Definitions (w/ example)
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tools
7. Takeaways and Q&A
Outline
69
Tweeting now:
#dataed
ā€¢ Develop and promote data quality awareness
ā€¢ Define data quality requirements
ā€¢ Profile, analyze and asses data quality
ā€¢ Define data quality metrics
ā€¢ Define data quality business
rules
ā€¢ Test and validate data quality
requirements
ā€¢ Set and evaluate data quality
service levels
ā€¢ Measure and monitor data quality
ā€¢ Manage data quality issues
ā€¢ Clean and correct data quality defects
ā€¢ Design and implement operational DQM procedures
ā€¢ Monitor operational DQM procedures and performance
Copyright 2013 by Data Blueprint
Overview: DQE Concepts and Activities
70
Copyright 2013 by Data Blueprint
Concepts and Activities
ā€¢ Data quality expectations provide the inputs necessary
to define the data quality framework:
ā€“ Requirements
ā€“ Inspection policies
ā€“ Measures, and monitors
that reflect changes in data
quality and performance
ā€¢ The data quality framework
requirements reflect 3 aspects
of business data expectations
1. A manner to record the expectation in business rules
2. A way to measure the quality of data within that dimension
3. An acceptability threshold
71
from The DAMA Guide to the Data Management Body of Knowledge Ā© 2009 by DAMA International
Copyright 2013 by Data Blueprint
Summary: Data Quality Engineering
72
1/26/2010 Ā© Copyright this and previous years by Data Blueprint - all rights reserved!
10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056
Copyright 2013 by Data Blueprint
Questions?
74
+ =
Itā€™s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now.
Developing a Data-centric Strategy & Roadmap
Enterprise Data World
April 28, 2014 @ 8:30 AM CT
Data Architecture Requirements
May 13, 2014 @ 2:00 PM ET/11:00 AM PT
Monetizing Data Management
June 10, 2014 @ 2:00 PM ET/11:00 AM PT
Sign up here:
www.datablueprint.com/webinar-schedule
or www.dataversity.net
Copyright 2013 by Data Blueprint
Upcoming Events
75
Copyright 2013 by Data Blueprint
References & Recommended Reading
76
ā€¢ The DAMA Guide to the Data Management Body of Knowledge Ā© 2009 by DAMA International
ā€¢ http://www2.sas.com/proceedings/sugi29/098-29.pdf
Copyright 2013 by Data Blueprint
Data Quality Dimensions
77
Copyright 2013 by Data Blueprint
Data Value Quality
78
Copyright 2013 by Data Blueprint
Data Representation Quality
79
Copyright 2013 by Data Blueprint
Data Model Quality
80
Copyright 2013 by Data Blueprint
Data Architecture Quality
81
Copyright 2013 by Data Blueprint
Guiding Principles
ā€¢ Manage data as a core organizational asset.
ā€¢ Identify a gold record for all data elements
ā€¢ All data elements will have a standardized data definition, data type, and
acceptable value domain
ā€¢ Leverage data governance for the control and performance of DQM
ā€¢ Use industry and international data standards whenever possible
ā€¢ Downstream data consumers specify data quality expectations
ā€¢ Define business rules to assert conformance to data quality expectations
ā€¢ Validate data instances and data sets against defined business rules
ā€¢ Business process owners will agree to and abide by data quality SLAs
ā€¢ Apply data corrections at the original source if possible
ā€¢ If it is not possible to correct data at the source, forward data corrections
to the owner of the original source. Influence on data brokers to conform
to local requirements may be limited
ā€¢ Report measured levels of data quality to appropriate data stewards,
business process owners, and SLA managers
82
Copyright 2013 by Data Blueprint
Goals and Principles
data quality control into the
system development life cycle
ā€¢ To provide defined processes for measuring,
monitoring, and reporting conformance to acceptable
levels of data quality
83
1/26/2010 Ā© Copyright this and previous years by Data Blueprint - all rights reserved!
ā€¢ To measurably improve the quality
of data in relation to defined
business expectations
ā€¢ To define requirements and
specifications for integrating
Copyright 2013 by Data Blueprint
Primary Deliverables
ā€¢ Improved Quality Data
ā€¢ Data Management Operational
Analysis
ā€¢ Data profiles
ā€¢ Data Quality Certification Reports
ā€¢ Data Quality Service Level
Agreements
84
Copyright 2013 by Data Blueprint
Roles and Responsibilities
85
1/26/2010 Ā© Copyright this and previous years by Data Blueprint - all rights reserved!
Suppliers:
ā€¢ External Sources
ā€¢ Regulatory Bodies
ā€¢ Business Subject Matter Experts
ā€¢ Information Consumers
ā€¢ Data Producers
ā€¢ Data Architects
ā€¢ Data Modelers
ā€¢ Data Stewards
Participants:
ā€¢ Data Quality Analysts
ā€¢ Data Analysts
ā€¢ Database Administrators
ā€¢ Data Stewards
ā€¢ Other Data Professionals
ā€¢ DRM Director
ā€¢ Data Stewardship Council
Consumers:
ā€¢ Data Stewards
ā€¢ Data Professionals
ā€¢ Other IT Professionals
ā€¢ Knowledge Workers
ā€¢ Managers and Executives
ā€¢ Customers

More Related Content

What's hot

Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data GovernanceDATAVERSITY
Ā 
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...DATAVERSITY
Ā 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceRoland Bullivant
Ā 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
Ā 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
Ā 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY
Ā 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
Ā 
Data Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation SlidesData Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation SlidesSlideTeam
Ā 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
Ā 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
Ā 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
Ā 
Who Should Own Data Governance ā€“ IT or Business?
Who Should Own Data Governance ā€“ IT or Business?Who Should Own Data Governance ā€“ IT or Business?
Who Should Own Data Governance ā€“ IT or Business?DATAVERSITY
Ā 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Ā 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
Ā 
Building a Data Strategy ā€“ Practical Steps for Aligning with Business Goals
Building a Data Strategy ā€“ Practical Steps for Aligning with Business GoalsBuilding a Data Strategy ā€“ Practical Steps for Aligning with Business Goals
Building a Data Strategy ā€“ Practical Steps for Aligning with Business GoalsDATAVERSITY
Ā 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Ā 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyDATAVERSITY
Ā 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
Ā 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
Ā 

What's hot (20)

Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
Ā 
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
Ā 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
Ā 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
Ā 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
Ā 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
Ā 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
Ā 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
Ā 
Data Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation SlidesData Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation Slides
Ā 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
Ā 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data Quality
Ā 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
Ā 
Who Should Own Data Governance ā€“ IT or Business?
Who Should Own Data Governance ā€“ IT or Business?Who Should Own Data Governance ā€“ IT or Business?
Who Should Own Data Governance ā€“ IT or Business?
Ā 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
Ā 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
Ā 
Building a Data Strategy ā€“ Practical Steps for Aligning with Business Goals
Building a Data Strategy ā€“ Practical Steps for Aligning with Business GoalsBuilding a Data Strategy ā€“ Practical Steps for Aligning with Business Goals
Building a Data Strategy ā€“ Practical Steps for Aligning with Business Goals
Ā 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
Ā 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data Strategy
Ā 
Data Governance
Data GovernanceData Governance
Data Governance
Ā 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Ā 

Viewers also liked

Data-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality ChallengesData-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality ChallengesDATAVERSITY
Ā 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
Ā 
Salesforce1 data gov lunch toronto deck
Salesforce1 data gov lunch toronto deckSalesforce1 data gov lunch toronto deck
Salesforce1 data gov lunch toronto deckBeth Fitzpatrick
Ā 
Introduction to DCAM, the Data Management Capability Assessment Model
Introduction to DCAM, the Data Management Capability Assessment ModelIntroduction to DCAM, the Data Management Capability Assessment Model
Introduction to DCAM, the Data Management Capability Assessment ModelElement22
Ā 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management BasicKhaled Mosharraf
Ā 
2014 dqe handouts
2014 dqe handouts2014 dqe handouts
2014 dqe handoutsData Blueprint
Ā 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality PresentationStephen McCarthy
Ā 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...Christopher Bradley
Ā 
Maturity in Data Management - Why do I need it?
Maturity in Data Management - Why do I need it?Maturity in Data Management - Why do I need it?
Maturity in Data Management - Why do I need it?Kingland
Ā 
Le Data Quality
Le Data QualityLe Data Quality
Le Data Qualitywdmmdp
Ā 

Viewers also liked (10)

Data-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality ChallengesData-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality Challenges
Ā 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
Ā 
Salesforce1 data gov lunch toronto deck
Salesforce1 data gov lunch toronto deckSalesforce1 data gov lunch toronto deck
Salesforce1 data gov lunch toronto deck
Ā 
Introduction to DCAM, the Data Management Capability Assessment Model
Introduction to DCAM, the Data Management Capability Assessment ModelIntroduction to DCAM, the Data Management Capability Assessment Model
Introduction to DCAM, the Data Management Capability Assessment Model
Ā 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management Basic
Ā 
2014 dqe handouts
2014 dqe handouts2014 dqe handouts
2014 dqe handouts
Ā 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
Ā 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...
Ā 
Maturity in Data Management - Why do I need it?
Maturity in Data Management - Why do I need it?Maturity in Data Management - Why do I need it?
Maturity in Data Management - Why do I need it?
Ā 
Le Data Quality
Le Data QualityLe Data Quality
Le Data Quality
Ā 

Similar to Data-Ed Webinar: Data Quality Engineering

Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
Ā 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data Blueprint
Ā 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMDATAVERSITY
Ā 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
Ā 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMark Schoeppel
Ā 
CDMP SLIDE TRAINER .pptx
CDMP SLIDE TRAINER .pptxCDMP SLIDE TRAINER .pptx
CDMP SLIDE TRAINER .pptxssuser65981b
Ā 
OAUG 05-2009-MDM-1683-A Fiteni CPA, CMA
OAUG 05-2009-MDM-1683-A Fiteni CPA, CMAOAUG 05-2009-MDM-1683-A Fiteni CPA, CMA
OAUG 05-2009-MDM-1683-A Fiteni CPA, CMAAlex Fiteni
Ā 
Workable Enteprise Data Governance
Workable Enteprise Data GovernanceWorkable Enteprise Data Governance
Workable Enteprise Data GovernanceBhavendra Chavan
Ā 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality RightDATAVERSITY
Ā 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...DATAVERSITY
Ā 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Blueprint
Ā 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudDATAVERSITY
Ā 
Data Governance Maturity Model
Data Governance Maturity ModelData Governance Maturity Model
Data Governance Maturity ModelBasuki Rahmad
Ā 
Increasing Your Business Data & Analytics Maturity
Increasing Your Business Data & Analytics MaturityIncreasing Your Business Data & Analytics Maturity
Increasing Your Business Data & Analytics MaturityMario Faria
Ā 
Data-Ed Webinar: Design & Manage Data Structures
Data-Ed Webinar: Design & Manage Data Structures Data-Ed Webinar: Design & Manage Data Structures
Data-Ed Webinar: Design & Manage Data Structures DATAVERSITY
Ā 
Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data Blueprint
Ā 
RungananW-DA&DG 201701 V2.0
RungananW-DA&DG 201701 V2.0RungananW-DA&DG 201701 V2.0
RungananW-DA&DG 201701 V2.0Runganan Wankundee
Ā 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
Ā 
Best Practices with the DMM
Best Practices with the DMMBest Practices with the DMM
Best Practices with the DMMDATAVERSITY
Ā 
Increasing Your Business Data and Analytics Maturity
Increasing Your Business Data and Analytics MaturityIncreasing Your Business Data and Analytics Maturity
Increasing Your Business Data and Analytics MaturityDATAVERSITY
Ā 

Similar to Data-Ed Webinar: Data Quality Engineering (20)

Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
Ā 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
Ā 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDM
Ā 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
Ā 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large Enterprises
Ā 
CDMP SLIDE TRAINER .pptx
CDMP SLIDE TRAINER .pptxCDMP SLIDE TRAINER .pptx
CDMP SLIDE TRAINER .pptx
Ā 
OAUG 05-2009-MDM-1683-A Fiteni CPA, CMA
OAUG 05-2009-MDM-1683-A Fiteni CPA, CMAOAUG 05-2009-MDM-1683-A Fiteni CPA, CMA
OAUG 05-2009-MDM-1683-A Fiteni CPA, CMA
Ā 
Workable Enteprise Data Governance
Workable Enteprise Data GovernanceWorkable Enteprise Data Governance
Workable Enteprise Data Governance
Ā 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
Ā 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Ā 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
Ā 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
Ā 
Data Governance Maturity Model
Data Governance Maturity ModelData Governance Maturity Model
Data Governance Maturity Model
Ā 
Increasing Your Business Data & Analytics Maturity
Increasing Your Business Data & Analytics MaturityIncreasing Your Business Data & Analytics Maturity
Increasing Your Business Data & Analytics Maturity
Ā 
Data-Ed Webinar: Design & Manage Data Structures
Data-Ed Webinar: Design & Manage Data Structures Data-Ed Webinar: Design & Manage Data Structures
Data-Ed Webinar: Design & Manage Data Structures
Ā 
Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures
Ā 
RungananW-DA&DG 201701 V2.0
RungananW-DA&DG 201701 V2.0RungananW-DA&DG 201701 V2.0
RungananW-DA&DG 201701 V2.0
Ā 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Ā 
Best Practices with the DMM
Best Practices with the DMMBest Practices with the DMM
Best Practices with the DMM
Ā 
Increasing Your Business Data and Analytics Maturity
Increasing Your Business Data and Analytics MaturityIncreasing Your Business Data and Analytics Maturity
Increasing Your Business Data and Analytics Maturity
Ā 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
Ā 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
Ā 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
Ā 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
Ā 
Data Catalogs Are the Answer ā€“ What Is the Question?
Data Catalogs Are the Answer ā€“ What Is the Question?Data Catalogs Are the Answer ā€“ What Is the Question?
Data Catalogs Are the Answer ā€“ What Is the Question?DATAVERSITY
Ā 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
Ā 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
Ā 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Ā 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
Ā 
The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
Ā 
Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?
Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?
Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?DATAVERSITY
Ā 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
Ā 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
Ā 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
Ā 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
Ā 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
Ā 
MLOps ā€“ Applying DevOps to Competitive Advantage
MLOps ā€“ Applying DevOps to Competitive AdvantageMLOps ā€“ Applying DevOps to Competitive Advantage
MLOps ā€“ Applying DevOps to Competitive AdvantageDATAVERSITY
Ā 
Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...DATAVERSITY
Ā 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
Ā 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Ā 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Ā 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
Ā 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
Ā 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
Ā 
Data Catalogs Are the Answer ā€“ What Is the Question?
Data Catalogs Are the Answer ā€“ What Is the Question?Data Catalogs Are the Answer ā€“ What Is the Question?
Data Catalogs Are the Answer ā€“ What Is the Question?
Ā 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
Ā 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
Ā 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
Ā 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
Ā 
The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta ā€“ Privacy, Security & Governance Race from Reactivity to Re...
Ā 
Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?
Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?
Emerging Trends in Data Architecture ā€“ Whatā€™s the Next Big Thing?
Ā 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
Ā 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
Ā 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
Ā 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
Ā 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
Ā 
MLOps ā€“ Applying DevOps to Competitive Advantage
MLOps ā€“ Applying DevOps to Competitive AdvantageMLOps ā€“ Applying DevOps to Competitive Advantage
MLOps ā€“ Applying DevOps to Competitive Advantage
Ā 
Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data ā€“ Why You Need Data Observability to Improve D...
Ā 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
Ā 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
Ā 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Ā 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
Ā 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
Ā 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Ā 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĆŗjo
Ā 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
Ā 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Ā 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Ā 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Ā 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 

Data-Ed Webinar: Data Quality Engineering

  • 1. Copyright 2013 by Data Blueprint 1 Unlock Business Value through Data Quality Engineering Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar focuses on obtaining business value from data quality initiatives. I will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re- occurring. Date: April 8, 2014 Time: 2:00 PM ET/11:00 AM PT Presenter: Peter Aiken, Ph.D. Time: ā€¢ timeliness ā€¢ currency ā€¢ frequency ā€¢ time period Form: ā€¢ clarity ā€¢ detail ā€¢ order ā€¢ presentation ā€¢ media Content: ā€¢ accuracy ā€¢ relevance ā€¢ completeness ā€¢ conciseness ā€¢ scope ā€¢ performance Time: ā€¢ timeliness ā€¢ currency ā€¢ frequency ā€¢ time period Form: ā€¢ clarity ā€¢ detail ā€¢ order ā€¢ presentation ā€¢ media Content: ā€¢ accuracy ā€¢ relevance ā€¢ completeness ā€¢ conciseness ā€¢ scope ā€¢ performance
  • 2. Copyright 2013 by Data Blueprint Get Social With Us! Live Twitter Feed Join the conversation! Follow us: @datablueprint @paiken Ask questions and submit your comments: #dataed 2 Like Us on Facebook www.facebook.com/datablueprint Post questions and comments Find industry news, insightful content and event updates. Join the Group Data Management & Business Intelligence Ask questions, gain insights and collaborate with fellow data management professionals
  • 3. Copyright 2013 by Data Blueprint 3 Peter Aiken, PhD ā€¢ 25+ years of experience in data management ā€¢ Multiple international awards & recognition ā€¢ Founder, Data Blueprint (datablueprint.com) ā€¢ Associate Professor of IS, VCU (vcu.edu) ā€¢ President, DAMA International (dama.org) ā€¢ 8 books and dozens of articles ā€¢ Experienced w/ 500+ data management practices in 20 countries ā€¢ Multi-year immersions with organizations as diverse as the US DoD, Nokia, Deutsche Bank, Wells Fargo, and the Commonwealth of Virginia 2
  • 4. Unlock Business Value through Data Quality Engineering Presented by Peter Aiken, Ph.D.
  • 5. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 5 Tweeting now: #dataed
  • 6. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 6 Tweeting now: #dataed
  • 7. Data Program Coordination Feedback Data Development Copyright 2013 by Data Blueprint Standard Data Organizational DM Practices and their Inter-relationships Organizational Strategies Goals Business Data Business Value Application Models & Designs Implementation Direction Guidance 7 Organizational Data Integration Data Stewardship Data Support Operations Data Asset Use Integrated Models
  • 8. Data Program Coordination Feedback Data Development Copyright 2013 by Data Blueprint Standard Data Organizational DM Practices and their Inter-relationships Organizational Strategies Goals Business Data Business Value Application Models & Designs Implementation Direction Guidance Identifying, modeling, coordinating, organizing, distributing, and architecting data shared across business areas or organizational boundaries. Ensuring that specific individuals are assigned the responsibility for the maintenance of specific data as organizational assets, and that those individuals are provided the requisite knowledge, skills, and abilities to accomplish these goals in conjunction with other data stewards in the organization. Initiation, operation, tuning, maintenance, backup/recovery, archiving and disposal of data assets in support of organizational activities. 8 Specifying and designing appropriately architected data assets that are engineered to be capable of supporting organizational needs. Organizational Data Integration Data Stewardship Data Support Operations Data Asset Use Integrated Models Defining, coordinating, resourcing, implementing, and monitoring organizational data program strategies, policies, plans, etc. as coherent set of activities.
  • 9. Data Program Coordination Feedback Data Development Copyright 2013 by Data Blueprint Standard Data Five Integrated DM Practice Areas Organizational Strategies Goals Business Data Business Value Application Models & Designs Implementation Direction Guidance 9 Organizational Data Integration Data Stewardship Data Support Operations Data Asset Use Integrated Models Leverage data in organizational activities Data management processes and infrastructure Combining multiple assets to produce extra value Organizational-entity subject area data integration Provide reliable data access Achieve sharing of data within a business area
  • 10. Copyright 2013 by Data Blueprint Five Integrated DM Practice Areas 10 Manage data coherently. Share data across boundaries. Assign responsibilities for data. Engineer data delivery systems. Maintain data availability. Data Program Coordination Organizational Data Integration Data Stewardship Data Development Data Support Operations
  • 11. Copyright 2013 by Data Blueprint ā€¢ 5 Data Management Practices Areas / Data Management Basics ā€¢ Are necessary but insufficient prerequisites to organizational data leveraging applications (that is Self Actualizing Data or Advanced Data Practices) Basic Data Management Practices ā€“ Data Program Management ā€“ Organizational Data Integration ā€“ Data Stewardship ā€“ Data Development ā€“ Data Support Operations http://3.bp.blogspot.com/-ptl-9mAieuQ/T-idBt1YFmI/ AAAAAAAABgw/Ib-nVkMmMEQ/s1600/ maslows_hierarchy_of_needs.png Advanced Data Practices ā€¢ Cloud ā€¢ MDM ā€¢ Mining ā€¢ Analytics ā€¢ Warehousing ā€¢ Big Data Management Practices Hierarchy (after Maslow)
  • 12. Copyright 2013 by Data Blueprint Data Management Body of Knowledge 12 Data Management Functions
  • 13. ā€¢ Published by DAMA International ā€“ The professional association for Data Managers (40 chapters worldwide) ā€“ DMBoK organized around ā€¢ Primary data management functions focused around data delivery to the organization (dama.org) ā€¢ Organized around several environmental elements ā€¢ CDMP ā€“ Certified Data Management Professional ā€“ DAMA International and ICCP ā€“ Membership in a distinct group made up of your fellow professionals ā€“ Recognition for your specialized knowledge in a choice of 17 specialty areas ā€“ Series of 3 exams ā€“ For more information, please visit: ā€¢ http://www.dama.org/i4a/pages/index.cfm?pageid=3399 ā€¢ http://iccp.org/certification/designations/cdmp Copyright 2013 by Data Blueprint DAMA DM BoK & CDMP 13
  • 14. Copyright 2013 by Data Blueprint Overview: Data Quality Engineering 14
  • 15. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 15 Tweeting now: #dataed
  • 16. Copyright 2013 by Data Blueprint Data Data Data Information Fact Meaning Request A Model Specifying Relationships Among Important Terms [Built on definition by Dan Appleton 1983] Intelligence Use 1. Each FACT combines with one or more MEANINGS. 2. Each specific FACT and MEANING combination is referred to as a DATUM. 3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST 4. INFORMATION REUSE is enabled when one FACT is combined with more than one MEANING. 5. INTELLIGENCE is INFORMATION associated with its USES. Wisdom & knowledge are often used synonymously Data Data Data Data 16
  • 17. Copyright 2013 by Data Blueprint Definitions ā€¢ Quality Data ā€“ Fit for use meets the requirements of its authors, users, and administrators (adapted from Martin Eppler) ā€“ Synonymous with information quality, since poor data quality results in inaccurate information and poor business performance ā€¢ Data Quality Management ā€“ Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure data quality ā€“ Entails the "establishment and deployment of roles, responsibilities concerning the acquisition, maintenance, dissemination, and disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf āœ“ Critical supporting process from change management āœ“ Continuous process for defining acceptable levels of data quality to meet business needs and for ensuring that data quality meets these levels ā€¢ Data Quality Engineering ā€“ Recognition that data quality solutions cannot not managed but must be engineered ā€“ Engineering is the application of scientific, economic, social, and practical knowledge in order to design, build, and maintain solutions to data quality challenges ā€“ Engineering concepts are generally not known and understood within IT or business! 17 Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166
  • 18. Copyright 2013 by Data Blueprint Improving Data Quality during System Migration 18 ā€¢ Challenge ā€“ Millions of NSN/SKUs maintained in a catalog ā€“ Key and other data stored in clear text/comment fields ā€“ Original suggestion was manual approach to text extraction ā€“ Left the data structuring problem unsolved ā€¢ Solution ā€“ Proprietary, improvable text extraction process ā€“ Converted non-tabular data into tabular data ā€“ Saved a minimum of $5 million ā€“ Literally person centuries of work
  • 19. Unmatched Items Ignorable Items Items Matched Week # (% Total) (% Total) (% Total) 1 31.47% 1.34% N/A 2 21.22% 6.97% N/A 3 20.66% 7.49% N/A 4 32.48% 11.99% 55.53% ā€¦ ā€¦ ā€¦ ā€¦ 14 9.02% 22.62% 68.36% 15 9.06% 22.62% 68.33% 16 9.53% 22.62% 67.85% 17 9.50% 22.62% 67.88% 18 7.46% 22.62% 69.92% Copyright 2013 by Data Blueprint Determining Diminishing Returns 19
  • 20. Time needed to review all NSNs once over the life of the project:Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time:Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration:Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration:Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Copyright 2013 by Data Blueprint 20 Quantitative Benefits
  • 21. Copyright 2013 by Data Blueprint Data Quality Misconceptions 1. You can fix the data 2. Data quality is an IT problem 3. The problem is in the data sources or data entry 4. The data warehouse will provide a single version of the truth 5. The new system will provide a single version of the truth 6. Standardization will eliminate the problem of different "truths" represented in the reports or analysis Source: Business Intelligence solutions, Athena Systems 21
  • 22. The Blind Men and the Elephant ā€¢ It was six men of Indostan, To learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mind. ā€¢ The First approached the Elephant, And happening to fall Against his broad and sturdy side, At once began to bawl: "God bless me! but the Elephant Is very like a wall!" ā€¢ The Second, feeling of the tusk Cried, "Ho! what have we here, So very round and smooth and sharp? To me `tis mighty clear This wonder of an Elephant Is very like a spear!" ā€¢ The Third approached the animal, And happening to take The squirming trunk within his hands, Thus boldly up he spake: "I see," quoth he, "the Elephant Is very like a snake!" ā€¢ The Fourth reached out an eager hand, And felt about the knee: "What most this wondrous beast is like Is mighty plain," quoth he; "'Tis clear enough the Elephant Is very like a tree!" ā€¢ The Fifth, who chanced to touch the ear, Said: "E'en the blindest man Can tell what this resembles most; Deny the fact who can, This marvel of an Elephant Is very like a fan!" ā€¢ The Sixth no sooner had begun About the beast to grope, Than, seizing on the swinging tail That fell within his scope. "I see," quoth he, "the Elephant Is very like a rope!" ā€¢ And so these men of Indostan Disputed loud and long, Each in his own opinion Exceeding stiff and strong, Though each was partly in the right, And all were in the wrong! (Source: John Godfrey Saxe's ( 1816-1887) version of the famous Indian legend ) 22 Copyright 2013 by Data Blueprint
  • 23. Copyright 2013 by Data Blueprint No universal conception of data quality exists, instead many differing perspective compete. ā€¢ Problem: ā€“Most organizations approach data quality problems in the same way that the blind men approached the elephant - people tend to see only the data that is in front of them ā€“Little cooperation across boundaries, just as the blind men were unable to convey their impressions about the elephant to recognize the entire entity. ā€“Leads to confusion, disputes and narrow views ā€¢ Solution: ā€“Data quality engineering can help achieve a more complete picture and facilitate cross boundary communications 23
  • 24. Copyright 2013 by Data Blueprint Structured Data Quality Engineering 1. Allow the form of the Problem to guide the form of the solution 2. Provide a means of decomposing the problem 3. Feature a variety of tools simplifying system understanding 4. Offer a set of strategies for evolving a design solution 5. Provide criteria for evaluating the quality of the various solutions 6. Facilitate development of a framework for developing organizational knowledge. 24
  • 25. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 25 Tweeting now: #dataed
  • 26. Copyright 2013 by Data Blueprint Mizuho Securities ā€¢ Wanted to sell 1 share for 600,000 yen ā€¢ Sold 600,000 shares for 1 yen ā€¢ $347 million loss ā€¢ In-house system did not have limit checking ā€¢ Tokyo stock exchange system did not have limit checking ... ā€¢ ā€¦ and doesn't allow order cancellations CLUMSY typing cost a Japanese bank at least Ā£128 million and staff their Christmas bonuses yesterday, after a trader mistakenly sold 600,000 more shares than he should have. The trader at Mizuho Securities, who has not been named, fell foul of what is known in financial circles as ā€œfat finger syndromeā€ where a dealer types incorrect details into his computer. He wanted to sell one share in a new telecoms company called J Com, for 600,000 yen (about Ā£3,000). Infamous Data Quality Example 26
  • 27. Copyright 2013 by Data Blueprint Four ways to make your data sparkle! 1.Prioritize the task ā€“ Cleaning data is costly and time consuming ā€“ Identify mission critical/non-mission critical data 2.Involve the data owners ā€“ Seek input of business units on what constitutes "dirty" data 3.Keep future data clean ā€“ Incorporate processes and technologies that check every zip code and area code 4.Align your staff with business ā€“ Align IT staff with business units (Source: CIO JULY 1 2004) 27
  • 28. Copyright 2013 by Data Blueprint ā€¢ Deming cycle ā€¢ "Plan-do-study-act" or "plan-do-check-act" 1. Identifying data issues that are critical to the achievement of business objectives 2. Defining business requirements for data quality 3. Identifying key data quality dimensions 4. Defining business rules critical to ensuring high quality data 28 The DQE Cycle
  • 29. Copyright 2013 by Data Blueprint The DQE Cycle: (1) Plan ā€¢ Plan for the assessment of the current state and identification of key metrics for measuring quality ā€¢ The data quality engineering team assesses the scope of known issues ā€“ Determining cost and impact ā€“ Evaluating alternatives for addressing them 29
  • 30. Copyright 2013 by Data Blueprint The DQE Cycle: (2) Deploy 30 ā€¢ Deploy processes for measuring and improving the quality of data: ā€¢ Data profiling ā€“ Institute inspections and monitors to identify data issues when they occur ā€“ Fix flawed processes that are the root cause of data errors or correct errors downstream ā€“ When it is not possible to correct errors at their source, correct them at their earliest point in the data flow
  • 31. Copyright 2013 by Data Blueprint The DQE Cycle: (3) Monitor ā€¢ Monitor the quality of data as measured against the defined business rules ā€¢ If data quality meets defined thresholds for acceptability, the processes are in control and the level of data quality meets the business requirements ā€¢ If data quality falls below acceptability thresholds, notify data stewards so they can take action during the next stage 31
  • 32. Copyright 2013 by Data Blueprint The DQE Cycle: (4) Act ā€¢ Act to resolve any identified issues to improve data quality and better meet business expectations ā€¢ New cycles begin as new data sets come under investigation or as new data quality requirements are identified for existing data sets 32
  • 33. Copyright 2013 by Data Blueprint DQE Context & Engineering Concepts ā€¢ Can rules be implemented stating that no data can be corrected unless the source of the error has been discovered and addressed? ā€¢ All data must be 100% perfect? ā€¢ Pareto ā€“ 80/20 rule ā€“ Not all data is of equal Importance ā€¢ Scientific, economic, social, and practical knowledge 33
  • 34. Copyright 2013 by Data Blueprint Data quality is now acknowledged as a major source of organizational risk by certified risk professionals! 34
  • 35. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 35 Tweeting now: #dataed
  • 36. Copyright 2013 by Data Blueprint Two Distinct Activities Support Quality Data 36 ā€¢ Data quality best practices depend on both ā€“ Practice-oriented activities ā€“ Structure-oriented activities Practice-oriented activities focus on the capture and manipulation of data Structure-oriented activities focus on the data implementation Quality Data
  • 37. Copyright 2013 by Data Blueprint Practice-Oriented Activities 37 ā€¢ Stem from a failure to rigor when capturing/manipulating data such as: ā€“ Edit masking ā€“ Range checking of input data ā€“ CRC-checking of transmitted data ā€¢ Affect the Data Value Quality and Data Representation Quality ā€¢ Examples of improper practice-oriented activities: ā€“ Allowing imprecise or incorrect data to be collected when requirements specify otherwise ā€“ Presenting data out of sequence ā€¢ Typically diagnosed in bottom-up manner: find and fix the resulting problem ā€¢ Addressed by imposing more rigorous data-handling governance Quality of Data Representation Quality of Data Values Practice-oriented activities
  • 38. Copyright 2013 by Data Blueprint Structure-Oriented Activities 38 ā€¢ Occur because of data and metadata that has been arranged imperfectly. For example: ā€“ When the data is in the system but we just can't access it; ā€“ When a correct data value is provided as the wrong response to a query; or ā€“ When data is not provided because it is unavailable or inaccessible to the customer ā€¢ Developer focus within system boundaries instead of within organization boundaries ā€¢ Affect the Data Model Quality and Data Architecture Quality ā€¢ Examples of improper structure-oriented activities: ā€“ Providing a correct response but incomplete data to a query because the user did not comprehend the system data structure ā€“ Costly maintenance of inconsistent data used by redundant systems ā€¢ Typically diagnosed in top-down manner: root cause fixes ā€¢ Addressed through fundamental data structure governance Quality of Data Architecture Quality of Data Models Structure-oriented activities
  • 39. Copyright 2013 by Data Blueprint Quality Dimensions 39
  • 40. Copyright 2013 by Data Blueprint A congratulations letter from another bank Problems ā€¢ Bank did not know it made an error ā€¢ Tools alone could not have prevented this error ā€¢ Lost confidence in the ability of the bank to manage customer funds 40
  • 41. Copyright 2013 by Data Blueprint 4 Dimensions of Data Quality 41 An organizationā€™s overall data quality is a function of four distinct components, each with its own attributes: ā€¢ Data Value: the quality of data as stored & maintained in the system ā€¢ Data Representation ā€“ the quality of representation for stored values; perfect data values stored in a system that are inappropriately represented can be harmful ā€¢ Data Model ā€“ the quality of data logically representing user requirements related to data entities, associated attributes, and their relationships; essential for effective communication among data suppliers and consumers ā€¢ Data Architecture ā€“ the coordination of data management activities in cross-functional system development and operations Practice- oriented Structure- oriented
  • 42. Copyright 2013 by Data Blueprint Effective Data Quality Engineering 42 Data Representation Quality As presented to the user Data Value Quality As maintained in the system Data Model Quality As understood by developers Data Architecture Quality As an organizational asset (closer to the architect)(closer to the user) ā€¢ Data quality engineering has been focused on operational problem correction ā€“ Directing attention to practice-oriented data imperfections ā€¢ Data quality engineering is more effective when also focused on structure-oriented causes ā€“ Ensuring the quality of shared data across system boundaries
  • 43. Copyright 2013 by Data Blueprint Full Set of Data Quality Attributes 43
  • 44. Copyright 2013 by Data Blueprint Difficult to obtain leverage at the bottom of the falls 44
  • 45. Copyright 2013 by Data Blueprint Frozen Falls 45
  • 46. Copyright 2013 by Data Blueprint New York Turns to Big Data to Solve Big Tree Problem ā€¢ NYC ā€“ 2,500,000 trees ā€¢ 11-months from 2009 to 2010 ā€“ 4 people were killed or seriously injured by falling tree limbs in Central Park alone ā€¢ Belief ā€“ Arborists believe that pruning and otherwise maintaining trees can keep them healthier and make them more likely to withstand a storm, decreasing the likelihood of property damage, injuries and deaths ā€¢ Until recently ā€“ No research or data to back it up 46 http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05
  • 47. Copyright 2013 by Data Blueprint NYC's Big Tree Problem ā€¢ Question ā€“ Does pruning trees in one year reduce the number of hazardous tree conditions in the following year? ā€¢ Lots of data but granularity challenges ā€“ Pruning data recorded block by block ā€“ Cleanup data recorded at the address level ā€“ Trees have no unique identifiers ā€¢ After downloading, cleaning, merging, analyzing and intensive modeling ā€“ Pruning trees for certain types of hazards caused a 22 percent reduction in the number of times the department had to send a crew for emergency cleanups ā€¢ The best data analysis ā€“ Generates further questions ā€¢ NYC cannot prune each block every year ā€“ Building block risk profiles: number of trees, types of trees, whether the block is in a flood zone or storm zone 47 http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05
  • 48. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 48 Tweeting now: #dataed
  • 49. Copyright 2013 by Data Blueprint Letter from the Bank ā€¦ so please continue to open your mail from either Chase or Bank One P.S. Please be on the lookout for any upcoming communications from either Chase or Bank One regarding your Bank One credit card and any other Bank One product you may have. Problems ā€¢ I initially discarded the letter! ā€¢ I became upset after reading it ā€¢ It proclaimed that Chase has data quality challenges 49
  • 50. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 50 Tweeting now: #dataed
  • 51. Copyright 2013 by Data Blueprint Data acquisition activities Data usage activitiesData storage Traditional Quality Life Cycle 51
  • 52. restored data Metadata Creation Metadata Refinement Metadata Structuring Data Utilization Copyright 2013 by Data Blueprint Data Manipulation Data Creation Data Storage Data Assessment Data Refinement 52 data architecture & models populated data models and storage locations data values data values data values value defects structure defects architecture refinements model refinements Data Life Cycle Model Products data
  • 53. restored data Metadata Refinement Metadata Structuring Data Utilization Copyright 2013 by Data Blueprint Data Manipulation Data Creation Data Storage Data Assessment Data Refinement 53 populated data models and storage locations data values Data Life Cycle Model: Quality Focus data architecture & model quality model quality value quality value quality value quality representation quality Metadata Creation architecture quality
  • 54. Copyright 2013 by Data Blueprint Starting point for new system development data performance metadata data architecture data architecture and data models shared data updated data corrected data architecture refinements facts & meanings Metadata & Data Storage Starting point for existing systems Metadata Refinement ā€¢ Correct Structural Defects ā€¢ Update Implementation Metadata Creation ā€¢ Define Data Architecture ā€¢ Define Data Model Structures Metadata Structuring ā€¢ Implement Data Model Views ā€¢ Populate Data Model Views Data Refinement ā€¢ Correct Data Value Defects ā€¢ Re-store Data Values Data Manipulation ā€¢ Manipulate Data ā€¢ Updata Data Data Utilization ā€¢ Inspect Data ā€¢ Present Data Data Creation ā€¢ Create Data ā€¢ Verify Data Values Data Assessment ā€¢ Assess Data Values ā€¢ Assess Metadata Extended data life cycle model with metadata sources and uses 54
  • 55. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 55 Tweeting now: #dataed
  • 56. Copyright 2013 by Data Blueprint Profile, Analyze and Assess DQ ā€¢ Data assessment using 2 different approaches: ā€“ Bottom-up ā€“ Top-down ā€¢ Bottom-up assessment: ā€“ Inspection and evaluation of the data sets ā€“ Highlight potential issues based on the results of automated processes ā€¢ Top-down assessment: ā€“ Engage business users to document their business processes and the corresponding critical data dependencies ā€“ Understand how their processes consume data and which data elements are critical to the success of the business applications 56
  • 57. Copyright 2013 by Data Blueprint Define DQ Measures ā€¢ Measures development occurs as part of the strategy/ design/plan step ā€¢ Process for defining data quality measures: 1. Select one of the identified critical business impacts 2. Evaluate the dependent data elements, create and update processes associate with that business impact 3. List any associated data requirements 4. Specify the associated dimension of data quality and one or more business rules to use to determine conformance of the data to expectations 5. Describe the process for measuring conformance 6. Specify an acceptability threshold 57
  • 58. Copyright 2013 by Data Blueprint Set and Evaluate DQ Service Levels ā€¢ Data quality inspection and monitoring are used to measure and monitor compliance with defined data quality rules ā€¢ Data quality SLAs specify the organizationā€™s expectations for response and remediation ā€¢ Operational data quality control defined in data quality SLAs includes: ā€“ Data elements covered by the agreement ā€“ Business impacts associated with data flaws ā€“ Data quality dimensions associated with each data element ā€“ Quality expectations for each data element of the identified dimensions in each application for system in the value chain ā€“ Methods for measuring against those expectations ā€“ (ā€¦) 58
  • 59. Measure, Monitor & Manage DQ Copyright 2013 by Data Blueprint ā€¢ DQM procedures depend on available data quality measuring and monitoring services ā€¢ 2 contexts for control/measurement of conformance to data quality business rules exist: ā€“ In-stream: collect in-stream measurements while creating data ā€“ In batch: perform batch activities on collections of data instances assembled in a data set ā€¢ Apply measurements at 3 levels of granularity: ā€“ Data element value ā€“ Data instance or record ā€“ Data set 59
  • 60. Copyright 2013 by Data Blueprint Overview: Data Quality Tools 4 categories of activities: 1) Analysis 2) Cleansing 3) Enhancement 4) Monitoring 60 Principal tools: 1) Data Profiling 2) Parsing and Standardization 3) Data Transformation 4) Identity Resolution and Matching 5) Enhancement 6) Reporting
  • 61. Copyright 2013 by Data Blueprint DQ Tool #1: Data Profiling ā€¢ Data profiling is the assessment of value distribution and clustering of values into domains ā€¢ Need to be able to distinguish between good and bad data before making any improvements ā€¢ Data profiling is a set of algorithms for 2 purposes: ā€“ Statistical analysis and assessment of the data quality values within a data set ā€“ Exploring relationships that exist between value collections within and across data sets ā€¢ At its most advanced, data profiling takes a series of prescribed rules from data quality engines. It then assesses the data, annotates and tracks violations to determine if they comprise new or inferred data quality rules 61
  • 62. Copyright 2013 by Data Blueprint DQ Tool #1: Data Profiling, contā€™d ā€¢ Data profiling vs. data quality-business context and semantic/logical layers ā€“ Data quality is concerned with proscriptive rules ā€“ Data profiling looks for patterns when rules are adhered to and when rules are violated; able to provide input into the business context layer ā€¢ Incumbent that data profiling services notify all concerned parties of whatever is discovered ā€¢ Profiling can be used toā€¦ ā€“ ā€¦notify the help desk that valid changes in the data are about to case an avalanche of ā€œskeptical userā€ calls ā€“ ā€¦notify business analysts of precisely where they should be working today in terms of shifts in the data 62
  • 63. Copyright 2013 by Data Blueprint Courtesy GlobalID.com 63
  • 64. Copyright 2013 by Data Blueprint DQ Tool #2: Parsing & Standardization ā€¢ Data parsing tools enable the definition of patterns that feed into a rules engine used to distinguish between valid and invalid data values ā€¢ Actions are triggered upon matching a specific pattern ā€¢ When an invalid pattern is recognized, the application may attempt to transform the invalid value into one that meets expectations ā€¢ Data standardization is the process of conforming to a set of business rules and formats that are set up by data stewards and administrators ā€¢ Data standardization example: ā€“ Brining all the different formats of ā€œstreetā€ into a single format, e.g. ā€œSTRā€, ā€œST.ā€, ā€œSTRTā€, ā€œSTREETā€, etc. 64
  • 65. Copyright 2013 by Data Blueprint DQ Tool #3: Data Transformation ā€¢ Upon identification of data errors, trigger data rules to transform the flawed data ā€¢ Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation ā€¢ Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base 65
  • 66. Copyright 2013 by Data Blueprint DQ Tool #4: Identify Resolution & Matching ā€¢ Data matching enables analysts to identify relationships between records for de-duplication or group-based processing ā€¢ Matching is central to maintaining data consistency and integrity throughout the enterprise ā€¢ The matching process should be used in the initial data migration of data into a single repository ā€¢ 2 basic approaches to matching: ā€¢ Deterministic ā€“ Relies on defined patterns/rules for assigning weights and scores to determine similarity ā€“ Predictable ā€“ Dependent on rules developers anticipations ā€¢ Probabilistic ā€“ Relies on statistical techniques for assessing the probability that any pair of record represents the same entity ā€“ Not reliant on rules ā€“ Probabilities can be refined based on experience -> matchers can improve precision as more data is analyzed 66
  • 67. Copyright 2013 by Data Blueprint DQ Tool #5: Enhancement ā€¢ Definition: ā€“ A method for adding value to information by accumulating additional information about a base set of entities and then merging all the sets of information to provide a focused view. Improves master data. ā€¢ Benefits: ā€“ Enables use of third party data sources ā€“ Allows you to take advantage of the information and research carried out by external data vendors to make data more meaningful and useful ā€¢ Examples of data enhancements: ā€“ Time/date stamps ā€“ Auditing information ā€“ Contextual information ā€“ Geographic information ā€“ Demographic information ā€“ Psychographic information 67
  • 68. Copyright 2013 by Data Blueprint DQ Tool #6: Reporting ā€¢ Good reporting supports: ā€“ Inspection and monitoring of conformance to data quality expectations ā€“ Monitoring performance of data stewards conforming to data quality SLAs ā€“ Workflow processing for data quality incidents ā€“ Manual oversight of data cleansing and correction ā€¢ Data quality tools provide dynamic reporting and monitoring capabilities ā€¢ Enables analyst and data stewards to support and drive the methodology for ongoing DQM and improvement with a single, easy-to-use solution ā€¢ Associate report results with: ā€“ Data quality measurement ā€“ Metrics ā€“ Activity 68
  • 69. Copyright 2013 by Data Blueprint 1. Data Management Overview 2. DQE Definitions (w/ example) 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tools 7. Takeaways and Q&A Outline 69 Tweeting now: #dataed
  • 70. ā€¢ Develop and promote data quality awareness ā€¢ Define data quality requirements ā€¢ Profile, analyze and asses data quality ā€¢ Define data quality metrics ā€¢ Define data quality business rules ā€¢ Test and validate data quality requirements ā€¢ Set and evaluate data quality service levels ā€¢ Measure and monitor data quality ā€¢ Manage data quality issues ā€¢ Clean and correct data quality defects ā€¢ Design and implement operational DQM procedures ā€¢ Monitor operational DQM procedures and performance Copyright 2013 by Data Blueprint Overview: DQE Concepts and Activities 70
  • 71. Copyright 2013 by Data Blueprint Concepts and Activities ā€¢ Data quality expectations provide the inputs necessary to define the data quality framework: ā€“ Requirements ā€“ Inspection policies ā€“ Measures, and monitors that reflect changes in data quality and performance ā€¢ The data quality framework requirements reflect 3 aspects of business data expectations 1. A manner to record the expectation in business rules 2. A way to measure the quality of data within that dimension 3. An acceptability threshold 71 from The DAMA Guide to the Data Management Body of Knowledge Ā© 2009 by DAMA International
  • 72. Copyright 2013 by Data Blueprint Summary: Data Quality Engineering 72 1/26/2010 Ā© Copyright this and previous years by Data Blueprint - all rights reserved!
  • 73. 10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056
  • 74. Copyright 2013 by Data Blueprint Questions? 74 + = Itā€™s your turn! Use the chat feature or Twitter (#dataed) to submit your questions to Peter now.
  • 75. Developing a Data-centric Strategy & Roadmap Enterprise Data World April 28, 2014 @ 8:30 AM CT Data Architecture Requirements May 13, 2014 @ 2:00 PM ET/11:00 AM PT Monetizing Data Management June 10, 2014 @ 2:00 PM ET/11:00 AM PT Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net Copyright 2013 by Data Blueprint Upcoming Events 75
  • 76. Copyright 2013 by Data Blueprint References & Recommended Reading 76 ā€¢ The DAMA Guide to the Data Management Body of Knowledge Ā© 2009 by DAMA International ā€¢ http://www2.sas.com/proceedings/sugi29/098-29.pdf
  • 77. Copyright 2013 by Data Blueprint Data Quality Dimensions 77
  • 78. Copyright 2013 by Data Blueprint Data Value Quality 78
  • 79. Copyright 2013 by Data Blueprint Data Representation Quality 79
  • 80. Copyright 2013 by Data Blueprint Data Model Quality 80
  • 81. Copyright 2013 by Data Blueprint Data Architecture Quality 81
  • 82. Copyright 2013 by Data Blueprint Guiding Principles ā€¢ Manage data as a core organizational asset. ā€¢ Identify a gold record for all data elements ā€¢ All data elements will have a standardized data definition, data type, and acceptable value domain ā€¢ Leverage data governance for the control and performance of DQM ā€¢ Use industry and international data standards whenever possible ā€¢ Downstream data consumers specify data quality expectations ā€¢ Define business rules to assert conformance to data quality expectations ā€¢ Validate data instances and data sets against defined business rules ā€¢ Business process owners will agree to and abide by data quality SLAs ā€¢ Apply data corrections at the original source if possible ā€¢ If it is not possible to correct data at the source, forward data corrections to the owner of the original source. Influence on data brokers to conform to local requirements may be limited ā€¢ Report measured levels of data quality to appropriate data stewards, business process owners, and SLA managers 82
  • 83. Copyright 2013 by Data Blueprint Goals and Principles data quality control into the system development life cycle ā€¢ To provide defined processes for measuring, monitoring, and reporting conformance to acceptable levels of data quality 83 1/26/2010 Ā© Copyright this and previous years by Data Blueprint - all rights reserved! ā€¢ To measurably improve the quality of data in relation to defined business expectations ā€¢ To define requirements and specifications for integrating
  • 84. Copyright 2013 by Data Blueprint Primary Deliverables ā€¢ Improved Quality Data ā€¢ Data Management Operational Analysis ā€¢ Data profiles ā€¢ Data Quality Certification Reports ā€¢ Data Quality Service Level Agreements 84
  • 85. Copyright 2013 by Data Blueprint Roles and Responsibilities 85 1/26/2010 Ā© Copyright this and previous years by Data Blueprint - all rights reserved! Suppliers: ā€¢ External Sources ā€¢ Regulatory Bodies ā€¢ Business Subject Matter Experts ā€¢ Information Consumers ā€¢ Data Producers ā€¢ Data Architects ā€¢ Data Modelers ā€¢ Data Stewards Participants: ā€¢ Data Quality Analysts ā€¢ Data Analysts ā€¢ Database Administrators ā€¢ Data Stewards ā€¢ Other Data Professionals ā€¢ DRM Director ā€¢ Data Stewardship Council Consumers: ā€¢ Data Stewards ā€¢ Data Professionals ā€¢ Other IT Professionals ā€¢ Knowledge Workers ā€¢ Managers and Executives ā€¢ Customers