Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Managing Blind Chapter 1
1.
2. Managing Blind
A Data Quality
and
Data Governance
Vade Mecum
By Peter R. Benson
Project Leader for ISO 8000, the International
Standard for Data Quality
Edited by Melissa M. Hildebrand
rev 2012.06.28
Copyright 2012 by Peter R. Benson
ECCMA Edition
ECCMA Edition License Notes:
This eBook is licensed for your personal enjoyment only. This
eBook may not be re-sold or given away to other people. If you
would like to share this eBook with another person, please
purchase an additional copy for each recipient. If you’re
reading this eBook and did not purchase it, or it was not
purchased for your use only, then please visit eccma.org and
purchase your own copy. It is also available at
Smashwords.com. Thank you for respecting the hard work of
this author.
***~~~***
2
3. Table of Contents
Preface
Basic principles
Chapter 1: Show me the money
Chapter 2: The law of unintended consequences
Chapter 3: Defining data and information
Chapter 4: The characteristics of data and information
Chapter 5: A simplified taxonomy of data
Chapter 6: Defining data quality
Chapter 7: Stating requirements for data
Chapter 8: Building a corporate business language
Chapter 9: Classifications
Chapter 10: Master data record duplication
Chapter 11: Data governance
Chapter 12: Where do we go from here?
Appendix 1: Managing a data cleansing process for assets,
materials or services
Further readings
***~~~***
3
4. Chapter 1: Show me the money
Business is about profit and profit is generated in the short
term by reducing cost and increasing revenue but in the longer
term by managing risk.
Risk management is fundamental to the finance and insurance
industries where the ability to “predict” is at the core of the
business. The difference between an actuary and a gambler is
data. The actuary promotes their ability to record and analyze
data and the gambler must hide any such ability or risk being
asked to leave the casino.
It is not surprising that data plays a key role in risk
management. Taking a “calculated risk” implies there is some
data upon which you can actually perform the calculation.
Other than in the finance and insurance industries, risk
management is a hard sell to all but the most sophisticated
managers. Cost reduction is a management favorite and an
easier sell, but if you can associate data quality and
governance with revenue growth you’ve scored a home run.
Most recorded examples of failures due to missing or incorrect
data fall into the catastrophic loss category. This is only
because of the enormity of the loss compared to the ease with
which the error was made, or the tiny amount of data involved.
There are whole websites devoted to listing the financial
consequences of data errors. Some of my favorites include;
Timo Elliott’s YouTube account of a simple error in the property
17
5. tax records that resulted in school budget cutbacks, as well as,
the Mars Climate Orbiter. The Mars Climate Orbiter was a $327
million project that came to an untimely end because of what
has become known as the “metric mix-up.” The software on the
Mars Climate Orbiter used the metric system, while the ground
crew was entering data using the imperial system. There is also
the story of Napoleon’s army who was able to force the
surrender of the Austrian army at Ulm when the Russians failed
to turn up as scheduled purportedly because they were using
the Julian calendar and not the Gregorian calendar used by the
Austrians; now that is what I call being stood up!
We all have personal stories in having to deal with the
consequences of data errors but my absolute personal favorite,
at least in hindsight, involves the IRS. It all began one morning
when I was handed a crisp envelope from the IRS. Inside the
envelope was a letter explaining that I was going to be audited.
This sort of letter sends chills up your spine. When I recovered
and mustered the courage to call the number on the letter, I
was surprised to be speaking to an eminently reasonable
inspector. She asked me to confirm that I was claiming a
deduction for alimony paid to my ex-wife. Not exactly the sort
of thing you wanted to be reminded of, but I was happy to
confirm that this was indeed the case. “According to our
records you have been claiming this deduction for over ten
years,” again not something I cared to be reminded of, but the
answer was an easy “yes”. There was a worrying silence,
followed by, “I am afraid this is not possible.” The chills quickly
18
6. rolled up my spine again. “The social security number you have
entered on your tax return belongs to a fourteen year old
female living in Utah.” To my utter surprise and after a long
exhale, I was glad to be able to correct the error which turned
out to be no more that a reversal of two digits in the social
security number. You have to be impressed by the ability of the
IRS to connect the dots. I know I was, and I should have quit
while I was ahead. There had been recent news reports about
child brides in Utah, so my reply was “Well at least she was
from Utah.” It did not impress the IRS agent who reminded me
that the IRS office I was speaking to was in Utah; apparently
humor is not a requirement for an IRS agent.
What jumps out from these examples is the multiplier effect. A
simple data error can easily, and all too often does, mushroom
into larger, far reaching and lasting economic fallout. Data
errors are rarely benign; more often than not they are
catastrophic.
As a general rule, most managers are natural risk takers, and
unless you are in the insurance industry, it is an uphill struggle
to associate data quality and governance with meaningful value
in the form of risk management or loss mitigation with one
notable exception. By focusing on resolving frequent small
losses, rather than larger catastrophic losses, it is usually
possible to correlate data quality and governance with reducing
loss. Examples include, reducing production down time and
delivery delays. These are most often considered to be revenue
19
7. generation and not cost reduction. The correlation between
data quality and delivered production capacity or on time
delivery is generally accepted, and the calculation of the
additional revenue generated is straightforward.
The role quality data plays in reducing cost is also generally
accepted, although the specifics are poorly understood. There
is clear evidence that simple vendor rationalization or group
purchasing will drive down price. However this can be easily
overdone to the point of exchanging short term price
advantage for long term reliance on larger suppliers able to
reclaim the price advantage over the longer term. The ultimate
goal is to commoditize goods and services to the point where
there are many competing suppliers. This requires excellent
vendor, material and service master data. The rewards can be
huge, not only in highly competitive pricing but also in a
flexible and resilient supply chain.
As a general rule most companies can save 10% of their total
expenditure on materials and services simply by good
procurement practices which include maintaining up to date
material and service masters supported by negotiated
contracts. The challenge is to maintain the discipline in the face
of urgent and unpredictable requirements for goods or services.
Most companies make it difficult and time consuming to add a
new item to their material or service masters and the result is
“free text” or “maverick spend." These are off contract
purchases where the item purchased is not in the material or
20
8. service master, instead a “free text” description is entered in
the purchase order. Free text descriptions are rarely
accompanied by rigorous classification and as a result
management reports start to lose accuracy as an ever
increasing percentage of spend appears under the
“miscellaneous” or “unclassified” headings, hardly a
management confidence builder. It is interesting that most ERP
systems require absolute unambiguous identification of the
party to be paid, on the pretext that it is required by law, which
it is, but they do not require the unambiguous identification of
the items purchased. As many have found out at their
considerable expense, the law also requires the identification
and unambiguous description of the goods or service
purchased. As federal and state governments go on the hunt
for more tax revenue, we can expect to see greater scrutiny of
purchase order line item descriptions to determine what is and
what is not accepted as an "ordinary and necessary” business
expense.
The most common scenario is a big effort to rationalize
procurement, which is then accompanied by a substantial drop
in free text spend. A big part of this effort is the identification
of duplicates. Vendor master duplicates are actually rare in
terms of the identification of the legal entity that needs to be
paid, but less rare is a lack of understanding of the relationship
between suppliers and how this impacts pricing. Customer
record duplication is actually surprisingly common, and worst of
all is material master duplication. Material master record
21
9. duplication all by itself can easily be responsible for up to a
30% price differential. Chapter 10 deals specifically with the
issue of the identification and resolution of duplicate records
but suffice to say it is not as straight forward of an issue as
many believe. Duplication is a matter of perspective and
timing.
Without good data governance that keeps the master data up
to date, data quality degrades and free text purchasing rises
again. Free text spend is actually a great indicator of the
success of a data quality and data governance program; the
lower the free text spend the more successful the program. It
is not hard to justify a data quality and data governance
program based on the initial measurable savings, but it is
harder to maintain a program as a cost avoidance initiative.
The ultimate goal is to associate a data quality and governance
program with revenue growth, preferably profitable revenue
growth. This can appear challenging but in reality it is not.
In 2010, The Economist Intelligence Unit’s editorial team
conducted a survey of 602 senior executives. Of which, 96% of
the executives surveyed considered data either “extremely
(69%) or somewhat (27%) valuable in creating and
maintaining a competitive advantage.”
Debra D'Agostino, Managing Editor of Business Research at the
Economist Intelligence Unit and editor of the report also states
"It's not enough to merely collect the data; companies need to
create strategies to ensure they can use information to get
22
10. ahead of their competitors."
How do you use data, let alone data quality and governance as
a competitive advantage? The most common answer is to look
inwards and consider data as a source of knowledge to be
mined for business intelligence. This has been done with
phenomenal success. From targeting customers with highly
contextual and relevant offers, to cutting edge logistics, to
product customization and everything in between.
Wal-Mart can rightly be said to be an information company that
uses retail to generate revenue and not a retail outlet that uses
information to maximize revenue. Data itself has value and
many companies have successfully turned their data into a
revenue source.
Roger Ehrenberg states it well when he says, “In today's world,
every business generates potentially valuable data. The
question is, are there ways of turning passive data into an
active asset to increase the value of the business by making its
products better, delivering a better customer experience, or
creating a data stream that can be licensed to someone for
whom it is most valuable?”
I have found that you can often convincingly calculate the
value of data by identifying the data that is essential to a
specific business process. Without the data, the process may
not fail but it would slow down, revenue would be lost and
costs would increase. Data is rarely the only contributing factor
to the efficiency of a specific process however, by looking at
23
11. how data contributes to the efficiency of the process you can
measure the value of the data.
Of course there is nothing like a crisis to focus attention and
liberate financial resources quickly. In order to sell a data
quality or data governance program it helps if you can find a
burning bridge, and if you cannot find one that is actually on
fire, it is not unknown to find one you can set on fire or at the
very least to point to the enormous and imminent risk of fire. It
really does work, ask any politician.
Any good data quality or data governance specialist will tell you
“Show me the data and I will show you the money.”
***~~~***
24