SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
systems




            save
                                                                                               CLEANER DATA. BETTER DECISIONS.




              ^
 8 Ways to screw up Your
 Data Quality Project
 Let’s face it, if data quality were easy, everyone would have good data and it wouldn’t be such a
 hot topic. On the contrary, despite all the tools and advice out there, selecting and implementing
 a comprehensive data quality solution still presents some hefty challenges. So how does a newly
 appointed Data Steward NOT mess up the data quality project? Here are a few pointers on how to
 avoid failure.



1. Don’t FoRGet the little PeoPle
As with other IT projects, the top challenge for data quality projects is securing business stakeholder
engagement throughout the process. But this doesn’t just mean C-level executives. Stakeholders for a data
quality initiative should also include department managers and even end-users within the company who must
deal with the consequences of bad data as well as the impact of system changes. Marketing, for example, relies
on data accuracy to reach the correct audience and maintain a positive image. Customer Service depends on
completeness and accuracy of a record to meet their specific KPIs. Finance, logistics and even manufacturing may
need to leverage the data for effective operations or even to feed future decisions. When it comes to obtaining
business buy-in, it is critical for Data Stewards to think outside the box regarding how the organization uses (or
could use) the data and then seek input from the relevant team members. While the instinct might be to avoid
decision by committee, in the end, it’s not worth the risk of developing a solution that does not meet business
expectations.

2. BeWaRe oF the “kitchen sink” solution
The appeal of an ‘umbrella’ data management solution can lure both managers and IT experts, offering the ease
and convenience of one-stop shopping. In fact, contact data quality can often be an add-on toolset offered by
a major MDM or BI vendor - simply to check the box. However, when your main concern is contact data, be
sure to measure all your options against a best-of-breed standard before deciding on a vendor. That means
understanding the difference between match quality vs match quantity, determining the intrinsic value (for your
organization) of integrated data quality processes and not overlooking features (or quality) that might seem
like nice-to-haves now but which down the line, can make or break the success of your overall solution. Once
you know the standard you are looking for with regards to contact deduplication, address validation, and single
customer view, you can effectively evaluate whether those larger-scale solutions will have the granularity needed
to achieve the best possible contact data cleansing for your company. While building that broader data strategy is
a worthy goal, now is the time to be conscious of not throwing the data quality out with the proverbial bathwater.




                                                                                                      www.helpit.com
3. Just Because You Can, Doesn’t Mean You Should
When it comes to identifying the right contact data quality solution, most companies not only compare vendors to
one another but they also consider the notion of developing a solution in-house. In fact, if you have a reasonably
well-equipped IT Department (or consultant team) it is entirely possible that an in-house solution will appear
cheaper to develop and there may be several factors that cause organizations to ‘lean’ in that direction including the
desire to have ‘more control’ over the data or eliminate security and privacy concerns.


There is a flip side, however, to these perceived advantages, that begs to be considered before jumping in. First,
ask yourself, does your team really have the knowledge AND bandwidth necessary to pull this off? Contact data
cleansing is both art and science. Best-of-breed applications have been developed over years of trial and error and
come with very deep knowledge bases and sophisticated match algorithms that can take a data quality project from
80% accuracy to 95% or greater accuracy. When you are dealing with millions or even billions of records, that extra
percentage matters. Keep in mind that even the best-intentioned developers may be all too eager to prove they can
build a data quality solution, without much thought as to whether or not they should. Even if the initial investment
is less expensive than a purchased solution, how much revenue is lost (or not gained) by diverting resources to this
initiative rather than to something more profitable? In-house solutions can be viable solutions, as long as they are
chosen for the right reasons and nothing is sacrificed in the long run.

4. NEVER USE SOMEONE ELSE’S Yardstick
Every vendor you evaluate will basically tell you to measure by the benchmarks they perform the best at. So the only
way to truly make an unbiased decision is to know ALL the benchmarks and then decide for yourself which is most
important to your company and don’t be fooled in the fine print. For example:


   •	 Number of duplicates, are often touted as a key measure of an application’s efficacy, but that figure is only
       valuable if they are all TRUE duplicates. Check this in an actual trial of your own data and go for the tool that
       delivers the greater number of TRUE duplicates while minimizing false matches.
   •	 Speed matters too but make sure you know the run speeds on your data and on your equipment.
   •	 More ‘versatile’ solutions are great, as long as your users will really be able to take advantage of all the bells
       and whistles.
   •	 Likewise, the volume of records processed should cover you for today and for what you expect to be
       processing in the next two to five years as this solution is not going to be something you want to implement
       and then change within a short time frame. Hence, scalability matters as well.


So, use your own data file, test several software options and compare the results in your own environment, with
your own users. Plus remember those intangibles like how long it will take you to get it up and running, users
trained, quality of reports, etc. These very targeted parameters should be the measure of success for your chosen
solution - not what anyone else dictates.




www.helpIT.com
5. MIND YOUR OWN BUSINESS (TEST CASES, THAT IS)
Not all matching software is created equal and the only way to effectively determine which software will address
your specific needs, is to develop test cases that serve as relevant and appropriate examples of the kinds of data
quality issues your organization is experiencing. These should be used as the litmus to determine which applications
will best be able to resolve those examples. Be detailed in developing these test cases so you can get down to the
granular features in the software which address them. Here are a few examples to consider:


  •	 Do you have contact records with phonetic variations in their names?
  •	 Are certain fields prone to missing or incorrect data?
  •	 Do your datasets consistently have data in the wrong fields (e.g. names in address lines, postal code in city
      fields, etc)?
  •	 Is business name matching a major priority?
  •	 Do customers often have multiple addresses?


Once you have identified a specific list of recurring challenges within your data, pull several real-world examples
from your actual database and use them in any data sample you send to vendors for trial cleansing. When reviewing
the results, make sure the solutions you are considering can find these matches on a trial. Each test case will require
specific features and strengths that not all data quality software offers. Without this granular level of information
about the names, addresses, emails, zip codes and phone numbers that are in your system, you will not be able to
fully evaluate whether a software can resolve them or not.


6. REMEMBER IT’S NOT ALL BLACK AND WHITE
Contact data quality solutions are often presented as binary - they either find the match or they don’t. In fact, as we
mentioned earlier, some vendors will tout the number of matches found as the key benchmark for efficiency. The
problem with this perception is that matching is not black and white - there is always a gray area of matches that
‘might be the same, but you can’t really be sure without inspecting each match pair’ so it is important to anticipate
how large your gray area will be and have a plan for addressing it. This is where the false match/true match
discussion comes into play.


True matches are just what they sound like while false matches are contact records that look and sound alike to the
matching engine, but are in fact, different. While it’s great when a software package can find lots of matches, the
scary part is in deciding what to do with them. Do you merge and purge them all? What if they are false matches?
Which one do you treat as a master record? What info will you lose? What other consequence flowed from that
incorrect decision?


The bottom line is: know how your chosen data quality vendor or solution will address the gray area. Ideally, you’ll
want a solution that allows the user to set the threshold of match strictness. A mass marketing mailing may err
on the side of removing records in the gray area to minimize the risk of mailing dupes whereas customer data
integration may require manual review of gray records to ensure they are all correct. If a solution doesn’t mention
the gray area or have a way of addressing it, that’s a red flag indicating they do not understand data quality.




                                                                                                       www.helpIT.com
7. Don’t FoRGet aBout FoRMat
Most companies do not have the luxury of one nice, cleanly formatted database where everyone follows the rules
of entry. In fact, most companies have data stored in a variety of places with incoming files muddying the waters
on a daily basis. Users and customers are creative in entering information. Legacy systems often have inflexible
data structures. Ultimately, every company has a variety of formatting anomalies that need to be considered when
exploring data cleansing tools. To avoid finding out too late, make sure to pull together data samples from all your
sources and run them during your trial. The data quality solution needs to handle data amalgamation from systems
with different structures and standards. Otherwise, inconsistencies will migrate and continue to cause systemic
quality problems.


8. DON’T BE SHORT-SIGHTED
Wouldn’t it be nice if once data is cleansed, the record set remains clean and static? Well, it would be nice but
it wouldn’t be realistic. On the contrary, information constantly evolves, even in the most closed-loop system.
Contact records represent real people with changing lives and as a result, decay by at least 4 percent per year
through deaths, moves, name changes, postal address changes or even contact preference updates. Business-side
changes such as acquisitions/mergers, system changes, upgrades and staff turnover also drive data decay. The post-
acquisition company often faces the task of either hybridizing systems or migrating data into the chosen solution.
Project teams must not only consider record integrity, but they must update business rules and filters that can affect
data format and cleansing standards.


Valid data being entered into the system during the normal course of business (either by CSR reps or by customers
themselves) also contributes to ongoing changes within the data. New forms and data elements may be added
by marketing and will need to be accounted for in the database. Incoming lists or big data sources will muddy
the water. Expansion of sales will result in new audiences and languages providing data in formats you haven’t
anticipated. Remember, the only constant in data quality is change. If you begin with this assumption, you skyrocket
your project’s likelihood of success. Identify the ways that your data changes over time so you can plan ahead and
establish a solution or set of business processes that will scale with your business.


Data quality is hard. Unfortunately, there is no one-size fits all approach and there isn’t even a single vendor that
can solve all your data quality problems. However, by being aware of some of the common pitfalls and doing a
thorough and comprehensive evaluation of any vendors involved, you can get your initiative off to the right start
and give yourself the best possible chances of success.


If you are interested in learning more about helpIT systems data quality tools, please feel free to contact us for a
Free Consultation and Trial.




  cleaner Data. Better Decisions.
  For the past 20 years, helpIT systems has been tightly focused on developing and
  delivering data quality technology that generates tangible and accurate results.
  With over 2,000 clients in 30 countries across 5 continents, helpIT is consistently
  raising the bar on data quality success.

  americas, australia, new Zealand: 866.332.7132
  uk, europe, asia: 011 +44 (0) 1372 225 900                                                                      systems
                                                                                            CLEANER DATA. BETTER DECISIONS.
  www.helpit.com

Más contenido relacionado

Último

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Destacado

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Destacado (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

8 Ways to NOT Screw Up Your Data Quality Project

  • 1. systems save CLEANER DATA. BETTER DECISIONS. ^ 8 Ways to screw up Your Data Quality Project Let’s face it, if data quality were easy, everyone would have good data and it wouldn’t be such a hot topic. On the contrary, despite all the tools and advice out there, selecting and implementing a comprehensive data quality solution still presents some hefty challenges. So how does a newly appointed Data Steward NOT mess up the data quality project? Here are a few pointers on how to avoid failure. 1. Don’t FoRGet the little PeoPle As with other IT projects, the top challenge for data quality projects is securing business stakeholder engagement throughout the process. But this doesn’t just mean C-level executives. Stakeholders for a data quality initiative should also include department managers and even end-users within the company who must deal with the consequences of bad data as well as the impact of system changes. Marketing, for example, relies on data accuracy to reach the correct audience and maintain a positive image. Customer Service depends on completeness and accuracy of a record to meet their specific KPIs. Finance, logistics and even manufacturing may need to leverage the data for effective operations or even to feed future decisions. When it comes to obtaining business buy-in, it is critical for Data Stewards to think outside the box regarding how the organization uses (or could use) the data and then seek input from the relevant team members. While the instinct might be to avoid decision by committee, in the end, it’s not worth the risk of developing a solution that does not meet business expectations. 2. BeWaRe oF the “kitchen sink” solution The appeal of an ‘umbrella’ data management solution can lure both managers and IT experts, offering the ease and convenience of one-stop shopping. In fact, contact data quality can often be an add-on toolset offered by a major MDM or BI vendor - simply to check the box. However, when your main concern is contact data, be sure to measure all your options against a best-of-breed standard before deciding on a vendor. That means understanding the difference between match quality vs match quantity, determining the intrinsic value (for your organization) of integrated data quality processes and not overlooking features (or quality) that might seem like nice-to-haves now but which down the line, can make or break the success of your overall solution. Once you know the standard you are looking for with regards to contact deduplication, address validation, and single customer view, you can effectively evaluate whether those larger-scale solutions will have the granularity needed to achieve the best possible contact data cleansing for your company. While building that broader data strategy is a worthy goal, now is the time to be conscious of not throwing the data quality out with the proverbial bathwater. www.helpit.com
  • 2. 3. Just Because You Can, Doesn’t Mean You Should When it comes to identifying the right contact data quality solution, most companies not only compare vendors to one another but they also consider the notion of developing a solution in-house. In fact, if you have a reasonably well-equipped IT Department (or consultant team) it is entirely possible that an in-house solution will appear cheaper to develop and there may be several factors that cause organizations to ‘lean’ in that direction including the desire to have ‘more control’ over the data or eliminate security and privacy concerns. There is a flip side, however, to these perceived advantages, that begs to be considered before jumping in. First, ask yourself, does your team really have the knowledge AND bandwidth necessary to pull this off? Contact data cleansing is both art and science. Best-of-breed applications have been developed over years of trial and error and come with very deep knowledge bases and sophisticated match algorithms that can take a data quality project from 80% accuracy to 95% or greater accuracy. When you are dealing with millions or even billions of records, that extra percentage matters. Keep in mind that even the best-intentioned developers may be all too eager to prove they can build a data quality solution, without much thought as to whether or not they should. Even if the initial investment is less expensive than a purchased solution, how much revenue is lost (or not gained) by diverting resources to this initiative rather than to something more profitable? In-house solutions can be viable solutions, as long as they are chosen for the right reasons and nothing is sacrificed in the long run. 4. NEVER USE SOMEONE ELSE’S Yardstick Every vendor you evaluate will basically tell you to measure by the benchmarks they perform the best at. So the only way to truly make an unbiased decision is to know ALL the benchmarks and then decide for yourself which is most important to your company and don’t be fooled in the fine print. For example: • Number of duplicates, are often touted as a key measure of an application’s efficacy, but that figure is only valuable if they are all TRUE duplicates. Check this in an actual trial of your own data and go for the tool that delivers the greater number of TRUE duplicates while minimizing false matches. • Speed matters too but make sure you know the run speeds on your data and on your equipment. • More ‘versatile’ solutions are great, as long as your users will really be able to take advantage of all the bells and whistles. • Likewise, the volume of records processed should cover you for today and for what you expect to be processing in the next two to five years as this solution is not going to be something you want to implement and then change within a short time frame. Hence, scalability matters as well. So, use your own data file, test several software options and compare the results in your own environment, with your own users. Plus remember those intangibles like how long it will take you to get it up and running, users trained, quality of reports, etc. These very targeted parameters should be the measure of success for your chosen solution - not what anyone else dictates. www.helpIT.com
  • 3. 5. MIND YOUR OWN BUSINESS (TEST CASES, THAT IS) Not all matching software is created equal and the only way to effectively determine which software will address your specific needs, is to develop test cases that serve as relevant and appropriate examples of the kinds of data quality issues your organization is experiencing. These should be used as the litmus to determine which applications will best be able to resolve those examples. Be detailed in developing these test cases so you can get down to the granular features in the software which address them. Here are a few examples to consider: • Do you have contact records with phonetic variations in their names? • Are certain fields prone to missing or incorrect data? • Do your datasets consistently have data in the wrong fields (e.g. names in address lines, postal code in city fields, etc)? • Is business name matching a major priority? • Do customers often have multiple addresses? Once you have identified a specific list of recurring challenges within your data, pull several real-world examples from your actual database and use them in any data sample you send to vendors for trial cleansing. When reviewing the results, make sure the solutions you are considering can find these matches on a trial. Each test case will require specific features and strengths that not all data quality software offers. Without this granular level of information about the names, addresses, emails, zip codes and phone numbers that are in your system, you will not be able to fully evaluate whether a software can resolve them or not. 6. REMEMBER IT’S NOT ALL BLACK AND WHITE Contact data quality solutions are often presented as binary - they either find the match or they don’t. In fact, as we mentioned earlier, some vendors will tout the number of matches found as the key benchmark for efficiency. The problem with this perception is that matching is not black and white - there is always a gray area of matches that ‘might be the same, but you can’t really be sure without inspecting each match pair’ so it is important to anticipate how large your gray area will be and have a plan for addressing it. This is where the false match/true match discussion comes into play. True matches are just what they sound like while false matches are contact records that look and sound alike to the matching engine, but are in fact, different. While it’s great when a software package can find lots of matches, the scary part is in deciding what to do with them. Do you merge and purge them all? What if they are false matches? Which one do you treat as a master record? What info will you lose? What other consequence flowed from that incorrect decision? The bottom line is: know how your chosen data quality vendor or solution will address the gray area. Ideally, you’ll want a solution that allows the user to set the threshold of match strictness. A mass marketing mailing may err on the side of removing records in the gray area to minimize the risk of mailing dupes whereas customer data integration may require manual review of gray records to ensure they are all correct. If a solution doesn’t mention the gray area or have a way of addressing it, that’s a red flag indicating they do not understand data quality. www.helpIT.com
  • 4. 7. Don’t FoRGet aBout FoRMat Most companies do not have the luxury of one nice, cleanly formatted database where everyone follows the rules of entry. In fact, most companies have data stored in a variety of places with incoming files muddying the waters on a daily basis. Users and customers are creative in entering information. Legacy systems often have inflexible data structures. Ultimately, every company has a variety of formatting anomalies that need to be considered when exploring data cleansing tools. To avoid finding out too late, make sure to pull together data samples from all your sources and run them during your trial. The data quality solution needs to handle data amalgamation from systems with different structures and standards. Otherwise, inconsistencies will migrate and continue to cause systemic quality problems. 8. DON’T BE SHORT-SIGHTED Wouldn’t it be nice if once data is cleansed, the record set remains clean and static? Well, it would be nice but it wouldn’t be realistic. On the contrary, information constantly evolves, even in the most closed-loop system. Contact records represent real people with changing lives and as a result, decay by at least 4 percent per year through deaths, moves, name changes, postal address changes or even contact preference updates. Business-side changes such as acquisitions/mergers, system changes, upgrades and staff turnover also drive data decay. The post- acquisition company often faces the task of either hybridizing systems or migrating data into the chosen solution. Project teams must not only consider record integrity, but they must update business rules and filters that can affect data format and cleansing standards. Valid data being entered into the system during the normal course of business (either by CSR reps or by customers themselves) also contributes to ongoing changes within the data. New forms and data elements may be added by marketing and will need to be accounted for in the database. Incoming lists or big data sources will muddy the water. Expansion of sales will result in new audiences and languages providing data in formats you haven’t anticipated. Remember, the only constant in data quality is change. If you begin with this assumption, you skyrocket your project’s likelihood of success. Identify the ways that your data changes over time so you can plan ahead and establish a solution or set of business processes that will scale with your business. Data quality is hard. Unfortunately, there is no one-size fits all approach and there isn’t even a single vendor that can solve all your data quality problems. However, by being aware of some of the common pitfalls and doing a thorough and comprehensive evaluation of any vendors involved, you can get your initiative off to the right start and give yourself the best possible chances of success. If you are interested in learning more about helpIT systems data quality tools, please feel free to contact us for a Free Consultation and Trial. cleaner Data. Better Decisions. For the past 20 years, helpIT systems has been tightly focused on developing and delivering data quality technology that generates tangible and accurate results. With over 2,000 clients in 30 countries across 5 continents, helpIT is consistently raising the bar on data quality success. americas, australia, new Zealand: 866.332.7132 uk, europe, asia: 011 +44 (0) 1372 225 900 systems CLEANER DATA. BETTER DECISIONS. www.helpit.com