Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Data-Ed Online: Approaching Data Quality

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
© Copyright 2021 by Peter Aiken Slide # 1
peter.aiken@anythingawesome.com +1.804.382.5957 Peter Aiken, PhD
Approaching
Dat...
What Data Quality Leaders are thinking and doing about it...
Approaching Data Quality
Gareth Shercliff
Director, Delivery ...
What most organizations
are looking for
Control Costs
with IOT and
Automation
Turning Data into
Revenue with
Analytics & A...
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 52 Anuncio

Data-Ed Online: Approaching Data Quality

Descargar para leer sin conexión

Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy. This, in turn, allows for speedy identification of business problems, the delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues. Organizations must realize what it means to utilize Data Quality engineering in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor Data Quality. Showing how Data Quality should be engineered provides a useful framework in which to develop an effective approach. This, in turn, allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.

Learning Objectives:

Help you understand foundational Data Quality concepts based on the DAMA Guide to Data Management Book of Knowledge (DAMA DMBoK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success

Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy. This, in turn, allows for speedy identification of business problems, the delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues. Organizations must realize what it means to utilize Data Quality engineering in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor Data Quality. Showing how Data Quality should be engineered provides a useful framework in which to develop an effective approach. This, in turn, allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.

Learning Objectives:

Help you understand foundational Data Quality concepts based on the DAMA Guide to Data Management Book of Knowledge (DAMA DMBoK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Data-Ed Online: Approaching Data Quality (20)

Anuncio

Más de DATAVERSITY (20)

Anuncio

Data-Ed Online: Approaching Data Quality

  1. 1. © Copyright 2021 by Peter Aiken Slide # 1 peter.aiken@anythingawesome.com +1.804.382.5957 Peter Aiken, PhD Approaching Data Quality Engineering Business Success Stories Peter Aiken, Ph.D. • I've been doing this a long time • My work is recognized as useful • Associate Professor of IS (vcu.edu) • Institute for Defense Analyses (ida.org) • DAMA International (dama.org) • MIT CDO Society (iscdo.org) • Anything Awesome (plusanythingawesome.com) • Experienced w/ 500+ data management practices worldwide • Multi-year immersions – US DoD (DISA/Army/Marines/DLA) – Nokia – Deutsche Bank – Wells Fargo – Walmart … • 12 books and dozens of articles © Copyright 2021 by Peter Aiken Slide # 2 https://anythingawesome.com + • DAMA International President 2009-2013/2018/2020 • DAMA International Achievement Award 2001 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005
  2. 2. What Data Quality Leaders are thinking and doing about it... Approaching Data Quality Gareth Shercliff Director, Delivery Innovation, Talend
  3. 3. What most organizations are looking for Control Costs with IOT and Automation Turning Data into Revenue with Analytics & AI Transform Customer Experience with a Customer 360 View Accelerate Innovation with the Cloud Minimize Risks with Compliance and Privacy Data complexity Data sources increase 5x in 5 years1 Efficiency crisis 67% OF TIME Searching & preparing data2 Financial waste Poor data cost An average of $12.8 million per year3 1 Source: IDC Data Age 2025 for Seagate Source: Survey Analysis: 12 Actions to Improve Your Data Quality Gartner 2021 2 Source: IDC, End-User Survey Results Deployment and Data Intelligence in 2019, doc #US45652419 You’re not the only one to struggle with it
  4. 4. . . Siloed everywhere Data & Tools complexity Too few Experts Data Velocity Exponential Demand for analytics Why it’s so hard
  5. 5. Just hiring someone will not scale at speed Too few technical experts cannot deliver trusted data at the speed of demand COSTS SCALABILITY low high low high DATA QUALITY low high Existing Model COSTS SCALABILITY low high low high DATA QUALITY low high Target Model Immediate expertise, platform and best practices high quality data at speed
  6. 6. What if you could add the scale and speed your business requires without hiring full time technical experts?
  7. 7. Talend Data Quality Service: our unique and fastest way to operationalize quality data at scale A Proven, Flexible Data Quality Framework Accelerate data trust with a pre-built rules library, custom rules, and ready-to-use Data Quality Dashboards Data Quality Experts Engage skilled practitioners for data quality analysis and cleansing work Consistent Data Quality Insight Manage data quality continuously to enable data as a competitive differentiator
  8. 8. Talend Data Quality as a Service in motion Discover Discover Search, find and profile data to understand structure and typology through various analysis and indicators Standardize Standardize Convert, format, validate, enrich, mask Consolidate Consolidate Match, deduplicate... and get the golden record through survivorship activities Operationalize Operationalize Collaborate, leverage business knowledge and industrialize it by IT Monitor Monitoring/Alerting Measure, analyse, control and improve Continuous Improvement Combining Unique Expertise, Practices & Tools to deliver time to value shorter
  9. 9. Start getting insights into the quality of your data today Take advantage of Talend’s Data Quality Service Accelerate your efficiency Benefit immediately from Talend’s DQS team expertise to get high quality data at speed and fuel effective business decisions across your organization. Ensure continuous and consistent data quality insights Get the peace of mind that your senior leadership, business users and data owners are operating based on known data quality and accuracy. Get the scalability required by your business demands No matter of data volumes or number of datasets – DQS helps you to scale along with your growing business demands.
  10. 10. Healthy data, healthy business. https://www.talend.com/contact-sales/
  11. 11. 3 © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A © Copyright 2021 by Peter Aiken Slide # 4 https://anythingawesome.com https://www.youtube.com/watch?v=uL2PsmlGn9g
  12. 12. Definitions • Quality Data – Fit for purpose meets the requirements of its authors, users, and administrators (from Martin Eppler) – Synonymous with information quality, since poor data quality results in inaccurate information and poor performance • Data Quality Management – "Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure data quality" – Encompasses life cycle activities – Include supporting processes from change management, etc. – Continuous improvement process requiring core capabilities • Data Quality Engineering – Recognition that data quality solutions cannot not managed but must be engineered – Data quality engineering concepts are generally not known and understood within IT or business! © Copyright 2021 by Peter Aiken Slide # 5 https://anythingawesome.com Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166 DQ Effort Pattern © Copyright 2021 by Peter Aiken Slide # 6 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2017 by DAMA International 80% time spent 20%
  13. 13. Hidden Data Factories © Copyright 2021 by Peter Aiken Slide # 7 https://anythingawesome.com https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year Work products are delivered to Customers Customers Knowledge Workers 80% looking for stuff 20% doing useful work Department B 1. Check A's work 2. Make any corrections 3. Complete B's work 4. Deliver to Department C Department A https://en.wikipedia.org/wiki/Theory_of_constraints Department C 1. Check B's work 2. Make any corrections 3. Complete C's work 4. Deliver to Customer 5. Deal with consequences Blind Persons and the Elephant © Copyright 2021 by Peter Aiken Slide # 8 https://anythingawesome.com http://www.dailymirror.lk/print/opinion/editorial-we-need-to-become-channels-of-peace/172-27164 It is like a fan! It is like a snake! It is like a wall! It is like a rope! It is like a tree!
  14. 14. Events are not always recognized as data quality challenges? © Copyright 2021 by Peter Aiken Slide # 9 https://anythingawesome.com • Letters from a bank • A very expensive, very small data rounding error • Health data story • The chocolate story • Covid-19 © Copyright 2021 by Peter Aiken Slide # 10 https://anythingawesome.com A congratulations letter from another bank Problems • Bank did not know it made an error • Tools alone could not have prevented this error • Lost confidence in the ability of the bank to manage customer funds
  15. 15. © Copyright 2021 by Peter Aiken Slide # 11 https://anythingawesome.com • Needed trench for electrical cable 2.52" - delivered 2.5" • $1M required to rent other facilities while new cable is obtained • Either rounding or truncation could explain – We need to get a summary on all of this," he said. "How did the mistake occur? Who's at fault? What are the damages? And how is money going to be recovered?" Port of Seattle 2.52" ➜ 2.5" © Copyright 2021 by Peter Aiken Slide # 12 https://anythingawesome.com This research, published as a letter this week in the British Medical Journal, was meant to draw attention to how much data gets entered incorrectly in the country’s medical system. These guys weren’t turning up at the doctor for pregnancy- related services. Instead, they were at their doctor for procedures that had medical codes similar to those of midwifery and obstetric services. With a misplaced keystroke here or there, an annual physical could become a consultation with a midwife. Why Britain has 17,000 pregnant men here keystroke misplaced there
  16. 16. © Copyright 2021 by Peter Aiken Slide # 13 https://anythingawesome.com Areyougonnatellusthechocolatestory-again? Why using Microsoft's tool caused Covid-19 results to be lost © Copyright 2021 by Peter Aiken Slide # 14 https://anythingawesome.com https://www.bbc.com/news/technology-54423988?es_p=12801491
  17. 17. © Copyright 2021 by Peter Aiken Slide # 15 https://anythingawesome.com https://www.bbc.com/news/technology-54423988?es_p=12801491 • Since 2007 should have been forced to use .xlsx (1,000,000+ rows) • Used .xls (65,000 rows) • Additional data was dropped without notification Practice-Oriented • Failure in rigor when capturing/ manipulating data • Allowing imprecise or incorrect data to be collected when requirements specify otherwise • Presenting data out of sequence Structure-Oriented • Data and metadata arranged imperfectly • Data is captured but inaccessible • When a incorrect data is provided as the correct response Practice-oriented activities focus on the capture and manipulation of data Data quality best practices depend on both © Copyright 2021 by Peter Aiken Slide # 16 https://anythingawesome.com Structure-oriented activities focus on the data implementation Quality "Fit for purpose" Data
  18. 18. Poor data manifests as multifaceted organizational challenges © Copyright 2021 by Peter Aiken Slide # 17 https://anythingawesome.com Root cause analysis is required to diagnose © Copyright 2021 by Peter Aiken Slide # 18 https://anythingawesome.com IT System Business Challenge Business Process Business Challenge IT Process Business Challenge Business System Business Challenge IT Process Business Challenge IT System Business Challenge Business Process Business Challenge Poor results
  19. 19. Many DQ challenges are unique and/or context specific! © Copyright 2021 by Peter Aiken Slide # 19 https://anythingawesome.com Burning Bridge • Something bad happened – Imperfect data was to blame • Someone needs to fix – Poor quality data • You currently have management's attention – It is wise to ensure you also have their understanding • "Do something" often leads to "Buy something" – Mostly technology-based • Get data quality-ing! – A fool with a tool is still a fool • Something is accomplished – Most often all the funding is used up © Copyright 2021 by Peter Aiken Slide # 20 https://anythingawesome.com • Early cases have a dual purpose – Make the case that this will fix the immediate challenge – Illustrate why a programmatic approach is preferable
  20. 20. Leverage is an Engineering Concept • Using proper engineering techniques, a human can lift a bulk that is weighs much more than the human © Copyright 2021 by Peter Aiken Slide # 21 https://anythingawesome.com 1 kg 10 kg 11 kg A wholistic approach to obtaining data leverage © Copyright 2021 by Peter Aiken Slide # 22 https://anythingawesome.com Organizational Data Knowledge workers supplemented by data professionals Process Guided by strategy https://www.computerhope.com/jargon/f/framework.htm People Technology Reducing ROT increases data leverage
  21. 21. Data Leverage is a multi-use concept • Permits organizations to better manage their data – Within the organization, and – With organizational data exchange partners – In support of the organizational mission • Leverage – Obtained by implementation of data-centric technologies, processes, and human skill sets – Focus on the non-ROT data • The bigger the organization, the greater potential leverage exists • Treating data more asset-like simultaneously – Lowers organizational IT costs and – Increases organizational knowledge worker productivity © Copyright 2021 by Peter Aiken Slide # 23 https://anythingawesome.com Concrete example of data leverage • Reference – Controls accessible data values • Master – Controls access to system capabilities • Transaction – Instances of values © Copyright 2021 by Peter Aiken Slide # 24 https://anythingawesome.com Countries where we do business? Types of accounts available? Controlled vocabulary items Are you a member of our premium club? Authorizing uses/users? Common/standard data structures $5 Authorized Like ! Example based on: Dr. Christopher Bradley of DMAdvisors–he has more, ping him at chris.bradley@dmadvisors.co.uk Cannot do business overseas? Cannot determine product origin? Cannot add a foreign language to the website? Cannot select a valid menu item?
  22. 22. Simple Math • At the beginning of a project, • Where the parties know the least about each other • All are expected to agree on the meaning of price, timing, and functionalities • Define X (some resources) • Define Y (cleaning 1 set of data) • Define Z (that data will be clean) © Copyright 2021 by Peter Aiken Slide # 25 https://anythingawesome.com If X is invested in Y then outcome Z will result (Z > X) Simple Math • Define X ($100) • Define Y (cleaning 1 set of data) • Define Z ($1000) © Copyright 2021 by Peter Aiken Slide # 26 https://anythingawesome.com If $100 is invested in cleaning 1 set of data then outcome $1000 will result
  23. 23. Data is not a Project • Durable asset – An asset that has a usable life more than one year • Reasonable project deliverables – 90 day increments – Data evolution is measured in years • Data – Evolves - it is not created – Significantly more stable • Readymade data architectural components – Prerequisite to agile development • Only alternative is to create additional data siloes! © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 27 Differences between Programs and Projects • Programs are Ongoing, Projects End – Managing a program involves long term strategic planning and continuous process improvement is not required of a project • Programs are Tied to the Financial Calendar – Program managers are often responsible for delivering results tied to the organization's financial calendar • Program Management is Governance Intensive – Programs are governed by a senior board that provides direction, oversight, and control while projects tend to be less governance-intensive • Programs Have Greater Scope of Financial Management – Projects typically have a straight-forward budget and project financial management is focused on spending to budget while program planning, management and control is significantly more complex • Program Change Management is an Executive Leadership Capability – Projects employ a formal change management process while at the program level, change management requires executive leadership skills and program change is driven more by an organization's strategy and is subject to market conditions and changing business goals © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Adapted from http://top.idownloadnew.com/program_vs_project/ and http://management.simplicable.com/management/new/program-management-vs-project-management 28 Your data quality program must last at least as long as your HR program!
  24. 24. © Copyright 2021 by Peter Aiken Slide # 29 https://anythingawesome.com Making a Better Quality Data Sandwich Data supply Data literacy Standard data Standard data Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # Data literacy 30 https://anythingawesome.com Data supply
  25. 25. Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # Standard data Data supply Data literacy 31 https://anythingawesome.com Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # This cannot happen without engineering and architecture! Quality engineering/ architecture work products do not happen accidentally! 32 https://anythingawesome.com Data supply Data literacy Standard data
  26. 26. Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # This cannot happen without data engineering and architecture! 33 https://anythingawesome.com Quality data engineering/ architecture work products do not happen accidentally! Data supply Data literacy Standard data USS Midway & Pancakes Why is this an excellent example of engineering? • It is tall • It has a clutch • It was built in 1942 • It is cemented to the floor • It is still in regular use! © Copyright 2021 by Peter Aiken Slide # 34 https://anythingawesome.com
  27. 27. Our barn had to pass a foundation inspection • Before further construction could proceed • No IT equivalent © Copyright 2021 by Peter Aiken Slide # 35 https://anythingawesome.com https://plusanythingawesome.com What does is mean "data quality program" • Ongoing commitment – Permits evolutionary improvement of the approach • Governance – Senior level coordination, direction, and control • Executive leadership capabilities – Change and risk management • Data quality approach inherits (above) – Budget, strategic priorities – Senior level attention and improving topical facility – Reasonable timelines/expectations © Copyright 2021 by Peter Aiken Slide # 36 https://anythingawesome.com https://blog.ducenit.com/data-quality-management
  28. 28. 37 © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A Systems Thinking © Copyright 2021 by Peter Aiken Slide # 38 https://anythingawesome.com http://victorianscandal.wordpress.com/picturesque/rachel-olshausen/ • A framework that is based on the belief that the component parts of a system can best be understood in the context of relationships with other systems, rather than in isolation. • The only way to fully understand why a problem or element occurs and persists is to understand the part in relation to the whole. Capra, F. (1996) The web of life: a new scientific understanding of living systems (1st Anchor Books ed). New York: Anchor Books. p. 30
  29. 29. Process Input ➜ Process ➜ Output Diagram © Copyright 2021 by Peter Aiken Slide # 39 https://anythingawesome.com Inputs Outputs Pizza Make Pizza Dough Water Pizza Crust Make Crust Make Pizza Data Steward Quality Responsibilities • Inputs – From where, do each of these my responsible data items come? – Why are they produced? – What level of quality is required by 'my processes?' • Process – What business processes use the data within my fiduciary responsibility? – For what business purpose do they use each data item? – What role does quality play for my processes to contribute? • Output – What downstream business processes consume data that was under my fiduciary care? – For what purpose are each data items consumed? – What quality attribute are required by each downstream consumer? © Copyright 2021 by Peter Aiken Slide # 40 https://anythingawesome.com
  30. 30. Interdependencies © Copyright 2021 by Peter Aiken Slide # 41 https://anythingawesome.com Data Governance ERP Data Quality © Copyright 2021 by Peter Aiken Slide # 42 https://anythingawesome.com Data Management Body of Knowledge (DM BoK V2) Practice Areas from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International
  31. 31. © Copyright 2021 by Peter Aiken Slide # 43 https://anythingawesome.com Data Strategy Data Governance BI/ Warehouse Perfecting operations in 3 data management practice areas 1X 1X 1X Metadata Data Quality from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International © Copyright 2021 by Peter Aiken Slide # 44 https://anythingawesome.com Separating the Wheat from the Chaff
  32. 32. Separating the Wheat from the Chaff © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 45 https://anythingawesome.com Is well organized data worth more? Pre-Information Age Metadata • Examples of information architecture achievements that happened well before the information age: – Page numbering – Alphabetical order – Table of contents – Indexes – Lexicons – Maps – Diagrams © Copyright 2021 by Peter Aiken Slide # 46 https://anythingawesome.com Example from: How to make sense of any mess by Abby Covert (2014) ISBN: 1500615994 "While we can arrange things with the intent to communicate certain information, we can't actually make information. Our users do that for us." https://www.youtube.com/watch?v=60oD1TDzAXQ&feature=emb_logo https://www.youtube.com/watch?v=r10Sod44rME&t=1s https://www.youtube.com/watch?v=XD2OkDPAl6s https://plusanythingawesome.com https://plusanythingawesome.com
  33. 33. Remove the structure and things fall apart rapidly • Better organized data increases in value © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 47 https://anythingawesome.com https://plusanythingawesome.com Separating the Wheat from the Chaff • Data that is better organized increases in value • Poor data management practices are costing organizations money/time/effort • 80% of organizational data is ROT – Redundant – Obsolete – Trivial • The question is which data to eliminate? – Most enterprise data is never analyzed © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 48 https://anythingawesome.com
  34. 34. Multiple Sources of Master/Reference Data © Copyright 2021 by Peter Aiken Slide # Payroll Application (3rd GL) Payroll Data (database) R& D Applications (researcher supported, no documentation) R & D Data (raw) Mfg. Data (home grown database) Mfg. Applications (contractor supported) Marketing Application (4rd GL, query facilities, no reporting, very large) Marketing Data (external database) Finance Data (indexed) Finance Application (3rd GL, batch system, no source) Personnel App. (20 years old, un-normalized data) Personnel Data (database) 49 https://anythingawesome.com © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 50 https://www.forbes.com/sites/ciocentral/2019/01/02/what-we-learned-from-top-execs-about-their-big-data-and-ai-initiatives/ 2020 0% 25% 50% 75% 100% % of challenges: technology % of challenges: people/process 90% 10% Culture's impact • 2019 challenges – 5% technology – 95% people/process • 2020 challenges – 10% technology – 95% people/process
  35. 35. Change Management & Leadership © Copyright 2021 by Peter Aiken Slide # 51 https://anythingawesome.com Diagnosing Organizational Readiness © Copyright 2021 by Peter Aiken Slide # adapted from the Managing Complex Change model by Lippitt, 1987 Culture is the biggest impediment to a shift in organizational thinking about data! 52 https://anythingawesome.com
  36. 36. Consistency Encourages Quality Analysis © Copyright 2021 by Peter Aiken Slide # 53 https://anythingawesome.com IT System Business Challenge Business Process Business Challenge IT Process Business Challenge Business System Business Challenge IT Process Business Challenge IT System Business Challenge Business Process Business Challenge Eliminating data debt requires a team with specialized skills deployed to create a repeatable process and develop sustained organizational skillsets 1. Allow the form of the Problem to guide the form of the solution 2. Provide a means of decomposing the problem 3. Feature a variety of tools simplifying system understanding 4. Offer a set of strategies for evolving a design solution 5. Provide criteria for evaluating the quality of the various solutions 6. Facilitate development of a framework for developing organizational knowledge. © Copyright 2021 by Peter Aiken Slide # 54 https://anythingawesome.com Programmatic Data Quality Engineering
  37. 37. Structured Approaches to Data Quality • Use organizational challenges to guide the form of quality remediation • Decompose implementation in a manner that will be seen by all as helping to address specific challenges • Aid the implementation using a variety of techniques (not just tools) • Develop a series of progressively stronger strategies for addressing the challenges • Provide meaningful feedback on progress • Facilitate development of a data- centric framework for institutionalizing organizational data quality knowledge © Copyright 2021 by Peter Aiken Slide # 55 https://anythingawesome.com © Copyright 2021 by Peter Aiken Slide # 56 https://anythingawesome.com https://en.wikipedia.org/wiki/Theory_of_constraints (TOC) • A management paradigm that views any manageable system as being limited in achieving more of its goals by a small number of constraints(Eliyahu M. Goldratt) • There is always at least one constraint, and TOC uses a focusing process to identify the constraint and restructure the rest of the organization to address it • TOC adopts the common idiom "a chain is no stronger than its weakest link," processes, organizations, etc., are vulnerable because the weakest component can damage or break them or at least adversely affect the outcome
  38. 38. The DQE Cycle © Copyright 2021 by Peter Aiken Slide # 57 https://anythingawesome.com • Deming cycle • "Plan-do-study-act" or "plan-do-check-act" – Identifying data issues that are critical to the achievement of business objectives – Defining business requirements for data quality – Identifying key data quality dimensions – Defining business rules critical to ensuring high quality data The DQE Cycle: (1) Plan © Copyright 2021 by Peter Aiken Slide # 58 https://anythingawesome.com • Plan for the assessment of the current state and identification of key metrics for measuring quality • The data quality engineering team assesses the scope of known issues – Determining cost and impact – Evaluating alternatives for addressing them
  39. 39. The DQE Cycle: (2) Deploy © Copyright 2021 by Peter Aiken Slide # 59 https://anythingawesome.com • Deploy processes for measuring and improving the quality of data: • Data profiling – Institute inspections and monitors to identify data issues when they occur – Fix flawed processes that are the root cause of data errors or correct errors downstream – When it is not possible to correct errors at their source, correct them at their earliest point in the data flow The DQE Cycle: (3) Monitor © Copyright 2021 by Peter Aiken Slide # 60 https://anythingawesome.com • Monitor the quality of data as measured against the defined business rules • If data quality meets defined thresholds for acceptability, the processes are in control and the level of data quality meets the business requirements • If data quality falls below acceptability thresholds, notify data stewards so they can take action during the next stage
  40. 40. The DQE Cycle: (4) Act © Copyright 2021 by Peter Aiken Slide # 61 https://anythingawesome.com • Act to resolve any identified issues to improve data quality and better meet business expectations • New cycles begin as new data sets come under investigation or as new data quality requirements are identified for existing data sets Starting point for new system development data performance metadata data architecture data architecture and data models shared data updated data corrected data architecture refinements facts & meanings Metadata & Data Storage Starting point for existing systems Metadata Refinement • Correct Structural Defects • Update Implementation Metadata Creation • Define Data Architecture • Define Data Model Structures Metadata Structuring • Implement Data Model Views • Populate Data Model Views Data Refinement • Correct Data Value Defects • Re-store Data Values Data Manipulation • Manipulate Data • Updata Data Data Utilization • Inspect Data • Present Data Data Creation • Create Data • Verify Data Values Data Assessment • Assess Data Values • Assess Metadata Extended data life cycle model with metadata sources and uses © Copyright 2021 by Peter Aiken Slide # 62 https://anythingawesome.com
  41. 41. Data Quality Attributes © Copyright 2021 by Peter Aiken Slide # 63 https://anythingawesome.com 64 © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A
  42. 42. © Copyright 2021 by Peter Aiken Slide # 65 https://anythingawesome.com Engineers say: Business wants to hear: Clean some data Decrease the number of undeliverable targeted marketing ads Reorganize the database Increase the ability of the salesforce to perform their own analyses Develop a taxonomy Create a common vocabulary for the organization Optimize a query Shaved 1 second off a task that runs a billion times a day Reverse engineer the legacy system Understand: what was good about the old system so it can be formally preserved and, what was bad so it can be improved Compare the utility of data quality conversation topics CDO Agenda Inventory Data -> uncovering assets & decreasing ROT Develop the first version of an organizational data strategy Monetize your organization's data © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 66 The CDOs goal is to better manage data as an organizational asset in support of the organizational mission!
  43. 43. Data Asset Inventory (Implementation) 1. Purpose is the goal of understanding, not definitions – Definitions are passive, purpose statements incorporate strategic elements, the rationale and justification based on the need for data 2. The sharing of inventoried data assets are categorized as: A. Data items that are shared with external organizations B. Data items that are shared within the organization C. Data items that are not shared but are used to derive shared data items D. Data items not shared outside but used to support workgroup activities E. Organizational data ROT 3. Assign each data asset inventoried, an existing subject area from which that data item best supports the organizational mission (ex. PAY is part of BACK OFFICE OPERATIONS) – based on (refine-able) purpose statements, primary subject-area allegiance is posited 4. Identify, de-dupe and harmonize data assets participating in synonyms/ homonym/other challenges - ensure only one item is designated as a (current) golden source 5. Identify which data items are deemed to be sensitive or personal data items and what specific controls need to be in place 6. Document all mapping rules for data items in categories 2A and 2B above © Copyright 2021 by Peter Aiken Slide # 67 https://anythingawesome.com Note: this exercise cannot be comprehensively performed in a single cycle so equally as important as the exercise itself, a processing system needs to be established so that as other data items are inevitably discovered, this inventory can be easily updated $ What is Strategy? • Current use derived from military - a pattern in a stream of decisions [Henry Mintzberg] © Copyright 2021 by Peter Aiken Slide # 68 https://anythingawesome.com A thing
  44. 44. Q1 Organizations without a formalized data quality focus Q4 Data Quality Focus: both, simultaneously Q2 Data Quality Focus: Increase organizational efficiencies/effectiveness Improve Operations Innovation © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 69 Initially pick one or the other but not both x x Q3 Data Quality Focus: Use data to create | strategic | opportunities | Math © Copyright 2021 by Peter Aiken Slide # 70 https://anythingawesome.com • VCU – $5m 35 year faculty member – +$20 million in grants/funded research projects/student supplemental salaries • Collaborations – Range – $0 ← (range) →+$1.5 billion documented savings – • My introduction is often: –Peter is a professor with a positive cash flow!
  45. 45. A musical analogy that works for both practice and storytelling © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 71 + = https://www.youtube.com/watch?v=4n1GT-VjjVs&frags=pl%2Cwn 12.5 25 37.5 50 Monday Tuesday Wednesday Thursday Friday 48 24 12 6 3 Pandemic Math Question? (a very bad week) • If demand at a 48-bed hospital facility is doubling-daily … • … at what point does anyone notice that the hospital beds are becoming scarce? – Monday 3 beds occupied – Tuesday 6 – Wednesday – ¾ of all beds were available – Yesterday – ½ of all beds were available – Today – zero beds available – Tomorrow …??? © Copyright 2021 by Peter Aiken Slide # 72 https://anythingawesome.com
  46. 46. • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A © Copyright 2021 by Peter Aiken Slide # 73 https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program Famous 1990's Words? • Question: – Why haven't organizations taken a more proactive approach to data quality? • Answer: – Fixing data quality problems is not easy – It is dangerous -- they'll come after you – Your efforts are likely to be misunderstood – You could make things worse – Now you get to fix it • A single data quality issue can grow into a significant, unexpected investment © Copyright 2021 by Peter Aiken Slide # 74 https://anythingawesome.com
  47. 47. © Copyright 2021 by Peter Aiken Slide # 75 https://anythingawesome.com • Information transparency • Analytics • Business Intelligence • Increasing efficiencies • Decreasing costs • Driving holistic decision-making across the organization • Information transparency • Analytics • Business Intelligence • Increasing efficiencies • Decreasing costs • Driving holistic decision-making across the organization High Quality Data is Critical N o t H e l p f u l • Information transparency $ • Analytics $ • Business Intelligence $ • Increasing efficiencies $ • Decreasing costs $ • Driving holistic decision-making across the organization $ Data Quality Dimensions © Copyright 2021 by Peter Aiken Slide # 76 https://anythingawesome.com
  48. 48. Data Value Quality © Copyright 2021 by Peter Aiken Slide # 77 https://anythingawesome.com Data Representation Quality © Copyright 2021 by Peter Aiken Slide # 78 https://anythingawesome.com
  49. 49. Data Model Quality © Copyright 2021 by Peter Aiken Slide # 79 https://anythingawesome.com Data Architecture Quality © Copyright 2021 by Peter Aiken Slide # 80 https://anythingawesome.com
  50. 50. Upcoming Events Essential Metadata Strategies 12 October 2021 Necessary Prerequisites to Data Success: Exorcising the Seven Deadly Data Sins 9 November 2021 Data Management vs. Data Governance Program 14 December 2021 © Copyright 2021 by Peter Aiken Slide # 81 https://anythingawesome.com Brought to you by: Time: 19:00 UTC (2:00 PM NYC) | Presented by: Peter Aiken, PhD Note: In this .pdf, clicking any webinar title opens the registration link Event Pricing © Copyright 2021 by Peter Aiken Slide # 82 https://anythingawesome.com • 20% off directly from the publisher on select titles • My Book Store @ http://plusanythingawesome.com • Enter the code "anythingawesome" at the Technics bookstore checkout where it says to "Apply Coupon" anythingawesome
  51. 51. Peter.Aiken@AnythingAwesome.com +1.804.382.5957 Thank You! © Copyright 2021 by Peter Aiken Slide # 83 Book a call with Peter to discuss anything - https://anythingawesome.com/OfficeHours.html + = Data Things Happen Organizational Things Happen This approach only works if • We know where the data that needs to be fixed–resides • We can communicate precisely and correctly amongst team members • We are adept with the correct technological support • … © Copyright 2021 by Peter Aiken Slide # 84 https://anythingawesome.com ≈ ≈ ≈ ≈ ≈ ≈ X $ X $ X $ X $ X $ X $ X $ X $ X $
  52. 52. © Copyright 2021 by Peter Aiken Slide # 85 https://anythingawesome.com 1 The project needs to be small Projects should not be allowed to begin unless the data requirements for the entire project are verified 2 The product Owner or sponsor must be highly skilled Few in IT have the requisite data skills and knowledge 3 The process must be agile Agile is a construction technique/ data requires more planning before construction 4 The agile team must be highly skilled in both the agile process and the technology Few agile teams have requisite levels of data skills 5 The organization must be highly skilled at emotional maturity Few organizations understand data stuff Winning Cards for data quality program success

×