SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Clarity Cloudworks
illuminating issues before they become problems
Development and
Operations 

are not 

the only groups in IT
Data Teams
• Are focused on urgent, unplanned work
• Traditionally operate the systems they develop,
because they don’t perceive hand-o! is possible
• Scant theory, 

what little writing exists is technology-focused
The DataOps Manifesto
Whether referred to as data science,
data engineering, data management,
big data, business intelligence, or the
like, through our work we have come to
value in analytics:
https://www.dataopsmanifesto.org/
Individuals and interactions
over
processes and tools 
Working analytics
over
comprehensive documentation
Customer collaboration
over
contract negotiation
Experimentation, iteration, and feedback
over
extensive upfront design
Cross-functional ownership of operations
over
siloed responsibilities
Do you have 

Bad Data?
In the absence of information, 

rumour becomes widely believed.

Rumour is biased toward emotion, 

which in work places tends to be negative.
What problems does data quality cause?
• Data / ETL pipelines crash, 

resulting in unavailable, stale, or incorrect data
• > 80% of Data Scientists’ time spent 

collecting data
• Incorrect data is used for decisions 

or published
• Doubts about data hurt morale and 

discourage evidence-based decision making
What is Data Quality?
Data quality is good 

when people who inspect data see what they expect.
Data quality is bad 

when people are surprised by the data they see.
Jadqfbfa ??
A BC A
Document data characteristics 

and train people to know them
If you only learn one thing today: 



In the absence of training and documentation most people
will be surprised by the data even when nothing is wrong.
What do we want?

Evidence Based Decision Making

When do we want it?

After Peer Review
Data Testing
• Accuracy, Consistency, Completeness Tests
• On records and relationships
• Relationship Consistency Tests
Test Objectives
• Accuracy - is it true?
• Consistent - does it obey the rules?
• Complete - what is missing?
Data Test Scopes
• Within a record (SQL row, NoSQL document, etc.)
• Within a set (SQL Table, etc.)
• Within an Application (HRIS, ERP, etc.)*
• Across the organisation*
* - combinatorial
Monitoring
Monitor Data as if it is
Infrastructure
When Where Who
Code
Event driven
Commit / PR
Test Developers fix errors
Infrastructure
Constantly at tight
intervals
Production
Automated repair
failover to Ops
Data Constantly Production
Automated repair
failover to data
steward
Data Production Value
Development
Idea
Value Pipeline
Innovation Pipeline
continuous data monitoring,
continuous application monitoring,
periodic code testing.
Pipelines
• Monitor each step in the pipeline
• If steps are idempotent, kill and retry once any
step whose measures are anomalous
• Raise an incident if the retry is also anomalous
• Insert data quality gates between steps in test
design and in response to incidents
Pipeline Measures
For each step in a data pipeline:
• Duration
• Cost (BUFFER_GETS, PAGE_READS, CPU Seconds)
• Records in
• Records out
Quality Measures
• Accuracy and completeness checks 

are number of errors and error % 

for every scope and time period
• Consistency checks 

are errors and error % 

for each rule and time period
How to Test
Real World

Accuracy
Cache Accuracy Complete Consistent
Record
Talk to people

(Call centre
verification)
Compare to
system of record
Permissable
Values
Rules within the
record
Set n/a
Compare to
system of record
Reconciliation
Rules within the
set
Application n/a n/a n/a
Rules between
types
Organisation n/a n/a n/a
Rules between
applications
When to Test
Real World

Accuracy
Cache Accuracy Complete Consistent
Record Infrequent Regular Every read Every read
Set n/a Regular Regular Regular
Application n/a n/a Regular Regular
Organisation n/a n/a n/a Regular
The journey of 

a thousand applications 

starts with 

a single test.
steven@claritycloudworks.com
+64 27 620 1237
claritycloudworks.com
Steven Ensslen

Más contenido relacionado

La actualidad más candente

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
DataWorks Summit
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
DATAVERSITY
 

La actualidad más candente (20)

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Business Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and StewardshipBusiness Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and Stewardship
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Data-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality EngineeringData-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality Engineering
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 

Similar a Measuring Data Quality with DataOps

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 

Similar a Measuring Data Quality with DataOps (20)

Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Agility for big data
Agility for big data Agility for big data
Agility for big data
 
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
 
Data Quality at the Speed of Work
Data Quality at the Speed of WorkData Quality at the Speed of Work
Data Quality at the Speed of Work
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your audits
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
 
Digitalization in Electronics Manufacturing
Digitalization in Electronics ManufacturingDigitalization in Electronics Manufacturing
Digitalization in Electronics Manufacturing
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren Nathan
 
Audit: Breaking Down Barriers to Increase the Use of Data Analytics
Audit: Breaking Down Barriers to Increase the Use of Data AnalyticsAudit: Breaking Down Barriers to Increase the Use of Data Analytics
Audit: Breaking Down Barriers to Increase the Use of Data Analytics
 

Último

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Último (20)

ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Measuring Data Quality with DataOps

  • 1. Clarity Cloudworks illuminating issues before they become problems
  • 2. Development and Operations 
 are not 
 the only groups in IT
  • 3. Data Teams • Are focused on urgent, unplanned work • Traditionally operate the systems they develop, because they don’t perceive hand-o! is possible • Scant theory, 
 what little writing exists is technology-focused
  • 4. The DataOps Manifesto Whether referred to as data science, data engineering, data management, big data, business intelligence, or the like, through our work we have come to value in analytics: https://www.dataopsmanifesto.org/
  • 8. Experimentation, iteration, and feedback over extensive upfront design
  • 9. Cross-functional ownership of operations over siloed responsibilities
  • 10. Do you have 
 Bad Data? In the absence of information, 
 rumour becomes widely believed.
 Rumour is biased toward emotion, 
 which in work places tends to be negative.
  • 11. What problems does data quality cause? • Data / ETL pipelines crash, 
 resulting in unavailable, stale, or incorrect data • > 80% of Data Scientists’ time spent 
 collecting data • Incorrect data is used for decisions 
 or published • Doubts about data hurt morale and 
 discourage evidence-based decision making
  • 12. What is Data Quality? Data quality is good 
 when people who inspect data see what they expect. Data quality is bad 
 when people are surprised by the data they see.
  • 15. Document data characteristics 
 and train people to know them If you only learn one thing today: 
 
 In the absence of training and documentation most people will be surprised by the data even when nothing is wrong.
  • 16. What do we want?
 Evidence Based Decision Making
 When do we want it?
 After Peer Review
  • 17. Data Testing • Accuracy, Consistency, Completeness Tests • On records and relationships • Relationship Consistency Tests
  • 18. Test Objectives • Accuracy - is it true? • Consistent - does it obey the rules? • Complete - what is missing?
  • 19. Data Test Scopes • Within a record (SQL row, NoSQL document, etc.) • Within a set (SQL Table, etc.) • Within an Application (HRIS, ERP, etc.)* • Across the organisation* * - combinatorial
  • 21. Monitor Data as if it is Infrastructure When Where Who Code Event driven Commit / PR Test Developers fix errors Infrastructure Constantly at tight intervals Production Automated repair failover to Ops Data Constantly Production Automated repair failover to data steward
  • 22. Data Production Value Development Idea Value Pipeline Innovation Pipeline continuous data monitoring, continuous application monitoring, periodic code testing.
  • 23. Pipelines • Monitor each step in the pipeline • If steps are idempotent, kill and retry once any step whose measures are anomalous • Raise an incident if the retry is also anomalous • Insert data quality gates between steps in test design and in response to incidents
  • 24.
  • 25. Pipeline Measures For each step in a data pipeline: • Duration • Cost (BUFFER_GETS, PAGE_READS, CPU Seconds) • Records in • Records out
  • 26. Quality Measures • Accuracy and completeness checks 
 are number of errors and error % 
 for every scope and time period • Consistency checks 
 are errors and error % 
 for each rule and time period
  • 27. How to Test Real World
 Accuracy Cache Accuracy Complete Consistent Record Talk to people
 (Call centre verification) Compare to system of record Permissable Values Rules within the record Set n/a Compare to system of record Reconciliation Rules within the set Application n/a n/a n/a Rules between types Organisation n/a n/a n/a Rules between applications
  • 28. When to Test Real World
 Accuracy Cache Accuracy Complete Consistent Record Infrequent Regular Every read Every read Set n/a Regular Regular Regular Application n/a n/a Regular Regular Organisation n/a n/a n/a Regular
  • 29. The journey of 
 a thousand applications 
 starts with 
 a single test.
  • 30. steven@claritycloudworks.com +64 27 620 1237 claritycloudworks.com Steven Ensslen