SlideShare una empresa de Scribd logo
1 de 49
Data Quality Challenges and Solution Approaches in Yahoo!'s Massive Data Environment Data Quality Manager: Dan DefendData Quality Architect: Aparna Vani DataVersity Webinar September 29, 2011 Abstract: By applying industry principles and techniques the Data Quality program has provided proactive and reactive system solutions to Audience data issues and root causes by addressing technical challenges of data quality at scale and engaging and leveraging the rest of the organization in the solution: from product teams all through the data stack (data sourcing, ETL, aggs and analytics) to analysts and sciences teams who consume the data. This methodology is now being scaled to the all data across Yahoo! including Search and Display Advertising.   © 2011 Yahoo!.  All rights reserved.
Unlocking the Power of Data 2
MEDIA Tech- nology 3
The Anatomy of a Yahoo! Web Page Buzz Targeted Content Apps Ads Content Y! links
What Yahoo! Does With Its Data? Analytics & Business Insights – data-driven decisions How many people visited Home Page today and what did they click on? What impact did the Japan tsunami have News and global engagement? Targeting What products are you interested in based on your recent web usage? Advertisers pay a lot of $$ for good targeting. Targeted content means better user engagement. Experimentation “Live user testing” What layout do users like best?  Are most profitable? 5
Which is the Better Home Page? 6
Which Ad Position Makes More Money? 7
Yahoo! Has a LOT of Data Leading Internet Portal and Software Supplier[1] Serves 640 MM users or84.5% of US internet users Top ranked site in Mail, Messenger, Home Page, and more Collects over 25 terabytes of behavioral data per day 2 U.S. Library of Congress equivalents every day [1] US Yahoo! Audience Measurement Report. comScore, Jan 2011 8
Overview of Yahoo! Data Pipelines ,[object Object]
Processes data from all Yahoo! properties web server logs and delivers audience engagement metrics
Display Advertising
Analytics and billing
Guaranteed and Non-guaranteed delivery ad campaigns
Exchange networks leverage other advertisers and publishers
Search Advertising
Analytics and billing9
Yahoo! Data Pipeline DataExtraction Web Pages Extract Transform Load Business Insights Data Warehouse Σ AdServers Targeting Reporting systems
Dimensions of Yahoo! Data Quality ` DQ Team abuse traffic invalid events metadata integrity external src agreement Σ events uncollected key metric agreement 11
DATA QUALITY = BUSINESS UPImpacts of poor quality of data to Yahoo! $$ Loss: Revenue, Refunds Wasted resources ,[object Object]
Sciences frustration and attrition (“QA the data”)Incorrect insights ,[object Object]
Suboptimal targeting
Credibility loss – customers don’t trust the dataExternal impact ,[object Object],REAL $$!!!  …  if managed reactively + = 12
Audience DQ Solution Path 13
Circa 2007: Significant Opportunities for Improvement in Audience Pipeline ,[object Object]
Property sourcing:
Is this a page view?  No standard
Tagging & server errors
Data dropped in collection system
Data discrepancies found by customer end of month or quarter
Abuse and robots skew metrics due to minimal traffic protection
Data sources that should agree, don’t
Internal customers don’t trust the data14
We Started to Measure It ` Source:  ticket volumes and root cause analysis Σ Key Finding: >80% data issue from the source 15
` Root Causes Differ Per Stage Source:  root cause analysis Σ Insights into point fixes and system solutions 16
SolutionIdentification of issues followed by quick wins and system solutions Developed DQ Methodology for large data systems 17
Improvements & Results 18
Central DQ Team Structure 19
DQ Proactive StandardsBuilding Data Quality into Products 20
Property DQ StandardsSourcing and Consuming Clean Data ` Σ Data Issues DQ Champs Classification, triage, drive fixes Metric/monitor priorities,   pain points       Server setup cookbook and validation Instrumentation validation:           PV, URL, CSC Proactive Reactive Customer- Driven 21
DQ Standards Overview (Proactive) 22
Support for DQ in the QE Cycle Data Validation ` ` Test Environment E2E data validation tests covering major  customer use cases in pre-release QE cycle Σ Note: Specific tools are not currently part of DQ standard  but partnership in this area may make sense 23
Data Validation Coverage in QEChecklist and Examples Checklist ,[object Object]
Compare results from legacy system or previous version of system (with production data)
Suggest organizing per DQ dimensions: completeness, accuracy, validity, consistency, integrityExamples ,[object Object]
Completeness: Include coverage to validate that the volume sent is the amount of load received, processed and output.
Accuracy - Test that the data input equals the data output. If data is requested for a specific day in one time zone but fetched in another the data will not be accurate.24
Support for DQ in the QE CycleQE Coverage of DQ Features ` Σ Functional test coverage for built-in DQ  features, e.g., in-line DQ checks 25
Data Sourcing Case Study ,[object Object]
Problem not discovered for over 2 days.  Rollback occurred on the 3rd day.

Más contenido relacionado

La actualidad más candente

Data Quality
Data QualityData Quality
Data QualityVijaya K
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Horizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise SolutionsHorizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise SolutionsKeyMark
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Oracle Enterprise Staffing Solutions
Oracle Enterprise Staffing SolutionsOracle Enterprise Staffing Solutions
Oracle Enterprise Staffing SolutionsBOSS Technologies
 
Corporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services OverviewCorporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services OverviewBoris Otto
 
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...Denodo
 
Virtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis WorkshopVirtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis WorkshopCCG
 
The what, why, and how of master data management
The what, why, and how of master data managementThe what, why, and how of master data management
The what, why, and how of master data managementMohammad Yousri
 
China data-mngnt-solution-market-report
China data-mngnt-solution-market-reportChina data-mngnt-solution-market-report
China data-mngnt-solution-market-reportssuser7709011
 
Sap information steward
Sap information stewardSap information steward
Sap information stewardytrhvk
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph DatabaseNeo4j
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmapvictorlbrown
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
Lean Master Data Management
Lean Master Data ManagementLean Master Data Management
Lean Master Data Managementnnorthrup
 
Master data management (mdm) & plm in context of enterprise product management
Master data management (mdm) & plm in context of enterprise product managementMaster data management (mdm) & plm in context of enterprise product management
Master data management (mdm) & plm in context of enterprise product managementTata Consultancy Services
 
Overall Approach to Data Quality ROI
Overall Approach to Data Quality ROIOverall Approach to Data Quality ROI
Overall Approach to Data Quality ROIFindWhitePapers
 

La actualidad más candente (20)

Data Quality
Data QualityData Quality
Data Quality
 
Bi&dw methodology
Bi&dw methodologyBi&dw methodology
Bi&dw methodology
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Horizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise SolutionsHorizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise Solutions
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Oracle Enterprise Staffing Solutions
Oracle Enterprise Staffing SolutionsOracle Enterprise Staffing Solutions
Oracle Enterprise Staffing Solutions
 
Corporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services OverviewCorporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services Overview
 
5 Steps To Master Data Management
5 Steps To Master Data Management5 Steps To Master Data Management
5 Steps To Master Data Management
 
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
 
Virtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis WorkshopVirtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis Workshop
 
The what, why, and how of master data management
The what, why, and how of master data managementThe what, why, and how of master data management
The what, why, and how of master data management
 
China data-mngnt-solution-market-report
China data-mngnt-solution-market-reportChina data-mngnt-solution-market-report
China data-mngnt-solution-market-report
 
Sap information steward
Sap information stewardSap information steward
Sap information steward
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph Database
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Data Warehouse 102
Data Warehouse 102Data Warehouse 102
Data Warehouse 102
 
Lean Master Data Management
Lean Master Data ManagementLean Master Data Management
Lean Master Data Management
 
Master data management (mdm) & plm in context of enterprise product management
Master data management (mdm) & plm in context of enterprise product managementMaster data management (mdm) & plm in context of enterprise product management
Master data management (mdm) & plm in context of enterprise product management
 
Overall Approach to Data Quality ROI
Overall Approach to Data Quality ROIOverall Approach to Data Quality ROI
Overall Approach to Data Quality ROI
 

Similar a Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data

Data Analytics & Hospital Asset Managemenr
Data Analytics & Hospital Asset ManagemenrData Analytics & Hospital Asset Managemenr
Data Analytics & Hospital Asset ManagemenrDoctor's Bazaar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Health Catalyst
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesCognizant
 
Software Productivity Framework
Software Productivity Framework Software Productivity Framework
Software Productivity Framework Zinnov
 
The Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurityThe Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurityTyrone Grandison
 
Sample Risk Assessment Report- QuantumBanking.pdf
Sample Risk Assessment Report- QuantumBanking.pdfSample Risk Assessment Report- QuantumBanking.pdf
Sample Risk Assessment Report- QuantumBanking.pdfSathishKumar960827
 
OberservePoint - The Digital Data Quality Playbook
OberservePoint - The Digital Data Quality  PlaybookOberservePoint - The Digital Data Quality  Playbook
OberservePoint - The Digital Data Quality PlaybookObservePoint
 
From Data to Insights: How IT Operations Data Can Boost Quality
From Data to Insights: How IT Operations Data Can Boost QualityFrom Data to Insights: How IT Operations Data Can Boost Quality
From Data to Insights: How IT Operations Data Can Boost QualityCognizant
 
Observability in highly distributed systems
Observability in highly distributed systemsObservability in highly distributed systems
Observability in highly distributed systemsDevOps Indonesia
 
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Saama
 
Achieve Excellence through Customer Experience
Achieve Excellence through Customer ExperienceAchieve Excellence through Customer Experience
Achieve Excellence through Customer ExperienceNaveen Agarwal
 
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...Precisely
 
Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)
Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)
Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)Lviv Startup Club
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...Cognizant
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information StewardVinny (Gurvinder) Ahuja
 
28 - Panorama Necto 14 support - visualization & data discovery solution
28 - Panorama Necto 14 support - visualization & data discovery solution28 - Panorama Necto 14 support - visualization & data discovery solution
28 - Panorama Necto 14 support - visualization & data discovery solutionPanorama Software
 

Similar a Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data (20)

dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
Data Analytics & Hospital Asset Managemenr
Data Analytics & Hospital Asset ManagemenrData Analytics & Hospital Asset Managemenr
Data Analytics & Hospital Asset Managemenr
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
 
Software Productivity Framework
Software Productivity Framework Software Productivity Framework
Software Productivity Framework
 
The Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurityThe Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurity
 
Sample Risk Assessment Report- QuantumBanking.pdf
Sample Risk Assessment Report- QuantumBanking.pdfSample Risk Assessment Report- QuantumBanking.pdf
Sample Risk Assessment Report- QuantumBanking.pdf
 
OberservePoint - The Digital Data Quality Playbook
OberservePoint - The Digital Data Quality  PlaybookOberservePoint - The Digital Data Quality  Playbook
OberservePoint - The Digital Data Quality Playbook
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
 
From Data to Insights: How IT Operations Data Can Boost Quality
From Data to Insights: How IT Operations Data Can Boost QualityFrom Data to Insights: How IT Operations Data Can Boost Quality
From Data to Insights: How IT Operations Data Can Boost Quality
 
Observability in highly distributed systems
Observability in highly distributed systemsObservability in highly distributed systems
Observability in highly distributed systems
 
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
 
Achieve Excellence through Customer Experience
Achieve Excellence through Customer ExperienceAchieve Excellence through Customer Experience
Achieve Excellence through Customer Experience
 
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
 
Teja Resume (1)
Teja Resume (1)Teja Resume (1)
Teja Resume (1)
 
Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)
Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)
Yuriy Gaiduchok: The Quest for Product Non-Functionality (UA)
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
 
28 - Panorama Necto 14 support - visualization & data discovery solution
28 - Panorama Necto 14 support - visualization & data discovery solution28 - Panorama Necto 14 support - visualization & data discovery solution
28 - Panorama Necto 14 support - visualization & data discovery solution
 

Más de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Más de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data

Notas del editor

  1. “Riding Giants” is only possible by using a recently-discovered method: tow-in surfingShow video: http://www.youtube.com/watch?v=LhKFTqxn6qs70’ wave = power of dataUDA DSI = jet-ski method (unlocking the data to harness the 70 wave)UDA DQ = getting the GPS coordinates correct so you are in the right place to catch it – without high quality data we miss the wave altogether!
  2. Yahoo business model = advertising