SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
page 1copyright Real-Time Technology Solutions, Inc., updated January, 2015
Enterprise Business Intelligence
& Data Warehousing:
The Data Quality Conundrum
A study by RTTS
For Business and IT Professionals
By Bill Hayduk
page 2copyright Real-Time Technology Solutions, Inc., updated January, 2015
Enterprise BI & Data Warehousing: the Data Quality Conundrum
Poor data quality is a huge and exponentially growing problem. Big Data is causing massive increases in
the volume (amount of data), velocity (speed of data in and out), and variety (range of data types and
sources) of data. Therefore, concern for poor data quality or ‘bad data’ is now a critical issue.
According to Gartner, “the average organization loses $8.2 million annually through poor data quality,
with 22% estimated their annual losses resulting from bad data at $20 million and 4% put that figure as
high as an astounding $100 million”. And InformationWeek found that “46% of companies cite data
quality as a barrier for adopting Business Intelligence products”.
We recently performed a study that included responses from over 200 companies interested in
improving the data quality in their Business Intelligence, ETL software and Enterprise Data Warehouse
implementations. Below are our findings, along with context around these results.
SECTION I.
Our first section polled customers on their architecture: specifically on their data warehouse, ETL and
business intelligence software and vendors.
Enterprise Data Warehouse Software
The top data warehouse vendors are
Oracle (plus MySQL and Exadata) as
number one and Teradata (and Aster Data)
as number two and every other one far
behind. This is in sync with research by
most analyst firms that track this platform.
Oracle and Teradata dominate the space
with both their loyal customer base and
their innovation.
Analyst firm Gartner, in its 2013 Magic
Quadrant for Data Warehouse Database
Management Systems report, projected a
10% growth in the database management
system market and it pinpointed a
significant increase in organizations
seeking to deploy data warehouses for the
first time.
page 3copyright Real-Time Technology Solutions, Inc., updated January, 2015
Business Intelligence Software
IBM leads the BI space with Cognos,
followed by the surprising performance of
Microsoft at second and Oracle’s combined
offering at 3rd. Unexpectedly, ‘other
vendors’ were chosen 18% of the time,
meaning there is still room for growth in the
BI space.
Analyst firm IDC stated that the market is
now forecast to continue to grow at a 9.8%
compound annual growth rate through
2016. One key observation they made was
“the media attention on Big Data has put
broader business analytics on the agenda of
more senior executives.”
ETL (Extract/Transform/Load) Software
Here we see another surprising result of our
survey. Microsoft finished first in the ETL
Vendor software survey. Informatica
PowerCenter, the most widely known vendor
finished second, closely followed by IBM’s
combined offering at third. It is interesting to
note that ‘other’ came in first above
Microsoft. There is still a large contingent of
companies using home-grown and open
source software in the marketplace.
According to analyst firm Forrester, “The
enterprise ETL market continues to grow at a
healthy pace as more enterprises replace
manual scripts with packaged ETL solutions.
This migration…toward ETL tools is driven by
the need to support growing and increasingly
complex data management requirements.”
page 4copyright Real-Time Technology Solutions, Inc., updated January, 2015
Current Data Warehouse Size
We inquired as to the current size of the data warehouse implementations. We discovered that 91%
were less than 100 Terabytes and 52% were less than 500 Gigabytes. Interestingly, the largest sector
(33%) of implementations was between 1 Terabyte and 100 Terabytes. This is a significant increase in
data size when the largest sector in our poll 2 years ago was measured in Gigabytes.
SECTION II.
In Section II, we analyzed the current quality situation of firms to determine their effectiveness.
page 5copyright Real-Time Technology Solutions, Inc., updated January, 2015
Current Testing Strategy
We inquired as to which test strategy was
being implemented: (1) testing across ETL legs
(source-to-target DWH, DWH to data mart, etc.), (2)
utilizing Minus Queries (see full definition of minus
queries here http://bit.ly/13lmp8N), and/or (3)
comparing row counts from source-to-target.
The preferred strategy is to utilize a
combination of the three, depending upon the
role (tester, ETL developer, operations). But if
one were going to choose a single strategy to
check for data quality, it would be (1). We
found that 30% of companies polled only verify
row counts and only 7% implemented all 3 as
their preferred strategy. Many singled out a
lack of automated testing and a lack of testing
resources as reasons they did not deploy a
more rigorous testing strategy.
Current Test Execution Method
When surveying customers on their current
test execution method, we found that 60% of
testing is currently performed manually.
Manual testing consists of extracting data from
the source databases, files and XML and also
extracting data from the data warehouse after
it goes through the ETL process and then
comparing these data sets manually, by eye.
This is quite extraordinary when considering
that the average data warehouse is measured
in gigabytes and typical tests return millions of
rows and upwards of hundreds of columns,
meaning millions of sets of data to compare.
Therefore, testers can only sample the
comparisons for practical purposes. Vendor
tools come in second and a home-grown
finished third.
Data Quality and Testing Challenges
When customers were asked about their biggest challenges when it came to testing the data for
accuracy, the clear top choice was ‘No Automation’. This goes hand-in-hand with the 2nd biggest
challenge – that testing manually is very time consuming. This was followed by the lack of a test
management process and/or tool, not enough data coverage and no reporting on the testing effort.
page 6copyright Real-Time Technology Solutions, Inc., updated January, 2015
Percent of data coverage by current test process
When determining the amount of
data coverage that companies’
current test process provides, it is
clear that there is not much data
coverage at all. Of those companies
surveyed, 84% had less than 50
percent data coverage, 58% had less
than 25 percent coverage, 33% had
less than 5 percent and 29% of
companies had less than 1 percent.
The 14% who had 100 percent
coverage had data warehouse
implementations less than 500GB in
size and were interested in finding a
data warehouse testing tool that could speed up the testing cycle. Also, the coverage represents the
amount of data brought back by SQL queries, not the amount compared. Since comparisons (as noted
above) are typically performed by visually reconciling the 2 data sets, it is impractical for more than 5-
10% of the data to be compared.
Ratio of Developers to Testers
Average ratio: 2.1 to 1
Median ratio: 1.7 to 1
Highest ratio: 20 to 1
Lowest ratio: 2 to 3
We found that, for the most part, the ratio of developer to tester was standard in principle. The issue
that most firms surveyed stated was that they could not get enough data coverage. The reason: ETL
developers are utilizing some form of tool to make their jobs faster and more efficient while most
testers are not.
page 7copyright Real-Time Technology Solutions, Inc., updated January, 2015
Effects of Bad Data
We surveyed customers on the impact of bad data on their organizations. Of the ones who answered,
100 percent said that they experienced some form of bad data in their data warehouses. Below are
samplings of their free-form answers on the effects that bad data caused them.
 Incorrect business intelligence reports
 Poor delivery quality & customer dissatisfaction resulting in re-work
 Missing revenue opportunities
 Critical business decisions rely on underlying bad data
 Bad quality of projects
 Negative business Impact
 SLA Issue with our customers
 Major embarrassments to our team
 Long working hours to fix bad data
Conclusion
Many companies are using Business Intelligence (BI) to make strategic decisions in the hope of gaining a
competitive advantage in a tough business landscape. But Bad Data will cause them to make decisions that
will cost their firms millions of dollars. It is clear from the results of this survey that companies are not
providing the level of data quality that C-level executives need to make a strategic decisions. Most firms
test far less than 10% of their data by sampling the data and the comparisons. Therefore, at least 90% of
data remains untested. Since bad data exists in all databases, firms need to test closer to 100% of their data
and guarantee that this critical information is accurate.
There is no practical way for testers to verify this level of coverage without the use of automated testing
tools. An automated testing solution will speed up the process, provide much more data coverage,
perform comparisons automated, and provide reports for audit trails. It is clear that as data grows
exponentially, a more complete solution is needed to keep enterprise-level data clean.
page 8copyright Real-Time Technology Solutions, Inc., updated January, 2015
About The Author
Bill Hayduk
Founder, CEO, President
Bill founded software and services firm RTTS in 1996. Under Bill's
guidance, RTTS has supported over 600 projects at over 400
corporations, from Fortune 500 to midsize firms. Bill is also the
business leader on QuerySurge, RTTS’ industry-leading data warehouse
testing tool.
Bill holds an MS in Computer Information Systems from the Zicklin
School of Business (Baruch College) and a BA in Economics from
Villanova University.
References
 Gartner: Magic Quadrant for Data Warehouse Database Management Systems (January 31, 2013)
 IDC: Worldwide Business Analytics Software 2012-2016 Forecast and 2011 Vendor Shares (June 2012)
 The Forrester Wave™: Enterprise ETL, Q1 2012 (February 27, 2012)
 InformationWeek: 2012 BI and Information Management Trends (November 2011)
 RTTS 2013 Client Survey on BI and Data Warehouse Quality

Más contenido relacionado

La actualidad más candente

NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016
Hank Lydick
 

La actualidad más candente (19)

Survey Results Age Of Unbounded Data June 03 10
Survey Results Age Of Unbounded Data June 03 10Survey Results Age Of Unbounded Data June 03 10
Survey Results Age Of Unbounded Data June 03 10
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
 
Analytics in-action-survey
Analytics in-action-surveyAnalytics in-action-survey
Analytics in-action-survey
 
"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015
 
Ibm business trends
Ibm business trendsIbm business trends
Ibm business trends
 
Datacenter industry survey 2015
Datacenter industry survey 2015Datacenter industry survey 2015
Datacenter industry survey 2015
 
Analytics solution
Analytics solutionAnalytics solution
Analytics solution
 
NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
 
2019 Data Trends Survey Results
2019 Data Trends Survey Results2019 Data Trends Survey Results
2019 Data Trends Survey Results
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data Management
 
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?
 
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
Accenture - GE Industrial Internet Changing Competitive Landscape Industries ...
 
Innovating with analytics
Innovating with analyticsInnovating with analytics
Innovating with analytics
 
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
 
To Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is EssentialTo Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is Essential
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
 
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
Impact of Data Analytics in Changing the Future of Business and Challenges Fa...
 

Similar a Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum

Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing RiskModernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Worldwide Business Research
 

Similar a Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum (20)

Going Big : Why Companies Need to Focus on Operational Analytics
Going Big : Why Companies Need to Focus on Operational Analytics Going Big : Why Companies Need to Focus on Operational Analytics
Going Big : Why Companies Need to Focus on Operational Analytics
 
Big data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-reportBig data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-report
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
 
Big Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderBig Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not Harder
 
Integrated_Insights_Platform_web
Integrated_Insights_Platform_webIntegrated_Insights_Platform_web
Integrated_Insights_Platform_web
 
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
Big Data and Analytics: The New Underpinning for Supply Chain Success? - 17 F...
 
Building an Effective Data Management Strategy
Building an Effective Data Management StrategyBuilding an Effective Data Management Strategy
Building an Effective Data Management Strategy
 
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
 
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
Amcis2015 erf strategic_informationsystemsevaluationtrack_improvingenterprise...
 
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing RiskModernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
 
The Digital Procurement Era
The Digital Procurement EraThe Digital Procurement Era
The Digital Procurement Era
 
dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdf
 
Infographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingInfographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data Testing
 
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsInfographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
 
The Face of the New Enterprise
The Face of the New EnterpriseThe Face of the New Enterprise
The Face of the New Enterprise
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?
 
Data Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from DataData Trends for 2019: Extracting Value from Data
Data Trends for 2019: Extracting Value from Data
 
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
 

Más de RTTS

How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 

Más de RTTS (20)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 

Último

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Último (20)

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 

Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum

  • 1. page 1copyright Real-Time Technology Solutions, Inc., updated January, 2015 Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum A study by RTTS For Business and IT Professionals By Bill Hayduk
  • 2. page 2copyright Real-Time Technology Solutions, Inc., updated January, 2015 Enterprise BI & Data Warehousing: the Data Quality Conundrum Poor data quality is a huge and exponentially growing problem. Big Data is causing massive increases in the volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources) of data. Therefore, concern for poor data quality or ‘bad data’ is now a critical issue. According to Gartner, “the average organization loses $8.2 million annually through poor data quality, with 22% estimated their annual losses resulting from bad data at $20 million and 4% put that figure as high as an astounding $100 million”. And InformationWeek found that “46% of companies cite data quality as a barrier for adopting Business Intelligence products”. We recently performed a study that included responses from over 200 companies interested in improving the data quality in their Business Intelligence, ETL software and Enterprise Data Warehouse implementations. Below are our findings, along with context around these results. SECTION I. Our first section polled customers on their architecture: specifically on their data warehouse, ETL and business intelligence software and vendors. Enterprise Data Warehouse Software The top data warehouse vendors are Oracle (plus MySQL and Exadata) as number one and Teradata (and Aster Data) as number two and every other one far behind. This is in sync with research by most analyst firms that track this platform. Oracle and Teradata dominate the space with both their loyal customer base and their innovation. Analyst firm Gartner, in its 2013 Magic Quadrant for Data Warehouse Database Management Systems report, projected a 10% growth in the database management system market and it pinpointed a significant increase in organizations seeking to deploy data warehouses for the first time.
  • 3. page 3copyright Real-Time Technology Solutions, Inc., updated January, 2015 Business Intelligence Software IBM leads the BI space with Cognos, followed by the surprising performance of Microsoft at second and Oracle’s combined offering at 3rd. Unexpectedly, ‘other vendors’ were chosen 18% of the time, meaning there is still room for growth in the BI space. Analyst firm IDC stated that the market is now forecast to continue to grow at a 9.8% compound annual growth rate through 2016. One key observation they made was “the media attention on Big Data has put broader business analytics on the agenda of more senior executives.” ETL (Extract/Transform/Load) Software Here we see another surprising result of our survey. Microsoft finished first in the ETL Vendor software survey. Informatica PowerCenter, the most widely known vendor finished second, closely followed by IBM’s combined offering at third. It is interesting to note that ‘other’ came in first above Microsoft. There is still a large contingent of companies using home-grown and open source software in the marketplace. According to analyst firm Forrester, “The enterprise ETL market continues to grow at a healthy pace as more enterprises replace manual scripts with packaged ETL solutions. This migration…toward ETL tools is driven by the need to support growing and increasingly complex data management requirements.”
  • 4. page 4copyright Real-Time Technology Solutions, Inc., updated January, 2015 Current Data Warehouse Size We inquired as to the current size of the data warehouse implementations. We discovered that 91% were less than 100 Terabytes and 52% were less than 500 Gigabytes. Interestingly, the largest sector (33%) of implementations was between 1 Terabyte and 100 Terabytes. This is a significant increase in data size when the largest sector in our poll 2 years ago was measured in Gigabytes. SECTION II. In Section II, we analyzed the current quality situation of firms to determine their effectiveness.
  • 5. page 5copyright Real-Time Technology Solutions, Inc., updated January, 2015 Current Testing Strategy We inquired as to which test strategy was being implemented: (1) testing across ETL legs (source-to-target DWH, DWH to data mart, etc.), (2) utilizing Minus Queries (see full definition of minus queries here http://bit.ly/13lmp8N), and/or (3) comparing row counts from source-to-target. The preferred strategy is to utilize a combination of the three, depending upon the role (tester, ETL developer, operations). But if one were going to choose a single strategy to check for data quality, it would be (1). We found that 30% of companies polled only verify row counts and only 7% implemented all 3 as their preferred strategy. Many singled out a lack of automated testing and a lack of testing resources as reasons they did not deploy a more rigorous testing strategy. Current Test Execution Method When surveying customers on their current test execution method, we found that 60% of testing is currently performed manually. Manual testing consists of extracting data from the source databases, files and XML and also extracting data from the data warehouse after it goes through the ETL process and then comparing these data sets manually, by eye. This is quite extraordinary when considering that the average data warehouse is measured in gigabytes and typical tests return millions of rows and upwards of hundreds of columns, meaning millions of sets of data to compare. Therefore, testers can only sample the comparisons for practical purposes. Vendor tools come in second and a home-grown finished third. Data Quality and Testing Challenges When customers were asked about their biggest challenges when it came to testing the data for accuracy, the clear top choice was ‘No Automation’. This goes hand-in-hand with the 2nd biggest challenge – that testing manually is very time consuming. This was followed by the lack of a test management process and/or tool, not enough data coverage and no reporting on the testing effort.
  • 6. page 6copyright Real-Time Technology Solutions, Inc., updated January, 2015 Percent of data coverage by current test process When determining the amount of data coverage that companies’ current test process provides, it is clear that there is not much data coverage at all. Of those companies surveyed, 84% had less than 50 percent data coverage, 58% had less than 25 percent coverage, 33% had less than 5 percent and 29% of companies had less than 1 percent. The 14% who had 100 percent coverage had data warehouse implementations less than 500GB in size and were interested in finding a data warehouse testing tool that could speed up the testing cycle. Also, the coverage represents the amount of data brought back by SQL queries, not the amount compared. Since comparisons (as noted above) are typically performed by visually reconciling the 2 data sets, it is impractical for more than 5- 10% of the data to be compared. Ratio of Developers to Testers Average ratio: 2.1 to 1 Median ratio: 1.7 to 1 Highest ratio: 20 to 1 Lowest ratio: 2 to 3 We found that, for the most part, the ratio of developer to tester was standard in principle. The issue that most firms surveyed stated was that they could not get enough data coverage. The reason: ETL developers are utilizing some form of tool to make their jobs faster and more efficient while most testers are not.
  • 7. page 7copyright Real-Time Technology Solutions, Inc., updated January, 2015 Effects of Bad Data We surveyed customers on the impact of bad data on their organizations. Of the ones who answered, 100 percent said that they experienced some form of bad data in their data warehouses. Below are samplings of their free-form answers on the effects that bad data caused them.  Incorrect business intelligence reports  Poor delivery quality & customer dissatisfaction resulting in re-work  Missing revenue opportunities  Critical business decisions rely on underlying bad data  Bad quality of projects  Negative business Impact  SLA Issue with our customers  Major embarrassments to our team  Long working hours to fix bad data Conclusion Many companies are using Business Intelligence (BI) to make strategic decisions in the hope of gaining a competitive advantage in a tough business landscape. But Bad Data will cause them to make decisions that will cost their firms millions of dollars. It is clear from the results of this survey that companies are not providing the level of data quality that C-level executives need to make a strategic decisions. Most firms test far less than 10% of their data by sampling the data and the comparisons. Therefore, at least 90% of data remains untested. Since bad data exists in all databases, firms need to test closer to 100% of their data and guarantee that this critical information is accurate. There is no practical way for testers to verify this level of coverage without the use of automated testing tools. An automated testing solution will speed up the process, provide much more data coverage, perform comparisons automated, and provide reports for audit trails. It is clear that as data grows exponentially, a more complete solution is needed to keep enterprise-level data clean.
  • 8. page 8copyright Real-Time Technology Solutions, Inc., updated January, 2015 About The Author Bill Hayduk Founder, CEO, President Bill founded software and services firm RTTS in 1996. Under Bill's guidance, RTTS has supported over 600 projects at over 400 corporations, from Fortune 500 to midsize firms. Bill is also the business leader on QuerySurge, RTTS’ industry-leading data warehouse testing tool. Bill holds an MS in Computer Information Systems from the Zicklin School of Business (Baruch College) and a BA in Economics from Villanova University. References  Gartner: Magic Quadrant for Data Warehouse Database Management Systems (January 31, 2013)  IDC: Worldwide Business Analytics Software 2012-2016 Forecast and 2011 Vendor Shares (June 2012)  The Forrester Wave™: Enterprise ETL, Q1 2012 (February 27, 2012)  InformationWeek: 2012 BI and Information Management Trends (November 2011)  RTTS 2013 Client Survey on BI and Data Warehouse Quality