SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
1
Data Validation
         Overview




2
What is Data Validation?

  Identifying errors in data sets that have

 been Moved or Transformed to ensure

  they are Complete and Accurate and

  meet Expectations or Requirements.




                                              3
How most companies test today

•   Many companies perform testing manually by writing SQL
    scripts, using Excel, or hand coding testing logic into their
    integration processes

•   Reconciliation is done manually (if done at all)
    − basic SQL scripts (row counts, aggregates) or manually written mappings/logic

•   Customers estimate data testing SHOULD take 25-30% of all
    hours spent on Data Integration
    − Most customers admit they do not do enough data validation, resulting in
      poorer data quality and higher project risk

•   PowerCenter upgrades can take up to weeks or months to
    complete due to manual testing effort
    − It takes one day to upgrade the ETL software



                                                                                      4
Problems with Manual Testing
• Takes a long time and is expensive
   − Time is spent writing queries and waiting for them to run and
     then searching through the results

• Error-prone manual process
   − “Stare and compare”

• Cannot perform thorough testing
   − Time/Cost pressure leads to “try it here and there” approach
   − Testing ends when the deadline is reached, done or not

• Usual problems associated with writing custom code
   − No audit trail
   − No reuse
   − No methodology

                                                                     5
Current Approach: Like a Photo Hunt




                                      6
Current Approach: Stare and Compare
      Data Set #1           Data Set #2




                                          7
What is the Data Validation Option?

  DVO is a independent (black box)
     testing solution that provides
     automation, repeatability and

    auditability to virtually any data

   testing or reconciliation process.


                                         8
Some DVO use cases
Data Being Transformed
  •   ETL Reconciliation
  •   Data Masking
  •   ETL Testing
  •   Application Migration

Data is Identical
  •   ETL version upgrade
  •   ETL Migration
  •   Database migration
  •   Application Retirement

                               9
Two Value Propositions for DVO
    Ensure the integrity of data as it moves through
                  the IT environment.
    Development & Test             Production Reconciliation

Provide automation for unit and   Protect the integrity of data that
      regression testing                   is loaded into
      of integration logic.            production systems.

Ensure that data produced by DI      Erroneous data due to failed
code meets requirements and       loads, faulty logic or operational
         expectations              issues is caught in a proactive
                                  automated manner and can be
                                        addressed as needed



                                                                       10
Two Value Propositions for DVO
Ensure the integrity of data as it moves through
              the IT environment.




                                                   11
How DVO works with PowerCenter

                            Data Validation Option
                                           Repository          Database                       Reports
              DVO Clients                 & Warehouse           Views

                                                                   V_Summary

                                                                  Id: name
                                                                  name: string
                                                                           V_Tests


                                 Define
                                                                  Price: integer
                                                                  DateId: name
                                                                        in: date
                                                                                V_Results
                                                                  Datename: string
                                                                        out: date

                                 Tests                            Salary: float
                                                                       Price: integer
                                                                       Date Id: name
                                                                             in: date
                                                                       Date name: string
                                                                             out: date
                                                                             Price: integer
                                                                       Salary: float
                                                                             Date in: date
                                                                             Date out: date
                                                                             Salary: float




     Execute Tests                                      Results
                                                                                                          Data
                                                                                                        Accessed

                                  Repository and Integration Services
Repository
                                                                                                                   Enterprise
                                PowerCenter                                                                          Data




                                                                                                                                12
Key Features of DVO
• Broad data connectivity
 • DBMS (Oracle, SQL Server, DB2, Sybase, Teradata, Netezza)
 • Mainframe (DB2 z/OS, DB2 AS/400, IMS, Adabas, MF Flat files, VSAM)
 • SalesForce.com , SAP transparent tables, SAS, ODBC and Flat files

• Numerous built-in tests
 • COUNT, COUNT_DISTINCT, COUNT_ROWS, MIN, MAX, AVG, SUM
 • SET AinB, SET BinA, SET AeqB
 • VALUE, OUTER VALUE, Expressions

• Model ETL constructs
 • LOOKUPs, Arbitrary SQL Relationships

• Other
 • Run from GUI or CLI (DVOCmd)
 • Built-in reporting



                                                                        13
Comparing DVO with Manual Testing




                                    14
Technology Company
Development
  and Test                                  Reduced data testing time by 80%
                                                  with Data Validation Option
KEY BUSINESS IMPERATIVE AND IT INITIATIVE

                          SAAS provider of Sales Compensation and analytics
                          • Data absolutely has to be correct as it affects peoples’
                            paychecks
                          • Very high visibility of the data with users
                          • Trust in the data is key

THE CHALLENGE                     INFORMATICA ADVANTAGE             RESULTS/BENEFITS
• New release every ~1 month      • With DVO they are able to       • Have created a test suite of
• 1 Full week of data testing       test 100,000s rows of data in     over 1000 Tests
  by QA team per release            regression tests                • Testers can manage the
• Developers wrote SQL for        • Developers no longer required     testing environment
  testing the data                  to write SQL                    • Can test large volumes of data
• Testers would execute the       • Testers are now empowered       • Testing time reduced from 1
  SQL, track errors and work        and independent of                week to 1 day (80% less)
  with Developers to resolve        developers
                                                                    • Spend “free time” on higher
• And who was testing the SQL                                         level tasks
  to make sure it was correct?



                           Informatica Confidential – Under NDA                                        15
Financial Services Company
 Production
Reconciliation                                Ensures DW is Complete and Accurate
                                                        with Data Validation Option
KEY BUSINESS IMPERATIVE AND IT INITIATIVE

                               Good data is essential to good business decisions. Their calculations of portfolio risk
                               and value must be correct.
                                • Spends “hundreds of millions” purchasing troubled debt in the USA
                                • The data and risk calculations on those assets must be correct.
                                • Bad data could cost them “millions” and put them out of business.


THE CHALLENGE                           INFORMATICA ADVANTAGE                      RESULTS/BENEFITS
 •   Business users were                  •   With DVO they are able to             •   DVO found where data was
     complaining about missing                perform detailed                          missing
     data in the systems.                     reconciliations across source         •   Found thousands of missing
 •   Data errors can lead to very             and target systems.                       records due to bad coding, &
     costly bad business decisions.       •   With DVO, they have a complete            improperly rerun failed jobs
 •   They were doing manual testing           audit trail.                          •   Reloaded all missing data in two
     via developer-written mappings                                                     weeks
     and PL/SQL                                                                     •   They are looking to implement
 •   Other products available today                                                     ongoing incremental validation
     could not meet their                                                               for all new data loaded into
     requirements                                                                       tables



                                Informatica Confidential – Under NDA                                                       16
Mid-size Technology Company
 Production
Reconciliation                                   Reconciling MDM data using
                                                       Data Validation Option
KEY BUSINESS IMPERATIVE AND IT INITIATIVE


                       Customer and contact hub is pivotal to efficient business operations
                       Millions of records processed across various systems
                       Ensure BAs, line managers and customers had access to accurate and
                       complete data based on their needs

THE CHALLENGE                     INFORMATICA ADVANTAGE         RESULTS/BENEFITS
• No easy way to reconcile        • DVO reconciled data         • Identified errors due to
  data in systems to identify       across systems (e.g.          faulty DI logic, and error
  bad data or identify extent       SalesForce and Hub)           handling process
  of errors                         and found:                  • Ensured incorrect
• Incorrectly augmented           • 1000s of missing records      records no longer being
  data in systems                   between systems               use in marketing
• Gold record data didn’t         • Incorrectly augmented         campaigns
  always match across               D&B data                    • Bad customer data no
  systems                         • Improperly coded golden       longer reaching customer
• Faulty records propagated         records                       in portal
  downstream.

                           Informatica Confidential – Under NDA                                17
18

Más contenido relacionado

La actualidad más candente

Interfacing In Form To Argus Safety
Interfacing In Form To Argus SafetyInterfacing In Form To Argus Safety
Interfacing In Form To Argus Safetydhiria00
 
Green Bay Google Transit
Green Bay Google TransitGreen Bay Google Transit
Green Bay Google Transitewug
 
Top100summit christina
Top100summit christinaTop100summit christina
Top100summit christinaChristina Geng
 
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraMichaël Figuière
 
Xml Syntax Quick Reference
Xml Syntax Quick ReferenceXml Syntax Quick Reference
Xml Syntax Quick ReferenceLiquidHub
 
Xm Lquickref
Xm LquickrefXm Lquickref
Xm LquickrefLiquidHub
 

La actualidad más candente (8)

Interfacing In Form To Argus Safety
Interfacing In Form To Argus SafetyInterfacing In Form To Argus Safety
Interfacing In Form To Argus Safety
 
Green Bay Google Transit
Green Bay Google TransitGreen Bay Google Transit
Green Bay Google Transit
 
Top100summit christina
Top100summit christinaTop100summit christina
Top100summit christina
 
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
 
Xml Syntax Quick Reference
Xml Syntax Quick ReferenceXml Syntax Quick Reference
Xml Syntax Quick Reference
 
Xm Lquickref
Xm LquickrefXm Lquickref
Xm Lquickref
 
En webinar jpa v2final
En webinar jpa v2finalEn webinar jpa v2final
En webinar jpa v2final
 
Wpf Tech Overview2009
Wpf Tech Overview2009Wpf Tech Overview2009
Wpf Tech Overview2009
 

Similar a Table29 Data Validation 95

E clinical solutions irug 2012 12sep2012
E clinical solutions irug 2012 12sep2012E clinical solutions irug 2012 12sep2012
E clinical solutions irug 2012 12sep2012Chandi Kodthiwada
 
Testing Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyTesting Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyKatherine Golovinova
 
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQAFest
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Paulo Gandra de Sousa
 
Informatica dvo training
Informatica dvo training  Informatica dvo training
Informatica dvo training keerthi124
 
A Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality GameA Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality GameInside Analysis
 
Dcom be-en-data-assessment-approach
Dcom be-en-data-assessment-approachDcom be-en-data-assessment-approach
Dcom be-en-data-assessment-approachfwathelet
 
Test data management a case study Presented at SiGIST
Test data management a case study Presented at SiGISTTest data management a case study Presented at SiGIST
Test data management a case study Presented at SiGISTrenardv74
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayTorana, Inc.
 
EmployDEX Consulting Services
EmployDEX Consulting ServicesEmployDEX Consulting Services
EmployDEX Consulting Serviceskhandaa
 
2010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week12010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week1Wolfram Arnold
 
Darpangupta resume
Darpangupta resumeDarpangupta resume
Darpangupta resumeDarpan Gupta
 
5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...
5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...
5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...Compuware APM
 
Real-world Entity Framework
Real-world Entity FrameworkReal-world Entity Framework
Real-world Entity FrameworkLynn Langit
 
Enterprise Systems Integration
Enterprise Systems IntegrationEnterprise Systems Integration
Enterprise Systems IntegrationVít Kotačka
 

Similar a Table29 Data Validation 95 (20)

E clinical solutions irug 2012 12sep2012
E clinical solutions irug 2012 12sep2012E clinical solutions irug 2012 12sep2012
E clinical solutions irug 2012 12sep2012
 
Testing Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyTesting Big Data solutions fast and furiously
Testing Big Data solutions fast and furiously
 
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Integration
IntegrationIntegration
Integration
 
Jithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL TestingJithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL Testing
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
 
PoEAA by Example
PoEAA by ExamplePoEAA by Example
PoEAA by Example
 
Informatica dvo training
Informatica dvo training  Informatica dvo training
Informatica dvo training
 
A Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality GameA Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality Game
 
Dcom be-en-data-assessment-approach
Dcom be-en-data-assessment-approachDcom be-en-data-assessment-approach
Dcom be-en-data-assessment-approach
 
Test data management a case study Presented at SiGIST
Test data management a case study Presented at SiGISTTest data management a case study Presented at SiGIST
Test data management a case study Presented at SiGIST
 
Resume sailaja
Resume sailajaResume sailaja
Resume sailaja
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
EmployDEX Consulting Services
EmployDEX Consulting ServicesEmployDEX Consulting Services
EmployDEX Consulting Services
 
2010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week12010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week1
 
Darpangupta resume
Darpangupta resumeDarpangupta resume
Darpangupta resume
 
5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...
5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...
5 IT Trends That Reduce Cost And Improve Web Performance - A Forrester and Go...
 
Real-world Entity Framework
Real-world Entity FrameworkReal-world Entity Framework
Real-world Entity Framework
 
Enterprise Systems Integration
Enterprise Systems IntegrationEnterprise Systems Integration
Enterprise Systems Integration
 

Table29 Data Validation 95

  • 1. 1
  • 2. Data Validation Overview 2
  • 3. What is Data Validation? Identifying errors in data sets that have been Moved or Transformed to ensure they are Complete and Accurate and meet Expectations or Requirements. 3
  • 4. How most companies test today • Many companies perform testing manually by writing SQL scripts, using Excel, or hand coding testing logic into their integration processes • Reconciliation is done manually (if done at all) − basic SQL scripts (row counts, aggregates) or manually written mappings/logic • Customers estimate data testing SHOULD take 25-30% of all hours spent on Data Integration − Most customers admit they do not do enough data validation, resulting in poorer data quality and higher project risk • PowerCenter upgrades can take up to weeks or months to complete due to manual testing effort − It takes one day to upgrade the ETL software 4
  • 5. Problems with Manual Testing • Takes a long time and is expensive − Time is spent writing queries and waiting for them to run and then searching through the results • Error-prone manual process − “Stare and compare” • Cannot perform thorough testing − Time/Cost pressure leads to “try it here and there” approach − Testing ends when the deadline is reached, done or not • Usual problems associated with writing custom code − No audit trail − No reuse − No methodology 5
  • 6. Current Approach: Like a Photo Hunt 6
  • 7. Current Approach: Stare and Compare Data Set #1 Data Set #2 7
  • 8. What is the Data Validation Option? DVO is a independent (black box) testing solution that provides automation, repeatability and auditability to virtually any data testing or reconciliation process. 8
  • 9. Some DVO use cases Data Being Transformed • ETL Reconciliation • Data Masking • ETL Testing • Application Migration Data is Identical • ETL version upgrade • ETL Migration • Database migration • Application Retirement 9
  • 10. Two Value Propositions for DVO Ensure the integrity of data as it moves through the IT environment. Development & Test Production Reconciliation Provide automation for unit and Protect the integrity of data that regression testing is loaded into of integration logic. production systems. Ensure that data produced by DI Erroneous data due to failed code meets requirements and loads, faulty logic or operational expectations issues is caught in a proactive automated manner and can be addressed as needed 10
  • 11. Two Value Propositions for DVO Ensure the integrity of data as it moves through the IT environment. 11
  • 12. How DVO works with PowerCenter Data Validation Option Repository Database Reports DVO Clients & Warehouse Views V_Summary Id: name name: string V_Tests Define Price: integer DateId: name in: date V_Results Datename: string out: date Tests Salary: float Price: integer Date Id: name in: date Date name: string out: date Price: integer Salary: float Date in: date Date out: date Salary: float Execute Tests Results Data Accessed Repository and Integration Services Repository Enterprise PowerCenter Data 12
  • 13. Key Features of DVO • Broad data connectivity • DBMS (Oracle, SQL Server, DB2, Sybase, Teradata, Netezza) • Mainframe (DB2 z/OS, DB2 AS/400, IMS, Adabas, MF Flat files, VSAM) • SalesForce.com , SAP transparent tables, SAS, ODBC and Flat files • Numerous built-in tests • COUNT, COUNT_DISTINCT, COUNT_ROWS, MIN, MAX, AVG, SUM • SET AinB, SET BinA, SET AeqB • VALUE, OUTER VALUE, Expressions • Model ETL constructs • LOOKUPs, Arbitrary SQL Relationships • Other • Run from GUI or CLI (DVOCmd) • Built-in reporting 13
  • 14. Comparing DVO with Manual Testing 14
  • 15. Technology Company Development and Test Reduced data testing time by 80% with Data Validation Option KEY BUSINESS IMPERATIVE AND IT INITIATIVE SAAS provider of Sales Compensation and analytics • Data absolutely has to be correct as it affects peoples’ paychecks • Very high visibility of the data with users • Trust in the data is key THE CHALLENGE INFORMATICA ADVANTAGE RESULTS/BENEFITS • New release every ~1 month • With DVO they are able to • Have created a test suite of • 1 Full week of data testing test 100,000s rows of data in over 1000 Tests by QA team per release regression tests • Testers can manage the • Developers wrote SQL for • Developers no longer required testing environment testing the data to write SQL • Can test large volumes of data • Testers would execute the • Testers are now empowered • Testing time reduced from 1 SQL, track errors and work and independent of week to 1 day (80% less) with Developers to resolve developers • Spend “free time” on higher • And who was testing the SQL level tasks to make sure it was correct? Informatica Confidential – Under NDA 15
  • 16. Financial Services Company Production Reconciliation Ensures DW is Complete and Accurate with Data Validation Option KEY BUSINESS IMPERATIVE AND IT INITIATIVE Good data is essential to good business decisions. Their calculations of portfolio risk and value must be correct. • Spends “hundreds of millions” purchasing troubled debt in the USA • The data and risk calculations on those assets must be correct. • Bad data could cost them “millions” and put them out of business. THE CHALLENGE INFORMATICA ADVANTAGE RESULTS/BENEFITS • Business users were • With DVO they are able to • DVO found where data was complaining about missing perform detailed missing data in the systems. reconciliations across source • Found thousands of missing • Data errors can lead to very and target systems. records due to bad coding, & costly bad business decisions. • With DVO, they have a complete improperly rerun failed jobs • They were doing manual testing audit trail. • Reloaded all missing data in two via developer-written mappings weeks and PL/SQL • They are looking to implement • Other products available today ongoing incremental validation could not meet their for all new data loaded into requirements tables Informatica Confidential – Under NDA 16
  • 17. Mid-size Technology Company Production Reconciliation Reconciling MDM data using Data Validation Option KEY BUSINESS IMPERATIVE AND IT INITIATIVE Customer and contact hub is pivotal to efficient business operations Millions of records processed across various systems Ensure BAs, line managers and customers had access to accurate and complete data based on their needs THE CHALLENGE INFORMATICA ADVANTAGE RESULTS/BENEFITS • No easy way to reconcile • DVO reconciled data • Identified errors due to data in systems to identify across systems (e.g. faulty DI logic, and error bad data or identify extent SalesForce and Hub) handling process of errors and found: • Ensured incorrect • Incorrectly augmented • 1000s of missing records records no longer being data in systems between systems use in marketing • Gold record data didn’t • Incorrectly augmented campaigns always match across D&B data • Bad customer data no systems • Improperly coded golden longer reaching customer • Faulty records propagated records in portal downstream. Informatica Confidential – Under NDA 17
  • 18. 18