3. What is Data Validation?
Identifying errors in data sets that have
been Moved or Transformed to ensure
they are Complete and Accurate and
meet Expectations or Requirements.
3
4. How most companies test today
• Many companies perform testing manually by writing SQL
scripts, using Excel, or hand coding testing logic into their
integration processes
• Reconciliation is done manually (if done at all)
− basic SQL scripts (row counts, aggregates) or manually written mappings/logic
• Customers estimate data testing SHOULD take 25-30% of all
hours spent on Data Integration
− Most customers admit they do not do enough data validation, resulting in
poorer data quality and higher project risk
• PowerCenter upgrades can take up to weeks or months to
complete due to manual testing effort
− It takes one day to upgrade the ETL software
4
5. Problems with Manual Testing
• Takes a long time and is expensive
− Time is spent writing queries and waiting for them to run and
then searching through the results
• Error-prone manual process
− “Stare and compare”
• Cannot perform thorough testing
− Time/Cost pressure leads to “try it here and there” approach
− Testing ends when the deadline is reached, done or not
• Usual problems associated with writing custom code
− No audit trail
− No reuse
− No methodology
5
8. What is the Data Validation Option?
DVO is a independent (black box)
testing solution that provides
automation, repeatability and
auditability to virtually any data
testing or reconciliation process.
8
9. Some DVO use cases
Data Being Transformed
• ETL Reconciliation
• Data Masking
• ETL Testing
• Application Migration
Data is Identical
• ETL version upgrade
• ETL Migration
• Database migration
• Application Retirement
9
10. Two Value Propositions for DVO
Ensure the integrity of data as it moves through
the IT environment.
Development & Test Production Reconciliation
Provide automation for unit and Protect the integrity of data that
regression testing is loaded into
of integration logic. production systems.
Ensure that data produced by DI Erroneous data due to failed
code meets requirements and loads, faulty logic or operational
expectations issues is caught in a proactive
automated manner and can be
addressed as needed
10
11. Two Value Propositions for DVO
Ensure the integrity of data as it moves through
the IT environment.
11
12. How DVO works with PowerCenter
Data Validation Option
Repository Database Reports
DVO Clients & Warehouse Views
V_Summary
Id: name
name: string
V_Tests
Define
Price: integer
DateId: name
in: date
V_Results
Datename: string
out: date
Tests Salary: float
Price: integer
Date Id: name
in: date
Date name: string
out: date
Price: integer
Salary: float
Date in: date
Date out: date
Salary: float
Execute Tests Results
Data
Accessed
Repository and Integration Services
Repository
Enterprise
PowerCenter Data
12
13. Key Features of DVO
• Broad data connectivity
• DBMS (Oracle, SQL Server, DB2, Sybase, Teradata, Netezza)
• Mainframe (DB2 z/OS, DB2 AS/400, IMS, Adabas, MF Flat files, VSAM)
• SalesForce.com , SAP transparent tables, SAS, ODBC and Flat files
• Numerous built-in tests
• COUNT, COUNT_DISTINCT, COUNT_ROWS, MIN, MAX, AVG, SUM
• SET AinB, SET BinA, SET AeqB
• VALUE, OUTER VALUE, Expressions
• Model ETL constructs
• LOOKUPs, Arbitrary SQL Relationships
• Other
• Run from GUI or CLI (DVOCmd)
• Built-in reporting
13
15. Technology Company
Development
and Test Reduced data testing time by 80%
with Data Validation Option
KEY BUSINESS IMPERATIVE AND IT INITIATIVE
SAAS provider of Sales Compensation and analytics
• Data absolutely has to be correct as it affects peoples’
paychecks
• Very high visibility of the data with users
• Trust in the data is key
THE CHALLENGE INFORMATICA ADVANTAGE RESULTS/BENEFITS
• New release every ~1 month • With DVO they are able to • Have created a test suite of
• 1 Full week of data testing test 100,000s rows of data in over 1000 Tests
by QA team per release regression tests • Testers can manage the
• Developers wrote SQL for • Developers no longer required testing environment
testing the data to write SQL • Can test large volumes of data
• Testers would execute the • Testers are now empowered • Testing time reduced from 1
SQL, track errors and work and independent of week to 1 day (80% less)
with Developers to resolve developers
• Spend “free time” on higher
• And who was testing the SQL level tasks
to make sure it was correct?
Informatica Confidential – Under NDA 15
16. Financial Services Company
Production
Reconciliation Ensures DW is Complete and Accurate
with Data Validation Option
KEY BUSINESS IMPERATIVE AND IT INITIATIVE
Good data is essential to good business decisions. Their calculations of portfolio risk
and value must be correct.
• Spends “hundreds of millions” purchasing troubled debt in the USA
• The data and risk calculations on those assets must be correct.
• Bad data could cost them “millions” and put them out of business.
THE CHALLENGE INFORMATICA ADVANTAGE RESULTS/BENEFITS
• Business users were • With DVO they are able to • DVO found where data was
complaining about missing perform detailed missing
data in the systems. reconciliations across source • Found thousands of missing
• Data errors can lead to very and target systems. records due to bad coding, &
costly bad business decisions. • With DVO, they have a complete improperly rerun failed jobs
• They were doing manual testing audit trail. • Reloaded all missing data in two
via developer-written mappings weeks
and PL/SQL • They are looking to implement
• Other products available today ongoing incremental validation
could not meet their for all new data loaded into
requirements tables
Informatica Confidential – Under NDA 16
17. Mid-size Technology Company
Production
Reconciliation Reconciling MDM data using
Data Validation Option
KEY BUSINESS IMPERATIVE AND IT INITIATIVE
Customer and contact hub is pivotal to efficient business operations
Millions of records processed across various systems
Ensure BAs, line managers and customers had access to accurate and
complete data based on their needs
THE CHALLENGE INFORMATICA ADVANTAGE RESULTS/BENEFITS
• No easy way to reconcile • DVO reconciled data • Identified errors due to
data in systems to identify across systems (e.g. faulty DI logic, and error
bad data or identify extent SalesForce and Hub) handling process
of errors and found: • Ensured incorrect
• Incorrectly augmented • 1000s of missing records records no longer being
data in systems between systems use in marketing
• Gold record data didn’t • Incorrectly augmented campaigns
always match across D&B data • Bad customer data no
systems • Improperly coded golden longer reaching customer
• Faulty records propagated records in portal
downstream.
Informatica Confidential – Under NDA 17