Más contenido relacionado La actualidad más candente (20) Similar a 525 ibm optim (20) 525 ibm optim1. Best Practices in Database Archiving
and Information Lifecycle
An InformationWeek Webcast
Sponsored by
3. Today’s Presenter
Carl Olofson,
Research Vice President,
Application Development and Deployment,
IDC
4. Best Practices in Database Archiving
and Information Lifecycle Management
How ILM Saves Money, Reduces Risk
Carl Olofson
Research Vice President
IDC
May 2011
Copyright IDC. Reproduction is forbidden unless authorized. All rights reserved.
5. Agenda
The Problem
Unchecked database growth
Hidden costs of large databases
Security and privacy in test data
Information Lifecycle Management
What is ILM?
Database archiving
– Requirements of database archiving
– Benefits of database archiving
Test data masking
– How data is masked
– Benefits of data masking
Conclusions / Recommendations
© IDC Visit us at IDC.com and follow us on Twitter: @IDC Source:/Notes: May-11 5
6. Unchecked Database Growth
As a database grows…
It requires larger indices
It consumes more storage
It requires specialized administration to tune
It needs more processor power to execute queries and updates
The hidden costs include
More storage administration
More downtime for reorgs
Larger batch windows for backups
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 6
7. Polling Question #1
How rapidly is your main production database growing?
Under 10% per year
10% per year
25% per year
Over 25% per year
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 7
8. Elements of Test Data Management
Selecting the data
Must be referentially complete subset of the database
Must reflect realistic patterns of data to ensure valid testing
Protecting sensitive data
Sensitive data must be masked to prevent unauthorized viewing
Masked data needs to make sense to the test system.
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 8
9. Security and Privacy in Test Data
Normal Security Is Often Suspended for Test Data
Confidential data could be compromised
Privacy requirements could be breached
Corporate policies may be violated
Contractual requirements and government regulations could lead
to legal culpability
In-House Masking Is Inadequate
Simplistic results create unrealistic test data
Code must be changed as the database changes, an
unreasonable burden on in-house IT
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 9
10. Polling Question #2
In what role is the person in your organization primarily
responsible for refreshing test data?
DBA
Development Manager
Project Leader
Developer
Other
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 10
12. The Basic Elements of ILM
Definition
Policies governing data creation, management, removal
Security
Encryption and access control at a granular level
Protection
Blocking access to sensitive data, including test data
Data test data protection done through data masking
Archiving
Removal of inactive data from the live database
Storage in a compressed, read-only datastore
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 12
13. The Data Masking Challenge
Application testing requirements
Using simple XXXX or #### or “Ipsum lorem” usually not
adequate for robust application testing.
Data must be representative of actual data in value range and
distribution.
Masked data must “make sense”; zip codes correlate to city and
state, for instance.
Secured information, such as personal identification, should not
be inferable from the masked data.
The fake data should be consistent.
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 13
14. Archiving: Types of Data
Reference
Created in response to a stand-alone event.
Randomly retrieved without requiring context
Active until a special event
Examples: Customer, Patient, Product
Transactional
Created at the start of a business process.
Retrieved in the context of a transaction
Deactivated at the end of a business process.
Examples: Sales order, treatment, shipment
Streaming
Created at reception of a streamed item
Inactive immediately (cannot be updated)
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 14
15. Classes of Data
Active
Data that is still being updated.
Includes reference and transactional data.
Inactive
Data no longer active, but retained for query and reporting
Includes historical and streamed data
Historical data is inactive transaction data
– Sales order completed, revenue recognized
– Inventory item sold and picked up
– Patient treatment completed, patient discharged
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 15
16. Buildup of Inactive Data
Hypothetical Example
Suppose we have a sales order table
We start the year with 10,000 orders per month
Orders grow at 1% per month
Each order takes 60 days to complete (recognize revenue)
Orders in process are active data
Completed orders are inactive data
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 16
17. Buildup of Inactive Transaction Data
Sales Order Table
160,000
140,000
120,000
100,000
Rows
80,000
Inactive
60,000
Active
40,000
20,000
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Inactive %
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 17
18. Inactive Data Clogs the Database
DBMS Overhead
Big Indexes
Storage demand
Slower queries
Slower transaction processing
Operational Overhead
DBA tuning
Disruption for unload/reload and reorg
Longer backup batch windows
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 18
19. Polling Question # 3
Think of transaction data that you retain. What is your required
retention period?
3-5 years
6-10 years
Over 10 years
We don’t have a retention policy
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 19
20. Approaches to “Aging Out” Data
Partitioning
Move data to low frequency partition on 2nd or 3rd tier storage
Use local partition indexes to avoid growth of global table indexes
Perform maintenance operations by physical partition
Problem: this approach impacts the whole table, and creates a complex
operational and management challenge that extends across the
database
Archiving
Select referentially complete subsets of inactive data
Move the inactive data to an archiving system outside the database
Ensure that the archive can support SQL and that queries can, if
necessary, be executed in an integrated manner with those of the live
database.
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 20
21. Benefits of Archiving
Database benefits
Faster queries
Less index maintenance overhead
Smaller dataspaces and simpler schema than partitioning option
Requires less CPU; license/maintenance savings for DB and
applications
Operational benefits
Less schema maintenance than partitioning option
Stable backup windows
Much less data reorganization
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 21
22. Application Retirement
Inactive Applications
Applications become inactive when they are no longer used, and their
functions have been migrated elsewhere.
They commonly still have data that must be retained for corporate
policy or legal reasons.
For this reason, enterprises keep them running, maintaining them, and
paying fees for them even though they are inactive.
Retiring Inactive Applications
All their data is inactive, so it may be archived altogether
The archiving system must retain the ability to report on the data.
The savings in servers, storage, software, and operations costs can be
very significant.
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 22
23. Critical Requirements of Database
Archiving
DBMS Support
Must support ongoing versions of major RDBMS including DB2,
Informix, Oracle, Sybase ASE, Microsoft SQL Server, and
MySQL
Must record schema and schema changes to support data
retrieval even after data definitions have changed.
Must support SQL and ODBC/JDBC used by applications.
Technical requirements
Random data retrieval
Compressed, optimized based on read-only access
Reasonable performance on 2nd and 3rd tier storage
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 23
24. Data Governance
Purpose is to ensure that data is trustworthy
Data is well defined, and maintenance is rational
Original source is known
Sequence and agents of update are known (provenance)
Data is valid and consistent
No unauthorized access has happened
No sensitive data is visible to unauthorized personnel
Data is retained as required without compromising performance
Business Benefits
Database development and management addresses known business needs
Trade secrets are not exposed and confidences are not compromised
Ensures contractual and legal requirements compliance
Reduces risk of actual or opportunity cost due to data-driven application error
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 24
25. ILM and Data Governance
Data Governance
Uniform Data Definition & Policy Management
Information Lifecycle
Trust Management
Management
Validity and
Managed Data Data
Consistency Security & Monitoring
Selection & Retention Protection
Assurance
Data Access
Database Database Test Data Data Provenance Access Log
Quality and Control and
Subsetting Archiving Masking Cleansing Tracking Analysis
Profiling Encryption
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 25
26. ILM and Database Development and
Management Tools
Database Development and Management Tools (DDMT)
Software used by DBAs and data managers to manage the size,
performance, and reliability/recoverability of databases
Includes DBA tools, database replication software, development
and optimization software, and database archiving / ILM.
The ILM Segment of the DDMT Market
Just 4.6% in 2009, but the fastest growing segment; the only
segment to show positive growth in that tough economic year.
Projected to show the greatest growth of all DDMT segments to
2014, with a forecast CAGR of 9.9% from $90 m to $188 m.
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 26
27. What’s IBM’s Share in the ILM Market
Segment
Revenue ($M)
Solix
CA 4% Other
4% 12%
HP
11%
IBM
Informatica 56%
13%
Source: IDC, 2010
Total = $89.9 Million
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 27
28. Conclusions and Recommendations
Conclusions
Data governance is critical because the utility and trustworthiness of
enterprise data cannot be left to chance.
ILM addresses the key dimension of data size management in relation to
data retention, and test data management.
These functions cannot be developed and maintained in-house.
Recommendations
Users should carefully review their data access and retention policies and
ensure that those policies are carried out.
In most cases, the best approach to ensuring data retention without
bloating the databases is to employ database archiving.
Test data management is not trivial; find professionally developed data
masking and subsetting tools.
IBM’s InfoSphere Optim leads the market in addressing these key ILM
requirements.
© IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11 28
29. © IDC Visit us at IDC.com and follow us on Twitter: @IDC May-11
30. Information Management
IBM InfoSphere Optim solutions
Managing data throughout its lifecycle in heterogeneous environments
Discover
Retire Speed understanding and project time through
relationship discovery within and across data sources
Understand sensitive data to protect and secure it
Training Test Data Management
Easily refresh & maintain right sized non-production
Discover environments, while reducing storage costs
Understand Improve application quality and deploy new
Classify Subset functionality more quickly
Data Masking
Development Protect sensitive information from misuse & fraud
Production Mask Prevent data breaches and associated fines
Data Growth Management
Reduce hardware, storage & maintenance costs
Test Streamline application upgrades and improve
application performance
Application Retirement
Safely retire legacy & redundant applications while
retaining the data
Archive Ensure application-independent access to archive
data © 2011 IBM Corporation
31. Information Management
Managing Data Across its Lifecycle
Discover where Develop database Enhance performance
data resides structures & code
Classify & define data Create & refresh test Rationalize application
Manage data growth portfolio
and relationships data
Enable compliance
Report & retrieve with retention & e-
Define policies Validate test results archived data discovery
Discover & Develop & Optimize, Archive Consolidate &
Define Test & Access Retire
Information Governance
Quality Management – Lifecycle – Security & Privacy
© 2011 IBM Corporation
32. Information Management
You can’t govern what you don’t understand Discover &
Define
Define business objects for archival and
? test data applications
? ?
? – Automation of manual activities
? accelerates time to value
? ? ? ? Discover data transformation rules and
? ?
? heterogeneous relationships
? ? ? – Business insight into data
?
? ? relationships reduces project risk
?
? Identify hidden sensitive data for privacy
? ? – Provides consistency across
?
information agenda projects
? ? ?
? ? ?
?
Distributed Data Landscape
© 2011 IBM Corporation
33. Information Management
Employ effective test data management practices Develop &
Test
Production or Production Clone
Subset & Mask
2TB
25 GB
• Create targeted, right-sized test
environments 25 GB Development
• Substitute sensitive data with Unit Test
fictionalized yet contextually accurate
data
• Easily refresh, reset and maintain test
50 GB
environments 100 GB
• Compare data to pinpoint and resolve Training
application defects faster Integration
Test
• Accelerate release schedules
© 2011 IBM Corporation
34. Information Management
Archive historical data for data growth management Optimize, Archive
& Access
Production Data
Archives
Archive
Reference Data
Restored Data
Historical Retrieve Historical Data
Can selectively
Current restore archived
data records
Universal Access to Application Data
Mashup Center Application Data Find ODBC / JDBC XML Report Writer
Data Archiving is an intelligent process for moving inactive or infrequently
accessed data that still has value, while providing the ability to search and
retrieve the data
© 2011 IBM Corporation
35. Information Management
Retire redundant and legacy applications Consolidate &
Retire
Preserve application data in its business context
– Capture all related data, including transaction details, reference data & associated
metadata
– Capture any related reference data may reside in other application databases
Retire out-of-date packaged applications as well as legacy custom applications
– Leverage out-of-box support of packaged applications to quickly identify & extract the
complete business object
Shut down legacy system without a replacement
– Provide fast and easy retrieval of data for research and reporting, as well as audits
and e-discovery requests
Infrastructure before Retirement Archived Data after Consolidation
` `
User Application Database Data User
` `
User Application Database Data User Archive Engine Archive Data
` `
User Application Database Data User
© 2011 IBM Corporation
36. Information Management
Resources to Learn More!
InfoSphere Optim Solutions page:
http://www-01.ibm.com/software/data/optim/
–IDC Worldwide Database Development and
Management Tools 2009 Vendor and Segment Analysis
Report
–Whitepaper: Control Application Data Growth Before It
Controls Your Business
–Whitepaper: Enterprise Strategies to Improve
Application Testing
–InfoSphere Optim Solutions for Custom and Packaged
Applications Solution Brief
© 2011 IBM Corporation