SlideShare una empresa de Scribd logo
1 de 20
2
Marc Hebert
Chief Operating Officer
Estuate
510-468-7132
marc@estuate.com
Jeff Tuck
IBM Optim Product
Manager
720-395-6032
jtuck@us.ibm.com
Peter Costigan
IBM Optim Product
Manager
408-656-9161
costigan@us.ibm.com
3
 The Hadoop Data Management Challenge
 Using Hadoop for Test Data Management
 Making Archive Data Available to Big Data
Analytics
 Summary
4
The Hadoop Data Management Challenge
 Many IT shops are using Hadoop for serious analytic applications,
and accumulating large amounts of data
 Hadoop is fast becoming a standard platform for analytics and other
uses
 And so, managing data in and with Hadoop will pose management
challenges in the next few years
 Hadoop can be a very useful tool for managing test data in an
overall test data management context
 Hadoop will also likely become an data archive repository of choice
for many IT shops that have application archiving and retirement
initiatives
 And, subsetting, masking and archiving Hadoop data itself needs
attention
 IBM’s Optim platform leverages Hadoop for these purposes
5
Channels
Big data
Enterprise
Applications
Discover Mask AnalyzeRefreshSubset
Discover
 Identify sensitive data
 Understand data
relationships
 Identify proper test
data
Subset
 Automatically extract
test data required for
each test case
 Test only on the
required values to
keep environments
efficient
Mask
 Enforce data integrity
while masking
 Support context &
application aware
masking
Refresh &
Analyze
 On demand access &
refresh of test data
 Automate test result
comparisons to reduce
errors
Divisional
customer
Network
Billing
analysis
6
 Protect sensitive
information from misuse
and fraud
 Prevent data breaches
and associated fines
 Achieve better information
governance
 Protect confidential data
used in big data platforms
 Mask data on screen in
applications and reports
 Implement proven data
masking techniques
 Support compliance with
privacy regulations
Requirements
Benefits
De-identify sensitive information
with realistic but fictional data
Personally identifiable
information is masked with
realistic but fictional data
JASON MICHAELS ROBERT SMITH
Mask data on demand
InfoSphere
Optim
InfoSphere
BigInsights
BigSheets
Dev
QA
Integration
Scalable and Cost Effective
• Leverage the scalability of Hadoop to grow your
test environment to support all test data needs
• Benefit from high performance at a low TCO
Trusted
• Mask sensitive data on the way in and out
• Process test data as a complete business object
and maintain relationship integrity
Open
•Leverages the Hadoop open and flexible
architecture
•Built-in connectors to move data in and out
•Query and analyze test data using Big SQL & Hive
• Visualize test data with BigSheets, Watson
Explorer or other Hadoop analytic tools
InfoSphere
Optim
A fully functional test data management offering for
Hadoop
– Supports Hive as a native source and target data
store
– Optim Primary Keys and Relationships
– Access Definitions, Table Maps and Column Maps
– Extract, Convert and Load
» New Load service designed specifically for
Hadoop
» Insert not supported due to Hive limitations
– Browse, Edit, Compare and Create
A Test Data Management solution that utilizes Hadoop as a test data
management warehouse to store, analyze, search and retrieve structured test
data to satisfy all testing use case data requirements throughout all phases of
the application development lifecycle
Business Objectives
1. Store and catalog data into Hadoop and utilize it as a test data warehouse
2. Explore cataloged data residing in a Hadoop test data warehouse
3. Search for cataloged data residing in a Hadoop test data warehouse
4. Retrieve cataloged data from a Hadoop test data warehouse
5. Store cataloged data in a Hadoop test data warehouse into other non
Hadoop relational data stores
other
New Capabilities Benefits
• Hadoop as a test data landing zone  highly scalable at low cost
 more data can be under control of testers
 higher agility to adjust & create test data sets
 open to access & manipulate data
• BigSheets (BigInsights tooling*)  Visualization and manipulation of data
• BigSQL (BigInsights tooling*)  Rich SQL + standard access + security … and more
Test Data
Test Data
Test Data
PROD
DB
Production
DB
Optim Technology
Expanded control for developers & testers to retrieve & create test data
(Subset) & Mask
Load/Refresh
open + managed
Optim v11.3
Load
Hadoop
BigInsights
*restricted license
11
Making Archive Data Available to Big Data
Analytics
 Optim enables clients to take historical data from production
systems and place that data into an archive file
 That archive file can have retention applied in support of corporate
and regulatory compliance requirements
 Data from archive files can be easily made available in Hadoop in
support of analytic initiatives, while the Optim archive file remains
the system of record
12
Benefits of Using Optim Archive as the
System of Record
 Ensures data is kept in the original business context without
modification
 Provides the ability to restore information to production systems
(selectively if required) including recreation of schemas and
database objects as needed
 Enables retention and disposition of information based on legal and
corporate policy (delete after 7 years)
 Enables eDiscovery and Legal Hold workflows
 Imposes access controls of archived data for data consumers
13
Considerations when Leveraging Hadoop as a
System of Record
 How will data access mechanisms be secured?
 Are audit records required for data access?
 Will the data set stored in Hadoop be immutable and guaranteed not to be
altered?
 How will retention policies be executed and explained when required?
 In the event of audit requests, are there processes in place to leverage
Hadoop as a source?
Apply Retention / Hold Policies
Capture complete business object
Preserve Data Integrity
Preserve Schema Metadata
Load data into Hadoop as needed
Archive Cold Data
Query-able analytical data
store, using Hadoop
Archive & Purge Data
InfoSphere Optim
Compressed, immutable,
auditable & restorable
archives
Database
IMS
VSAM
More…
Archive files Hadoop
Complete
object
Hadoop Cluster
Application
Optim – Hadoop Integration:
•Optim Hadoop Loader to convert Optim archive file into CSV & load into HDFS
•Data accessible via query engines like BigSQL, Hive, or Impala (depending on
Hadoop distribution
Database
Data Archive
files
Optim Data Archive Optim
Hadoop Loader
CSV Files
Hive warehouse
Hcatalog
metadata
BigSQL ..
query processing
16
Manage Your Hadoop Data with Help from
Your Friends: Estuate and IBM Optim
 Estuate is the world’s leading specialist in IBM Optim
 Deep product development relationship with IBM
 Over 250 Optim implementations
 IBM Optim is the world’s leading data archiving platform with 76%
market share, per Gartner
 Optim customers are starting to leverage Hadoop platforms in their
Information Lifecycle Governance initiatives
 Estuate brings deep Optim and Hadoop experience and best
practices to help you advance your Hadoop strategy and projects
 And, you can do this with either an on-premise or hosted service
17
Integrated Data Management
Production DatabasesTest & Development Databases
IBM Optim- A Platform for Enterprise Data
Management
IBM InfoSphere Discovery
Value: Improve
Application
Performance, Reduce
Infrastructure Costs
& Improve
Compliance
• Retain only needed
data, move the rest to
archives
• Deploy Tiered
Storage Strategies
• Retain Data
According to Value
• Simplify Infrastructure
Data Growth
Solution
Value: Reduce
Infrastructure Cost &
Compliance
• Decommission
redundant or obsolete
applications
• Retain Access to
historical data
Decommissioning
Solution
Value: Risk
Management
•Protect PII Data
• Apply Single Data
Masking Solution
• Leverage realistic
data
Data Privacy
Solution
Value: Speed
Application Delivery
•Create realistic and
manageable test
environments
•Speed application
delivery
•Improve Test Coverage
•Improve Quality
Test Data
Management
Solution
• Discover undocumented business rules
used to transform data from existing
systems
• Prototype and test new
transformations for the target system
Value: Automates analysis of data and
data relationships for complete
understanding of data assets
•Define the business objects for archiving and sub-
setting
•Identify all instances of private data so that they can
be fully protected
18
Enterprise Architecture
An integrated, modular environment to manage enterprise application data and InfoSphere Optimize data-
driven applications from requirements to retirement across heterogeneous environments.
Data GrowthData PrivacyTest Data Management Application Retirement
Discovery
Data Growth, Application Retirement, Test Data Management, Data Privacy
19
Summary: The Benefits of Hadoop for
Enterprise Data Management
 Hadoop is state-of-the-art as a Test Data Management
platform
 Makes testing more agile and nimble
 Leverages the power of Optim Data Privacy as well for
PCI compliance
 Hadoop will gradually become a powerful repository for
corporate archived data
 Supporting ILG initiatives and compliance
20
Marc Hebert
Chief Operating Officer
Estuate
510-468-7132
marc@estuate.com
Q & A

Más contenido relacionado

La actualidad más candente

Harnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie MacHarnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie Mac
DataWorks Summit
 
Trends in Data Lifecycle Management and Information Governance
Trends in Data Lifecycle Management and Information GovernanceTrends in Data Lifecycle Management and Information Governance
Trends in Data Lifecycle Management and Information Governance
Bryant Bell
 

La actualidad más candente (20)

Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Storing Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance ChallengesStoring Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance Challenges
 
Harnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie MacHarnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie Mac
 
GE’s Industrial Data Lake Platform
GE’s Industrial Data Lake PlatformGE’s Industrial Data Lake Platform
GE’s Industrial Data Lake Platform
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Perspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data GovernancePerspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data Governance
 
Keynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsKeynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive Analytics
 
How to create a successful data archiving strategy for your Salesforce Org.
How to create a successful data archiving strategy for your Salesforce Org.How to create a successful data archiving strategy for your Salesforce Org.
How to create a successful data archiving strategy for your Salesforce Org.
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data Lake
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationReinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital Transformation
 
Trends in Data Lifecycle Management and Information Governance
Trends in Data Lifecycle Management and Information GovernanceTrends in Data Lifecycle Management and Information Governance
Trends in Data Lifecycle Management and Information Governance
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
 
Multi-Cloud Integration with Data Virtualization (ASEAN)
Multi-Cloud Integration with Data Virtualization (ASEAN)Multi-Cloud Integration with Data Virtualization (ASEAN)
Multi-Cloud Integration with Data Virtualization (ASEAN)
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 

Similar a Using hadoop for enterprise data management

Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
DataWorks Summit
 

Similar a Using hadoop for enterprise data management (20)

Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Managing Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big DataManaging Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big Data
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 
Cisco Big Data Warehouse Expansion Solution data sheet
Cisco Big Data Warehouse Expansion Solution data sheetCisco Big Data Warehouse Expansion Solution data sheet
Cisco Big Data Warehouse Expansion Solution data sheet
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 

Más de Estuate, Inc.

Más de Estuate, Inc. (18)

Webinar: Amplify your information governance with a robust data lineage
Webinar: Amplify your information governance with a robust data lineageWebinar: Amplify your information governance with a robust data lineage
Webinar: Amplify your information governance with a robust data lineage
 
How to neutralize vulnerabilities in a mixed cloud- on premise environment
How to neutralize vulnerabilities in a mixed cloud- on premise environmentHow to neutralize vulnerabilities in a mixed cloud- on premise environment
How to neutralize vulnerabilities in a mixed cloud- on premise environment
 
Webinar on IBM Optim Test Data Management and Data Privacy
Webinar on IBM Optim Test Data Management and Data PrivacyWebinar on IBM Optim Test Data Management and Data Privacy
Webinar on IBM Optim Test Data Management and Data Privacy
 
Webinar on Managing your Oracle EBS for Productivity
Webinar on Managing your Oracle EBS for ProductivityWebinar on Managing your Oracle EBS for Productivity
Webinar on Managing your Oracle EBS for Productivity
 
Stop Hunger Now Partners with Estuate to package 10,000 Meals
Stop Hunger Now Partners with Estuate to package 10,000 MealsStop Hunger Now Partners with Estuate to package 10,000 Meals
Stop Hunger Now Partners with Estuate to package 10,000 Meals
 
Upcoming Webinar on Retiring Applications - The Low Hanging Fruit in IT Savings
Upcoming Webinar on Retiring Applications - The Low Hanging Fruit in IT SavingsUpcoming Webinar on Retiring Applications - The Low Hanging Fruit in IT Savings
Upcoming Webinar on Retiring Applications - The Low Hanging Fruit in IT Savings
 
Best Practices in Implementing Oracle Database Security Products
Best Practices in Implementing Oracle Database Security ProductsBest Practices in Implementing Oracle Database Security Products
Best Practices in Implementing Oracle Database Security Products
 
Estuate helps major wireless telecom save tens of millions
Estuate helps major wireless telecom save tens of millionsEstuate helps major wireless telecom save tens of millions
Estuate helps major wireless telecom save tens of millions
 
Estuate EDM Checklist
Estuate EDM ChecklistEstuate EDM Checklist
Estuate EDM Checklist
 
Ready To Make The Move To Oracle Release 12
Ready To Make The Move To Oracle Release 12Ready To Make The Move To Oracle Release 12
Ready To Make The Move To Oracle Release 12
 
MySQL Migration
MySQL MigrationMySQL Migration
MySQL Migration
 
Estuate - Control Application Data Growth
Estuate - Control Application Data GrowthEstuate - Control Application Data Growth
Estuate - Control Application Data Growth
 
Integration of Oracle EAM with Oracle AutoVue
Integration of Oracle EAM with Oracle AutoVueIntegration of Oracle EAM with Oracle AutoVue
Integration of Oracle EAM with Oracle AutoVue
 
Coeur D Alene Case Study
Coeur D Alene Case StudyCoeur D Alene Case Study
Coeur D Alene Case Study
 
Estuate Service Offerings
Estuate Service OfferingsEstuate Service Offerings
Estuate Service Offerings
 
Five Characteristics of a Good Oracle Exadata Implementation Partner
Five Characteristics of a Good Oracle Exadata Implementation PartnerFive Characteristics of a Good Oracle Exadata Implementation Partner
Five Characteristics of a Good Oracle Exadata Implementation Partner
 
Estuate IBM Optim Service Offerings
Estuate IBM Optim Service OfferingsEstuate IBM Optim Service Offerings
Estuate IBM Optim Service Offerings
 
Business Intelligence Solutions
Business Intelligence SolutionsBusiness Intelligence Solutions
Business Intelligence Solutions
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Using hadoop for enterprise data management

  • 1.
  • 2. 2 Marc Hebert Chief Operating Officer Estuate 510-468-7132 marc@estuate.com Jeff Tuck IBM Optim Product Manager 720-395-6032 jtuck@us.ibm.com Peter Costigan IBM Optim Product Manager 408-656-9161 costigan@us.ibm.com
  • 3. 3  The Hadoop Data Management Challenge  Using Hadoop for Test Data Management  Making Archive Data Available to Big Data Analytics  Summary
  • 4. 4 The Hadoop Data Management Challenge  Many IT shops are using Hadoop for serious analytic applications, and accumulating large amounts of data  Hadoop is fast becoming a standard platform for analytics and other uses  And so, managing data in and with Hadoop will pose management challenges in the next few years  Hadoop can be a very useful tool for managing test data in an overall test data management context  Hadoop will also likely become an data archive repository of choice for many IT shops that have application archiving and retirement initiatives  And, subsetting, masking and archiving Hadoop data itself needs attention  IBM’s Optim platform leverages Hadoop for these purposes
  • 5. 5 Channels Big data Enterprise Applications Discover Mask AnalyzeRefreshSubset Discover  Identify sensitive data  Understand data relationships  Identify proper test data Subset  Automatically extract test data required for each test case  Test only on the required values to keep environments efficient Mask  Enforce data integrity while masking  Support context & application aware masking Refresh & Analyze  On demand access & refresh of test data  Automate test result comparisons to reduce errors Divisional customer Network Billing analysis
  • 6. 6  Protect sensitive information from misuse and fraud  Prevent data breaches and associated fines  Achieve better information governance  Protect confidential data used in big data platforms  Mask data on screen in applications and reports  Implement proven data masking techniques  Support compliance with privacy regulations Requirements Benefits De-identify sensitive information with realistic but fictional data Personally identifiable information is masked with realistic but fictional data JASON MICHAELS ROBERT SMITH Mask data on demand
  • 7. InfoSphere Optim InfoSphere BigInsights BigSheets Dev QA Integration Scalable and Cost Effective • Leverage the scalability of Hadoop to grow your test environment to support all test data needs • Benefit from high performance at a low TCO Trusted • Mask sensitive data on the way in and out • Process test data as a complete business object and maintain relationship integrity Open •Leverages the Hadoop open and flexible architecture •Built-in connectors to move data in and out •Query and analyze test data using Big SQL & Hive • Visualize test data with BigSheets, Watson Explorer or other Hadoop analytic tools InfoSphere Optim
  • 8. A fully functional test data management offering for Hadoop – Supports Hive as a native source and target data store – Optim Primary Keys and Relationships – Access Definitions, Table Maps and Column Maps – Extract, Convert and Load » New Load service designed specifically for Hadoop » Insert not supported due to Hive limitations – Browse, Edit, Compare and Create
  • 9. A Test Data Management solution that utilizes Hadoop as a test data management warehouse to store, analyze, search and retrieve structured test data to satisfy all testing use case data requirements throughout all phases of the application development lifecycle Business Objectives 1. Store and catalog data into Hadoop and utilize it as a test data warehouse 2. Explore cataloged data residing in a Hadoop test data warehouse 3. Search for cataloged data residing in a Hadoop test data warehouse 4. Retrieve cataloged data from a Hadoop test data warehouse 5. Store cataloged data in a Hadoop test data warehouse into other non Hadoop relational data stores
  • 10. other New Capabilities Benefits • Hadoop as a test data landing zone  highly scalable at low cost  more data can be under control of testers  higher agility to adjust & create test data sets  open to access & manipulate data • BigSheets (BigInsights tooling*)  Visualization and manipulation of data • BigSQL (BigInsights tooling*)  Rich SQL + standard access + security … and more Test Data Test Data Test Data PROD DB Production DB Optim Technology Expanded control for developers & testers to retrieve & create test data (Subset) & Mask Load/Refresh open + managed Optim v11.3 Load Hadoop BigInsights *restricted license
  • 11. 11 Making Archive Data Available to Big Data Analytics  Optim enables clients to take historical data from production systems and place that data into an archive file  That archive file can have retention applied in support of corporate and regulatory compliance requirements  Data from archive files can be easily made available in Hadoop in support of analytic initiatives, while the Optim archive file remains the system of record
  • 12. 12 Benefits of Using Optim Archive as the System of Record  Ensures data is kept in the original business context without modification  Provides the ability to restore information to production systems (selectively if required) including recreation of schemas and database objects as needed  Enables retention and disposition of information based on legal and corporate policy (delete after 7 years)  Enables eDiscovery and Legal Hold workflows  Imposes access controls of archived data for data consumers
  • 13. 13 Considerations when Leveraging Hadoop as a System of Record  How will data access mechanisms be secured?  Are audit records required for data access?  Will the data set stored in Hadoop be immutable and guaranteed not to be altered?  How will retention policies be executed and explained when required?  In the event of audit requests, are there processes in place to leverage Hadoop as a source?
  • 14. Apply Retention / Hold Policies Capture complete business object Preserve Data Integrity Preserve Schema Metadata Load data into Hadoop as needed Archive Cold Data Query-able analytical data store, using Hadoop Archive & Purge Data InfoSphere Optim Compressed, immutable, auditable & restorable archives Database IMS VSAM More… Archive files Hadoop
  • 15. Complete object Hadoop Cluster Application Optim – Hadoop Integration: •Optim Hadoop Loader to convert Optim archive file into CSV & load into HDFS •Data accessible via query engines like BigSQL, Hive, or Impala (depending on Hadoop distribution Database Data Archive files Optim Data Archive Optim Hadoop Loader CSV Files Hive warehouse Hcatalog metadata BigSQL .. query processing
  • 16. 16 Manage Your Hadoop Data with Help from Your Friends: Estuate and IBM Optim  Estuate is the world’s leading specialist in IBM Optim  Deep product development relationship with IBM  Over 250 Optim implementations  IBM Optim is the world’s leading data archiving platform with 76% market share, per Gartner  Optim customers are starting to leverage Hadoop platforms in their Information Lifecycle Governance initiatives  Estuate brings deep Optim and Hadoop experience and best practices to help you advance your Hadoop strategy and projects  And, you can do this with either an on-premise or hosted service
  • 17. 17 Integrated Data Management Production DatabasesTest & Development Databases IBM Optim- A Platform for Enterprise Data Management IBM InfoSphere Discovery Value: Improve Application Performance, Reduce Infrastructure Costs & Improve Compliance • Retain only needed data, move the rest to archives • Deploy Tiered Storage Strategies • Retain Data According to Value • Simplify Infrastructure Data Growth Solution Value: Reduce Infrastructure Cost & Compliance • Decommission redundant or obsolete applications • Retain Access to historical data Decommissioning Solution Value: Risk Management •Protect PII Data • Apply Single Data Masking Solution • Leverage realistic data Data Privacy Solution Value: Speed Application Delivery •Create realistic and manageable test environments •Speed application delivery •Improve Test Coverage •Improve Quality Test Data Management Solution • Discover undocumented business rules used to transform data from existing systems • Prototype and test new transformations for the target system Value: Automates analysis of data and data relationships for complete understanding of data assets •Define the business objects for archiving and sub- setting •Identify all instances of private data so that they can be fully protected
  • 18. 18 Enterprise Architecture An integrated, modular environment to manage enterprise application data and InfoSphere Optimize data- driven applications from requirements to retirement across heterogeneous environments. Data GrowthData PrivacyTest Data Management Application Retirement Discovery Data Growth, Application Retirement, Test Data Management, Data Privacy
  • 19. 19 Summary: The Benefits of Hadoop for Enterprise Data Management  Hadoop is state-of-the-art as a Test Data Management platform  Makes testing more agile and nimble  Leverages the power of Optim Data Privacy as well for PCI compliance  Hadoop will gradually become a powerful repository for corporate archived data  Supporting ILG initiatives and compliance
  • 20. 20 Marc Hebert Chief Operating Officer Estuate 510-468-7132 marc@estuate.com Q & A