New Innovations in Information Management for Big Data - Smarter Business 2013

New Innovations in Information Integration &
Governance (IIG) for Big Data
David Corrigan
Director of Product Marketing, InfoSphere

Data Confidence Is Essential
If you want to find new insights
from big data . . .
and ACT on those insights . . .
you need confidence in the data
used for insight
Information Integration & Governance (IIG)
• Make decisions with greater certainty
• Analyze rapidly while providing necessary controls
• Increase the value of data

Building Big Data Confidence is Essential
3x 77%
80%
Organizations with IIG
outperform their
competitors
Outperform
Competitors
Organizations rated
their decision making
as good or excellent
Transform the
Front Office
Experience
Establish
Trusted Information
Organizations establish
high or very high level of
trust in data

IIG Evolves for the Era of Big Data
Automated Integration
Business users need rapid data
provisioning among the zones
Visual Context
Categorize, index, and find
big data to optimize its usage
Agile Governance
Ensure appropriate actions based on
the value of the data
1
2
3
How do I get access to
new big data sources?
How do I digest all of
this new information?
How do manage all of
this new data?

Six Innovations that Build Big Data Confidence
Visual
Context
Agile
Governance
Automated
Integration
Big Match
Integration of master records from
big data with probabilistic matching
powered by Hadoop
Big Data
Catalogue
Categorize metadata on all big
data sources
MDM for Big Data
Rapid mastering of new big data
sources and extension of 360°
view with unstructured big data
* Statement of Direction
Data Click
Self-service data
provisioning for big
data repositories
Information Governance
Dashboard
Visual context to give immediate
status on governance policies
Big Data Privacy &
Security
Monitor and mask sensitive big
data in Hadoop, NoSQL, &
relational systems *
*
*

InfoSphere Data Click
Self-service Data Provisioning
Innovation
• Two-click data provisioning designed for business
users
• Integration of more big data sources – JSON,
NoSQL, Hadoop, JDBC
Value
• Rapid provisioning of ad-hoc repositories
• Faster time to insight
• Self service to eliminate the IT bottleneck
Usage
• Enables rapid analysis of big data sources
Data
Provisioning in
1 5000th
the time
Of traditional
approach
Automated
Integration
2
Click Data
Access
* Source: IBM performance lab testing, showing JDBC inserts at
5.8% to 74% faster

Big Match
Find & Integrate Master Data in Big Data Sources
MDM BigInsight
s
Big Match Engine
Match
Millions
Of Records
Automated
Integration
How It Works
• Probabilistic matching on big data platform
(BigInsights-Hadoop)
• Matching at a higher volume
• Matching of a wider variety of data sets
Client Value
• Find master data within big data sources
• Get an answer faster – enable real-time matching
at big data volumes
Usage
• Provides more context by detecting master
entities faster
* Source: IBM InfoSphere performance
team test results

Big Data Catalogue
Find Big Data More Easily
Visual
Context
Big Data Catalogue
170x
Improvement in
metadata import
performance*
Innovation
• Stores metadata on every available big data
source
• Provides structure to the Hadoop landing zone so
data may be easily found and leveraged
• Classifies data (origin, lineage, source, value….)
Value
• Find data more easily within a growing Hadoop
landing zone and a complex zone architecture
• Rapidly leverage new big data sources
Usage
• Enables optimal usage of big data * Source: IBM internal performance
results, where three test runs with
the latest version averaged 11.46
seconds vs 1,964 seconds with the
previous release

Information Governance Dashboard
Visualize and Control Governance Visual
Context
Innovation
• Measurements for policies and KPIs
• Rapid creation of tailored dashboards
Value
• Immediate insight into governance policy status
• Interception of issues when they start, right at the
source
Usage
• Raises data confidence with visual governance
status
1000s
Of data points
and policies
visualized

Big Data Privacy and Security
Protect a Wider Variety of Sources
InfoSphere
Optim
InfoSphere
Guardium
Agile
Governance
80%
Faster Activity
Monitoring*
Innovation
• Data activity monitoring of more NoSQL, Hadoop,
and Relational Systems
• Masking of sensitive data used in Hadoop
Value
• Protection is a pre-requisite for the fundamental
assumption of big data – sharing data for new
insight
• Automation enables protection without inhibiting
speed
Usage
• Ensures sensitive data is protected and secure
RDBMS
Hadoop
NoSQL
Data Warehouses
Application Data
and Files
•Source: IBM internal benchmarks
of InfoSphere Guardium V9 p50

MDM for Big Data
The Complete 360° View of Important Data
MDM Data Explorer
Agile
Governance
21K
Customer-centric
transactions per
second*
How It Works
• Extend the master view with federated,
unstructured big data
• Hybrid styles enable linking source records or
consolidating based on confidence
Client Value
• Visualize every related data item in the 360° view
• Rapidly onboard new big data sources
• MDM adapts to the source
Usage
• Provides a complete understanding of the
customer or master entity
* Source: InfoSphere MDM with DB2 pureScale
achieves: 21,000 customer-centric transactions a
second, 2X transaction rate of Oracle MDM on
Exalogic/Exadata using ½ the number of cores
Note to U.S. Government Users Restricted Rights --
Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
Approved Claim in US/Canada only.
Results valid as of 10/21/2012.

InfoSphere Delivers Data Confidence
For Big Data Use Cases
Big Data Exploration Enhanced 360o View
of the Customer
Operations Analysis Data Warehouse Augmentation
Security/Intelligence
Extension
 Understand confidence
 Determine risk  Establish master record
 Extent to all sources
 Automatic data protection
 Mask sensitive information
 High volume data integration
 Automatic data protection
 High volume data integration
 Agile big data archiving and retrieval

Use Case Spotlight: Enhanced 360° View
MDM and Big Data
Deliver the Complete 360° View
Capabilities Required to
Be Successful
1. Combine structured MDM and
unstructured big data
2. Rapidly onboard uncertain data
sources in a registry style to
separate low and high confidence
data
3. Find and match master data
entities within big data sources
MDM
Integration &
Quality
Data Explorer
Single Version
of the Truth
Extended View
of Master Data

Use Case Spotlight: Data Warehouse Augmentation
Improve your data warehouse
by improving data confidence
Integration &
Quality
Data Warehouse
High performance
data loads
MD
M
Archiving Security &
Privacy
Test Data
Management Automated
Archiving Automated
Data Protection
Self-service
Testing
More Accurate
Analysis
Capabilities Required to
Be Successful
1. Self-service integration for ad-hoc
requests
2. Understand context of all available
big data with a single metadata
repository and business glossary
3. Mask any variety of sensitive data
before ingestion
4. Automatically protect big data with
activity monitoring
5. Store and analyze archive files on
Hadoop

A Busy Year of Innovation within the Labs
Literally dozens of
innovations that raise
confidence in big data
Two highlights:
1. BLU Acceleration
2. PureData System
for Hadoop

BLU Acceleration
BLU Acceleration
IBM Research & Development Lab Innovations
Dynamic In-Memory
In-memory columnar processing with
dynamic movement of unused data to storage
Actionable Compression
Industry’s first data compression that preserves order
so that the data can be used without decompressing
Parallel Vector Processing
Multi-core and SIMD parallelism
(Single Instruction Multiple Data)
Data Skipping
Skips unnecessary processing of irrelevant data
Super Fast, Super Easy—
Create, Load and Go!
No indexes, No aggregates,
No tuning, No SQL changes,
No schema changes

Iqbal Goralwalla, Head
of
DB2 Managed Services,
Triton
Lennart Henäng,
IT Architect
Yong Zhou, Sr. Manager of Data
Warehouse & Business
Intelligence Dept.
BLU Acceleration: Customers are Seeing Great Results
“100x speed up
with literally no
tuning!”
“Converting this row-
organized uncompressed
table to a column-
organized table in DB2
10.5 delivered a massive
15.4x savings!”
“With BLU Acceleration, we’ve
been able to reduce the time
spent on pre-aggregation by
30x—from one hour to two
minutes! BLU Acceleration is
truly amazing.”

PureData System for Hadoop
Bringing big data to the enterprise
 Simplify the delivery of unstructured data to the enterprise
 Integrate Hadoop with the data warehouse
 Leverage Hadoop for data archive
 Provide best in class security
 Provide data exploration across structured and unstructured
data
 Accelerate insight with machine data
 Accelerate insight with social data

Confidence Is Essential for Actionable Insight
• Make decisions with greater certainty
• Analyze rapidly while providing necessary
controls
• Increase the value of data
Visual Context
Agile Governance
Automated Integration

Understanding Your Data is the Basis for Confidence

New Innovations in Information Management for Big Data - Smarter Business 2013

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a New Innovations in Information Management for Big Data - Smarter Business 2013

Similar a New Innovations in Information Management for Big Data - Smarter Business 2013 (20)

Más de IBM Sverige

Más de IBM Sverige (20)

Último

Último (20)

New Innovations in Information Management for Big Data - Smarter Business 2013

Notas del editor