Oracle Data Integration Platform is a cornerstone for big data solutions that provides five core capabilities: business continuity, data movement, data transformation, data governance, and streaming data handling. It includes eight core products that can operate in the cloud or on-premise, and is considered the most innovative in areas like real-time/streaming integration and extract-load-transform capabilities with big data technologies. The platform offers a comprehensive architecture covering key areas like data ingestion, preparation, streaming integration, parallel connectivity, and governance.
4. Five Core
Capabilities
1. Business ContinuityDATA ALWAYS AVAILABLE
2. Data Movement
DATA ANYWHERE IT’S NEEDED
3. Data TransformationDATA ACCESSIBLE IN ANY FORMAT
4. Data GovernanceDATA THAT CAN BE TRUSTED
5. Streaming Data
DATA IN MOTION OR AT REST
6. Most
Innovative
Technology
#1
#1
Realtime / Streaming
Data Integration Tool
Pushdown / E-LT
Data Integration Tool
1st to certify replication with
Streaming Big Data
1st to certify E-LT tool with
Apache Spark/Python
1st to power Data Preparation
w/ML + NLP + Graph Data
1st to offer Self-Service &
Hybrid Cloud solution
9. Oracle GoldenGate
Realtime Performance
Extensible & Flexible
Proven & Reliable
Oracle GoldenGate provides low-impact capture, routing,
transformation, and delivery of database transactions
across homogeneous and heterogeneous environments in
real-time with no distance limitations.
Most
Databases
Data
Events
Transaction Streams
Cloud
DBs
Big
Data
Supports Databases, Big Data and NoSQL:
* The most popular enterprise integration tool in history
11. Self-Service
Better Recommendations
Built-in Data Graph Zero software
to install, easy
to use browser
based interface
Better
automation and
less grunt work
for humans
Graph database
of real-world
facts used for
enrichment
Oracle Data Preparation
ReportingApps
Files
ETL
Oracle Data Preparation is a self-service tool that makes
it simple to transform, prepare, enrich and standardize
business data – it can help IT accelerate solutions for the
Business by giving control of data formatting directly to
data analysts.
16. Business Friendly
Extreme Performance
Spatial Awareness
Oracle Stream Analytics
DB
Web / Devices
Data
Event
Data & Transaction Streams
Downstream
(eg; Hadoop)
Data
Event
Oracle Stream Analytics is a powerful analytic toolkit
designed to work directly on data in motion – simple data
correlations, complex event processing, geo-fencing, and
advanced dashboards run on millions of events per
second.
Innovative dual
model for
Apache Spark or
Coherence grid
Simple to use
spatial and geo-
fencing features
an industry first
Includes Oracle
GoldenGate for
streaming
transactions
19. Business Glossary
End-to-End Lineage
100+ Supported Systems
Oracle Metadata Management
Oracle Metadata Management provides an integrated
toolkit that combines business glossary, workflow,
metadata harvesting and rich data steward collaboration
features.
Supports Databases, Big Data, ETL Tools, BI Tools etc:
BI Report Lineage
Taxonomy Lineage
Data Model Lineage
30. T : @markrittman
THOUGHTS ON ORACLE DATA INTEGRATION
FOR BIG DATA - A PRACTITIONER'S VIEW
Mark Rittman, Oracle ACE Director
ORACLE OPENWORLD 2016, SAN FRANCISCO
31. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Oracle ACE Director, blogger + ODTUG member
•Regular columnist for Oracle Magazine
•Past ODTUG Executive Board Member
•Author of two books on Oracle BI
•Co-founder of Rittman Mead, now independent analyst
•15+ Years in Oracle BI, DW, ETL + now Big Data
•Based in Brighton, UK
About the Presenter
31
32. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Every engagement and customer discussion has Big Data central to the project
• Hadoop extending traditional DWs through scalability, flexibility, cost, RDBMS -compatibility
• Hadoop as the ETL engine driven by ODI Big Data KMs
• New datatypes and methods of analysis enabled by Hadoop schema-on-read
• Project innovation driven by machine learning, streaming, ability to store + keep *all* data
Big Data Technology Core to Modern BI Platforms
32
•And what is driving the interest in these projects…?
Data Reservoir
Oracle
Data Visualization
Oracle Big Data Platform
Oracle Big Data Discovery
Safe & secure Discovery and Development
environment
Data sets and
samples
Models and
programs
Marketing /
Sales Applications
Models
Machine
Learning
Segments
Operational Data
Transactions
Customer
Master ata
Event, Social +
Unstructured Data
Voice + Chat
Transcripts
Data Factory
OGG for
Big Data 12c
Oracle
Stream
Analytics
Data streams
ODI12c
Raw
Customer Data
Data stored in
the original
format (usually
files) such as
SS7, ASN.1,
JSON etc.
Mapped
Customer Data
Data sets
produced by
mapping and
transforming
raw data
Oracle
Data
Preparation
Oracle Big Data Appliance
Starter Rack + Expansion
• Cloudera CDH + Oracle software
• 18 High-spec Hadoop Nodes with
InfiniBand switches for internal Hadoop
traffic, optimised for network throughput
• 1 Cisco Management Switch
• Single place for support for H/W + S/W
Oracle Big Data Appliance
Starter Rack + Expansion
• Cloudera CDH + Oracle software
• 18 High-spec Hadoop Nodes with
InfiniBand switches for internal Hadoop
traffic, optimised for network throughput
• 1 Cisco Management Switch
• Single place for support for H/W + S/W
Enriched
Customer Profile
Modeling
Scoring
Infiniband
33. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman33
34. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
35. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Data from all the sources will need to be integrated to create the single customer view
• Hadoop technologies (Flume, Kafka, Storm) can be used to ingest events, log data
• Files can be loaded “as is” into the HDFS filesystem
• Oracle/DB data can be bulk-loaded using Sqoop
• GoldenGate for trickle-feeding transactional data
•But nature of new data sources brings challenges
• May be semi-structured or unknown schema
• Joining schema-free datasets
•Need to consider quality and resolve incorrect,
incomplete, and inconsistent customer data
The Big data Secret? IT’s all about Data Integration
35
Single Customer
View
Enriched
CustomerProfile
M/L
“How”
Chat
“What”“Who”
“Why”
Data from
structured +
schema-on-read
sources needs
integrating
Requires
preparation +
obfuscation
Streaming
sources with
JSON payloads
Apply Schema to
Raw and Semi-
Structured Data
Heterogeneous
Enterprise +
Web sources
36. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Finding raw data is easy; then the real work needs to be done - can be > 90% of project
•Four main tasks to land, prepare and integrate raw data to turn it into a customer profile
1. Ingest it in real-time into the data reservoir
2. Apply Schema to Raw and Semi-Structured Data
3. Remove Sensitive Data from Any Input Files
4. Transform and map into your Customer 360-degree profile
Landing, Preparing and Securing Raw Data is *Hard*
36
37. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Data enrichment tool aimed at domain experts, not programmers
•Uses machine-learning to automate
data classification + profiling steps
•Automatically highlight sensitive data,
and offer to redact or obfuscate
•Dramatically reduce the time required
to onboard new data sources
•Hosted in Oracle Cloud for zero-install
• File upload and download from browser
• Automate for production data loads
Oracle Big Data Preparation Cloud Service
37
Raw Data
Data stored in the original
format (usually files) such
as SS7, ASN.1, JSON etc.
Mapped Data
Data sets produced by
mapping and transforming
raw data
Voice + Chat
Transcripts
38. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
Step 2: Apply Schema to Raw and Semi-Structured Data
38
NLP
Embedded Information in
unstructured text
Entities
Embedded Information
No reliable patterns
Invalid and missing data
Sensitive data
Invalid
emails
Stream from
APIs, HTTP:
Moderate
Batch Load
from files, DB:
Easy
Load raw text from
blog entries,
reviews
39. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Automatically profile and analyse datasets
•Use Machine Learning to spot and obfuscate sensitive data automatically
Step 3: Remove Sensitive Data from Any Input Files
39
40. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Oracle Data Integration offers a wider set of products for managing Customer 360 data
•Oracle GoldenGate
•Oracle Enterprise Data Quality
•Oracle Data Integrator
•Oracle Enterprise Metadata
Management
•All Hadoop enabled
•Works across Big Data,
Relational and Cloud
Step 4 : Transform, Join + Map into Polyglot Data Stores
40
41. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Projects build yesterday using MapReduce today need to be rewritten in Spark
• Then Spark needs to be upgraded to Spark Streaming + Kafka for real time…
• Upgrades, and replatforming onto the latest tech, can bring “fragile” initiatives to a halt
•ODI’s pluggable KM approach to big data integration makes tech upgrades simple
•Focus time + investment on new big data initiatives
• Not rewriting fragile hand-coded scripts
Future-Proof Big Data Integration Platform
41
41
Discovery & Development Labs
Safe & secure Discovery and Development environment
Data
Warehouse
Curated data :
Historical view and
business aligned
access
ODI
Desktop
Client
Big Data Management Platform
Data sets and
samples
Models and programs
Big Data Platform - All Running Natively Under Hadoop
YARN (Cluster Resource Management)
Hive + Pig
(Log processing,
UDFs etc)
HDFS (Cluster Filesystem holding raw data)
Kafka + Spark
Streaming
Apache
Beam?
Enriched
Customer Profile
Modeling
Scoring
Spark
(In-Memory
Data Processing)
42. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Big data projects have had it “easy” so far in terms of data quality + data provenance
• Innovation labs + schema-on-read prioritise discovery + insight, not accuracy and audit trails
• But a data reservoir without any cleansing, management + data quality = data cesspool
• … and nobody knows where all the contamination came from, or who made it worse
And the Next Challenge : Data Quality + Provenance
42
43. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•From my perspective, this is what makes Oracle Data Integration my Hadoop DI platform of choice
•Most vendors can load and transform data in Hadoop (not as well, but basic capability)
•Only Oracle have the tools to tackle
tomorrow’s Big Data challenge:
Data Quality + Data Governance
• Oracle Enterprise Data Quality
• Oracle Enteprise Metadata Mgmt
•Seamlessly integrated with ODI
•Brings enterprise “smarts” to
less mature Big Data projects
Data Governance : Why I Recommend Oracle DI
Tools
43