Organizations with data warehouses are increasingly looking at big data technologies to extend the capacity of their platform, offload simple ETL and data processing tasks and add new capabilities to store and process unstructured data along with their existing relational datasets. In this presentation we’ll look at what’s involved in adding Hadoop and other big data technologies to your data warehouse platform, see how tools such as Oracle Data Integrator and Oracle Business Intelligence can be used to process and analyze new “big data” data sources, and look at what’s involved in creating a single query and metadata layer over both sources of data.
Video of presentation from 2-11-2015 at https://www.youtube.com/watch?v=AG3yIKgcn_8
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
Adding Hadoop and Big Data to Extend Data Warehouse Capabilities
1. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Adding Hadoop and Big Data to
Extend Data Warehouse Capabilities
Mark Rittman, CTO, Rittman Mead
November 2015
2. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
About the Speaker
•Mark Rittman, Co-Founder of Rittman Mead
•Oracle ACE Director, specialising in Oracle BI&DW
•14 Years Experience with Oracle Technology
•Regular columnist for Oracle Magazine
•Author of two Oracle Press Oracle BI books
•Oracle Business Intelligence Developers Guide
•Oracle Exalytics Revealed
•Writer for Rittman Mead Blog :
http://www.rittmanmead.com/blog
•Email : mark.rittman@rittmanmead.com
•Twitter : @markrittman
3. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
4. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
About Rittman Mead
•Oracle BI and DW Gold partner
•Winner of five UKOUG Partner of the Year awards in 2013 - including BI
•World leading specialist partner for technical excellence,
solutions delivery and innovation in Oracle BI
•Approximately 80 consultants worldwide
•All expert in Oracle BI and DW
•Offices in US (Atlanta), Europe, Australia and India
•Skills in broad range of supporting Oracle tools:
‣OBIEE, OBIA, ODIEE
‣Big Data, Hadoop, NoSQL & Big Data Discovery
‣Essbase, Oracle OLAP
‣GoldenGate
‣Endeca
5. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
15+ Years in Oracle BI and Data Warehousing
•Started back in 1997 on a bank Oracle DW project
•Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL
and shell scripts
•Went on to use Oracle Developer/2000 and Designer/2000
•Our initial users queried the DW using SQL*Plus
•And later on, we rolled-out Discoverer/2000 to everyone else
•And life was fun…
6. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
The Oracle-Centric DW Architecture
•Over time, this data warehouse architecture developed
•Added Oracle Warehouse Builder to
automate and model the DW build
•Oracle 9i Application Server (yay!)
to deliver reports and web portals
•Data Mining and OLAP in the database
•Oracle 9i for in-database ETL (and RAC)
•Data was typically loaded from
Oracle RBDMS and EBS
•It was turtles Oracle all the way down…
7. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Traditional Three-Layer Relational Data Warehouses
Staging Foundation /
ODS
Performance /
Dimensional
ETL ETL
BI Tool (OBIEE)
with metadata
layer
OLAP / In-Memory
Tool with data load
into own database
Direct
Read
Data
Load
Traditional structured
data sources
Data
Load
Data
Load
Data
Load
Traditional Relational Data Warehouse
•Three-layer architecture - staging, foundation and access/performance
•All three layers stored in a relational database (Oracle)
•ETL used to move data from layer-to-layer
8. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Combined Inmon and Kimble Data Modelling Approaches
• Kimball-style star schemas are good for providing information access
• Inmon-style 3NF / CIF designs are good for storing and preserving data
• Ideal approach is to combine the two
‣ 3NF “atomic storage layer” to preserve a
process-neutral view of data
‣ Star Schema “access and performance layer”
to provide optimized access to data
• All data flows through the three layers, typically using ETL tools
‣ Into staging
‣ Through ODS/Foundation
‣ Into Star Schema / Dimensional model
9. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
ETL Largely Batch-Based and with Single Route through DW
•All data lands in Staging layer, processed and then thrown-away
‣Too expensive to store all incoming granular data online - selected data stored as summary
•Processed through Foundation layer and then Access and Performance
•ETL development an expensive, manual task
‣Built for the long-term
‣Industrialised data loading routines
‣Team development
‣Not all that scalable
10. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
But Users Are Demanding More … and Budgets get Smaller
•Users want more data stored in the DW, but budgets for IT are getting smaller
•It’s no longer feasible to store all archive data in expensive Teradata-style DWs
•New forms of analysis require tools and storage beyond SQL and database tables
•And it all still needs to be governed, managed and secured!
11. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
And Now … Everyone’s Talking About Big Data
•Explosion in volume and variety of data that’s now available
•New, cheap and open-source technology
makes it economic to store + process it
•New businesses are being built, and
existing ones disrupted, by rise of big data
•IT departments are using Hadoop + other
technologies to complement, and replace,
proprietary databases, storage etc
•So … can Hadoop actually help us here,
as well as being useful for data scientists?
12. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Introducing Hadoop
•A new approach to data processing and data storage
•Rather than a small number of large, powerful servers, it spreads processing over
large numbers of small, cheap, redundant servers
•Spreads the data you’re processing over
lots of distributed nodes
•Has scheduling/workload process that sends
parts of a job to each of the nodes
- a bit like Oracle Parallel Execution
•And does the processing where the data sits
- a bit like Exadata storage servers
•Shared-nothing architecture
•Low-cost and highly horizontal scalable
Job Tracker
Task Tracker Task Tracker Task Tracker Task Tracker
Data Node Data Node Task Tracker Task Tracker
13. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Hadoop Tenets : Simplified Distributed Processing
•Hadoop, through MapReduce, breaks processing down into simple stages
‣Map : select the columns and values you’re interested in, pass through as key/value pairs
‣Reduce : aggregate the results
•Most ETL jobs can be broken down into filtering,
projecting and aggregating
•Hadoop then automatically runs job on cluster
‣Share-nothing small chunks of work
‣Run the job on the node where the data is
‣Handle faults etc
‣Gather the results back in
Mapper
Filter, Project
Mapper
Filter, Project
Mapper
Filter, Project
Reducer
Aggregate
Reducer
Aggregate
Output
One HDFS file per reducer,
in a directory
14. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Flexible, Low-Cost Resilient Storage : Hadoop Distributed FS
•The filesystem behind Hadoop, used to store data for Hadoop analysis
‣Unix-like, uses commands such as ls, mkdir, chown, chmod
•Fault-tolerant, with rapid fault detection and recovery
•High-throughput, with streaming data access and large block sizes
•Designed for data-locality, placing data closed to where it is processed
•Accessed from the command-line, via internet (hdfs://), GUI tools etc
[oracle@bigdatalite mapreduce]$ hadoop fs -mkdir /user/oracle/my_stuff
[oracle@bigdatalite mapreduce]$ hadoop fs -ls /user/oracle
Found 5 items
drwx------ - oracle hadoop 0 2013-04-27 16:48 /user/oracle/.staging
drwxrwxrwx - oracle hadoop 0 2012-09-18 17:02 /user/oracle/moviedemo
drwxrwxrwx - oracle hadoop 0 2012-10-17 15:58 /user/oracle/moviework
drwxrwxrwx - oracle hadoop 0 2013-05-03 17:49 /user/oracle/my_stuff
drwxrwxrwx - oracle hadoop 0 2012-08-10 16:08 /user/oracle/stage
15. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Apache Hive : SQL Access + Table Metadata Over HDFS
•Apache Hive provides a SQL layer over Hadoop, once we understand the structure (schema)
of the data we’re working with
•Exposes HDFS and other Hadoop data as tables and columns
•Provides a simple SQL dialect for queries called HiveQL
•SQL queries are turned into MapReduce jobs under-the-covers
•JDBC and ODBC drivers provide
access to BI and ETL tools
•Hive metastore (data dictionary)
leveraged by many other Hadoop tools
‣Apache Pig
‣Cloudera Impala
‣etc
SELECT a, sum(b)
FROM myTable
WHERE a<100
GROUP BY a
Map
Task
Map
Task
Map
Task
Reduce
Task
Reduce
Task
Result
16. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
NoSQL Databases
•Family of database types that reject tabular storage,
SQL access and ACID compliance
•Focus is on scalability, speed and schema-on-read
‣Oracle NoSQL Database - speed and scalability
‣Apache HBase - speed, scalability and Hadoop
‣MongoDB - native storage of JSON documents
•May or may not run on Hadoop, but associated with it
•Great choice for high-velocity data capture
•CRUD approach vs write-once/read many in HDFS
17. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Why is Hadoop and Big Data Technologies of Interest to Us?
•Gives us an ability to store more data, at more detail, for longer
•Provides a cost-effective way to analyse vast amounts of data
•Hadoop & NoSQL technologies can give us “schema-on-read” capabilities
•There’s vast amounts of innovation in this area we can harness
•And it’s very complementary to Oracle BI & DW
18. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Using Hadoop to Extend Data Warehouse Storage
•Use Hadoop to store older, less-important data in the warehouse (archive data)
•Offers benefits of traditional SAN-based file storage, but with ability to process data too
•Original IT-led use-case - clear TCO benefits vs expensive Teradata-style DW
MartsData Warehouse
Σ Σ
Business
Intelligence
• Online
• Scalable
• Flexible
• Cost
Effective
Hadoop
19. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Offloading Simple ETL and Data Processing to Hadoop
•Special use-case : offloading low-value, simple ETL work to a Hadoop cluster
‣Receiving, aggregating, filtering and pre-processing data for an RDBMS data warehouse
‣Potentially free-up high-value DW resources for analytic work
20. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Initially Required Separate Hadoop Dev Skills and Tools
•Original data processing model was MapReduce, a distributed processing algorithm
‣Based on functional programming idea, typically written in Java
•Other languages include Python, R, Scala etc
•Separate tools for data loading (Sqoop),
ingestion (Flume, Kafka), processing (Pig etc)
•Hard to find staff, and hard to integrate with rest
of enterprise (security, governance, data dictionary)
Mapper
Filter, Project
Mapper
Filter, Project
Mapper
Filter, Project
Reducer
Aggregate
Reducer
Aggregate
Output
One HDFS file per reducer,
in a directory
21. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Today : Data Integration Tools now Cover Hadoop & Big Data
•ODI provides an excellent framework for running Hadoop ETL jobs
‣ELT approach pushes transformations down to Hadoop - leveraging power of cluster
•Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation
‣Whilst still preserving RDBMS push-down
‣Extensible to cover Pig, Spark etc
•Process orchestration
•Data quality / error handling
•Metadata and model-driven
22. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data Connectors
•Oracle-licensed utilities to connect Hadoop to Oracle RBDMS
‣Bulk-extract data from Hadoop to Oracle, or expose HDFS / Hive data as external tables
‣Run R analysis and processing on Hadoop
‣Leverage Hadoop compute resources to offload ETL and other work from Oracle RBDMS
‣Enable Oracle SQL to access and load Hadoop data
23. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Initial Hadoop Use-Case : Low-Cost Staging Layer for DW
•Use Hadoop as a low-cost, horizontally-scalable DW archive
•Use Hadoop, Hive and MapReduce for low-cost ETL staging
•Extend the DW with new data sources, datatypes, detail-level data
24. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Today’s Oracle Information Management Ref Architecture
Actionable
Events
Event Engine Data
Reservoir
Data Factory Enterprise
Information Store
Reporting
Discovery Lab
Actionable
Information
Actionable
Insights
Input
Events
Execution
Innovation
Discovery
Output
Events
& Data
Structured
Enterprise
Data
Other
Data
25. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Next-Generation Layered Data Warehouse Architecture
Virtualization&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc
BI Assets
Information
Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data
Science
Data Engines &
Poly-structured
sources
Content
Docs Web & Social Media
SMS
Structured
Data
Sources
•Operational Data
•COTS Data
•Master & Ref. Data
•Streaming & BAM
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted from
business process changes
Past, current and future interpretation of
enterprise data. Structured to support agile
access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to
support specific discovery
objectives
Project based data stored to
facilitate rapid content /
presentation delivery
Data Sources
26. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Combining Oracle RDBMS with Hadoop + NoSQL
•High-value, high-density data goes into Oracle RDBMS
•Better support for fast queries, summaries, referential integrity etc
•Lower-value, lower-density data goes into Hadoop + NoSQL
‣Also provides flexible schema, more agile development
•Successful next-generation BI+DW projects combine both - neither on their own is sufficient
27. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Introducing the Concept of the “Data Reservoir”
•Immutable Data Reservoir provides assured
System of Reference
•Can include relational as well as non-relational
•If you touch it take it all is best practice
•Flow rate into reservoir driven by needs
‣A wide range of technology choices can deliver
close to near-R/T
‣Minimal integration, enrichment and DQ in primary flow
•Probable long term value in abstracting above specific Hadoop technologies because of
tooling maturity
•Simplify / rationalise and reduce costs of current MFT, ETL, data integration estate
•Enables agile development and discovery as the data is always availabl
28. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Reconciling Data Modeling, and Agile Development
•Only model logically what you have to, and implement just what you need to
29. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle’s Engineered System Data Reservoir Platform
30. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Data Layers - Cost, Quality and Concurrency Trade-off
31. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
But … We Now Have Separate Silos of Information
•Hadoop typically accessed via MapReduce and other Java frameworks
•NoSQL databases use APIs
•Data Warehouses use SQL
32. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
ETL Considerations : Using Hive vs. Regular Oracle SQL
•Not all join types are available in Hive - joins must be equality joins
•No sequences, no primary keys on tables
•Generally need to stage Oracle or other external data into Hive before joining to it
•Hive latency - not good for small microbatch-type work
‣But other alternatives exist - Spark, Impala etc
•Hive is INSERT / APPEND only - no updates, deletes etc
‣But HBase may be suitable for CRUD-type loading
•Don’t assume that HiveQL == Oracle SQL
‣Test assumptions before committing to platform
vs.
33. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data SQL : Extending SQL and Exadata to Hadoop
•Part of Oracle Big Data 4.0 (BDA-only)
‣Also requires Oracle Database 12c, Oracle Exadata Database Machine
•Extends Oracle Data Dictionary to cover Hive
•Extends Oracle SQL and SmartScan to Hadoop
•Extends Oracle Security Model over Hadoop
‣Fine-grained access control
‣Data redaction, data masking
Exadata
Storage Servers
Hadoop
Cluster
Exadata Database
Server
Oracle Big
Data SQL
SQL Queries
SmartScan SmartScan
34. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Adding A Data Reservoir Can Add Complexity to ETL Though
•Many organisations are implementing “data lakes” or “data reservoirs”
•Traditionally built standalone to the data warehouse, used to store event, social, web data
•Separate tools and processes for loading data reservoir compared to main DW
•Limited scope for combining
data reservoir and RBDMS
data as two separate stores
35. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Adding Database Transaction Ingestion to Data Reservoir
•Hadoop data loading processes also bring database transactions into data reservoir
•Common data loading process, staging and archive area
•Simpler data flow, reduced latency, more opportunities to exploit integration opportunities
•… But RDBMS transaction ingestion needs to leverage Flume, HDFS, Hive, Oozie etc
36. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Apache Flume : Core Technology for Real-Time Event Ingest
•Apache Flume is the standard way to transport log files from source through to target
•Initial use-case was webserver log files, but can transport any file from A>B
•Does not do data transformation, but can send to multiple targets / target types
•Mechanisms and checks to ensure successful transport of entries
•Has a concept of “agents”, “sinks” and “channels”
•Agents collect and forward log data
•Sinks store it in final destination
•Channels store log data en-route
•Simple configuration through INI files
•Handled outside of ODI12c
37. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle GoldenGate 12c for Big Data
•Oracle GoldenGate 12c for Big Data can replicate database transactions into Hadoop
•Load directly into Hive / HDFS, or feed transactions into Apache Flume as flume events
•Provides a way to replicate Oracle + other RBDMS data into the data reservoir
‣Works with Flume to provide a single streaming route into the the data reservoir
38. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Hadoop Platform Enables Innovation
•Graph analysis for understanding social and customer networks
‣Who’s most influential? Which supplier do I depend on the most?
What is the right product mix for millennials?
39. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Most Popular Business-Driven Use Case : Customer 360
40. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Business Analytics and Big Data Sources
•OBIEE 11g can also make use of big data sources
‣OBIEE 11.1.1.7+ supports Hive/Hadoop as a data source
‣Oracle R Enterprise can expose R models through DB functions, columns
‣Oracle Exalytics has InfiniBand connectivity to Oracle BDA
•Endeca Information Discovery can analyze unstructured and semi-structured sources
‣Increasingly tighter-integration between
OBIEE and Endeca
41. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
But … Still Not Easy to Get Analytic Value at Fast Pace
•Hadoop at try
6
Tool Complexity
• Early Hadoop tools only for experts
• Existing BI tools not designed for Hadoop
• Emerging solutions lack broad capabilities
80% effort typically
spent on evaluating
and preparing data
Data Uncertainty
• Not familiar and overwhelming
• Potential value not obvious
• Requires significant manipulation
Overly dependent on
scarce and highly
skilled resources
42. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data Discovery
•“The Visual Face of Hadoop” - cataloging, analysis and discovery for the data reservoir
•Runs on Cloudera CDH5.3+ (Hortonworks support coming soon)
•Combines Endeca Server + Studio technology with Hadoop-native (Spark) transformations
43. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Finally … What Keeps the CIO Awake at Night
•Security and Privacy Regulations
‣Are we analysing and sharing data in compliance with privacy regulations?
-And if we are - would customers think our use of it is ethical?
‣Do I know if the data in my Hadoop cluster is *really* secure?
44. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Hadoop Security “By Default”
•Connections between Hadoop services, and by users to services, aren’t authenticated
•Security is fragmented : HDFS, Hive, OS user accounts, Hue, CM all separate models
•No single place to define security policies, groups, access rights
•No single tool to audit access and permissions
•By default, everything is open and trusted - reflects roots in academia, R&D, marketing depts
45. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data SQL : Single RBDMS/Hadoop Security Model
•Potential to extend Oracle security model over Hadoop (Hive) data
‣Masking / Redaction
‣VPD
‣FGAC
46. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Summary
•Hadoop and Oracle Big Data Appliance are increasingly appearing in BI+DW Projects
•Gives DW projects the ability to store more data, cheaper and more flexibly than before
•Enables non-relational (SQL) query tools and analysis techniques (R, Spark etc)
•Extends BI’s capability to report and analyze across wider data sources
•Maturity varies widely in terms of tool maturity, and Oracle integration with Hadoop
•Trend is for Oracle to “productize” big data, creating tools + products around Oracle BDA
•We are probably at early stages - but very interesting times to be an Oracle BI+DW dev!
47. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Thank You for Attending!
•Thank you for attending this presentation, and more information can be found at http://
www.rittmanmead.com
•Contact us at info@rittmanmead.com or mark.rittman@rittmanmead.com
•Look out for our book, “Oracle Business Intelligence Developers Guide” out now!
•Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)
48. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Adding Hadoop and Big Data to
Extend Data Warehouse Capabilities
Mark Rittman, CTO, Rittman Mead
November 2015