SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Big Data Analytics – Scaling R to Enterprise Data
useR! 2013 – Albacete Spain #useR2013
Luis Campos

Mark Hornick

Big Data Solutions Lead, Oracle EMEA
@luigicampos
1

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Director, Oracle Database Advanced Analytics
@MarkHornick

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
2

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The girl with all the questions!
“The real innovation here is that we can
and get the

ask questions

answer back before we have forgotten

why

we asked the question in the first

place

.”

– Hilary Mason, Chief Scientist Bit.ly
+ member of NYC Mayor Bloomberg’s Technology and Innovation Advisory Council

3

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Nexus of Forces, Platform 3.0, Four Pillars
What Analysts/groups are saying?

4

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
New Information Challenges
Data Explosion
A Decade of Digital Universe Growth: Storage in Exabytes (Source:
IDC’s Digital Universe Study, June 2011)

Combinatory Explosion
Dimension Explosion
5

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Big Data Solution = Data + Analytics + Tools
Source: McKinsey study “Big data: What’s your plan?” (March 2013)
http://www.mckinsey.com/insights/business_technology/big_data_whats_your_plan

DATA
Any Data,
Any Source

6

ANALYTICS
Out-of-the box
Analytics,
New Models

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

TOOLS
Self Service
Data Discovery

On Premise,
On Cloud,
On Mobile
Oracle Complete Business Analytics Solution

BIG DATA
APPLIANCE
BIG DATA
CONNECTORS
NoSQL DB

7

Oracle Advanced
DATA MINING
Analytics
ORACLE R Ent.

SPATIAL,GRAPH
Real Time
Decisions (RTD)

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

OBIEE
ENDECA
Collective
Intellect (CI)

On Premise,
Oracle Cloud,
On Mobile
Apply Advanced Analytics on All Data
Visualise it with any BI Tool

Hadoop
Relational

HDFS

Data

BI Tools

8

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Oracle R Advantages
1. Keep the R tools
2. Keep the data where it sits (Relational or HDFS)
3. Keep the SQL Based BI Tools
4. Scale to LARGE data sets
R workspace console

Function push-down
– data transformation &

Oracle statistics engine

statistics

Development

9

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Production

OBIEE, Web
Services

Consumption
Oracle’s Advanced Analytics Strategic Offerings
Deliver enterprise-level advanced analytics in the Database
 Oracle in-Database Data Mining algorithms
– Access through Free GUI from SQL Developer or programmatically from SQL,
PL/SQL, R or Java
– Predictive model APIs for the Oracle R Enterprise
– Exadata architecture advantages for up to 5x improvement with Smart Scan
 Oracle R Distribution
– Free download, pre-installed on Oracle Big Data Appliance, bundled with Oracle
Linux
– Enhanced linear algebra performance: Intel’s Math Kernel Library, AMD’s Core Math
Library (Windows and Linux), SUN Solaris and IBM AIX
– Enterprise support for customers of Oracle Advanced Analytics, Big Data Appliance,
and Oracle Linux

10

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle’s Advanced Analytics Strategic Offerings
Deliver enterprise-level R in the Database or Hadoop
 Oracle R Enterprise
– Transparent access to database-resident data from R
– Embedded R script execution through database managed R engines
– Statistics engine
– Enhanced support for high-speed Exadata scoring
 Oracle R Connector for Hadoop [ORCH] (Part of Oracle Big Data Connectors)
– R interface to Oracle Hadoop Cluster on BDA and non-Oracle Hadoop clusters
– Access and manipulate data in HDFS, database, and file system
– Write MapReduce functions using R and execute through natural R interface
– Predictive models with execution in-Cluster against Hadoop-stored data

11

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle R Components
Component layout

Analyst Laptop
Oracle Database
Oracle R Distribution
Oracle R Enterprise
Server Components

Oracle R Distribution
Oracle R Connector
for Hadoop Client
Oracle R Enterprise
Client Packages

Optional with ORCH

12

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle R Distribution

Oracle R Connector
for Hadoop
Oracle R Enterprise
Client Packages

Big Data Appliance

Oracle R Enterprise
Client Packages

Exadata
Knowledge Exploitation Process
Typical stages in a Big Data Project
Business
Understanding

Deployment

Data
Scientist

Data
Selection

Evaluation

Discovery

Model
Building

13

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Data
Preparation

13
Data Loading with Oracle R Enterprise
Business
Understanding

Deployment

Data
Scientist

Data
Selection

library(ORE)
R> df <- data.frame(A=1:26,
B=letters[1:26])
R> dim(df)
[1] 26 2
R> class(df)
[1] "data.frame"

R> ore.create(df, table="DF_TABLE")
Evaluation

Discovery

Model
Building

16

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

R> ore.ls()
[1] "DF_TABLE"
R> class(DF_TABLE)
[1] "ore.frame" attr(,"package")
[1] "OREbase"
R> dim(DF_TABLE)
[1] 26 2

16
Discovery with Oracle R in-DB and HDFS
Business
Understanding

Deployment

Data
Scientist

Discovery

Evaluation

Model
Building

17

Data
Selection

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

library(ORE)
ore.ls() # list tables in DB
class(MY_TABLE) # ore.frame
dim(MY_TABLE)
# overloaded R functions
head(MY_TABLE)
sample(MY_TABLE)
summary(MY_TABLE)
library(ORCH)
hdfs.ls()
hdfs.dim("myHDFSdata")
hdfs.head("myHDFSdata")
hdfs.sample("myHDFSdata")
hdfs.toHive("myHDFSdata",
tablename="my_hive_data")
summary(my_hive_data)

17
Data Prep with Oracle R in-DB and HDFS
library(ORE) / library(ORCH)
# join
merge (MY_TABLE1, MY_TABLE2,by.x="x1", by.y="x2")

Business
Understanding

Deployment

Data
Scientist

Data
Selection

# project columns
df <- MY_TABLE[,c("X","Y","Z")]

# filter rows
df <- df[df$Z<=4.3 | df$A=="B",1:3]
Evaluation

Discovery

Model
Building

18

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

#binning
IRIS_TAB <- ore.push(iris[1:4])
IRIS_TAB$PetalBins =
ifelse(IRIS_TAB$Petal.Length < 2.0, "SMALL
PETALS",
ifelse(IRIS_TAB$Petal.Length < 4.0, "MEDIUM
PETALS", "LARGE PETALS"))

18
“Densifying” data: custom MapReduce jobs
Count occurrence of hash tags in tweets per customer for select tags
mapHashTags <- function (k,v) {
x <- strsplit(v$text, " ")
x <- x[x!='']
importantTags <- tolower(importantTags)
for(twt in 1:length(x)) {
for(tag in x[[twt]]) {
if(substr(tag,1,1) == "#") {
tagL <- tolower(tag)
if(tagL %in% importantTags) {
orch.keyval(v[twt,"screenName"],tagL)
}}}}}
reduceHashTags <- function(k,vals) { # k = screenName, vals = vector(tags)
importantTags <- tolower(importantTags)
vals <- factor(vals$val,levels=importantTags)
x <- as.data.frame(t(as.matrix(table(vals))))
orch.keyval(k,x) # k = screenName, x = df(importantTags as cols) with counts
}

19

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

19
ORCH: Create your own MapReduce jobs
Count occurrence of hash tags in tweets per customer for select tags
importantTags <- c("#bigdata","#database","#oracle","#sql")
tag.summary <- hadoop.exec(tweets.id,
mapper=mapHashTags,
reducer=reduceHashTags,
export=orch.export(importantTags=importantTags),
config=new("mapred.config",
job.name
= "TwitterScreenNameHashTags",
reduce.tasks = 5,
map.output
= data.frame(key='a', val='a'),
reduce.output = data.frame(key='a', bigdata=0,
database=0 ,oracle=0, sql=0)))
hdfs.get(tag.summary)
> hdfs.get(tag.summary)
key bigdata

database oracle

sql

1

4

7

37

91

2

twitter.user.2

15

19

1

32

3

twitter.user.3

104

57

8

0

4

20

twitter.user.1

twitter.user.4

0

64

549

0

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

20
Modelling with Oracle R in-DB and HDFS
# Clustering with ORE
Business
Understanding

Deployment

Data
Scientist

Data
Selection

X <- ore.push (data.frame(x))
km.mod1 <ore.odmKMeans(~., X, num.centers=2,
num.bins=5)
summary(km.mod1)
rules(km.mod1)
clusterhists(km.mod1)

# Regression with ORCH
Discovery

Evaluation

Model
Building

21

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

mod.lm <- orch.lm(myFormula, myData,
nReducers = 2)
summary(mod.lm)
pred <- predict.orch.lm(mod.lm, newdata =
myData)
res.pred <- hdfs.get(pred)
head(res.pred)

21
In-database performance advantage
R lm vs. ORE ore.lm
Data: 500k to 1.5m records, 3 predictors
Performance: 2x-3x improvement for build, 4x improvement for scoring

22

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

22
In-database performance advantage – lm

More tests at http://blogs.oracle.com/R/entry/oracle_r_enterprise_1_32
23

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

23
Deploying with Oracle R Enterprise
Load R scripts into ORE script repository
Invoke R scripts by name from SQL

Business
Understanding

Production
Deploy
ment

Data
Scientist

Data
Selection

Discovery

Evaluation

Model
Building

24

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Store R objects directly in Oracle
Database (no separate files)

Optional return values:
• Data frame consumable by any SQL-ready
application
• XML containing structured data, complex
R objects, PNG images
• PNG table with BLOB column containing
images for immediate consumption
Schedule for automatic execution

24
Oracle Advanced Analytics: Embedded R Execution
SQL interface rqEval – generate XML string for graphic output
Oracle PL/SQL
begin

sys.rqScriptCreate('Example6',
'function(){

res <- 1:10

Oracle BI Publisher

plot( 1:100, rnorm(100), pch = 21,
bg = "red", cex = 2 )

R Language

res
}');
end;
/
Oracle SQL

select value
from

25

table(rqEval(NULL,'XML','Example6'));

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Summary
Oracle R Enterprise (ORE)

Oracle R Connector for Hadoop (ORCH)

• A comprehensive, database-centric
environment for end-to-end analytical
processes in R with immediate deployment
to production environments
• Wide range of in-database advanced
analytics algorithms exposed through R
• Eliminate R client memory limits

• A collection of R packages enabling Big Data
analytics from an R environment
• Allows R users to leverage a Hadoop Cluster
with HDFS and MapReduce from R
• Prepackaged advanced analytics algorithms
• Transparent manipulation of HIVE data

• Enable R users to conduct Big Data projects from R
• Eliminate client R engine memory barrier
• Scale to large data sets
• Deploy R-based solutions without translation to other
languages or environments
26

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

26
Resources
• Blog:

http://www.oracle.com/goto/R

https://blogs.oracle.com/R/

• Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397
• Oracle R Distribution:
http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html
• ROracle:
http://cran.r-project.org/web/packages/ROracle
• Oracle R Enterprise:
http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise
• Oracle R Connector for Hadoop:
http://www.oracle.com/us/products/database/big-data-connectors/overview

27

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

27
28

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

28

Más contenido relacionado

La actualidad más candente

Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to HadoopEdureka!
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysiszafarali1981
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash CourseDataWorks Summit
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataBuild Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataJean Ihm
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Innovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle RInnovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle RCapgemini
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...Jean Ihm
 
Hedrich_Michael_Resume_NT
Hedrich_Michael_Resume_NTHedrich_Michael_Resume_NT
Hedrich_Michael_Resume_NTMichael Hedrich
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
 
Oracle Spatial Studio: Fast and Easy Spatial Analytics and Maps
Oracle Spatial Studio:  Fast and Easy Spatial Analytics and MapsOracle Spatial Studio:  Fast and Easy Spatial Analytics and Maps
Oracle Spatial Studio: Fast and Easy Spatial Analytics and MapsJean Ihm
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine LearningJean Ihm
 
How To Visualize Graphs
How To Visualize GraphsHow To Visualize Graphs
How To Visualize GraphsJean Ihm
 

La actualidad más candente (20)

Madhu
MadhuMadhu
Madhu
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysis
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataBuild Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Innovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle RInnovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle R
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
 
Hedrich_Michael_Resume_NT
Hedrich_Michael_Resume_NTHedrich_Michael_Resume_NT
Hedrich_Michael_Resume_NT
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Oracle Spatial Studio: Fast and Easy Spatial Analytics and Maps
Oracle Spatial Studio:  Fast and Easy Spatial Analytics and MapsOracle Spatial Studio:  Fast and Easy Spatial Analytics and Maps
Oracle Spatial Studio: Fast and Easy Spatial Analytics and Maps
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine Learning
 
How To Visualize Graphs
How To Visualize GraphsHow To Visualize Graphs
How To Visualize Graphs
 
Aster getting started
Aster getting startedAster getting started
Aster getting started
 

Similar a User 2013-oracle-big-data-analytics-1971985

Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Mark Tabladillo
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
LT Infotech_Amit_Kurani_10621681_CV
LT Infotech_Amit_Kurani_10621681_CVLT Infotech_Amit_Kurani_10621681_CV
LT Infotech_Amit_Kurani_10621681_CVAmit Kurani
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
ETL_Developer_Resume_Shipra_7_02_17
ETL_Developer_Resume_Shipra_7_02_17ETL_Developer_Resume_Shipra_7_02_17
ETL_Developer_Resume_Shipra_7_02_17Shipra Jaiswal
 
Munir_Database_Developer
Munir_Database_DeveloperMunir_Database_Developer
Munir_Database_DeveloperMunir Muhammad
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesMark Rittman
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverviewjdijcks
 
Database@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenDatabase@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenTammy Bednar
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 

Similar a User 2013-oracle-big-data-analytics-1971985 (20)

Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
LT Infotech_Amit_Kurani_10621681_CV
LT Infotech_Amit_Kurani_10621681_CVLT Infotech_Amit_Kurani_10621681_CV
LT Infotech_Amit_Kurani_10621681_CV
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
ETL_Developer_Resume_Shipra_7_02_17
ETL_Developer_Resume_Shipra_7_02_17ETL_Developer_Resume_Shipra_7_02_17
ETL_Developer_Resume_Shipra_7_02_17
 
Munir_Database_Developer
Munir_Database_DeveloperMunir_Database_Developer
Munir_Database_Developer
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" Sources
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview
 
Database@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenDatabase@Home : The Future is Data Driven
Database@Home : The Future is Data Driven
 
jagadeesh updated
jagadeesh updatedjagadeesh updated
jagadeesh updated
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Resume
ResumeResume
Resume
 

Más de OUGTH Oracle User Group in Thailand

Más de OUGTH Oracle User Group in Thailand (18)

Quarterly leader-call-dec-2014
Quarterly leader-call-dec-2014Quarterly leader-call-dec-2014
Quarterly leader-call-dec-2014
 
Oracle Database Monitoring with AAS
Oracle Database Monitoring with AASOracle Database Monitoring with AAS
Oracle Database Monitoring with AAS
 
How oracle 12c flexes its muscles against oracle 11g r2
How oracle 12c flexes its muscles against oracle 11g r2How oracle 12c flexes its muscles against oracle 11g r2
How oracle 12c flexes its muscles against oracle 11g r2
 
Presentation joelperez thailand2014
Presentation joelperez thailand2014Presentation joelperez thailand2014
Presentation joelperez thailand2014
 
How to-work-with-the-oracle-user-group-team
How to-work-with-the-oracle-user-group-teamHow to-work-with-the-oracle-user-group-team
How to-work-with-the-oracle-user-group-team
 
Apouc 2014-java-8-create-the-future
Apouc 2014-java-8-create-the-futureApouc 2014-java-8-create-the-future
Apouc 2014-java-8-create-the-future
 
Apouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12cApouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12c
 
Apouc 2014-learn-from-oracle-support
Apouc 2014-learn-from-oracle-supportApouc 2014-learn-from-oracle-support
Apouc 2014-learn-from-oracle-support
 
Apouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-dataApouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-data
 
Apouc 2014-oracle-applications-update
Apouc 2014-oracle-applications-updateApouc 2014-oracle-applications-update
Apouc 2014-oracle-applications-update
 
Apouc 2014-oracle mobile platform
Apouc 2014-oracle mobile platformApouc 2014-oracle mobile platform
Apouc 2014-oracle mobile platform
 
Apouc 2014-oracle-ace-program
Apouc 2014-oracle-ace-programApouc 2014-oracle-ace-program
Apouc 2014-oracle-ace-program
 
Apouc 2014-oracle-community-programs
Apouc 2014-oracle-community-programsApouc 2014-oracle-community-programs
Apouc 2014-oracle-community-programs
 
Apouc 2014-oracle-cloud-strategy
Apouc 2014-oracle-cloud-strategyApouc 2014-oracle-cloud-strategy
Apouc 2014-oracle-cloud-strategy
 
Apouc 2014-wrapup
Apouc 2014-wrapupApouc 2014-wrapup
Apouc 2014-wrapup
 
How to install oracle 12c release 1
How to install oracle 12c release 1How to install oracle 12c release 1
How to install oracle 12c release 1
 
Session 307 ravi pendekanti engineered systems
Session 307  ravi pendekanti engineered systemsSession 307  ravi pendekanti engineered systems
Session 307 ravi pendekanti engineered systems
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
 

User 2013-oracle-big-data-analytics-1971985

  • 1. Big Data Analytics – Scaling R to Enterprise Data useR! 2013 – Albacete Spain #useR2013 Luis Campos Mark Hornick Big Data Solutions Lead, Oracle EMEA @luigicampos 1 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Director, Oracle Database Advanced Analytics @MarkHornick Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 2. 2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 3. The girl with all the questions! “The real innovation here is that we can and get the ask questions answer back before we have forgotten why we asked the question in the first place .” – Hilary Mason, Chief Scientist Bit.ly + member of NYC Mayor Bloomberg’s Technology and Innovation Advisory Council 3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 4. Nexus of Forces, Platform 3.0, Four Pillars What Analysts/groups are saying? 4 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 5. New Information Challenges Data Explosion A Decade of Digital Universe Growth: Storage in Exabytes (Source: IDC’s Digital Universe Study, June 2011) Combinatory Explosion Dimension Explosion 5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 6. Big Data Solution = Data + Analytics + Tools Source: McKinsey study “Big data: What’s your plan?” (March 2013) http://www.mckinsey.com/insights/business_technology/big_data_whats_your_plan DATA Any Data, Any Source 6 ANALYTICS Out-of-the box Analytics, New Models Copyright © 2013, Oracle and/or its affiliates. All rights reserved. TOOLS Self Service Data Discovery On Premise, On Cloud, On Mobile
  • 7. Oracle Complete Business Analytics Solution BIG DATA APPLIANCE BIG DATA CONNECTORS NoSQL DB 7 Oracle Advanced DATA MINING Analytics ORACLE R Ent. SPATIAL,GRAPH Real Time Decisions (RTD) Copyright © 2013, Oracle and/or its affiliates. All rights reserved. OBIEE ENDECA Collective Intellect (CI) On Premise, Oracle Cloud, On Mobile
  • 8. Apply Advanced Analytics on All Data Visualise it with any BI Tool Hadoop Relational HDFS Data BI Tools 8 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 9. Oracle R Advantages 1. Keep the R tools 2. Keep the data where it sits (Relational or HDFS) 3. Keep the SQL Based BI Tools 4. Scale to LARGE data sets R workspace console Function push-down – data transformation & Oracle statistics engine statistics Development 9 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Production OBIEE, Web Services Consumption
  • 10. Oracle’s Advanced Analytics Strategic Offerings Deliver enterprise-level advanced analytics in the Database  Oracle in-Database Data Mining algorithms – Access through Free GUI from SQL Developer or programmatically from SQL, PL/SQL, R or Java – Predictive model APIs for the Oracle R Enterprise – Exadata architecture advantages for up to 5x improvement with Smart Scan  Oracle R Distribution – Free download, pre-installed on Oracle Big Data Appliance, bundled with Oracle Linux – Enhanced linear algebra performance: Intel’s Math Kernel Library, AMD’s Core Math Library (Windows and Linux), SUN Solaris and IBM AIX – Enterprise support for customers of Oracle Advanced Analytics, Big Data Appliance, and Oracle Linux 10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 11. Oracle’s Advanced Analytics Strategic Offerings Deliver enterprise-level R in the Database or Hadoop  Oracle R Enterprise – Transparent access to database-resident data from R – Embedded R script execution through database managed R engines – Statistics engine – Enhanced support for high-speed Exadata scoring  Oracle R Connector for Hadoop [ORCH] (Part of Oracle Big Data Connectors) – R interface to Oracle Hadoop Cluster on BDA and non-Oracle Hadoop clusters – Access and manipulate data in HDFS, database, and file system – Write MapReduce functions using R and execute through natural R interface – Predictive models with execution in-Cluster against Hadoop-stored data 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 12. Oracle R Components Component layout Analyst Laptop Oracle Database Oracle R Distribution Oracle R Enterprise Server Components Oracle R Distribution Oracle R Connector for Hadoop Client Oracle R Enterprise Client Packages Optional with ORCH 12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle R Distribution Oracle R Connector for Hadoop Oracle R Enterprise Client Packages Big Data Appliance Oracle R Enterprise Client Packages Exadata
  • 13. Knowledge Exploitation Process Typical stages in a Big Data Project Business Understanding Deployment Data Scientist Data Selection Evaluation Discovery Model Building 13 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Data Preparation 13
  • 14. Data Loading with Oracle R Enterprise Business Understanding Deployment Data Scientist Data Selection library(ORE) R> df <- data.frame(A=1:26, B=letters[1:26]) R> dim(df) [1] 26 2 R> class(df) [1] "data.frame" R> ore.create(df, table="DF_TABLE") Evaluation Discovery Model Building 16 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. R> ore.ls() [1] "DF_TABLE" R> class(DF_TABLE) [1] "ore.frame" attr(,"package") [1] "OREbase" R> dim(DF_TABLE) [1] 26 2 16
  • 15. Discovery with Oracle R in-DB and HDFS Business Understanding Deployment Data Scientist Discovery Evaluation Model Building 17 Data Selection Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. library(ORE) ore.ls() # list tables in DB class(MY_TABLE) # ore.frame dim(MY_TABLE) # overloaded R functions head(MY_TABLE) sample(MY_TABLE) summary(MY_TABLE) library(ORCH) hdfs.ls() hdfs.dim("myHDFSdata") hdfs.head("myHDFSdata") hdfs.sample("myHDFSdata") hdfs.toHive("myHDFSdata", tablename="my_hive_data") summary(my_hive_data) 17
  • 16. Data Prep with Oracle R in-DB and HDFS library(ORE) / library(ORCH) # join merge (MY_TABLE1, MY_TABLE2,by.x="x1", by.y="x2") Business Understanding Deployment Data Scientist Data Selection # project columns df <- MY_TABLE[,c("X","Y","Z")] # filter rows df <- df[df$Z<=4.3 | df$A=="B",1:3] Evaluation Discovery Model Building 18 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. #binning IRIS_TAB <- ore.push(iris[1:4]) IRIS_TAB$PetalBins = ifelse(IRIS_TAB$Petal.Length < 2.0, "SMALL PETALS", ifelse(IRIS_TAB$Petal.Length < 4.0, "MEDIUM PETALS", "LARGE PETALS")) 18
  • 17. “Densifying” data: custom MapReduce jobs Count occurrence of hash tags in tweets per customer for select tags mapHashTags <- function (k,v) { x <- strsplit(v$text, " ") x <- x[x!=''] importantTags <- tolower(importantTags) for(twt in 1:length(x)) { for(tag in x[[twt]]) { if(substr(tag,1,1) == "#") { tagL <- tolower(tag) if(tagL %in% importantTags) { orch.keyval(v[twt,"screenName"],tagL) }}}}} reduceHashTags <- function(k,vals) { # k = screenName, vals = vector(tags) importantTags <- tolower(importantTags) vals <- factor(vals$val,levels=importantTags) x <- as.data.frame(t(as.matrix(table(vals)))) orch.keyval(k,x) # k = screenName, x = df(importantTags as cols) with counts } 19 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 19
  • 18. ORCH: Create your own MapReduce jobs Count occurrence of hash tags in tweets per customer for select tags importantTags <- c("#bigdata","#database","#oracle","#sql") tag.summary <- hadoop.exec(tweets.id, mapper=mapHashTags, reducer=reduceHashTags, export=orch.export(importantTags=importantTags), config=new("mapred.config", job.name = "TwitterScreenNameHashTags", reduce.tasks = 5, map.output = data.frame(key='a', val='a'), reduce.output = data.frame(key='a', bigdata=0, database=0 ,oracle=0, sql=0))) hdfs.get(tag.summary) > hdfs.get(tag.summary) key bigdata database oracle sql 1 4 7 37 91 2 twitter.user.2 15 19 1 32 3 twitter.user.3 104 57 8 0 4 20 twitter.user.1 twitter.user.4 0 64 549 0 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 20
  • 19. Modelling with Oracle R in-DB and HDFS # Clustering with ORE Business Understanding Deployment Data Scientist Data Selection X <- ore.push (data.frame(x)) km.mod1 <ore.odmKMeans(~., X, num.centers=2, num.bins=5) summary(km.mod1) rules(km.mod1) clusterhists(km.mod1) # Regression with ORCH Discovery Evaluation Model Building 21 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. mod.lm <- orch.lm(myFormula, myData, nReducers = 2) summary(mod.lm) pred <- predict.orch.lm(mod.lm, newdata = myData) res.pred <- hdfs.get(pred) head(res.pred) 21
  • 20. In-database performance advantage R lm vs. ORE ore.lm Data: 500k to 1.5m records, 3 predictors Performance: 2x-3x improvement for build, 4x improvement for scoring 22 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 22
  • 21. In-database performance advantage – lm More tests at http://blogs.oracle.com/R/entry/oracle_r_enterprise_1_32 23 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 23
  • 22. Deploying with Oracle R Enterprise Load R scripts into ORE script repository Invoke R scripts by name from SQL Business Understanding Production Deploy ment Data Scientist Data Selection Discovery Evaluation Model Building 24 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Store R objects directly in Oracle Database (no separate files) Optional return values: • Data frame consumable by any SQL-ready application • XML containing structured data, complex R objects, PNG images • PNG table with BLOB column containing images for immediate consumption Schedule for automatic execution 24
  • 23. Oracle Advanced Analytics: Embedded R Execution SQL interface rqEval – generate XML string for graphic output Oracle PL/SQL begin sys.rqScriptCreate('Example6', 'function(){ res <- 1:10 Oracle BI Publisher plot( 1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) R Language res }'); end; / Oracle SQL select value from 25 table(rqEval(NULL,'XML','Example6')); Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 24. Summary Oracle R Enterprise (ORE) Oracle R Connector for Hadoop (ORCH) • A comprehensive, database-centric environment for end-to-end analytical processes in R with immediate deployment to production environments • Wide range of in-database advanced analytics algorithms exposed through R • Eliminate R client memory limits • A collection of R packages enabling Big Data analytics from an R environment • Allows R users to leverage a Hadoop Cluster with HDFS and MapReduce from R • Prepackaged advanced analytics algorithms • Transparent manipulation of HIVE data • Enable R users to conduct Big Data projects from R • Eliminate client R engine memory barrier • Scale to large data sets • Deploy R-based solutions without translation to other languages or environments 26 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 26
  • 25. Resources • Blog: http://www.oracle.com/goto/R https://blogs.oracle.com/R/ • Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397 • Oracle R Distribution: http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html • ROracle: http://cran.r-project.org/web/packages/ROracle • Oracle R Enterprise: http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise • Oracle R Connector for Hadoop: http://www.oracle.com/us/products/database/big-data-connectors/overview 27 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 27
  • 26. 28 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28