SlideShare a Scribd company logo
1 of 46
Download to read offline
Revolution Confidential
Revolution Analytics & Cloudera Confidential
R + Hadoop
Ask Bigger (and new) Questions
and Get Better, Faster Answers
Michele Chambers
Chief Strategy Officer & VP Product Mgmt
Jai Ranganathan
Director Product Mgmt & Strategy
Revolution Confidential
Period of Disruption
2
1st Generation Predictive Analytics
Revolution Confidential
Today’s Challenge:
Accelerating Business Cadence
Changing Business Environment
• Fact Based Decisions Require More Data
• Need to Understand Tradeoffs and Best Course of Action
• Predictive Models Need to Continually Deliver Lift
• Reduced Shelf Life for Predictive Models
Faster Time to Value
• Reduce Analytic Cycle Time
• Build & Deploy Models Faster
• Eliminate Time Consuming Data Movements
Rapid Customer Facing Decisions
• Score More Frequently
• Need to Make Best Decision in Real Time
3
Revolution Confidential
4
Big Data
2nd Generation Modern Analytics
Machine
Learning
Quick to Fail
Lift
Revolution Confidential
Typical Technology Challenges
Our Customers Face
Big Data
• New Data
Sources
• Data Variety &
Velocity
• Fine Grain
Control
• Data Movement,
Memory Limits
Complex
Computation
• Experimentation
• Many Small
Models
• Ensemble
Models
• Simulation
Enterprise
Readiness
• Heterogeneous
Landscape
• Write Once,
Deploy Anywhere
• Skill Shortage
• Production
Support
Production
Efficiency
• Shorter Model
Shelf Life
• Volume of
Models
• Long End-to-End
Cycle Time
• Pace of Decision
Accelerated
5
Revolution Confidential
Revolution Confidential
Big Data Big Analytics is different
Revolution Confidential
7
Revolution Confidential
y=ax+b
8
Revolution Confidential
y=ax+b
y=ax+b
y=ax+b
y=ax+b
y=ax+b
y=ax+b
y=ax+b
y=ax+b
9
Revolution Confidential
New model
Existing model
10
Revolution Confidential
60%
65%
70%
75%
80%
85%
90%
95%
100%
0% 5% 10% 15% 20% 25% 30%
Accuracy
False Positives
Add unstructured data
Existing model
Revolution Confidential
Big Data Big Analytics Use Cases
12
• Build predictive models with (very) large datasets
• More rows/observations and/or more columns/features
• Tend to use dimension reduction, machine learning and/or ensemble techniques
One Big Model
• Score and predict with (very) large datasets with previously built model
• Score in batch or individual transactions
• Previously built model may be exported from model build to model deployment env.
Big Data Scoring
• Model factories build predictive models in quantity
• Automated building of individualized models and/or parallel individualized model
execution
Many Small
Models
• Score and predict with many individualized models
• Production model factories require model management
Scoring Many
Models
• Analytic models that are mathematically intense
• May not use large data sets but generate a lot of interim calculations
• May include vectorization, simulation, optimization
Computationally
Intensive Analytics
12
Revolution Confidential
Big Data Big Analytics
Specialized Use Cases
• Build forecasts with time sequenced data
• For Big Data, tend to be many small models esp. machine data
• Due to typical Big Data volume requires model management
Time Series
Analytics
• Use of unstructured, free text
• For Big Data, typically used to enhance structured predictive analytics
• Minimally requires text processing tools and may also require natural language
processing
Text and Document
Analytics
• Analyzing continuous, high speed data flows for patterns and acting upon the
patterns in real-time
• Requires specialized sampling and filtering techniques
• Uses distinct discovery analytics methods such as frequent itemsets or clustering
Mining Data
Streams
• No separation of model building and model scoring
• As real-time data becomes more widely available, this emerging category reduces
time-to-insight with little or no separation between model building and scoring
Zero Latency
13
Revolution Confidential
Revolution Confidential
Analytic Reference Architecture
Decision
Analytic Applications
Integration
Middleware
Data
Hadoop
Data
Warehouse
Other
Data
Sources
Analytics
Analytics Development Tools &
Platforms
|||||||||||||||||||||||||||
14
Revolution Confidential
Revolution Confidential
Architectural Approaches to Analytics
Beside Architecture Inside Architecture
DecisionIntegrationAnalytics
Analytics Development Tools & Platforms
Local Data Mart
Data
||||||||||||
||||||||||||
DecisionIntegration
Data+Analytics
Analytics Development Tools & Platforms
Analytic Applications
Middleware
Data Sources
Data Sources
Analytic Applications
Middleware
 15
Revolution Confidential
Pros & Cons of Architectural Approaches
• Analytic workflow tasks performed in a separate analytics
environment outside of the source database
• Pros: Segregates analytic workload
• Cons: Doesn’t leverage powerful production for transformations,
introduces scoring latencies,
Beside
Architecture
• Analytics workflow tasks performed inside the source database
with embedded analytics
• Pros: Eliminates data movement, reduces model latency, allows
exploration of all data
• Cons: IT governance on production, potential new skills
Inside
Architecture
• Some analytic workflow tasks performed inside the source
database & others performed in a separate analytics environment
• Pros: Leverages strengths of each architecture
• Cons: Maintain multiple environments
Hybrid
Architecture
16
Revolution Confidential
Building & Deploying Analytic Models
Beside
Architecture
Inside
Architecture
Hybrid
Architecture
Analytics
Analytics Development
Tools & Platforms
Local Data Mart
Data
Data Sources
24 3 34 1
Data+Analytics
Analytics Development
Tools & Platforms
Data Sources
2 31
Analytics
Analytics Development
Tools & Platforms
Local Data Mart
Data+Analytics
Analytics Development
Tools & Platforms
Data Sources1 2
LEGEND
Model Build
Model Deploy
Model Recode / PMML
Update DataData Prep / Marshaling
134
Revolution Confidential
+
&
Revolution ConfidentialOur platform vision
19
Lower cost per TB
Avoid data copying
Minimize big data movement
Simplify the IT and user
experience
Organizations bring their applications to
Hadoop data
©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or
redistribution without written permission is prohibited.
Revolution Confidential
Traditional workloads in Hadoop
WORKLOADS IN HADOOP
Search
Analytics
Self-service BI
Data Processing (ELT)
In Cloudera
• 2-10X the performance
• 1/10th the cost
In Cloudera
• Integrated R support for
deep analytics
• Takes advantage of entire
cluster for high
performance
• More granular datasets
with more model features
In Cloudera
• Data exploration on the
full fidelity data
• Faster lifecycle from
source data to mini-mart
• 1/10th the cost
OLAP reporting
©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or
redistribution without written permission is prohibited.
Revolution Confidential
Enterprise-Grade Solutions for Big Data
Key Characteristics
©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or
redistribution without written permission is prohibited.
Revolution Confidential
Cloudera Manager & R integration
Seamless cluster administration for Revolution R Enterprise
Deploy
Deploy Revolution R Enterprise quickly
and easily onto your CDH cluster
1
Configure & Optimize
Ensure optimal settings are configured for
performance of Revolution R Enterprise
2
Monitor, Diagnose &
Report
Identify resource controls, monitor
performance, debug and diagnose issues
through a single consolidated interface
3
©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or
redistribution without written permission is prohibited.
Revolution Confidential
23
Revolution Confidential
What is the R Language?
 A Platform…
 A Procedural Language for Stats, Math and Data Science
 A Complete Data Visualization Framework
 Provided as Open Source
 A Community…
 2M+ Users with the Skill to Tackle Big Data Statistical and
Numerical Analysis and Machine Learning Projects
 Active User Groups Across the World
 An Ecosystem
 CRAN: 4500+ Freely Available Algorithms, Test Data and
Evaluations
24
Revolution Confidential
Revolution R Enterprise
Revolution R Enterprise
is the only enterprise big data big analytics platform
based on open source R statistical computing language
Portable Across Enterprise Platforms
High Performance, Scalable Analytics
Easier to Build & Deploy
25
Revolution Confidential
R is open source and drives analytic innovation but….
has some limitations for Enterprises
Disk based
scalability
Parallel threading
Commercial
support
Leverage open
source packages
plus Big Data ready
packages
26
Commercial
License
In memory bound
Single threaded
Community support
4500+ innovative
analytic packages
Risk of deployment
of open source
Big Data
Speed of
Analysis
Enterprise
Readiness
Analytic
Breadth
& Depth
Commercial
Viability
26
Revolution Confidential
Language
Interpreter and
Standard R
Algorithm Suites
Development &
Deployment Tooling
Big Data Distributed
Execution Platform
Introducing Revolution R Enterprise
The Big Data Big Analytics Platform
R+CRAN
RevoR
DistributedR
ConnectR
ScaleR
DevelopR DeployR
Revolution R Enterprise
27
Revolution Confidential
Big Data Speed @ Scale
with Revolution R Enterprise
Fast Math Libraries
Parallelized Algorithms
In-Database Execution
Multi-Threaded Execution
Multi-Core Processing
In-Hadoop Execution
Memory Management
Parallelized User Code
28
First, we enhance and
accelerate the Open
Source R interpreter.
28
Revolution Confidential
Open Source R performance:
Multi-threaded Math
Open
Source R
29
Revolution R
Enterprise
Computation (4-core laptop) Open Source R Revolution R Speedup
Linear Algebra1
Matrix Multiply 176 sec 9.3 sec 18x
Cholesky Factorization 25.5 sec 1.3 sec 19x
Linear Discriminant Analysis 189 sec 74 sec 3x
General R Benchmarks2
R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x
R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/
Customers report 3-50x
performance improvements
compared to Open Source R —
without changing any code
Revolution Confidential
Big Data Speed @ Scale
with Revolution R Enterprise
Fast Math Libraries
Parallelized Algorithms
In-Database Execution
Multi-Threaded Execution
Multi-Core Processing
In-Hadoop Execution
Memory Management
Parallelized User Code
30
Second, we built a
platform for hosting R
with Big Data on a
variety of massively
parallel platforms.
30
Revolution ConfidentialRevolution R Enterprise DistributedR
Innovative Memory Management, Multi-Threaded Execution, Multi-Core Processing
• A Revolution R Enterprise ScaleR analytic is provided a data source as input
• The analytic loops over data, reading a block at a time.
• Blocks of data are read by a separate worker thread (Thread 0).
• Worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update
intermediate results objects in memory
• When all of the data is processed a master results object is created from the intermediate results objects
COMBINE INTERMEDIATE RESULTS
31
Revolution Confidential
Revolution R Enterprise ScaleR
Performance and Capacity
32
Revolution Confidential
SAS HPA Benchmarking comparison*
Logistic Regression
Rows of data 1 billion 1 billion
Parameters “just a few” 7
Time 80 seconds 44 seconds
Data location In memory On disk
Nodes 32 5
Cores 384 20
RAM 1,536 GB 80 GB
Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a
20th as much RAM, a 6th as many nodes, and not pre-loading data into RAM.
*As published by SAS in HPC Wire, April 21, 2011
Double
45%
1/6th
5%
5%
Revolution R Enterprise Delivers Performance at 2% of the Cost
33
Revolution ConfidentialRevolution R Enterprise ScaleR:
High Performance Big Data Analytics
 Data import – Delimited,
Fixed, SAS, SPSS, OBDC
 Variable creation &
transformation
 Recode variables
 Factor variables
 Missing value handling
 Sort
 Merge
 Split
 Aggregate by category
(means, sums)
 Min / Max
 Mean
 Median (approx.)
 Quantiles (approx.)
 Standard Deviation
 Variance
 Correlation
 Covariance
 Sum of Squares (cross product
matrix for set variables)
 Pairwise Cross tabs
 Risk Ratio & Odds Ratio
 Cross-Tabulation of Data
(standard tables & long form)
 Marginal Summaries of Cross
Tabulations
 Chi Square Test
 Kendall Rank Correlation
 Fisher’s Exact Test
 Student’s t-Test
Data Prep, Distillation & Descriptive Analytics
 Subsample (observations &
variables)
 Random Sampling
R Data Step Statistical Tests
Sampling
Descriptive Statistics
34
Revolution ConfidentialRevolution R Enterprise ScaleR:
High Performance Big Data Analytics
 Sum of Squares (cross product
matrix for set variables)
 Multiple Linear Regression
 Generalized Linear Models (GLM)
- All exponential family
distributions: binomial, Gaussian,
inverse Gaussian, Poisson,
Tweedie. Standard link functions
including: cauchit, identity, log,
logit, probit. User defined
distributions & link functions.
 Covariance & Correlation
Matrices
 Logistic Regression
 Classification & Regression Trees
 Predictions/scoring for models
 Residuals for all models
 Histogram
 Line Plot
 Scatter Plot
 Lorenz Curve
 ROC Curves (actual data and
predicted values)
 K-Means
Statistical Modeling
 Decision Trees
Predictive Models Cluster AnalysisData Visualization
Classification
Machine Learning
Simulation
 Monte Carlo
Variable Selection
 Stepwise Regression (for linear reg)
35
Revolution Confidential
Unparalleled Big Data Big Analytics
Scale, Performance & Innovation
1 + 1 = 1000’s
Performance
V
a
l
u
e
Revolution R Enterprise
+ =
Performance
Enhanced R
R Language
Open Source
R Analytic
Packages
Big Data
Distributed &
Parallel
Processing
&
Analytic Package
Big Data
Distributed &
Parallel
Processing
&
Analytic Package
Open Source
R Analytic
Packages
Performance Enhanced R
36
Revolution Confidential
Leveraging CRAN with DistributedR & ScaleR
 Big Data Distillation
 Allows a R programmer to leverage RRE ScaleR to reduce dimensionality
prior and input the reduced data set into open source packages so that the
computationally intensive portion is sped up with RRE ScaleR techniques
and any of the plethora of open source packages can be leveraged
 Big Data Threading
 Allows a R programmer to leverage RRE ScaleR to execute algorithms
designed for SMP environments in parallel using DistributedR (ie: Monte
Carlo simulation)
 Supercharge Open Source package with RRE
 Allows a R programmer to re-engineer a CRAN routine by replacing an
Open Source function inside an R based algorithm with the equivalent
ScaleR function(s)
 High Performance Custom Algorithm
 Allows a R programmer to use the RRE high throughput extreme data
format (XDF) to apply any combination of Open Source functions and logic
while chunking through an XDF file to overcome the Open Source R
memory limitations
37
Revolution Confidential
WODA:
Write Once – Deploy Anywhere
38
Revolution Confidential
Big Analytics on Big Data in Hadoop
 100% R on Hadoop
 Full Skill Transfer - No Java needed.
 Use 4500+ CRAN Packages
 Blend Combine R & Other Tools /
Methods
 100% Portability
 Build Once – Deploy Many
 Track Evolution of Hadoop
 Protect Against Platform Uncertainty
 Avoid Platform Lock-ins
 Hadoop Performance & Scale
 Leverage Hadoop Parallelism Easily
 Analyze Data Without Moving It
DataAnalyticsApplications
Hadoop
+
Scalable
Compute
HDFS
HBase
Portability.
Parallel Storage
Hive
Big Data
Scale
100% R.
39
Revolution Confidential
Revolution Confidential
Revolution R Enterprise + Cloudera Propels
Enterprises into the Future
Decision
Analytic Applications
Integration
Middleware
Data
Cloudera
Data Management Platform
Analytics
Revolution R Enterprise
Big Data Big Analytics Platform
|||||||||||||||||||||||||||
40
Revolution Confidential
Revolution R Enterprise Powers
Write Once, Deploy Anywhere
41
Beside
Architecture
Inside
Architecture
Hybrid
Architecture
Analytics
Revolution R Enterprise
Local Data Mart
Data
Cloudera
24 3 34 1
Data+Analytics
Revolution R Enterprise
Cloudera
2 31
Analytics
Revolution R Enterprise
Local Data Mart
Data+Analytics
Revolution R Enterprise
Cloudera1 2
LEGEND
Model Build
Model Deploy
Model Recode / PMML
Update DataData Prep / Marshaling
4 |||||||||||||
|||||||||||||
|||||| Direct Connector
Bottom Line: Save Time, Save Money, Get Insights Faster
• Direct connectors access data without data movement
• Push down analyzing data without movement
• Use same R script on any platform without recoding
• Use right architecture for the job!
Revolution Confidential
Revolution R Enterprise Inside Cloudera
Consumption
Cloudera
Business Analysts
(Alteryx, Tableau,
QlikView, Cognos,
Microstrategy, Datameer
etc.)
Power Analysts
(R Studio, DevelopR, etc.)
Line of Business
users
(Analytic Apps, Rules
Engines, etc.)
Revolution R Enterprise
Machine Data
New Data Sources
Data Suppliers
Traditional Sources
IBM
Mainframe
Data Sources
R+CRAN
RevoR
DistributedR
ConnectR
ScaleR
DeployR
Big Data Big Analytics
Data Transformation,
Model Building & Scoring
42
Revolution Confidential
QuickStart Programs Deliver Value Quickly
 Offered by both Cloudera and Revolution
Analytics
 Combine Software, Services and Training
 Cloudera can help you get started with
Hadoop in a few ways
 Revolution Analytics helps you realize value
from R + Hadoop
43
Revolution Confidential
Summary
Revolution R Enterprise and Cloudera Hadoop bring best-of-breed
technologies to deliver:
 Highly scalable and high performance machine learning on data
residing in Hadoop
 Using the familiar R programming environment makes analytics
at scale accessible and easy for R users
 With the ability to integrate disparate data sources in one
repository, full lifecycle analytics from ad-hoc analysis to
production analytics are available in one managed environment
 The deep integration of Revolution R Enterprise with Cloudera
will provide a seamless operational experience for managing
both products
44
Revolution Confidential
45
Thank You
Visit us @ Strata NYC Oct 28
Revolution Confidential
Revolution Confidential
Questions
Revolution Analytics: info@revolutionanalytics.com
Cloudera: info@cloudera.com

More Related Content

What's hot

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityDatabricks
 
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...MongoDB
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
 
How can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of othersHow can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of othersgreyaudrina
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0DataWorks Summit
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiDatabricks
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsRyan Gross
 
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...BAINIDA
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study Seeling Cheung
 
Harnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie MacHarnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie MacDataWorks Summit
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTechWell
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB
 

What's hot (20)

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
TESTING IN BIG DATA WORLD
TESTING IN BIG DATA  WORLDTESTING IN BIG DATA  WORLD
TESTING IN BIG DATA WORLD
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
 
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Big Data Proof of Concept
Big Data Proof of ConceptBig Data Proof of Concept
Big Data Proof of Concept
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
How can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of othersHow can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of others
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...Data Analytics in your IoT SolutionFukiat Julnual, Technical Evangelist, Mic...
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
Harnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie MacHarnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie Mac
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case Study
 

Viewers also liked

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processingYahoo Developer Network
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
BIG Data Science: A Path Forward
BIG Data Science:  A Path ForwardBIG Data Science:  A Path Forward
BIG Data Science: A Path ForwardDan Mallinger
 
Big Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueBig Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueDan Mallinger
 
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)Stuart Herbert
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in HadoopAnalyticsWeek
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 

Viewers also liked (12)

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
BIG Data Science: A Path Forward
BIG Data Science:  A Path ForwardBIG Data Science:  A Path Forward
BIG Data Science: A Path Forward
 
Big Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueBig Analytics: Building Lasting Value
Big Analytics: Building Lasting Value
 
More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)More Than Websites: PHP And The Firehose @DataSift (2013)
More Than Websites: PHP And The Firehose @DataSift (2013)
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in Hadoop
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Predictive Analytics using R
Predictive Analytics using RPredictive Analytics using R
Predictive Analytics using R
 

Similar to R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers

12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution AnalyticsRevolution Analytics
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Precisely
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classmcAnalytics99
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...Chad Lawler
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesRevolution Analytics
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationCloudera, Inc.
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 

Similar to R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers (20)

12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson class
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two Strategies
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 

More from Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Recently uploaded

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers

  • 1. Revolution Confidential Revolution Analytics & Cloudera Confidential R + Hadoop Ask Bigger (and new) Questions and Get Better, Faster Answers Michele Chambers Chief Strategy Officer & VP Product Mgmt Jai Ranganathan Director Product Mgmt & Strategy
  • 2. Revolution Confidential Period of Disruption 2 1st Generation Predictive Analytics
  • 3. Revolution Confidential Today’s Challenge: Accelerating Business Cadence Changing Business Environment • Fact Based Decisions Require More Data • Need to Understand Tradeoffs and Best Course of Action • Predictive Models Need to Continually Deliver Lift • Reduced Shelf Life for Predictive Models Faster Time to Value • Reduce Analytic Cycle Time • Build & Deploy Models Faster • Eliminate Time Consuming Data Movements Rapid Customer Facing Decisions • Score More Frequently • Need to Make Best Decision in Real Time 3
  • 4. Revolution Confidential 4 Big Data 2nd Generation Modern Analytics Machine Learning Quick to Fail Lift
  • 5. Revolution Confidential Typical Technology Challenges Our Customers Face Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Skill Shortage • Production Support Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to-End Cycle Time • Pace of Decision Accelerated 5
  • 6. Revolution Confidential Revolution Confidential Big Data Big Analytics is different
  • 11. Revolution Confidential 60% 65% 70% 75% 80% 85% 90% 95% 100% 0% 5% 10% 15% 20% 25% 30% Accuracy False Positives Add unstructured data Existing model
  • 12. Revolution Confidential Big Data Big Analytics Use Cases 12 • Build predictive models with (very) large datasets • More rows/observations and/or more columns/features • Tend to use dimension reduction, machine learning and/or ensemble techniques One Big Model • Score and predict with (very) large datasets with previously built model • Score in batch or individual transactions • Previously built model may be exported from model build to model deployment env. Big Data Scoring • Model factories build predictive models in quantity • Automated building of individualized models and/or parallel individualized model execution Many Small Models • Score and predict with many individualized models • Production model factories require model management Scoring Many Models • Analytic models that are mathematically intense • May not use large data sets but generate a lot of interim calculations • May include vectorization, simulation, optimization Computationally Intensive Analytics 12
  • 13. Revolution Confidential Big Data Big Analytics Specialized Use Cases • Build forecasts with time sequenced data • For Big Data, tend to be many small models esp. machine data • Due to typical Big Data volume requires model management Time Series Analytics • Use of unstructured, free text • For Big Data, typically used to enhance structured predictive analytics • Minimally requires text processing tools and may also require natural language processing Text and Document Analytics • Analyzing continuous, high speed data flows for patterns and acting upon the patterns in real-time • Requires specialized sampling and filtering techniques • Uses distinct discovery analytics methods such as frequent itemsets or clustering Mining Data Streams • No separation of model building and model scoring • As real-time data becomes more widely available, this emerging category reduces time-to-insight with little or no separation between model building and scoring Zero Latency 13
  • 14. Revolution Confidential Revolution Confidential Analytic Reference Architecture Decision Analytic Applications Integration Middleware Data Hadoop Data Warehouse Other Data Sources Analytics Analytics Development Tools & Platforms ||||||||||||||||||||||||||| 14
  • 15. Revolution Confidential Revolution Confidential Architectural Approaches to Analytics Beside Architecture Inside Architecture DecisionIntegrationAnalytics Analytics Development Tools & Platforms Local Data Mart Data |||||||||||| |||||||||||| DecisionIntegration Data+Analytics Analytics Development Tools & Platforms Analytic Applications Middleware Data Sources Data Sources Analytic Applications Middleware  15
  • 16. Revolution Confidential Pros & Cons of Architectural Approaches • Analytic workflow tasks performed in a separate analytics environment outside of the source database • Pros: Segregates analytic workload • Cons: Doesn’t leverage powerful production for transformations, introduces scoring latencies, Beside Architecture • Analytics workflow tasks performed inside the source database with embedded analytics • Pros: Eliminates data movement, reduces model latency, allows exploration of all data • Cons: IT governance on production, potential new skills Inside Architecture • Some analytic workflow tasks performed inside the source database & others performed in a separate analytics environment • Pros: Leverages strengths of each architecture • Cons: Maintain multiple environments Hybrid Architecture 16
  • 17. Revolution Confidential Building & Deploying Analytic Models Beside Architecture Inside Architecture Hybrid Architecture Analytics Analytics Development Tools & Platforms Local Data Mart Data Data Sources 24 3 34 1 Data+Analytics Analytics Development Tools & Platforms Data Sources 2 31 Analytics Analytics Development Tools & Platforms Local Data Mart Data+Analytics Analytics Development Tools & Platforms Data Sources1 2 LEGEND Model Build Model Deploy Model Recode / PMML Update DataData Prep / Marshaling 134
  • 19. Revolution ConfidentialOur platform vision 19 Lower cost per TB Avoid data copying Minimize big data movement Simplify the IT and user experience Organizations bring their applications to Hadoop data ©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 20. Revolution Confidential Traditional workloads in Hadoop WORKLOADS IN HADOOP Search Analytics Self-service BI Data Processing (ELT) In Cloudera • 2-10X the performance • 1/10th the cost In Cloudera • Integrated R support for deep analytics • Takes advantage of entire cluster for high performance • More granular datasets with more model features In Cloudera • Data exploration on the full fidelity data • Faster lifecycle from source data to mini-mart • 1/10th the cost OLAP reporting ©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 21. Revolution Confidential Enterprise-Grade Solutions for Big Data Key Characteristics ©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 22. Revolution Confidential Cloudera Manager & R integration Seamless cluster administration for Revolution R Enterprise Deploy Deploy Revolution R Enterprise quickly and easily onto your CDH cluster 1 Configure & Optimize Ensure optimal settings are configured for performance of Revolution R Enterprise 2 Monitor, Diagnose & Report Identify resource controls, monitor performance, debug and diagnose issues through a single consolidated interface 3 ©2013 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 24. Revolution Confidential What is the R Language?  A Platform…  A Procedural Language for Stats, Math and Data Science  A Complete Data Visualization Framework  Provided as Open Source  A Community…  2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects  Active User Groups Across the World  An Ecosystem  CRAN: 4500+ Freely Available Algorithms, Test Data and Evaluations 24
  • 25. Revolution Confidential Revolution R Enterprise Revolution R Enterprise is the only enterprise big data big analytics platform based on open source R statistical computing language Portable Across Enterprise Platforms High Performance, Scalable Analytics Easier to Build & Deploy 25
  • 26. Revolution Confidential R is open source and drives analytic innovation but…. has some limitations for Enterprises Disk based scalability Parallel threading Commercial support Leverage open source packages plus Big Data ready packages 26 Commercial License In memory bound Single threaded Community support 4500+ innovative analytic packages Risk of deployment of open source Big Data Speed of Analysis Enterprise Readiness Analytic Breadth & Depth Commercial Viability 26
  • 27. Revolution Confidential Language Interpreter and Standard R Algorithm Suites Development & Deployment Tooling Big Data Distributed Execution Platform Introducing Revolution R Enterprise The Big Data Big Analytics Platform R+CRAN RevoR DistributedR ConnectR ScaleR DevelopR DeployR Revolution R Enterprise 27
  • 28. Revolution Confidential Big Data Speed @ Scale with Revolution R Enterprise Fast Math Libraries Parallelized Algorithms In-Database Execution Multi-Threaded Execution Multi-Core Processing In-Hadoop Execution Memory Management Parallelized User Code 28 First, we enhance and accelerate the Open Source R interpreter. 28
  • 29. Revolution Confidential Open Source R performance: Multi-threaded Math Open Source R 29 Revolution R Enterprise Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra1 Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x General R Benchmarks2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ Customers report 3-50x performance improvements compared to Open Source R — without changing any code
  • 30. Revolution Confidential Big Data Speed @ Scale with Revolution R Enterprise Fast Math Libraries Parallelized Algorithms In-Database Execution Multi-Threaded Execution Multi-Core Processing In-Hadoop Execution Memory Management Parallelized User Code 30 Second, we built a platform for hosting R with Big Data on a variety of massively parallel platforms. 30
  • 31. Revolution ConfidentialRevolution R Enterprise DistributedR Innovative Memory Management, Multi-Threaded Execution, Multi-Core Processing • A Revolution R Enterprise ScaleR analytic is provided a data source as input • The analytic loops over data, reading a block at a time. • Blocks of data are read by a separate worker thread (Thread 0). • Worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memory • When all of the data is processed a master results object is created from the intermediate results objects COMBINE INTERMEDIATE RESULTS 31
  • 32. Revolution Confidential Revolution R Enterprise ScaleR Performance and Capacity 32
  • 33. Revolution Confidential SAS HPA Benchmarking comparison* Logistic Regression Rows of data 1 billion 1 billion Parameters “just a few” 7 Time 80 seconds 44 seconds Data location In memory On disk Nodes 32 5 Cores 384 20 RAM 1,536 GB 80 GB Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a 20th as much RAM, a 6th as many nodes, and not pre-loading data into RAM. *As published by SAS in HPC Wire, April 21, 2011 Double 45% 1/6th 5% 5% Revolution R Enterprise Delivers Performance at 2% of the Cost 33
  • 34. Revolution ConfidentialRevolution R Enterprise ScaleR: High Performance Big Data Analytics  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing value handling  Sort  Merge  Split  Aggregate by category (means, sums)  Min / Max  Mean  Median (approx.)  Quantiles (approx.)  Standard Deviation  Variance  Correlation  Covariance  Sum of Squares (cross product matrix for set variables)  Pairwise Cross tabs  Risk Ratio & Odds Ratio  Cross-Tabulation of Data (standard tables & long form)  Marginal Summaries of Cross Tabulations  Chi Square Test  Kendall Rank Correlation  Fisher’s Exact Test  Student’s t-Test Data Prep, Distillation & Descriptive Analytics  Subsample (observations & variables)  Random Sampling R Data Step Statistical Tests Sampling Descriptive Statistics 34
  • 35. Revolution ConfidentialRevolution R Enterprise ScaleR: High Performance Big Data Analytics  Sum of Squares (cross product matrix for set variables)  Multiple Linear Regression  Generalized Linear Models (GLM) - All exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions including: cauchit, identity, log, logit, probit. User defined distributions & link functions.  Covariance & Correlation Matrices  Logistic Regression  Classification & Regression Trees  Predictions/scoring for models  Residuals for all models  Histogram  Line Plot  Scatter Plot  Lorenz Curve  ROC Curves (actual data and predicted values)  K-Means Statistical Modeling  Decision Trees Predictive Models Cluster AnalysisData Visualization Classification Machine Learning Simulation  Monte Carlo Variable Selection  Stepwise Regression (for linear reg) 35
  • 36. Revolution Confidential Unparalleled Big Data Big Analytics Scale, Performance & Innovation 1 + 1 = 1000’s Performance V a l u e Revolution R Enterprise + = Performance Enhanced R R Language Open Source R Analytic Packages Big Data Distributed & Parallel Processing & Analytic Package Big Data Distributed & Parallel Processing & Analytic Package Open Source R Analytic Packages Performance Enhanced R 36
  • 37. Revolution Confidential Leveraging CRAN with DistributedR & ScaleR  Big Data Distillation  Allows a R programmer to leverage RRE ScaleR to reduce dimensionality prior and input the reduced data set into open source packages so that the computationally intensive portion is sped up with RRE ScaleR techniques and any of the plethora of open source packages can be leveraged  Big Data Threading  Allows a R programmer to leverage RRE ScaleR to execute algorithms designed for SMP environments in parallel using DistributedR (ie: Monte Carlo simulation)  Supercharge Open Source package with RRE  Allows a R programmer to re-engineer a CRAN routine by replacing an Open Source function inside an R based algorithm with the equivalent ScaleR function(s)  High Performance Custom Algorithm  Allows a R programmer to use the RRE high throughput extreme data format (XDF) to apply any combination of Open Source functions and logic while chunking through an XDF file to overcome the Open Source R memory limitations 37
  • 39. Revolution Confidential Big Analytics on Big Data in Hadoop  100% R on Hadoop  Full Skill Transfer - No Java needed.  Use 4500+ CRAN Packages  Blend Combine R & Other Tools / Methods  100% Portability  Build Once – Deploy Many  Track Evolution of Hadoop  Protect Against Platform Uncertainty  Avoid Platform Lock-ins  Hadoop Performance & Scale  Leverage Hadoop Parallelism Easily  Analyze Data Without Moving It DataAnalyticsApplications Hadoop + Scalable Compute HDFS HBase Portability. Parallel Storage Hive Big Data Scale 100% R. 39
  • 40. Revolution Confidential Revolution Confidential Revolution R Enterprise + Cloudera Propels Enterprises into the Future Decision Analytic Applications Integration Middleware Data Cloudera Data Management Platform Analytics Revolution R Enterprise Big Data Big Analytics Platform ||||||||||||||||||||||||||| 40
  • 41. Revolution Confidential Revolution R Enterprise Powers Write Once, Deploy Anywhere 41 Beside Architecture Inside Architecture Hybrid Architecture Analytics Revolution R Enterprise Local Data Mart Data Cloudera 24 3 34 1 Data+Analytics Revolution R Enterprise Cloudera 2 31 Analytics Revolution R Enterprise Local Data Mart Data+Analytics Revolution R Enterprise Cloudera1 2 LEGEND Model Build Model Deploy Model Recode / PMML Update DataData Prep / Marshaling 4 ||||||||||||| ||||||||||||| |||||| Direct Connector Bottom Line: Save Time, Save Money, Get Insights Faster • Direct connectors access data without data movement • Push down analyzing data without movement • Use same R script on any platform without recoding • Use right architecture for the job!
  • 42. Revolution Confidential Revolution R Enterprise Inside Cloudera Consumption Cloudera Business Analysts (Alteryx, Tableau, QlikView, Cognos, Microstrategy, Datameer etc.) Power Analysts (R Studio, DevelopR, etc.) Line of Business users (Analytic Apps, Rules Engines, etc.) Revolution R Enterprise Machine Data New Data Sources Data Suppliers Traditional Sources IBM Mainframe Data Sources R+CRAN RevoR DistributedR ConnectR ScaleR DeployR Big Data Big Analytics Data Transformation, Model Building & Scoring 42
  • 43. Revolution Confidential QuickStart Programs Deliver Value Quickly  Offered by both Cloudera and Revolution Analytics  Combine Software, Services and Training  Cloudera can help you get started with Hadoop in a few ways  Revolution Analytics helps you realize value from R + Hadoop 43
  • 44. Revolution Confidential Summary Revolution R Enterprise and Cloudera Hadoop bring best-of-breed technologies to deliver:  Highly scalable and high performance machine learning on data residing in Hadoop  Using the familiar R programming environment makes analytics at scale accessible and easy for R users  With the ability to integrate disparate data sources in one repository, full lifecycle analytics from ad-hoc analysis to production analytics are available in one managed environment  The deep integration of Revolution R Enterprise with Cloudera will provide a seamless operational experience for managing both products 44
  • 46. Revolution Confidential Revolution Confidential Questions Revolution Analytics: info@revolutionanalytics.com Cloudera: info@cloudera.com