SlideShare a Scribd company logo
1 of 21
Revolution Confidential 
Revolution Analytics 
R 
and 
Data Science 
Joseph B Rickert 
September 25, 2014
Revolution Confidential What is R? 
 Most widely used data analysis 
software 
 Used by 2M+ data scientists, 
statisticians and analysts 
 Most powerful statistical 
programming language 
 Flexible, extensible and 
comprehensive for productivity 
 Platform for beautiful and unique 
data visualizations 
 As seen in New York Times, Twitter 
and Flowing Data 
 Thriving open-source community 
 Leading edge of analytics research 
www.revolutionanalytics.com/what-r
OPEN SOURCE R
Revolution Confidential 
4 
R’s popularity is growing rapidly 
R Usage Growth 
Rexer Data Miner Survey, 2007-2013 
• Rexer Data Miner Survey • IEEE Spectrum, July 2014 
#9: R 
Language Popularity 
IEEE Spectrum Top Programming Languages
Revolution Confidential Poll Question #1 
 What are the statistical programming 
languages/platforms you are most familiar 
with? (choose all that apply) 
 A) R 
 B) SAS 
 C) SPSS 
 D) KXEN 
 E) Statistica 
5
Revolution Confidential Tools for Data Science 
Source: O’Reilly Data Science Survey 
6
Revolution Confidential 
7 
R is among the highest-paid IT skills in the 
US 
Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
Revolution Confidential 
8 
Photo by Ksayer1 on flickr.
Revolution Confidential Why R for Data Science? 
X <- if (!is.empty.model(mt)) 
model.matrix(mt, mf, contrasts) 
else matrix(, NROW(Y), 0L) 
weights <- as.vector(model.weights(mf)) 
if (!is.null(weights) && !is.numeric(weights)) 
stop("'weights' must be a numeric vector") 
if (!is.null(weights) && any(weights < 0)) 
stop("negative weights not allowed") 
offset <- as.vector(model.offset(mf)) 
if (!is.null(offset)) { 
if (length(offset) != NROW(Y)) 
stop(gettextf("number of offsets is %d should equal %d (number of observations)", 
length(offset), NROW(Y)), domain = NA) 
} 
mustart <- model.extract(mf, "mustart") 
etastart <- model.extract(mf, "etastart") 
fit <- eval(call(if (is.function(method)) "method" else method, 
Algorithms 
x = X, y = Y, weights = weights, start = start, etastart = etastart, 
mustart = mustart, offset = offset, family = family, 
control = control, intercept = attr(mt, "intercept") > 
0L)) 
if (length(offset) && attr(mt, "intercept") > 0L) { 
fit2 <- eval(call(if (is.function(method)) "method" else method, 
x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, 
offset = offset, family = family, control = control, 
intercept = TRUE)) 
if (!fit2$converged) 
warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?") 
fit$null.deviance <- fit2$deviance 
} 
if (model) 
fit$model <- mf 
fit$na.action <- attr(mf, "na.action") 
if (x) 
fit$x <- X 
if (!y) 
fit$y <- NULL 
fit <- c(fit, list(call = call, formula = formula, terms = mt, 
data = data, offset = offset, control = control, method = method, 
contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt, 
mf))) 
class(fit) <- c(fit$class, c("glm", "lm")) 
fit 
9 
Task Views
Revolution Confidential R Growth 
Put this astonishing growth in 
perspective: 
 SAS.V 9.3S contains ~ 
1,200 commands that are 
roughly equivalent to R 
functions 
 R packages contain a 
median of 5 functions 
 Therefore R has ~ 36,820 
functions 
 During 2013 alone, R added 
more functions than SAS 
Institute has written in its 
entire history! 
Bob Muenchen 
10 
5882 packages 9/25/14
Revolution Confidential Why R for Data Science? 
Visualizations 
11
Revolution Confidential Why R for Data Science? 
 Scripting 
 Functional programming 
 Parallel programming 
 Data structures 
 Objects 
 Data Types 
 Regular expressions 
 Data connections 
 Interfaces to other 
Programming 
languages 
12
Revolution Confidential Why R for Data Science? 
Data Manipulation 
13 
“It's often said that 80% of the effort of analysis is spent just getting the data 
ready to analyse, the process of data cleaning. Data cleaning is not only a 
vital first step, but it is often repeated multiple times over the course of an 
analysis as new problems come to light.” Hadley Wickham Tidy Data
Revolution Confidential Why R for Data Science? 
R Integrates 
 Web applications 
 Internet graphics 
 D3 
 Potly 
 Other Languages 
 C, C++ 
 Java 
 BI Tools 
 Data bases 
 SQL 
 MongoDB 
14
Revolution Confidential Poll Question #2 
 What are the data platforms that you are 
connecting to regularly? (choose all that 
apply) 
 A) Hadoop 
 B) Spark 
 C) Cloud-based (Azure/AWS/Google) 
 D) Data Warehouses 
 E) Servers (Grid or Cluster) 
15
Revolution Confidential Why R for Data Science 
Hadoop 
Servers & 
Clusters 
Data 
Warehouses 
R Scales
Revolution Confidential Poll Question #3 
 What are the types of models that you are 
working with most? (choose all that apply) 
 A) Linear models / Regression / GLM 
 B) Decision Trees / Random Forests 
 C) Survival Models 
 D) GBM 
 E) Time Series models 
17
Let’s look at some 
code. 
www.revolutionanalytics.com 
1.855.GET.REVO 
Twitter: @RevolutionR
Revolution Confidential 
19 
Why is R Right for Data Science? 
 R is open source 
 R is a powerful language 
 Data Manipulation 
 Computational Statistics 
 Machine Learning 
 R is an innovation engine 
 R has a rich and expanding ecosystem
Revolution Confidential 
20 
Q&A / Resources 
R Code and Markdown Files 
https://github.com/joseph-rickert/DataScienceRWebinar 
What is R? 
revolutionanalytics.com/what-is-r 
Companies using R 
revolutionanalytics.com/companies-using-r 
AcademyR training 
revolutionanalytics.com/AcademyR 
AcademyR Certification 
revolutionanalytics.com/AcademyR-certification 
Contact Revolution Analytics 
revolutionanalytics.com/contact-us
Thank you 
Revolution Analytics is the leading commercial 
provider of software and support for the 
popular open source R statistics language. 
www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR 21

More Related Content

What's hot

Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVECTYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVECMathankumar S
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation AyanaRukasar
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesCGI
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
 

What's hot (20)

Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Stroke Prediction
Stroke PredictionStroke Prediction
Stroke Prediction
 
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVECTYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
 
Unit 2
Unit 2Unit 2
Unit 2
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 

Viewers also liked

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with RWei Zhong Toh
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services finalWasim Akram
 

Viewers also liked (6)

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
RHadoop
RHadoopRHadoop
RHadoop
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with R
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services final
 

Similar to R and Data Science

Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and moreMasayoshi Ootsuka
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analyticstempledf
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelMSDEVMTL
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxRevolution Analytics
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
#rstats lessons for #measure
#rstats lessons for #measure#rstats lessons for #measure
#rstats lessons for #measureMark Edmondson
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Mark Tabladillo
 

Similar to R and Data Science (20)

Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and more
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans Excel
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
#rstats lessons for #measure
#rstats lessons for #measure#rstats lessons for #measure
#rstats lessons for #measure
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 

More from Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

R and Data Science

  • 1. Revolution Confidential Revolution Analytics R and Data Science Joseph B Rickert September 25, 2014
  • 2. Revolution Confidential What is R?  Most widely used data analysis software  Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language  Flexible, extensible and comprehensive for productivity  Platform for beautiful and unique data visualizations  As seen in New York Times, Twitter and Flowing Data  Thriving open-source community  Leading edge of analytics research www.revolutionanalytics.com/what-r
  • 4. Revolution Confidential 4 R’s popularity is growing rapidly R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  • 5. Revolution Confidential Poll Question #1  What are the statistical programming languages/platforms you are most familiar with? (choose all that apply)  A) R  B) SAS  C) SPSS  D) KXEN  E) Statistica 5
  • 6. Revolution Confidential Tools for Data Science Source: O’Reilly Data Science Survey 6
  • 7. Revolution Confidential 7 R is among the highest-paid IT skills in the US Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
  • 8. Revolution Confidential 8 Photo by Ksayer1 on flickr.
  • 9. Revolution Confidential Why R for Data Science? X <- if (!is.empty.model(mt)) model.matrix(mt, mf, contrasts) else matrix(, NROW(Y), 0L) weights <- as.vector(model.weights(mf)) if (!is.null(weights) && !is.numeric(weights)) stop("'weights' must be a numeric vector") if (!is.null(weights) && any(weights < 0)) stop("negative weights not allowed") offset <- as.vector(model.offset(mf)) if (!is.null(offset)) { if (length(offset) != NROW(Y)) stop(gettextf("number of offsets is %d should equal %d (number of observations)", length(offset), NROW(Y)), domain = NA) } mustart <- model.extract(mf, "mustart") etastart <- model.extract(mf, "etastart") fit <- eval(call(if (is.function(method)) "method" else method, Algorithms x = X, y = Y, weights = weights, start = start, etastart = etastart, mustart = mustart, offset = offset, family = family, control = control, intercept = attr(mt, "intercept") > 0L)) if (length(offset) && attr(mt, "intercept") > 0L) { fit2 <- eval(call(if (is.function(method)) "method" else method, x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, offset = offset, family = family, control = control, intercept = TRUE)) if (!fit2$converged) warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?") fit$null.deviance <- fit2$deviance } if (model) fit$model <- mf fit$na.action <- attr(mf, "na.action") if (x) fit$x <- X if (!y) fit$y <- NULL fit <- c(fit, list(call = call, formula = formula, terms = mt, data = data, offset = offset, control = control, method = method, contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt, mf))) class(fit) <- c(fit$class, c("glm", "lm")) fit 9 Task Views
  • 10. Revolution Confidential R Growth Put this astonishing growth in perspective:  SAS.V 9.3S contains ~ 1,200 commands that are roughly equivalent to R functions  R packages contain a median of 5 functions  Therefore R has ~ 36,820 functions  During 2013 alone, R added more functions than SAS Institute has written in its entire history! Bob Muenchen 10 5882 packages 9/25/14
  • 11. Revolution Confidential Why R for Data Science? Visualizations 11
  • 12. Revolution Confidential Why R for Data Science?  Scripting  Functional programming  Parallel programming  Data structures  Objects  Data Types  Regular expressions  Data connections  Interfaces to other Programming languages 12
  • 13. Revolution Confidential Why R for Data Science? Data Manipulation 13 “It's often said that 80% of the effort of analysis is spent just getting the data ready to analyse, the process of data cleaning. Data cleaning is not only a vital first step, but it is often repeated multiple times over the course of an analysis as new problems come to light.” Hadley Wickham Tidy Data
  • 14. Revolution Confidential Why R for Data Science? R Integrates  Web applications  Internet graphics  D3  Potly  Other Languages  C, C++  Java  BI Tools  Data bases  SQL  MongoDB 14
  • 15. Revolution Confidential Poll Question #2  What are the data platforms that you are connecting to regularly? (choose all that apply)  A) Hadoop  B) Spark  C) Cloud-based (Azure/AWS/Google)  D) Data Warehouses  E) Servers (Grid or Cluster) 15
  • 16. Revolution Confidential Why R for Data Science Hadoop Servers & Clusters Data Warehouses R Scales
  • 17. Revolution Confidential Poll Question #3  What are the types of models that you are working with most? (choose all that apply)  A) Linear models / Regression / GLM  B) Decision Trees / Random Forests  C) Survival Models  D) GBM  E) Time Series models 17
  • 18. Let’s look at some code. www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR
  • 19. Revolution Confidential 19 Why is R Right for Data Science?  R is open source  R is a powerful language  Data Manipulation  Computational Statistics  Machine Learning  R is an innovation engine  R has a rich and expanding ecosystem
  • 20. Revolution Confidential 20 Q&A / Resources R Code and Markdown Files https://github.com/joseph-rickert/DataScienceRWebinar What is R? revolutionanalytics.com/what-is-r Companies using R revolutionanalytics.com/companies-using-r AcademyR training revolutionanalytics.com/AcademyR AcademyR Certification revolutionanalytics.com/AcademyR-certification Contact Revolution Analytics revolutionanalytics.com/contact-us
  • 21. Thank you Revolution Analytics is the leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR 21

Editor's Notes

  1. Image reference: http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
  2. Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey