SlideShare una empresa de Scribd logo
1 de 24
Through the fire wall with 
miniCRAN 
Andrie de Vries 
andrie@revolutionanalytics.com 
@RevoAndrie
OUR COMPANY 
The leading provider 
of advanced analytics 
software and services 
based on open source R, 
since 2007 
OUR SOFTWARE 
The only Big Data, Big Analytics 
software platform based on the 
data science language R 
SOME KUDOS 
Visionary 
Gartner Magic Quadrant 
for Advanced Analytics 
Platforms, 2014
Through the fire wall
Overview 
 Situation 
– CRAN and other package repositories are a wonderful source of innovation 
– It is very easy to create a complete mirror of CRAN 
 Complication 
– Organisations want to control what sits behind the firewall 
– Rationale: licensing as well as security concerns 
 Critical question 
– How to manage an internally consistent set of package in your organisation?
Enterprise requires CRAN behind the firewall 
 Security 
 Separated from Internet 
 Virus and/or malicious code detection 
 License compliance 
 Subset of approved packages 
CRAN 
Local CRAN mirror 
Internally approved 
subset of CRAN 
mirror 
R 
Mirror 
Publish internally 
R 
R users 
Scan, virus check 
and quarantine
Solutions: 
 Use rsync to create a full mirror 
– Described in the Revolution R 
installation manual 
 OR 
 Use the miniCRAN package 
– Specify packages 
– Download to local repository 
– Create additional repository files 
– To support available.packages() and 
install.packages() 
– Repeat for each version of R 
CRAN 
Partial CRAN mirror 
Scan, virus check and 
Internally approved 
subset of CRAN 
mirror 
R 
Mirror 
Publish internally 
R 
R users 
quarantine
miniCRAN
Terminology 
 Repository 
– Specific file structure with package source and/or binaries as 
well as PACKAGES metadata 
– For example CRAN or BioConductor 
 Library 
– A folder on your machine containing packages 
– Separate folder for each installed version of R 
 Package 
– The actual package 
– For example ggplot2 or MASS
Anatomy of a CRAN mirror 
 A repository contains packages in both source and binary format (for 
multiple versions of R) 
Root 
∟ src 
∟ contrib 
∟ bin 
∟ windows/contrib/ 
∟ macosx/contrib/ 
∟ macosx/mavericks/contrib 
∟ macosx/leopard/contrib 
∟ PACKAGES 
Source packages 
Binary packages 
(multiple folders for 
each R version) 
Index file
Step by step guide 
 List desired packages 
 Determine all dependencies 
– (recursively) 
 Download source and binaries 
– For every version of R you want to support 
 Create index file: PACKAGES 
 Make your local repo available in the organisation
Dependency explosion 
ggplot2, data.table, lattice, xts, knitr, shiny, plyr 
ggplot2 
highr formatR 
reshape 
data.table 
colorspace 
Hmisc 
lattice 
bift6as4tmatch KernSmooth 
xts 
knitr 
shiny plyr 
RColorBrewer 
zoo 
stringr 
mime 
Rcpp 
dichromat munsell 
labeling 
cluster 
latticeExtra 
survival 
Formula 
bitops 
digest 
SparseM 
gtable 
maps 
foreign 
sp 
sandwich 
TH.data 
mvtnorm 
evaluate 
markdown 
nlme Matrix 
reshape2 
scales 
proto 
MASS 
bit 
timeDate 
tseries 
quadprog 
BH 
codetools 
iterators 
caTools 
httpuv 
xtable 
RJSONIO 
htmltools 
quantreg 
mapproj 
hexbin 
maptools 
multcomp 
testthat 
mgcv 
chron 
timeSeries 
its 
fts 
tis 
testit 
rglXML 
RCurl 
Cairo 
foreachabind 
doMC 
itertools 
→→→→→ 
Imports 
Depends 
Suggests 
Enhances 
LinkingTo 
> pkgDep(c("ggplot2", "data.table", 
"lattice","xts", "knitr", "shiny", "plyr")) 
[1] "ggplot2" "data.table" "lattice" 
[4] "xts" "knitr" "shiny" 
[7] "plyr" "digest" "gtable" 
[10] "reshape2" "scales" "proto" 
[13] "MASS" "Rcpp" "stringr" 
[16] "RColorBrewer" "dichromat" "munsell" 
[19] "labeling" "colorspace" "zoo" 
[22] "evaluate" "formatR" "highr" 
[25] "markdown" "mime" "httpuv" 
[28] "caTools" "RJSONIO" "xtable" 
[31] "htmltools" "bitops" "SparseM" 
[34] "survival" "Formula" "latticeExtra" 
[37] "cluster" "rpart" "nnet" 
[40] "acepack" "foreign" "maps" 
[43] "sp" "mvtnorm" "TH.data" 
[46] "sandwich" "nlme" "Matrix" 
[49] "bit" "timeDate" "quadprog" 
[52] "Hmisc" "BH" "codetools" 
[55] "iterators" "quantreg" "mapproj" 
[58] "hexbin" "maptools" "multcomp" 
[61] "testthat" "mgcv" "chron" 
[64] "reshape" "fastmatch" "bit64" 
[67] "KernSmooth" "timeSeries" "tseries" 
[70] "its" "fts" "tis" 
[73] "testit" "rgl" "XML" 
[76] "RCurl" "Cairo" "abind" 
[79] "foreach" "doMC" "itertools"
Using miniCRAN 
 The miniCRAN package is available at: 
– CRAN 
– github (development version) 
library(miniCRAN) 
vignette("miniCRAN")
Using miniCRAN 
library(miniCRAN) 
# Specify list of packages to download 
pkgs <- c("foreach") 
# Specify CRAN mirror to use 
revolution <- c(CRAN="http://cran.revolutionanalytics.com") 
pkgList <- pkgDep(pkgs, repos=revolution, type="source") 
# Make repo for source and win.binary 
makeRepo(pkgList, path=pth, repos=revolution, 
type="source")
Referring to repo on local file system 
 Use file:/// 
install.packages("ggplot2", 
repos="file:///path/to/file/")
Making it stick 
 Configure Rprofile for every user 
 To set your repo permanently, add 
options(repos=c(CRAN="file:///path/to/repo"))
Example code 
 Find an example session at gist 
– https://gist.github.com/andrie/d68834d68f4724432929
Reproducibility
Package Hell 
I heard you need to create 
a TPS Report. Here, I’ve 
got an R script that does 
that already. 
Oh, you need to 
download these 5 
packages first. 
I did, and 
it still 
doesn’t 
work! 
Well, it worked when I 
wrote it 3 weeks ago. 
YOUR 
Grr. 
Package 
updates…
Sharing a script reproducibly … and simply 
# Run with R 3.1.0 
require(RRT) 
checkpoint(snapshot="2014-06-27") 
# find packages used in this project 
# install packages in checkpoint folder 
# set library path to use checkpointed packages 
require(ggplot2) 
require(data.table) 
require(knitr) 
...
The R reproducibility toolkit 
 Server-side solution: MRAN 
 Client side R package: RRT 
CRAN 
MRAN RRT package 
RRDaily 
snapshots 
require(RRT) 
checkpoint("2014-06- 
27")
MRAN and RRT is actively developed 
 Try the development version by installing from github: 
install.packages("devtools") 
library("devtools") 
devtools::install_github("RevolutionAnalytics/RRT") 
library("RRT")
Conclusion
Conclusion 
 To use a private version of CRAN in your organisation, either: 
– Use rsync to create a full CRAN mirror 
– Use the miniCRAN package to selectively create a mini version of CRAN 
 For reproducible research, look out for imminent announcements about 
RRT and MRAN
Thank you. 
www.revolutionanalytics.com 
Twitter: @RevolutionR

Más contenido relacionado

La actualidad más candente

Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...Fasten Project
 
JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...
JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...
JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...Rafael Benevides
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyDaniel Barker
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Paul Richards
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with condaTravis Oliphant
 
Analyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHubAnalyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHubAhmed Zerouali
 
SFO15-110: Toolchain Collaboration
SFO15-110: Toolchain CollaborationSFO15-110: Toolchain Collaboration
SFO15-110: Toolchain CollaborationLinaro
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and visionStephan Ewen
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Building and Deploying containerized Python Apps in the Cloud
Building and Deploying containerized Python Apps in the CloudBuilding and Deploying containerized Python Apps in the Cloud
Building and Deploying containerized Python Apps in the CloudRodolfo Carvalho
 
Kubernetes for java developers
Kubernetes for java developersKubernetes for java developers
Kubernetes for java developersSandro Giacomozzi
 
Python deployments on OpenShift 3
Python deployments on OpenShift 3Python deployments on OpenShift 3
Python deployments on OpenShift 3Rodolfo Carvalho
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsFlink Forward
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkDataWorks Summit/Hadoop Summit
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Rafael Ferreira da Silva
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 

La actualidad más candente (20)

Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
 
JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...
JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...
JavaOne 2016: The Deploy Master: From Basic to Zero Downtime, Blue/Green, A/B...
 
Using Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS ModelerUsing Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS Modeler
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in Technology
 
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Analyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHubAnalyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHub
 
SFO15-110: Toolchain Collaboration
SFO15-110: Toolchain CollaborationSFO15-110: Toolchain Collaboration
SFO15-110: Toolchain Collaboration
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Building and Deploying containerized Python Apps in the Cloud
Building and Deploying containerized Python Apps in the CloudBuilding and Deploying containerized Python Apps in the Cloud
Building and Deploying containerized Python Apps in the Cloud
 
Kubernetes for java developers
Kubernetes for java developersKubernetes for java developers
Kubernetes for java developers
 
Python deployments on OpenShift 3
Python deployments on OpenShift 3Python deployments on OpenShift 3
Python deployments on OpenShift 3
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 

Similar a Through the firewall with miniCRAN

Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceRevolution Analytics
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, BrusselsDaniel Nüst
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Ricardo Amaro
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R projectWLOG Solutions
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant Ricardo Amaro
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 
Introduction to r
Introduction to rIntroduction to r
Introduction to rgslicraf
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance ComputersDave Hiltbrand
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector BuilderMark Wilkinson
 
Approaching package manager
Approaching package managerApproaching package manager
Approaching package managerTimur Safin
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...Felipe Prado
 
Creating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneCreating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneGary Wisniewski
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)Daniel Nüst
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 
Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPCinside-BigData.com
 
Puppet Systems Infrastructure Construction Kit
Puppet Systems Infrastructure Construction KitPuppet Systems Infrastructure Construction Kit
Puppet Systems Infrastructure Construction KitAlessandro Franceschi
 

Similar a Through the firewall with miniCRAN (20)

Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brussels
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Approaching package manager
Approaching package managerApproaching package manager
Approaching package manager
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
 
6202942
62029426202942
6202942
 
Creating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneCreating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with Chaperone
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPC
 
Puppet Systems Infrastructure Construction Kit
Puppet Systems Infrastructure Construction KitPuppet Systems Infrastructure Construction Kit
Puppet Systems Infrastructure Construction Kit
 

Más de Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution Analytics
 

Más de Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 

Through the firewall with miniCRAN

  • 1. Through the fire wall with miniCRAN Andrie de Vries andrie@revolutionanalytics.com @RevoAndrie
  • 2. OUR COMPANY The leading provider of advanced analytics software and services based on open source R, since 2007 OUR SOFTWARE The only Big Data, Big Analytics software platform based on the data science language R SOME KUDOS Visionary Gartner Magic Quadrant for Advanced Analytics Platforms, 2014
  • 4. Overview  Situation – CRAN and other package repositories are a wonderful source of innovation – It is very easy to create a complete mirror of CRAN  Complication – Organisations want to control what sits behind the firewall – Rationale: licensing as well as security concerns  Critical question – How to manage an internally consistent set of package in your organisation?
  • 5. Enterprise requires CRAN behind the firewall  Security  Separated from Internet  Virus and/or malicious code detection  License compliance  Subset of approved packages CRAN Local CRAN mirror Internally approved subset of CRAN mirror R Mirror Publish internally R R users Scan, virus check and quarantine
  • 6. Solutions:  Use rsync to create a full mirror – Described in the Revolution R installation manual  OR  Use the miniCRAN package – Specify packages – Download to local repository – Create additional repository files – To support available.packages() and install.packages() – Repeat for each version of R CRAN Partial CRAN mirror Scan, virus check and Internally approved subset of CRAN mirror R Mirror Publish internally R R users quarantine
  • 8. Terminology  Repository – Specific file structure with package source and/or binaries as well as PACKAGES metadata – For example CRAN or BioConductor  Library – A folder on your machine containing packages – Separate folder for each installed version of R  Package – The actual package – For example ggplot2 or MASS
  • 9. Anatomy of a CRAN mirror  A repository contains packages in both source and binary format (for multiple versions of R) Root ∟ src ∟ contrib ∟ bin ∟ windows/contrib/ ∟ macosx/contrib/ ∟ macosx/mavericks/contrib ∟ macosx/leopard/contrib ∟ PACKAGES Source packages Binary packages (multiple folders for each R version) Index file
  • 10. Step by step guide  List desired packages  Determine all dependencies – (recursively)  Download source and binaries – For every version of R you want to support  Create index file: PACKAGES  Make your local repo available in the organisation
  • 11. Dependency explosion ggplot2, data.table, lattice, xts, knitr, shiny, plyr ggplot2 highr formatR reshape data.table colorspace Hmisc lattice bift6as4tmatch KernSmooth xts knitr shiny plyr RColorBrewer zoo stringr mime Rcpp dichromat munsell labeling cluster latticeExtra survival Formula bitops digest SparseM gtable maps foreign sp sandwich TH.data mvtnorm evaluate markdown nlme Matrix reshape2 scales proto MASS bit timeDate tseries quadprog BH codetools iterators caTools httpuv xtable RJSONIO htmltools quantreg mapproj hexbin maptools multcomp testthat mgcv chron timeSeries its fts tis testit rglXML RCurl Cairo foreachabind doMC itertools →→→→→ Imports Depends Suggests Enhances LinkingTo > pkgDep(c("ggplot2", "data.table", "lattice","xts", "knitr", "shiny", "plyr")) [1] "ggplot2" "data.table" "lattice" [4] "xts" "knitr" "shiny" [7] "plyr" "digest" "gtable" [10] "reshape2" "scales" "proto" [13] "MASS" "Rcpp" "stringr" [16] "RColorBrewer" "dichromat" "munsell" [19] "labeling" "colorspace" "zoo" [22] "evaluate" "formatR" "highr" [25] "markdown" "mime" "httpuv" [28] "caTools" "RJSONIO" "xtable" [31] "htmltools" "bitops" "SparseM" [34] "survival" "Formula" "latticeExtra" [37] "cluster" "rpart" "nnet" [40] "acepack" "foreign" "maps" [43] "sp" "mvtnorm" "TH.data" [46] "sandwich" "nlme" "Matrix" [49] "bit" "timeDate" "quadprog" [52] "Hmisc" "BH" "codetools" [55] "iterators" "quantreg" "mapproj" [58] "hexbin" "maptools" "multcomp" [61] "testthat" "mgcv" "chron" [64] "reshape" "fastmatch" "bit64" [67] "KernSmooth" "timeSeries" "tseries" [70] "its" "fts" "tis" [73] "testit" "rgl" "XML" [76] "RCurl" "Cairo" "abind" [79] "foreach" "doMC" "itertools"
  • 12. Using miniCRAN  The miniCRAN package is available at: – CRAN – github (development version) library(miniCRAN) vignette("miniCRAN")
  • 13. Using miniCRAN library(miniCRAN) # Specify list of packages to download pkgs <- c("foreach") # Specify CRAN mirror to use revolution <- c(CRAN="http://cran.revolutionanalytics.com") pkgList <- pkgDep(pkgs, repos=revolution, type="source") # Make repo for source and win.binary makeRepo(pkgList, path=pth, repos=revolution, type="source")
  • 14. Referring to repo on local file system  Use file:/// install.packages("ggplot2", repos="file:///path/to/file/")
  • 15. Making it stick  Configure Rprofile for every user  To set your repo permanently, add options(repos=c(CRAN="file:///path/to/repo"))
  • 16. Example code  Find an example session at gist – https://gist.github.com/andrie/d68834d68f4724432929
  • 18. Package Hell I heard you need to create a TPS Report. Here, I’ve got an R script that does that already. Oh, you need to download these 5 packages first. I did, and it still doesn’t work! Well, it worked when I wrote it 3 weeks ago. YOUR Grr. Package updates…
  • 19. Sharing a script reproducibly … and simply # Run with R 3.1.0 require(RRT) checkpoint(snapshot="2014-06-27") # find packages used in this project # install packages in checkpoint folder # set library path to use checkpointed packages require(ggplot2) require(data.table) require(knitr) ...
  • 20. The R reproducibility toolkit  Server-side solution: MRAN  Client side R package: RRT CRAN MRAN RRT package RRDaily snapshots require(RRT) checkpoint("2014-06- 27")
  • 21. MRAN and RRT is actively developed  Try the development version by installing from github: install.packages("devtools") library("devtools") devtools::install_github("RevolutionAnalytics/RRT") library("RRT")
  • 23. Conclusion  To use a private version of CRAN in your organisation, either: – Use rsync to create a full CRAN mirror – Use the miniCRAN package to selectively create a mini version of CRAN  For reproducible research, look out for imminent announcements about RRT and MRAN
  • 24. Thank you. www.revolutionanalytics.com Twitter: @RevolutionR