4. Internal Product 2:
Automated reports
Thursday morning:
Automated Business Reporting with
R (Zhengying (Doro) Lour)
R + bash + email
R + markdown + web server
5. Internal Product 3:
The Internal R package
• Data APIs
• Business specific metrics
• Custom plotting functions
• Custom data manipulation utilities
Thursday Morning:
An R tools platform in Cosmetic Industry (Jean-Francois Collin)
7. External Product 1:
Customer facing web app
Wednesday afternoon
Rapid Prototyping with R/Shiny at
McKinsey (Aaron Horowitz)
http://www.showmeshiny.com/
8. External Product 2:
analytical back-end
Wed afternoon:
Deploying R into Business Intelligence and Real-time Applications
(Louis Bajuk-Yorgan)
Zillow’s Big Data and Real-time Services in R (Yeng Bun)
10. More good example applications:
• http://blog.revolutionanalytics.com/2014/06/how-data-
driven-companies-use-r-to-compete.html
11. Ops: Managing an R Environment
• Overall: not complex, but there are pain points:
• R library management
• CRAN, non-CRAN and internal packages
• Version management
• Dependency management (pulling all dependencies)
• Non-R dependencies (especially C++ and Java)
• Hardware specifications: How much RAM is enough?
12. Conclusion: Why R?
• Plotting
• Rich analytical library
• More than a DSL: end to end functionality from data APIs
to web apps
• Solid IDE support
• Sturdy, stable easy to support platform
• Rapid prototyping
15. Tools: data manipulation
• Base R features
• Data structures: the data.frame
• Vectorized data manipulation: apply, tapply, lapply…
• Data structures: ts
• Comprehensive, elegant missing data handling (NA)
• Packages
• Wickham school: reshape2/plyr/dplyr/tidyr
• data.table
• Time series: zoo, xts, lubridate
• Spatial data tools: sp/maptools
• The ‘G’ school: gdata
19. Tools: parallel computing
• parallel: lots of features formerly distributed among
packages have recently been collected into this base R
package
• Revolution analytics
• Map-Reduce: rmr/rhadoop
• H20 (hexadata)
• SparkR (not on CRAN yet, look on github)
20. Tools: big or out of memory computing
• dplyr: supports database backed data structures
• ff: supports file based data
• biglm/bigmemory: shared memory matrices
• HadoopStreaming
Introduce self
State goal of presentation: overview of the ways that R is being used
Define ‘product’ for the non-business folks (deliverable)
Bread and butter for many; everyone does some of this; even non-primary R users often turn to R for this
Why R: R has always tried to be a platform for statistical analysis
R fits neatly into this kind of pipeline, there are useful command line utilities
This product is basically an extension of the automated reporting idea.