1. Executive Intro to R
William M. Cohee
November 2016
Prepared using Apache OpenOffice 4.1.2
2. Presenter Bio
● 15+ years of Wall Street Technology
experience
● Expertise in front-office Fixed Income
Systems, Analytics, Pricing, Instrument,
& Entity Reference Data Management
● BA, Computer Science
● MS, Information Systems Engineering
● Certified Bloomberg Specialist
● Currently in the Chief Data Office
@ HSBC
● www.linkedin.com/in/billcohee
3. Topic
● Tool of choice for Statisticians, Data Analysts, & Data Scientists
● Popularity and use of R is on the rise
● R Community is vibrant & the talent pool is growing rapidly
● R is evolving from its statistical computing roots into a development
platform for robust, reusable software
● A lot of commercial, third-party systems are adding support
● Oracle, Microsoft becoming big players
● R can be used to manage & analyze data in Hadoop
● A growing ecosystem is accelerating industry acceptance/adoption
● R savvy IT leaders can deliver more effective, lower cost solutions
4. Agenda
● What is R [slides 5-8]
● What can R be used for [slides 9-10]
● Recap & where to learn more [slides 11-12]
5. R – What is it?
● A powerful computing environment for Data Analysis & Statistics
● 'R' proper, is an open-source programming language
● Developed as a dialect of 'S'
● S developed by Bell Labs to 'turn ideas into software, quickly and
faithfully' c.1976
● strong desire at the time for an alternative to writing FORTRAN
subroutines for analyzing data
● Ross Ihaka and Robert Gentleman recognized as original creators
of R while professors at the University of Auckland in New Zealand
c.1995
● v1.0 came onto the scene in the early 2000s
6. R – What is it?
● Traditional user base consists of
● Researchers
● Statisticians
● Academia
● 'New wave' R users
● Wall Street Desk Quants
● Risk Analysts & Financial Modelers
● Data Scientists
● Advent of Big Data and the nascent field of Data Science are serving
as catalysts to the sudden rise of this 16+ year old technology
7. R – What is it?
● When people speak of R, they are usually referring to the broader
ecosystem, not the language
● R for Windows, Microsoft R Open – command line interpreters
● RStudio, R Tools for Visual Studio – IDEs (Interactive Development Environments)
● user-friendly, robust, graphical front-ends for working with R
● CRAN and MRAN
● Comprehensive R Archive Network
● Microsoft R Open Archive Network
● repositories of open-source extensions to R known as 'Packages'
● think of a Package as a pre-built library of functions & data
8. R – What is it?
● R was not created with 'coders' in mind
● Creators were focused on how to make Data Analysis easier on the
users of data
● Geared toward the power-user who has to work with large amounts
of data while avoiding coding as much as practically possible
● Why is it called R ???
● the co-creators were Ross & Robert!
● it was trendy to give languages letter names (B, C, S, etc)
● As R becomes more mainstream, it may have everyday applications
for people in roles requiring them to work with or 'be in the data'
9. R – What can it be used for?
● For presenting & solving data-oriented problems
● Exploratory Analysis
● discovering data about the data
● clustering & visualizing data
● quickly building summaries of the data being worked with
● Wrangling/Munging & re-shaping data
● working with structured & unstructured data
● sub-setting, filtering, and merging data
● making data 'tidy' – datasets that facilitate some kind of analysis
● dplyr & tidyr Packages popular
10. R – What can it be used for?
● Predictive Analytics & Machine Learning
● modeling, sampling, forecasting, trending, regression
● caret, h2o, quantmod Packages popular
● Data Visualization
● powerful, publication-quality graphing & plotting Packages
● ggplot2, leaflets, and shiny Packages popular
● shiny example: Where are the so-called 'SuperZIPs'?
● US postal codes scored on a scale of 0-100, 100 being highest
● score is a function of median household income and education level
● Top 5% are deemed the 'SuperZIPs'
● click to see the R + shiny powered Interactive data map
11. Recap & Resources
● R is an open-source environment that can be used for complex Data
'work'
● essential part of a Data Scientist's Toolbox
● Also a functional programming language
● can be used to create programs to automate routine, repetitive data
tasks and for general software development
● Becoming a mainstream tool
● benefiting from increased commercial support
● maturing ecosystem of Packages
● Agility, flexibility, growing talent pool, & low cost of ownership all a
part of R's appeal
12. Recap & Resources
● Where to learn more...
● The R Homepage: https://www.r-project.org
● RStudio: https://www.rstudio.com/products/RStudio
● CRAN: https://cran.r-project.org
● Oracle and R: http://bit.ly/2dUC24a
● Microsoft and R: http://bit.ly/2e5CT5m
● The R Consortium: https://www.r-consortium.org
● Playlist of R video tutorials: http://bit.ly/1iRcgyn
● Free Courses
● https://www.coursera.org/learn/r-programming
● https://www.datacamp.com/courses/free-introduction-to-r
Scan this QR code to view
online from a mobile device