3. For everyone?
● Manual/one-off analyses
○ Not production-level code
● Personal interest
● No deep learning
Survey:
● PhD [student]?
● >1 stats/ML class?
● Can code?
14. R libraries: Hadleyverse
Step Libraries
Get data rvest, xml2, readxl
Clean dplyr, tidyr, stringr
Explore / visualise ggplot2
Publish
... and many others
from Hadley Wickham.
15. Alternatives
Excel / Google Sheets
● External data sources
● Google Apps Script
○ Google Translate API
○ Sending e-mail
○ …
● Not easy to reproduce analyses
Tons of other software
R & Python worth the learning curve
16. R is easy: reading data
# Read data
apartments <- read.csv2("data/apartment_rent_tartu.csv",
sep=";", header=TRUE)
17. R is easy: dplyr
library(dplyr)
# Find average price by part of city
apartments %>%
group_by(Linnaosa) %>%
summarise(KeskmineHind=mean(HindKohandatud)) %>%
arrange(desc(KeskmineHind))
18. R is easy: lin. regression
# Build linear model
fit <- lm(HindKohandatud ~ Tube, data=apartments)
summary(fit)
22. Easy to useHard to use
Limited
Powerful
ggplot2
D3.js
D3 derivates
Excel
GSheets
AI
23. How to reach an audience
● Social media
● Start a blog
○ stat24.ee
○ pungas.ee
● Offer free content
○ Newspapers (tip lines)
○ Guest posts on blogs
● Push to Estonian data science community
○ TODO: FB group? Community blog?
25. Examples
Apartment prices: R + D3.js 18k hits
Salaries of public servants: R + D3.js 38k hits
Study data: R + D3.js 3k hits
Election promise calculator: D3.js 42k hits
Bondora: R
Alcohol deaths: Illustrator
26. News & inspiration
Mailing lists:
Information is Beautiful, Data Science Weekly, Data Elixir
Blogs:
FiveThirtyEight, R-bloggers, Stat24, Mike Bostock