2. 01 Introduction of R and Measure
Communities
02 Overview of the Tidyverse
03 Environments and version
control
04 Tidy data
05 Data activation
3. Mark’s #measure #rstats timeline
2006
2007
2010
2012
2015
2016
2018
SEO reporting
Pulling Google
Analytics and Search
Console into Excel
dashboards
Move to Denmark
Move into Google
Analytics consultancy,
start using BigQuery
and Google Cloud
#measure
Open source
Start creating open-source
R packages, blogging at
code.markedmondson.me
Python
Start using Python to
create SEO tools
R
Start using R to create
attribution and
forecasting models on
Google Analytics data
#rstats
Google Developer
Expert
Publish Shiny
applications, become a
GDE for Google
Analytics/Cloud.
Develop R for Google
Cloud resources
including dartistics.com
and googleAnalyticsR
Data Engineering
Creating data pipelines
for machine learning
using Google Cloud, R,
Python
4. #RSTATS COMMUNITY
•R has been working on the process of
data analysis for since 1993
•2 million users estimated in 2012 (5
million users now?)
•10,000+ packages
•@rstudio has 60k followers
•1984 R jobs at www.indeed.nl
5. #MEASURE COMMUNITY
•Since ???? (1993 WebTrends?)
•MeasureCamps all over the world
•@googleanalytics has 1.05M followers on
Twitter
•1770 Google/Adobe Analytics jobs at
indeed.nl
9. THE PIT OF SUCCESS
Q:What is the vision behind the “tidyverse”?
A: My long term goal is to create a pit of success where the default path
leads to a great result.
> HadleyWickham, Quora Q&A, 2016
Where “flow” and magic happens
21. Happy families are all alike; every unhappy family is
unhappy in its own way
— LeoTolstoy
Tidy datasets are all alike but every messy dataset is
messy in its own way
— HadleyWickham
Tidy Data
https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
22. http://garrettgman.github.io/tidying/
Tidy datasets are easy to manipulate, model and
visualize, and have a specific structure: each variable
is a column, each observation is a row, and each
type of observational unit is a table.
23. UNTIDY DATA - FAMILY A
http://garrettgman.github.io/tidying/
24. UNTIDY DATA - FAMILY B
Cases
Population
http://garrettgman.github.io/tidying/
27. TIDY RULES
• Each variable is a column
• Each row is an observation
• One table per dataset
• No totals included in columns
• No data as formatting (e.g. colour of cells…)
• Data types are defined e.g. string, numeric,
float, boolean, date
• Missing values are defined e.g. NA, NULL or 0
28. Data Storage
• Easy access to whoever needs it
• Able to cope with your data volume and velocity
• GDPR compliant
• Can handle different types of data such as video, images
For us on Google Cloud Platform that means: Cloud Storage and BigQuery
33. I’ve spent 100,000 EUR on my
analytics infrastructure.
Where are my insights?
34. Data Activation
Time to action
Business
Impact
Bad Dashboards
Data APIs
Good Dashboards
Analysis
Report
Business
Alerts
35. R visualisation options
Base R plotting RMarkdown Shiny
Quick data exploration HTML,Word, PDF etc. Interactive R code
36. Time to Learn Customisation Reproducability Cost
Data Studio Easy Good Poor Free
PowerBI Medium OK Poor Minimal
Tableau Medium OK Poor Expensive
RMarkdown Medium Excellent High Free
Shiny Hard Excellent High Minimal
Excel Medium Good Terrible Free
41. R TAUGHT ME…
• Stable work environment and version control = Reproducibility
• Tidy data before analysis
• Low friction data analysis encourages “flow”
• Think deeply on final format of data
• Activated data is the ultimate goal
42. Copenhagen
Artillerivej 86
2300 København S
+45 70 20 29 19
Stockholm
Strandvägen 7A
114 56 Stockholm
+45 70 20 29 19
London
43A Southend Road
BR3 1SP. London
+45 70 20 29 19
Oslo
CJ Hambros plass 2
N-0164 Oslo
+45 70 20 29 19
Copenhagen
Artillerivej 86
2300 København S
+45 70 20 29 19
Stockholm
Strandvägen 7A
114 56 Stockholm
+45 70 20 29 19
London
43A Southend Road
BR3 1SP. London
+45 70 20 29 19
Oslo
CJ Hambros plass 2
N-0164 Oslo
+45 70 20 29 19
THANKYOU
Twitter @HoloMarkeD
Blog code.markedmondson.me
Email mark@iihnordic.com