The aim of this presentation is to:
1. Encourage web data analysts to use R instead of spreadsheets.
2. Help those wanting to learn R to get started.
3. Build interest amongst the R community in developing packages for web analytics.
The presentation briefly discusses what Web Analytics is, why R should be used instead of spreadsheets for web data analysis, ways to learn R, and how to put R to practice in web analytics.
In this presentation Johann shares his experience in creating his first open-source R package, ganalytics, used for accessing Google Analytics data. Reflecting on his journey to date in learning R, Johann gives tips to newcomers in helping them succeed in using R for their day to day work and in creating their own packages. A demonstration of ganalytics is included along with an invitation to the community to get involved in its future development.
The R script used in the demonstration can be located at the following gist: https://gist.github.com/jdeboer/6569077
2. Purpose of this presentation
1. Encourage web data analysts to move to R
and away from Excel!
2. Help those wanting to learn R to get started.
3. Build interest amongst the R community in
developing packages for web analytics.
3. What is web analytics?
Measurement and analysis of (aggregated)
internet data for purposes of optimising
website usage.
6. Web analytics data
Dimensions Metrics
User Custom dimensions, eg.
existing customer flag
Unique users, avg
revenue per user
Device Browser and OS,
isTablet, isMobile
Unique devices, total
visits
Session Traffic source, date and
time of visit, landing
page
Time on site,
pageviews per visit,
goal completions
Hit Page URL, page title,
event type, product
name
Time on page, page
loading time,
transaction amount
7. Metrics
Unique Visitors, New Visits, % New Visits, Visits, Bounces, Bounce Rate, Bounce
Rate, Visit Duration, Avg. Visit Duration, Organic Searches, Impressions, Clicks,
Cost, CPM, CPC, CTR, Cost per Transaction, Cost per Goal Conversion, Cost per
Conversion, RPC, ROI, Margin, Goal 1 Starts, Goal Starts, Goal 1 Completions,
Goal Completions, Goal 1 Value, Goal Value, Per Visit Goal Value, Goal 1
Conversion Rate, Goal Conversion Rate, Goal 1 Abandoned Funnels, Abandoned
Funnels, Goal 1 Abandonment Rate, Total Abandonment Rate, Data Hub Activities,
Page Value, Entrances, Entrances / Pageviews, Pageviews, Pages / Visit, Unique
Pageviews, Time on Page, Avg. Time on Page, Exits, % Exit, Results Pageviews,
Total Unique Searches, Results Pageviews / Search, Visits with Search, % Visits
with Search, Search Depth, Search Depth, Search Refinements, % Search
Refinements, Time after Search, Time after Search, Search Exits, % Search Exits,
Goal 1 Conversion Rate, Goal Conversion Rate, Per Search Goal Value, Page Load
Time (ms), Page Load Sample, Avg. Page Load Time (sec), Domain Lookup Time
(ms), Avg. Domain Lookup Time (sec), Page Download Time (ms), Avg. Page
Download Time (sec), Redirection Time (ms), Avg. Redirection Time (sec), Server
Connection Time (ms), Avg. Server Connection Time (sec), Server Response Time
(ms), Avg. Server Response Time (sec), Speed Metrics Sample, Document
Interactive Time (ms), Avg. Document Interactive Time (sec), Document Content
Loaded Time (ms), Avg. Document Content Loaded Time (sec), DOM Latency
Metrics Sample, Screen Views, Screen Views, Unique Screen Views, Unique
Screen Views, Screens / Session, Screens / Session, Time on Screen, Avg. Time
on Screen, Total Events, Unique Events, Event Value, Avg. Value, Visits with Event,
Events / Visit, Transactions, Ecommerce Conversion Rate, Revenue, Average
Value, Per Visit Value, Shipping, Tax, Total Value, Quantity, Unique Purchases,
Average Price, Product Revenue, Average QTY, Local Revenue, Local Shipping,
Local Tax, Local Product Revenue, Social Actions, Unique Social Actions, Actions
Per Social Visit, User Timing (ms), User Timing Sample, Avg. User Timing (sec),
Exceptions, Exceptions / Screen, Crashes, Crashes / Screen, Custom Metric Value
Dimensions
Visitor Type, Count of Visits, Days Since Last Visit, User Defined Value, Visit
Duration, Referral Path, Full Referrer, Campaign, Source, Medium, Source /
Medium, Keyword, Ad Content, Social Network, Social Source Referral, Ad Group,
Ad Slot, Ad Slot Position, Ad Distribution Network, Query Match Type, Matched
Search Query, Placement Domain, Placement URL, Ad Format, Targeting Type,
Placement Type, Display URL, Destination URL, AdWords Customer ID, AdWords
Campaign ID, AdWords Ad Group ID, AdWords Creative ID, AdWords Criteria ID,
Goal Completion Location, Goal Previous Step - 1, Goal Previous Step - 2, Goal
Previous Step - 3, Browser, Browser Version, Operating System, Operating System
Version, Mobile (Including Tablet), Tablet, Mobile Device Branding, Mobile Device
Model, Mobile Input Selector, Mobile Device Info, Mobile Device Marketing Name,
Device Category, Continent, Sub Continent Region, Country / Territory, Region,
Metro, City, Latitude, Longitude, Network Domain, Service Provider, Flash Version,
Java Support, Language, Screen Colors, Screen Resolution, Endorsing URL,
Display Name, Social Activity Post, Social Activity Timestamp, Social User Handle,
User Photo URL, User Profile URL, Shared URL, Social Tags Summary, Originating
Social Action, Social Network and Action, Hostname, Page, Page path level 1, Page
path level 2, Page path level 3, Page path level 4, Page Title, Landing Page,
Second Page, Exit Page, Previous Page Path, Next Page Path, Page Depth, Site
Search Status, Search Term, Refined Keyword, Site Search Category, Start Page,
Destination Page, App Installer ID, App Version, App Name, App ID, Screen Name,
Screen Depth, Landing Screen, Exit Screen, Event Category, Event Action, Event
Label, Transaction, Affiliation, Visits to Transaction, Days to Transaction, Product
SKU, Product, Product Category, Currency Code, Social Source, Social Action,
Social Source and Action, Social Entity, Social Type, Timing Category, Timing
Label, Timing Variable, Exception Description, Experiment ID, Variation, Custom
Dimension , Custom Variable (Key 1), Custom Variable (Value 01), Date, Year,
Month of the year, Week of the year, Day of the month, Hour, Month, Week, Day,
Day of week, Day of week name, Hour of Day, Month of Year, Week of Year, ISO
week of the year
265 dimensions and metrics in
Google Analytics and growing!
11. Google Analytics APIs
● Data collection
○ Collection APIs and SDKs
● Data extraction
○ Core Reporting API
■ Metadata API
○ Real-time Reporting API
○ Multi-Channel Funnels Reporting API
16. 7 reasons to use R instead of Excel
1. Captures each step in your analysis
2. Makes it easier to review and fix your
mistakes
3. Easy to reproduce and reuse analyses
4. Join datasets from multiple sources
5. Powerful ways to analyse and visualise your
data
6. Automate retrieval of your data via the Core
Reporting API
7. Saves time!
21. Google Analytics packages for R
● r-google-analytics
○ By Google but stopped working for a long time
● rga
○ By Bror Skardhamar, popular and active
● ganalytics
○ Written by me to create an abstraction from the Core
Reporting API protocol
23. Make querying GA data from R an
easy and interactive experience
● Queries are manipulated on the fly
● Defining filter and segmentation expressions
is easy
● Checks queries for errors before sending
○ corrects them automatically in some cases too!
● Creates a level of abstraction from the Core
Reporting API - easier to extend functionality
24. Query expressions
ga:keyword@=buy
(search traffic keywords containing “buy”)
A single expression comprises of:
● a variable - a dimension or metric
● an operator - e.g. equals, contains, regular
expression, greater than, does not equal, ...
● an operand - a number (metric) or a
character string (dimension)
25. Combining expressions
● Expressions can be joint using OR and AND.
● OR takes precedence over AND always, and
expressions cannot be grouped.
ga:keyword@=buy;ga:city=~^(Sy
dney|Melbourne)$,ga:isTablet=
=Yes
(search traffic keywords containing “buy” AND [city
is [Sydney OR Melbourne] OR via a tablet])
26. Writing expressions with ganalytics
● Filter to pass to core reporting API
ga:keyword@=buy;ga:city=~^(Sydney|Melbourne)$,g
a:isTablet==Yes
● Using ganalytics to write this
GaAnd(
GaExpr('keyword', '@', 'buy'),
GaOr(
GaExpr('city', '~',
'^(Sydney|Melbourne)$'),
GaExpr('isTablet', '=', ‘Yes’)
)
)
28. How does traffic from
desktop, mobile and
tablet users change
throughout the day and
over the week?
Average number of visits per hour and day -
split by desktop, mobile and tablet
31. Package development
● Use RStudio with Git version control
○ Open a free GitHub account
○ Use Roxygen2 for generating your documentation
and NAMESPACE file
○ RStudio integrates with Git, Roxygen2 and RTools
to make the package build process easy
● Hadley Wickham is a great help
○ devtools package - great for installing straight from a
GitHub repository
○ read his online book “Advanced R Programming” -
easy to follow package development steps
32. Learn more...
● Google Analytics: #ganalytics
○ Video lessons: google.com.au/analytics/iq.html
○ Reference: developers.google.com/analytics
● Learn R: #rstats
○ Presciient: presciient.com/courses
○ Code School: tryr.codeschool.com
○ Coursera: coursera.org/course/compdata
○ Intro to R videos by Google: t.co/FQ8DvZEdRW
● Package development: adv-r.had.co.nz
● ganalytics: github.com/jdeboer/ganalytics
● Follow me on Twitter: @johannux
Editor's Notes
Digital Analytics Manager at Open Universities Australia
Focused on web usability and analytics
Background in computer systems
Learning and using R for around 2 years
I'll briefly explain:
what Web Analytics is
why R should be used instead of Excel for web data analysis
ways to learn R, and
how to put R to practice in web analytics
Eugene Dubossarsky
Jeff Leek and Rober Peng of John Hopkins School of Public Health
Hadley Wickham
R In A Nutshell by Joseph Adler
Quick R by Rob Kabacoff
Try R at Code School
R Podcast by Eric Nantz
#rstats