Contrasting approaches to using data to answer business questions from web analysts and data scientists - and how that is changing the web analytics industry
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
How snowplow and data scientists are transforming the web analytics industry (and creating a new event analytics industry)
1. Web analytics is dead!
Long live event analytics
How data scientists and big data tech are killing one
industry and creating another
What role Snowplow plays
2. Web analytics is a big industry
• Spend in the just the US on web analytics software (Adobe
Sitecatalyst, Webtrends, Google Analytics Premium etc.) estimated at $500m
and growing 17 – 20% p.a. in 2011*
• Likely that at that amount is spent again on consulting services related to the
use of web analytics data
• Whole industry of web consultants e.g.:
• Semphonic in the USA (bought by Ernst and Young)
• Logan Tod in the UK (bought by PwC)
• Big 4 accounting firms only buy businesses they can sell into (tens of) thousands of
companies
• Whole ecosystem around web analytics
• “Digital analytics professionals” – it is a career path (retailers, media agencies)
• Events, books, organisations geared towards web analysts
*Source: Quora
http://www.quora.com/Web-Analytics-what-is-the-size-of-the-web-analytics-market
3. Web analytics is an old industry, predating the recent wave in
big data technology
Web analytics
1990
Web is born
1993
Big data
Log file based web analytics
1996
1997
Javascript tagging
…
2004
publishes MapReduce paper
2006
Hadoop project split out of Nutch
2008
Facebook develops Hive
2010
publishes Dremel paper
2011
open sources Storm
4. Two problems with web analytics, that stem from the fact web
analytics came of age in the 1990s
The web was static, hyperlinked documents
Tech to handle massive data sets was
prohibitively expensive
• The entities and events that web analytics
programmes understand is limited
• Web analytics programmes aggregate raw data
to reduce data volumes
• Page views, link clicks, transactions, goals,
sessions, visitors
• This requires specifying in advance how data
can be analysed, so that the data can be ‘precut’
Hard to model the rich interactions in
today’s interactive webapps
Web analytics reporting is very inflexible
5. In particular, web analytics insistence on aggregating data is an
anathema to data scientists
Data scientist approach
Give me the data and I’ll figure out
how to answer the question
Web analytics approach
You can’t get your answer from one of
our pre-canned reports? Have a go
with our “advanced report-builder”
What if I want to: build a model? Understand underlying causality? Use the data in
my web application? Dynamically optimize spend / content?
6. We built Snowplow to address the two weaknesses in the web
analytics approach
Describe web events in much richer
grammar and vocabulary
Liberate your data
• Where you store your data has a big
impact on what types of analyses you can
quickly run on it
7. Snowplow is an event data collection and warehousing platform
Snowplow data pipeline
Website / webapp
Amazon
S3
Mobile apps
Other applications
(e.g. on games
consoles, connected
TVs, desktops, connected
devices)
Collect
Transform
and
enrich
Amazon
Redshift /
PostgreSQL
Other
(Neo4J, Big
Query…)
Snowplow delivers your
complete, granular event data in
your own data warehouse(s), so
you can plugin any tool to analyse
it
8. Snowplow is composed of a set of loosely coupled
subsystems, architected to be robust and scalable
1. Trackers
A
Generate event
data
Examples:
• Javascript
tracker
• Ruby / Lua /
No-JS /
Arduino
tracker
2. Collectors
B
Receive data
from trackers
and put it in a
queue
Examples:
• Cloudfront
collector
• Clojure
collector for
Amazon EB
3. Enrich
C
Clean and
enrich raw data
Built on
Scalding /
Cascading /
Hadoop and
powered by
Amazon EMR
4. Storage
D
5. Analytics
Store data
ready for
analysis
Examples:
• Amazon
Redshift
• PostgreSQL
• Amazon S3
A
D
Standardised data protocols
9. Snowplow is open source and cloud-based
• Open source but easy to deploy via integration with Amazon Web Services
(cloud infrastructure)
• Our technology is free!
• Collecting massive quantities of digital event data should be easy and cheap…
• … so that we can focus time and effort on using the data productively
• We charge for Professional Services on top of our platform
• More value in how you use the data, than in collecting / storing it
• Lots of scope to build applications on top of our platform going forwards
11. …use our tech to solve some of their most intractable problems
• What is the impact of different ad campaigns and creative on the way users
behave, subsequently? What is the return on that ad spend?
• How do visitors use social channels (Facebook / Twitter) to interact around video
content? How can we predict which content will “go viral”?
• How do updates to our product change the “stickiness” of our service? ARPU?
Does that vary by customer segment?
12. We believe that event data is one of the most exciting data
sources to work with, today
13. We are only at the beginning of figuring out how to use this
data…
• How do we represent different types of event sequence?
• What makes journeys similar and what makes them different? How can we
cluster them?
• How can we “spot” those events that are predictive of future events? Of
consumer value? Of consumer interest?
• How can we unpick the effects of marketing / digital products and user’s
predisposition to the way sequences of events unfold?
• How best should we model different users at different points on different types
of journeys?
14. We hope people like you will use our tech to do amazing things
with the data!
Questions?
More information
• Snowplow repo: https://github.com/snowplow/snowplow
• Twitter: @SnowPlowData
• Website: http://snowplowanaltyics.com
• My LinkedIn:
• My Twitter: