The document provides an overview of a case study analyzing big data from soccer events like the Champions League and World Cup to understand fan engagement. It discusses:
1) The goals of understanding fan communities and passions, and analyzing fan interactions to help brands engage fans.
2) The methodology used including collecting Twitter data around events, establishing keyword/hashtag seeds, and analyzing tweets to understand engagement logics.
3) The process of data collection including modifying scripts, watching events for patterns, deciding what to analyze, and preparing for future events.
4) Plans to match fan survey and ethnographic data with Twitter data to gain insights for brands around campaigns and measuring success.
2. Agenda
What is Big Data?
• Some Definitions
• Mixed Methods Approach
Champion’s League & World Cup Case Study
• Process
• Results and Usage
• Pitfalls and Learnings
Moving Forward
• Data Approach Flow
• Caveats
• Organization and Communication
3. What is Big Data?
So many different definitions… nobody quite
agrees….
… except that it’s definitely a buzzword
4. What is Big Data?
It is just generally agreed upon that it’s messy and complex. This
is an opportunity and challenge for us to innovate.
“an all-encompassing term for any collection of data sets so large and complex
that it becomes difficult to process using on-hand data management tools or
traditional data processing applications.”
“Big data is a buzzword, or catch-phrase, used to describe a massive
volume of both structured and unstructured data that is
so large that it's difficult to process using traditional database and
software techniques. In most enterprise scenarios the data is too big or it moves
too fast or it exceeds current processing capacity. Big data has the
potential to help companies improve operations and
make faster, more intelligent decisions.”
“Volume, Variety, Velocity, Variability, Complexity”
Quotes
from:
h-p://www.forbes.com/sites/gilpress/2014/09/03/12-‐big-‐data-‐definiBons-‐whats-‐yours/2/
h-p://www.webopedia.com/TERM/B/big_data.html
h-p://en.wikipedia.org/wiki/Big_data
5. What We Do Need to Solve Big
Data?
… for leveraging engagement at least.
6. …
for
leveraging
engagement
at
least.
Determine
Right
QuesBons
and
Goals
for
Data
Interdisciplinary
Approach
IteraBve
Refinement
“Combining the what (quantitative) with the why (qualitative) can
be exponentially powerful. It is also critical to our ability to take all our
clickstream data and truly analyze it, to find insights that drive
meaningful website changes that will improve our customers’
experiences.” – Avinash Kaushik
Answer:
Mixed Methods and Innovation
Quote
from:
Web
AnalyBcs
in
One
Hour
a
Day
by
Avinash
Kaukshik
9. Sports Fan and Engagement
Study Overall Goals for HAVAS
• to identify and define communities of sports fans
based around passion points(A)
• to analyze fan interactions with those passions
(B)
• position HAVAS Sports & Entertainment to more
effectively advise brands on how to meaningfully
engage with sports fans by leveraging passion-
based communities.
(C)
10. Big Data Research Objectives
• Discover a mixed
methodology
framework for sports
and entertainment fan
engagement
External
for
Havas
• Justify our fan logic
topology in relation to
Twitter conversations
through natural
language processing
Internal
for Lab
11. Initial Data Collection Steps
1) Modify data collection process to fit live
soccer events using Champion’s league as
a test run
2) Establish methodology in seeding initial
pool of users, keywords, and hashtags
3) Analyze tweets and how they fit into
logics of engagement
4) Establish methodology in how to gain
insight from twitter conversations
12. “Analyzing Big Data is a BIG JOB
with Many People” – Jake
Inputs & Equipment
Keywords,
hashtags, user
clusters file on
txt document
Dedicated
server system
colllecting
information
Engineering
Run and modify
Python script
Register Public
Screening API
Parse for
results
Live Viewing Team
Team to watch game and look for patterns
13. Data Collection Process
Engineering &
Team: Tech and
Data Set-Up
Engineer: Run
Script with Seed
File
Team: Watch Event
for Patterns and
Additional Seeds
Team: Decide Data
to Analyze
Engineer: Parse
Data into User-
Friendly Format
Team: Look at Data
and prepare for
next event
18. Sponsors
Sponsors will often have official hashtags promoted during
sporting events to cross-promote their brand and the sporting
event.
Official
Hashtags
Sponsors
Team
Names
Key
Terms
Key
Players
20. Initial Data Seed Scoping Caveats
• Twitter caps at couple of
thousand tweets per
second on Public API
• Public API received tweets
do not appear to be
affected by location based
factors the way individual
user feeds are
• Twitter chunks these
tweets in mysterious
algorithm it deems
important
• Number of Tweets
scrapped render these
factors nominal in terms of
large-scale user behavior
22. What kind of Tweets or tone in
tweets fit into logics of
engagement?
*Informed by survey and ethnography
Entertainment Immersion
Social
Connection
Identification
Mastery Pride Play Advocacy
23. Operational Process
Plan for World Cup & Modeling with Beacon Capabilities
See how conservations analyzed from a big data perspective fit and build on the
logics of engagement model
Determine what data frameworks worked in capturing useful information
Initial qualitative look at data
27. Big Data
Basic Methods of Analysis
• Text processing of tweets and plotting using algorithms into
agglomerative clusters (aka cool visuals)
• Frequency of terms, associations, and word clouds fall under
here
• Goal: Find texts of what spurred the most conversation
Textual
• A way to visually see social connection data
• Understand forms of bonds and the connections between
individual data points worth exploring
• Goal: Detecting communities (our clusters, brands)
Networks
• Toolkits (such as Hootsuite) that measure “sentiment” using
positive and negative language
• Can be used to see if an initiative performed well
• Goal: Measure success of a campaign at different times
Sentiment
31. • Survey Twitter Handles
– See if their online behavior matches survey logics
– What does the content they’re sharing look like
– Trends by cluster, gender, other data points
• Match Data
– Look for clusters of behavior to events in games
– See popularity of brand campaigns and behavioral response to brand stories
– Gain insight from bursts of activity and real-time marketing
– See what are characteristics of influencers
• Brand Data
– Identify how these strategies were executed in online conversations and responses
– Identify types of interactions/content/other markers around brands on Twitter
– Do influential brands mean consistent users interacting across brands? Why are people
interacting in this way? How can we categorize these interactions according to our logic
clusters?
– Was the content agile?
– See how users responded by the logics to different types of content
– Look for differences in fan response and fan-initiated behavior to the brands
Questions and Hypothesis
32. What We Planned To Do
• Steps
• Define interesting WC fan moments and brand moments
• Examine moments in time and certain brand campaigns
• Investigate possible Natural Language Processing tools
• Formulated Questions
• Timeline
• Created a timeline assigning roles to each person
• Deliverables
• TBD, likely looking at clusters of behavior around brand campaigns.
• Sentiment analysis may tie in here
33. Ethnographic
Report
-What did people
say about the
brand or the
logics they used?
Survey Data
-Under this brand
logic utilized,
what is the
intensity and who
are the clusters?
Big Data
-How did
audiences
respond online to
actions by the
brand?
Approaching with Mixed
Methods
34. Exercise: Group Datasets
Figure out what insight you might be able
to get from each piece of data and how
would you apply mixed methods.
36. The Future of Social Media
Analytics
“We will be moving beyond key-word based
queries into machine-learning algorithms.
Influencers whom I have with with echo
similar ideas about the increasing use and
refine of latent semantic indexing (or some
variant of it) and other machine-learning
algorithms in order to improve social
listening, automatic categorization of
content, and the ability to take action on
data” - Marshall Sponder
39. The Dashboard Build Process
Pulled 250
Retweeted
Tweets with
Verification
from
BigSheets
Coded
Tweets
According to
Logic for
Testing Data
Built
Dictionary
According to
Sample
Tweets,
Ethnography,
Survey
Created
Natural
Language
Processing
and Machine
Learning
Algorithms
Fan
Engagement
Dashboard
Prototype
41. Annenberg Innovation Lab Fan
Engagement Dashboard built through
collaboration and mixed methods
learning.
67% Accuracy in classifying tweets by
Logic of Engagement leading to
actionable insight and business intelligence
for Leveraging Fan Engagement.
42.
43. The Process End-to-End
Collecting and
Managing Data
Data Back Up Data Clean Up Run Models
Gain InsightsRefine Models
Learn Actionable
Insights
Communicate
Insights (Reports,
Infographic
Blueprints)
Create Initial
Dictionary for
Natural Language
Processing
Annotate/Code
Tweets for
Training Data for
Machine Learning
Created
Dashboard
Improve on
Design
45. Moving Forward
Your Challenge
• Your data will be different
client-to-client
• Twitter is just the beginning
• Your will get to be creative
and work on collaborative
cross-functional teams to
dive into the data
• *This will be both rewarding
and potentially difficult
Tasks Ahead
• Begin thinking about
what you can learn from
data to help our sponsors
reach their goals
• Start thinking about how
your fans behave in your
approach to figuring out
what questions to ask the
data
46. Most Basic Steps
Determine Goals Capture Data Curate Data
Merge Datasets
and Bring Together
Methodologies if
Necessary
Additional Data
Processing to
Usable Form
Deliver Insight to
the Client
48. Bumps in the Road Ahead
• Privacy Issues and
Respecting the Fans
• Company layers and
politics – releasing data
from companies is
fraught with back and
forth
• Getting data into a
usable form
• Assumptions were wrong
or have to be redefined
– it’s ok to fail fast – but
be ready to keep moving
• Working in cross-
functional groups
51. Bring it Together
Draw connections between the data sets
and how could they relate to the eight
logics and situational triggers.
“While social media data are always interesting in
themselves (at least, for an analyst), when business
owners are able to combine data and layer them
efficiently, the information will become more useful
and actionable.” – Marshall Sponder