Attention Economics in Social Web Systems

ATTENTION ECONOMICS
IN SOCIAL WEB SYSTEMS
DR MATTHEW ROWE

@MROWEBOT
M.ROWE@LANCASTER.AC.UK
WWW.MATTHEW-ROWE.COM
WWW.LANCS.AC.UK/STAFF/ROWEM/

Digital Futures Seminar – 25th October 2012

Outline
1

¨  Background (About Me)
¨  Preamble:
¤  Social Networks
¤  The Evolution of the Web
¤  Attention Economics

¨  The Nitty-Gritty: Research
¤  Content Attention Patterns
¤  Follower Prediction
¤  Churn Prediction

¨  Summary

Attention Economics in Social Web Systems

From… a small town (not so) far away
2


Studied…
3

¨  2002 – 2006: M.Eng. in Software Engineering at the University of Sheffield
¤  Developed an interest in:
n  Information Extraction
n  Machine Learning
n  Semantic Web
¨  2006 – 2010: Ph.D. in Computer Science at the University of Sheffield
¤  ‘Disambiguating Identity Web References with Social Data’

¤  Researched:
n  Social networks
n  Digital Identity
n  Disambiguation techniques
n  Semantic Web techngologies


Worked…
4

¨  April 2010 – August 2010: Research Associate at the University of
Sheffield
¤  Information Extraction
¤  Linked Data for the Semantic Web
¤  Unsupervised clustering methods for person disambiguation
¨  September 2010 – August 2012: Research Associate at the Knowledge
Media Institute, OU
¤  Social networks and churn
¤  Behaviour modelling
¤  Community evolution
¤  Forecasting and prediction methods


Interested in…
5

¨  1. Data
¤  Semantics – how data is connected together
¤  Social networks – how people are connected together
¤  Digital Identity – how people present themselves
¨  2. Prediction
¤  Forecasting and classification
¤  Disambiguation
¨  3. Machines
¤  Automation of processes
¤  Modelling social systems for machines
¤  Artificial Intelligence


6
Preamble: Social Networks, the Web,
and Attention Economics


What is a social network?
7

¨  A social network consists of:
¤  Nodes: users in the social network
¤  Edges: connections between users
¨  Networks can be built using various mechanisms:
¤  Explicitly:
n  Undirected edge: User A friends user B A B

n  Directed edge: User A followers user B A B
¤  Implicitly:
n  User A replies to user B in a community forum P
n  User A ‘likes’ user B’s content
A B
¨  Properties of social networks can be measured using:
¤  Network-measures:
n  Clustering coefficient: how connected users in the social network are
¤  Nodes-measures:
n  Degree: the number of users connected to a given user
n  Centrality: how central a user is to the network, important for information flow


Social Network Theories
8

¨  Homophily: “birds of a feather flock together”
¤  Nodes in a network tend to group with similar nodes
n  Structural: users who share many common friends are likely to be friends
n  Behavioural: users who exhibit similar behaviour are likely to be friends
n  Congruent with ‘Social Identity’: a user select friends as definition of their intentional identity
¨  Small-world:
¤  Social networks form ‘small worlds’ where two users can be indirectly connected
by a small number of steps
¨  Self-affirmation and self-efficacy:
¤  Users construct their social network to affirm themselves
n  E.g. Action: discuss a problem. Reaction: support is offered from peers
n  E.g. Action: announce successful outcome. Reaction: congratulations from peers
¨  Social Contagion:
¤  Users in a network are influenced by their peers. Influence grows with tie
strength
n  E.g. A buys a product. B sees that A has bought the product. B buys the product.


Social Network Analysis
9

¨  Rooted in sociology
¤  Understanding how people are connected together, their grouping and clustering
¨  Stanley Milgrim: small-world experiment
¤  Forwarded postcards onto direct acquaintances. Found 5.7 degrees.
n  Lead to ‘Six degrees of separation’
¤  Backstrom et al. ‘Four Degrees of Separation’. Web Science 2012.
n  Nodes=731m, edges=69b. Found 3.74 degrees.
¨  Paul Erdos: one of the most widely published mathematicians
¤  Erdos number: the degree of separation between Paul Erdos and you
¤  The Kevin Bacon game: try to connect any actor to Kevin Bacon in 6 steps
¨  Robin Dunbar: formulated that the maximum number of ties = 150
¤  Repeatedly found to be the same for average social network sizes

¨  The explicit ‘socialisation’ of the Web has made social network analysis
possible at large scale…


The Evolution of the Web
10

¨  Web 1.0 – the document web
¤  Communication medium: Bulletin-boards, Email, IM (ICQ)
¤  Documents are connected to one another via hyperlinks
¤  Web presence: restricted to the technologically savvy
¨  Web 2.0 – the social web
¤  Platforms provide APIs and open up of data
¤  Users become central to the web (User-generated content: Wikipedia)
¤  Social networking sites: mediation through social objects
¤  Web presence: blogs (cult of the amateur)
¨  Web 3.0 – the semantic web
¤  Big and open data: Machines are now crunching large-scale datasets
¤  Rise in lightweight semantics: Google Rich Snippets, Facebook Open
Graph
¤  Links have meaning! :Matthew foaf:know :Jon!


Defining Social Web Systems
11

¨  Social web sites are in essence applications:
¤  Offer a range of functionalities and features, aside from just information
¨  Idea: Model social web sites as systems, define system properties:
¤  Actors = users
¤  Processes = social behaviour
¤  Structure = social network
¤  Input/Output = data
¨  Social web systems can evolve and change over time:
¤  User behaviour may impact the behaviour of others
¤  Shared content may spread through the system
¤  Systems are susceptible to:
n  Viruses (trolling, nefarious content)
n  Stimuli (external events, key actors, content injection)

¨  Attention economics is also a salient factor in social web systems…

Attention Awareness Attention Action

12

First, some background on attention…
¨  Attention: the middle ground between ‘awareness’ and
‘action’
¤  It’s what motivates us to respond, read, like, comment, share
¨  Attention is the new currency:
¤  Rise in ‘Attention Culture’:
n  Reflected in media programming: TOWIE, Made in Chelsea,
X-Factor
n  Fame is now pursued and celebrity is a marker of ‘success’
¤  Follows an economic structure:
n  Demand: attention from others (i.e. to my presence, content)
n  Supply: attention to others (i.e. reply to content, share content)
n  Attention is a limited commodity, only so much can be given to others


13

¨  “What counts now is what is most scarce now, namely attention.”
Michael H. Goldhaber, 1997

¨  Rise of the Information Economy has made attention economics pertinent:
¤  Web 3.0 has lead to masses of data being released = Information Overload
¤  “…what is the most precious resource in our new information economy? Certainly not
information, for we are drowning in it. No, what we are short of is the attention to make
sense of that information.”
Richard A. Lanham, 2006

¨  Social web systems are the setting for the Battle for Attention:
¤  Content publishers and creators: want to maximise content exposure
¤  Government policy makers: want feedback to initiated policy/issue discussions
¤  Digital marketing firms: maximise client’s audience, draw attention to client’s product
¨  The battle for attention has created various careers and issues:
¤  The ‘Social Media Professional’
¤  Digital Marketing – social media campaigns
¤  Like Farms – generating artificial attention

Attention Economics in Social Web Systems:
Research Challenges
14

¨  How do social network theories relate to attention economics?

¨  What causes users’ behaviour to change?

¨  Who influences whom?
¤  Can we effectively model social contagion?

¨  How can I maximise attention to my content?

¨  How do social networks grow over time?

¨  Why do people subscribe to me? And then unsubscribe?!


Attention Economics in Social Web Systems:
Research Challenges
15

¨  How do social network theories relate to attention economics?

¨  What causes users’ behaviour to change?

¨  Who influences whom?
¤  Can we effectively model social contagion?





16 The Nitty-Gritty: Research (I)
Content Attention Patterns


Content Attention Patterns
17

¨  Content publishers want people to:
¤  Share their content
¤  Discuss their content
¨  Government policy makers want to:
¤  Enable public engagement
¤  Get policy feedback


¨  Need to:
¤  Model features associated with shared content
¤  Predict which pieces of content will achieve high attention levels
¤  Identify the feature patterns of high attention content
¤  Learn how these patterns differ between social web systems


Content Attention Patterns:
18
Model Derivation
Wish to capture features associated with published content…
¨  User features:
¤  Number of followers, number of followees: social-network based
¤  number of posts, age in the system, post rate: activity-based
¨  Content features:
¤  Post length, referral count, time in day: surface features of the post
¤  Complexity: cumulative entropy of terms in the post
¤  Readability: Gunning Fog index of the post
¤  Informativeness: TF-IDF measure of terms within the post
¤  Polarity: average sentiment of terms in the post
¨  Topic features:
¤  Topic entropy: the concentration of the author across community forums
n  Higher entropy indicates a wider spread of forum activity
¤  Topic Likelihood: the likelihood that a user posts in a specific forum given his post
history
n  Measures the affinity that a user has with a given forum


19
Predicting Attention
¨  Two-stage process:
¤  1. Seed post identification
n  Pick out the posts (seeds) which elicit a response from those that don’t (non-seeds)
n  Identify the features of seed posts: How do they differ from non-seeds?
n  Task: binary classification using supervised classifiers
n  Train on one sample (80%), test on another sample (20%)
n  Class labels: positive (seed) and negative (non-seed)
¤  2. Attention-level prediction
n  Predict which posts will get the post attention (i.e. number of replies)
n  Identify the features of high-attention posts
n  Task: regression using linear regression
n  Train on sample (80%), test on another sample (20%)
n  Predict the number of replies

¨  How do the patterns from (1) and (2) differ between social web systems?
¨  Are there differences in the patterns within the same social web system?

20
Datasets
¨  Boards.ie
¤  Largest community-message board in Ireland
¤  Covers a range of topics and subjects in dedicated forums
¤  Analysed all posts and forums in 2006
¤  Attention measure: number of posts in a thread
¤  1.9m posts, 90k seeds, 21k non-seeds, 30k users
¨  Twitter
¤  Subscription-network social web system
n  Users subscribe (follow) other users, then read their content
¤  Collected a random subset over 24-hour period
¤  Attention measure: length of @reply chain
¤  1.4m posts, 144k seeds, 930k non-seeds, 766k users

¨  High class imbalance in each dataset!
¤  i.e. high proportion of seeds to non-seeds


A:8
Experiment 1 – General Patterns Table II. Results from the classification of seed posts on Twitter
21 P R F1 ROC
User Naive Bayes 0.780 0.859 0.805 0.558
Max Ent 0.749 0.866 0.803 0.566
Began by examining the general patterns in the dataset… J48 0.855 0.866 0.806 0.537
Content Naive Bayes 0.772 0.866 0.803 0.664

¨  1. Identification of Seed Posts
Max Ent 0.801 0.863 0.808 0.777
J48 0.826 0.866 0.810 0.671
All A:9Naive Bayes 0.802 0.746 0.770 0.677
¤  Which model performs best? Max Ent
J48
0.807
0.837
0.864
0.870
0.810
0.831
0.781
0.775
Table II. Results from the classification of seed posts on Twitter,
where content features outperform user features and all features
Twitter Boards.ie
achieves the optimum performance Table III. Results from the classification of seed posts on Boards.ie
P R F1 ROC P R F1 ROC
User Naive Bayes 0.803 0.862 0.809 0.603 User Naive Bayes 0.691 0.767 0.719 0.540
Max Ent 0.823 0.865 0.805 0.612 Max Ent 0.776 0.806 0.722 0.556
J48 0.833 0.866 0.811 0.636 J48 0.778 0.809 0.734 0.582
Content Naive Bayes 0.811 0.850 0.823 0.651 Content Naive Bayes 0.730 0.794 0.740 0.616
Max Ent 0.874 0.870 0.814 0.697 Max Ent 0.758 0.806 0.730 0.678
J48 0.888 0.882 0.841 0.666 J48 0.795 0.822 0.783 0.617
All Naive Bayes 0.833 0.868 0.820 0.680 Focus Naive Bayes 0.710 0.737 0.722 0.588
Max Ent 0.853 0.870 0.820 0.733 Max Ent 0.649 0.805 0.719 0.586
J48 0.869 0.883 0.851 0.726 J48 0.649 0.805 0.719 0.500
User + Content Naive Bayes 0.712 0.772 0.732 0.593
Max Ent 0.767 0.807 0.734 0.671
other users. Combining the features together yields our best performing model J48 0.795 0.821 0.779 0.675
he J48 classifier. User + Focus Naive Bayes 0.699 0.778 0.724 0.585
Max Ent 0.771 0.806 0.722 0.607
. Boards.ie. For solitary feature sets on Boards.ie Table III demonstrates that J48 0.777 0.810 0.742 0.617
t features provide the best features, outperforming user features Content + Focus
and focus fea- Naive Bayes 0.732 0.787 0.746 0.658
Focus features perform poorly on their own, suggesting that such information is Max Ent 0.762 0.807 0.731 0.692
cient for identifying seeds. When we combine the feature sets together we notice J48 0.798 0.823 0.787 0.662
ved performance, for example combining content features with focus All features Naive Bayes
Max Ent
0.724
0.768
0.780
0.808
0.740
0.733
0.637
0.688
es the best performing model in terms of 2 feature sets combined, demonstrat- J48 0.798 0.824 0.792 0.692
e utility of focus features when used in conjunction with content quality metrics.
h case of combining feature sets we observe improvements, and by combining
5.2.3. Twitter vs Boards.ie. Comparing Twitter with Boards.ie we notice similarities be-

J48 0.649 0.805 0.719 0.500
User + Content Naive Bayes 0.712 0.772 0.732 0.593
Max Ent 0.767 0.807 0.734 0.671
J48 0.795 0.821 0.779 0.675
User + Focus Naive Bayes
Max Ent
0.699
0.771
0.778
0.806
0.724
0.722
0.585
0.607
J48 0.777 0.810 0.742 0.617
Experiment 1 – General Patterns
Content + Focus Naive Bayes
Max Ent
0.732
0.762
0.787
0.807
0.746
0.731
0.658
0.692
22 J48 0.798 0.823 0.787 0.662
All Naive Bayes 0.724 0.780 0.740 0.637
Max Ent 0.768 0.808 0.733 0.688
J48 0.798 0.824 0.792 0.692

¤  How do features correlate with seed posts?
Table IV. Reduction in F levels as individual features are
1
dropped from the j48 classiﬁer
Feature Dropped Twitter Boards.ie
- 0.862 0.815
Post Count 0.864 0.815
In-Degree 0.861. 0.811*
Out-Degree 0.858*** 0.811*
User Age 0.863 0.807***
Post Rate 0.863 0.815
Topic Entropy - 0.815
Topic Likelihood - 0.798***
Post Length 0.861 0.810**
Complexity 0.862 0.811**
Readability 0.857*** 0.802***
Referral Count 0.862 0.793***
Time in Day 0.842*** 0.810**
Informativeness 0.861. 0.801***
Polarity 0.860** 0.808***
Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 .

Attention Economics inTime-in-day yielded
the greatest reduction in the F1 level when it was
5.3.1. Twitter. Social Web Systems
removed from the best performing model - J48 using all features. Assessing the box

23

¤  How do features correlate with seed posts?

Twitter

Boards.ie


A:16Experiment 1 – General Patterns
24

We ¨  2.describe our results from the model selection stage before breaking down t
now Prediction of Attention Levels
best performing model for each system and analysing its coefﬁcients.
¤  Which model performs best?
Table V. Averaged nDCG@k levels for different datasets and feature
sets
Twitter Boards.ie SCN Digg
User 0.376 0.646 0.592 0.594
Content 0.790 0.433 0.522 0.647
Focus - 0.587 0.564 0.824
User + Content 0.554 0.547 0.676 0.812
User + Focus - 0.660 0.583 0.559
Content + Focus - 0.756 0.573 0.848
All - 0.687 0.569 0.831
Average 0.573 0.617 0.583 0.731

6.2. Results: Model Selection
6.2.1. Twitter. Table V presents the results from each of the tested systems, we ﬁr
focus on the results obtained for the microblogging platform Twitter where we test
user and content features and their combination together. The results show that co
tent features performed best, far outperforming the use of user features on their ow

and concise manner. In our case we are exploring attention measured through the
length of reply chains and comments attributed to a given piece of content, for other
measures of attention such as retweets on Twitter then user features could play a
greater role.
Looking at the average results obtained from our model selection task, we find
that our method achieves its best performance over the Digg dataset, possibly due
to the skewed dataset towards more popular content. The next highest performance
is achieved on Boards.ie where content and focus features provide the best model for
25
prediction. We achieve poor performance for Twitter when using all features from the
tested models, where the use of user features harms the predictive performance of
¨  2. Prediction of Attention Levels
content features given that accuracy worsens. In this case the content of the tweet
contains vital indicators of the attention that we can expect to yield.
¤  How do features correlate with heightened attention?
Table VI. Summary of coefficients from Linear Regression Models induced from best per-
forming features and their significance levels

Twitter High Attention= Twitter Boards.ie SCN
Boards.ie High Digg
Attention=
•  Shorter posts Post Count - - -5.689E Concentrated topics
•  02
04 -
Out-degree - - -2.520E *** -
•  Denser vocabulary •  02Longer posts
In-degree - - 5.013E *** -
•  Fewer hyperlinks User Age - - 6.665E Wider vocabulary
•  08 -
•  Earlier in the day! Post Rate - - •  Fewer referrals
1.227E 01 -
Topic Entropy - -0.2441 *** • - Negative sentiment
-16.369 **
Topic Likelihood - 60.0807 *** - -33.286 .
Post Length -0.0092 0.0369 *** 2.414E 02 *** 7.131 *
Complexity -1.9664 *** 2.4775 **** 3.610E 01 ** -30.592 ***
Readability 0.0043 ** 0.0024 *** -1.846E 03 -0.018
Referral Count -0.5842 *** -0.1236 ** 2.147E 02 . -
Time in Day -0.0028*** 7.98 5 -2.340E 05 0.012 **
Informativeness 0.0035 -0.0093 **** -4.773E 03 *** -1.146 *
Polarity 0.0309 -4.0863 *** -1.094E 01 -3.464
Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1

6.3. Results: Feature Assessment
Thus far we have identified the best performing model from each of the Social Web

Experiment 2 – Specific Patterns
26

Examining community-specific patterns in Boards.ie. Added additional features
to capture community-dependencies
¨  1. Identification of Seed Posts (Over 9 randomly sampled forums):

¤  267 (Astronomy and Space) = content features alone performs best
TABLE II
F1 SCORE AND M ATTHEWS CORRELATION COEFFICIENT (MCC) FOR DIFFERENT FORUMS WHEN PERFORMING SEED POST IDENTIFICATION . T HE BEST
¤  221 (Spanish) = title features and user features performs best
PERFORMING MODEL FOR EACH FORUM IS MARKED IN BOLD .

forumid User Focus Content Community Title All
MCC F1 MCC F1 MCC F1 MCC F1 MCC F1 MCC F1
10 0.0 0.75 0.0 0.75 0.071 0.76 0.0 0.75 0.0 0.75 0.1 0.766
607 0.332 0.839 0.0 0.802 0.0 0.802 0.0 0.802 0.0 0.802 0.359 0.857
343 0.0 0.769 0.0 0.769 0.093 0.782 0.0 0.769 0.0 0.769 0.148 0.789
267 0.078 0.609 -0.132 0.531 0.242 0.673 0.078 0.609 0.0 0.549 0.181 0.643
865 0.0 0.533 0.0 0.533 0.0 0.533 0.0 0.533 0.0 0.533 0.632 0.815
544 0.0 0.818 0.0 0.818 -0.052 0.809 0.0 0.818 0.0 0.818 0.109 0.828
55 0.0 0.913 0.0 0.913 0.0 0.913 0.0 0.913 0.0 0.913 0.144 0.918
221 0.447 0.625 -0.447 0.25 0.0 0.486 0.0 0.333 0.707 0.829 0.0 0.333
630 0.0 0.678 0.0 0.678 -0.044 0.675 0.0 0.678 0.0 0.678 0.109 0.686

¤  In support communities: new users to the topic = more likely to get replies
and p < 0.01) in order to attract attention. but complex posts which have been authored by newbies are
¤  Specificity of community’s subject has an effect:
Another support and advise oriented community is the com- most likely to catch the attention of this community.
munity around forumWork and The topic of thisvery general: post does not haveof the community around forum 267
n  343 (Golf). Jobs forum is community The main purpose to fit the forum
is a more speciﬁcn  Golf forumof the previous community. between post and community must be minimisedcontent
than the topic is very specific: distance (Astronomy & Space) is to share information and
In this community the content of a post needs to be rather and to engage in discussions. Long posts (coef = 0.083
complex (coef = 2.261 and p < 0.01) and should also not and p < 0.05) which do not contain many novel terms
Attention links (coef = Social Web p < 0.05) in order to attract (informativeness coef = 0.029 and p < 0.05) but are positive
contain Economics in 0.586 and Systems
attention. Further posts which are topically distinct from what in their sentiment (polarity’s coef = 4.556 and p < 0.05)
the Golf community usually talks about (community distance are very likely to attract the attention of this community. The

Experiment 2 – Specific Patterns
27

¨  2. Prediction of Attention Levels
¤  Golf forum (343):
n  Seed post identification = content and community features
TABLE III
n  Prediction ORMALISED levels = focus features
AVERAGED N of attention D ISCOUNTED C UMULATIVE G AIN nDCG@k differences in the patter
VALUES USING A LINEAR REGRESSION MODEL WITH DIFFERENT FEATURE
¤  Satellite forum (55): the forum likelihood t
SETS . A nDCG@k OF 1 INDICATES THAT THE PREDICTED RANKING OF
will generate attention,
n  Seed post identification = all features
POSTS PERFECTLY MATCHES THEIR REAL RANKING . P OSTS ARE RANKED
BY THE NUMBER OF REPLIES THEY GOT. pattern learnt previous
n  Prediction of attention levels = title features only works best.
that an increased numb
Forum User Focus Content Commun’ Title All of the post generating
10 0.599 0.561 0.452 0.516 0.418 0.616 attention pattern, whil
221 0.887 0.954 0.863 0.954 0.88 0.985
267 0.63 0.703 0.773 0.6 0.75 0.685 hyperlinks increases th
343 0.558 0.727 0.612 0.634 0.572 0.636 Our results from th
544 0.5 0.514 0.607 0.684 0.461 0.574
55 0.574 0.42 0.655 0.671 0.73 0.692 show that the factors t
607 0.77 0.632 0.814 0.48 0.686 0.842 around a post tend to
630 0.707 0.459 0.635 0.547 0.485 0.762
865 0.673 0.612 0.85 0.643 0.771 0.796 the length of this discu
around forum 10 (Wor
marks in the title is m
to stimulate lengthy di

28
Summary
¨  Key differences in Content Attention Patterns:
¤  Between social web systems
n  i.e. language complexity
¤  Within communities in the same social web system
n  Purpose and specificity of the community impact attention
¨  Currently exploring:
¤  Content attention patterns across different systems
¤  The relation between content attention patterns and:
n  Topical-specificity of community/network-cluster or group
n  Community purpose

Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities. C Wagner, M Rowe, M
Strohmaier and H Alani. To appear in the proceedings of the Fourth IEEE International Conference on Social Computing.
Amsterdam, The Netherlands. (2012)
What catches your attention? An empirical study of attention patterns in community forums. C Wagner, M Rowe, M
Strohmaier and H Alani. In the proceedings of the International Conference on Weblogs and Social Media. Dublin,
Ireland. (2012).
Anticipating Discussion Activity on Community Forums. M Rowe, S Angeletou and H Alani. The Third IEEE International
Attention Economics in Social Web Systems USA. (2011)
Conference on Social Computing. Boston,

29 The Nitty-Gritty: Research (II)
Follower Prediction


Follower Prediction
30

¨  Digital marketing firms want to:
¨  Maximise a client’s audience
¨  Draw attention to client’s product
¨  Maintain the reputation of their clients
¨  Content publishers want to:
¨  Ensure as many people view their content as possible

¨  Why do people subscribe to me?

¨  Need to:
¤  Profile users, how they behave and the content they share
¤  Predict who will follow whom in a social network
¤  Identify how people differ in their decision to follow others
¤  Understand how follow patterns differ between social web systems

Follower Prediction:
31
Task Formulation
¨  Formulating the problem:
¨  User A is given a set of recommendations of who to follow: R(A)
¨  Given R(A), which users will A actually follow?
¨  Goal: learn a function f which when given A and R(A) can accurately predict follower
decisions.
¨  Model this problem as a binary classification task:
¨  Predict whether A will follow B (positive), or not (negative)
¨  Constrains the task to modelling pairwise similarities between A and B across
different follow-factors:
¨  Social = similarities in the social network of A and B
¨  Topical = topical-similarity between the content of A and B
¨  Visibility = visibility of the B’s presence to A

¨  Once pairwise similarities have been measured we can:
¨  1. Learn a general model to predict who will follow whom
¨  2. Learn behaviour specific models to identify divergent follow-patterns


32
Social Factors
¨  The decision of A to follow B might be based on common relationships
between A and B
¤  Based on the principle of ‘homophily’
¨  Implement existing network-topology measures from the literature:
¤  Mutual Followers Count:
n  Overlap of the sets of followers of A and B
¤  Mutual Followees Count:
n  Overlap of the sets of followees of A and B
¤  Mutual Friends Count:
n  Overlap of the sets of friends of A and B
n  Friend of A is both a follower and a followee of A (directed)
¤  Mutual Neighbours Count:
n  Overlap of sets of followees or followers
n  Ignores direction


Topical Factors
33

¨  The decision of user A to follow user B might be based on the content that B has
shared
¨  Implement topical affinity measures based on different models:
¤  Tag Vectors
n  Cosine similarity: between the content tag vectors of A and B
¤  Concept Bags
n  Generated using concept extraction over the content of A and B
n  Disambiguated reference (e.g. “football” = ex:association_football)
n  Cosine similarity: between the concept bags of A and B
n  Jenson-Shannon divergence: between prob’ dist’ of the concept bags of A and B
¤  Concept Graphs
n  Concepts are connected together in a semantic web (Google ‘DBPedia’)
n  db:Lancaster_University dbprop:city db:Lancaster !
n  Measure average d(c1,c2) between the concepts of content from A and B
n  Shortest Path: between c1 and c2 in the concept graph
n  Hitting Time: steps taken by random walker from c1 to c2
n  Commute Time: steps taken by random walker to go from c1 to c2 and back

34
Visibility Factors
¨  The decision of user A to follow user B might be based on user A noticing
user B’s presence
¨  Implement visibility measures that capture presence potential:
¤  Retweet Count:
n  Number of times a followee of A has retweeted content from B
¤  Mention Count:
n  Number of times a followee of A has mentioned B (e.g. @B)

¤  Comment Count:
n  Number of times a followee of A has commented on content from B

¤  Influence-weighted Counts:
n  Weight each of the above by the influence of followee of A on A
n  Measured by the number of times the followee has been replied to by A
n  Related to our earlier theory of ‘Social Contagion’
n  Derive weighted versions of the three measures


35
Dataset + Experimental Setup
¨  Knowledge Discovery and Data Mining (KDD) Cup 2012 Dataset ●
●●
●
¤  Task: follower prediction! Ideal ;) ● ●
● ●
●●
●●
●●

10000
●●
●●
●
●●
●●
●●
●●

Frequency (c(n))
●●
●●
●●
●●
●●
●
●●
●●
●●
●
● ●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
1. General follower prediction
●
●●
●
●●
●
¨  ●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●●
●●
Learn a general followee-decision model (10% of users)
●

100
●
●●
●●
●●
●
●
¤  ●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●●
●
●●
●
●●●
●●
For 10%: built features based on recommendations
●
●●
●●
●●
●
●●
●●
●
¤  ●●
●
●●
●
●●
●●
●●
●●
●
●●
●●
●
●●
●●
●●
●
● ●●●
●● ●
●●
●●
●
●●
●●
●
●
●● ●
●●
●
●
●●●
●●●●
●●●
●●
●
¤  Divided dataset up into: training (80%) and testing (20%) ●●●
●● ●
●●
●●
●●
●●●
●●●
●●●
●●
●●
●
●●●●
●●●●
●●●●
●●●
●●●
●●

1
●●●
●●●
●●●
●●●
●●
●●

¨  2. Binned follower prediction: Topical-focus 1 5 50 500
¤  Learn models of users who differ in their topical focus recommendations (n)
¤  For each user: measured concept-entropy, derived equal-frequency bins, selected users in the lowest
and highest bins
¤  For selected users: built features based on recommendations
¤  Divided 2 datasets (low & high) into: training (80%) and testing (20%)
¨  3. Binned follower prediction: Degree
¤  Learn models of users who differ in their popularity (i.e. follower count)
¤  For each user: measured the degree, derived equal-frequency bins, selected users in the lowest and
highest bins
¤  For selected users: built features based on recommendations
¤  Divided 2 datasets (low & high) into: training (80%) and testing (20%)


36
Accuracy
¨  General Model:
¤  Topical-information provides the best solitary-factor set performance
n  Outperforms existing topological approaches from the state of the art!
¤  All features performs best
¨  Binned Models:
¤  Topical-focus: Low entropy users = topical features, High entropy users = social features
¤  Degree: Low degree and high degree users = topical features
n  Expected high-degree users to be driven by social factors
1.0

Social
Topical
0.8

Visibility
All
0.6
0.4
0.2
0.0

Full Entropy − Low Entropy − High Degree − Low Degree − High


37
Follower-Decision Patterns
¨  Used a logistic regression classifier for experiments:
¤  Provides log-odds ratio diagnostics: explaining how a change in feature value effects
follow-likelihood
¨  Connections are formed when…
¤  In the general model:
n  Users are closers topically (greater tag vector cosine, lower shortest path, hitting time
and commute time)
¤  In the topical-focus model:
n  For low entropy users: same as the general model (greater topical affinity)
n  For high entropy users: users share more mutual followers, reduced tag vector similarity
but reduced hitting time
¤  In the degree model:
n  For low degree users: topical affinity is greater (same as general model)
n  For high degree users: more mutual followees present (i.e. they follow more of the same
people), similar topical effects as the general model


38
Summary
¨  Homophily shown to play a crucial role in users following one another
¤  Existing work used network-topology methods
¤  Presented work utilises the semantic web to gauge topical affinity
¨  Staying on topic will gain you followers within those topics
¤  Highlighted by the low-entropy users
¨  Follow-decisions are based upon user behaviour:
¤  Differences the follow-decisions based on the focus and popularity of users
¨  Current work:
¤  Examining followee-decision patterns on Twitter
n  Overhead of data gathering (as I will explain next)
¤  Can we use the same approach to predict churners?...

Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks. M Rowe, M
Stankovic and H Alani. To appear in the proceedings of the International Semantic Web Conference 2012. Boston, US.
(2012)


39 The Nitty-Gritty: Research (III)
Churn Prediction


Churn Prediction
40

The complement of Follower Prediction…
¨  Same motivation as link prediction, but with an emphasis on maintenance of
subscribers
¨  Digital marketing firms want to:
¨  Draw attention to client’s product
¨  Maintain the audience of their clients


¨  Need to:
¤  Profile users, how they behave and the content they share
¤  Predict who is going to ‘unfollow’ whom
n  i.e. churn from their social network
¤  Identify how people differ in their behaviour and decisions
¤  Understand how churn patterns differ between social web systems


Churn Prediction:
Hypotheses
41

¨  Churn on Twitter:
¤  (Kwak et al., 2012) – More common and followers tags = less likely to churn
¤  (Kwak et al., 2011) – Uninteresting topics, mundane details = more likely to churn
¤  (Kivran-Swaine et al., 2011) – If followee is more important/powerful than follower =
churn
¨  Churn on Facebook:
¤  (Sibona and Walczak, 2011) – Unimportant, inappropriate and polarising posts = churn
¤  (Quercia et al., 2012) – Follower is neurotic and introverted = churn

¨  H1: Churn is topically-driven
¤  Intuition: people follow me for work topics (#semanticweb, #socialnetworks), if I
talk about football then I experience churn!
¨  H2: Topically-focussed users experience churn when they diverge
¨  H3: General-discussion users experience less churn than topically focussed-
users

Churn Prediction:
42
Data Acquisition Problem
¨  Predicting churners and followers on Twitter requires comparing social networks at
consecutive time steps
¨  Topical-Homophily is important for: a) link prediction, b) hypotheses
¨  Therefore we need to capture, at regular time steps for a given collection of seed
users S:
¤  A) Follower network of each user (s) and each follower in the follower network of s
¤  B) Content published by each user (s) and each follower in the follower network of s!
¨  We are also restricted by API limits. L = max number of requests per day:
¤  Twitter: w/o whitelisting; L=1,440. W/ whitelisting; 480k!

¨  Goal: derive S such that we:
¤  Maximise the size of S
¤  Account for growth in the follower network of each member of S
¤  Account for the growth of the follower network of each follower of each member of S
¤  A member of S has no more than 5k followers (upper limit of the API response)
¤  Remain within the API limits! (i.e. requests < L/2 per day)


Attention Economics in Social Web Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to Attention Economics in Social Web Systems

Similar to Attention Economics in Social Web Systems (20)

More from Matthew Rowe

More from Matthew Rowe (20)

Recently uploaded

Recently uploaded (20)

Attention Economics in Social Web Systems