Gave this talk at SSSW'13; The 10th Summer School on Ontology Engineering and the Semantic Web
7 - 13 July, 2013. Cercedilla, Spain. http://sssw.org/2013/
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Social Media Analytics with a pinch of semantics
1. Social Media Analytics
with a pinch of semantics
Harith Alani
http://people.kmi.open.ac.uk/harith/
@halani
harith-alani
@halani
2. Outline of my talk
§ I’ll start talking
§ Then I’ll finish talking
§ You’ll wonder what you’ve learned!
§ You will clap regardless
§ You’ll be convinced you learned nothing
§ You could be right!
§ But you’re wrong of course
§ We go to the bar tonight and forget all about the talk!
3. • Why social media analytics?
– It’s where everyone is!
– Real time information
– Low cost
– Much of it
Survey of 3800 marketers on how they use
social media to grow their business
Social Media for
Businesses
4. § “they can't be forced to use social apps, they must opt-in”
§ “need a detailed understanding of social networks: how people are currently working,
who they work with and what their needs are”
9. Facebook Insights
• Provides measurements
on FB Page
performance
• Provides demographic
data about visitors, and
their engagement with
posts
• “Experiment with
different types of posts
to see what your
audience responds to
best.”
10. Social Media Challenges • Integration
– How to represent and
connect this data?
• Behaviour
– How can we measure and
predict behaviour?
– Which behaviours are good/
bad in which community
type?
• Change
– Can we influence behaviour
change?
• Community Health
– What health signs should we
look for?
– How to predict them?
• Engagement
– How can we maximise
engagement?
• Sentiment
– How to measure it? track it?
– Can we predict sentiment
towards entities (brands,
people, events)?
14. Semantically-Interlinked Online
Communities (SIOC)
• SIOC aims to enable the integration of online community information.
• SIOC provides a Semantic Web ontology for representing rich data from the Social Web
in RDF
sioc-project.org
17. Why monitor behaviour?
§ Understand impact of behaviour on community evolution
§ Forecast community future
§ Learn when intervention might be needed
§ Learn which behaviour should be encouraged or
discouraged
§ Find what could trigger certain behaviours
§ What is the best mix of behaviour to increase
engagement in the community
§ To see which users need more support, which ones
should be confined, and which ones should be promoted
18. Behaviour analysis in Social Media
§ Bottom Up analysis
§ Every community member
is classified into a “role”
§ Unknown roles might be
identified
§ Copes with role changes
over timeini#ators
lurkers
followers
leaders
Structural, social network,
reciprocity, persistence, participation
Feature levels change with the
dynamics of the community
Associations of roles with a collection of
feature-to-level mappings
e.g. in-degree -> high, out-degree -> high
Run rules over each user’s features
and derive the community role composition
21. Clustering for identifying emerging roles
– Map the distribution of each
feature in each cluster to a
level (i.e. low, mid, high)
– Align the mapping patterns
with role labels
00 0.274 0.086 0.909**
74 1.000 -0.059 0.513
86 -0.059 1.000 0.065
9** 0.513 0.065 1.000
Table 2: Mapping of cluster dimensions to levels
Cluster Dispersion Initiation Quality Popularity
0 L M H L
1 L L L L
2 M H L H
3 H H H H
4 L H H M
5,7 H H L H
6 L H M M
8,9 M H H H
10 L H M H
• 3 - Distributed Expert: an expert on a variety of
topics and participates across many different fo-
rums
• 4 - Focussed Expert Initiator: similar to cluster
0 in that this type of user is focussed on certain
topics and is an expert on those, but to a large ex-
tent starts discussions and threads, indicating that
his/her shared content is useful to the community
• 5.7 - Distributed Novice: participates across a
range of forums but is not knowledgeable on any
• 1 - Focussed Novice: focussed within a few
select forums but does not provide good quality
content.
• 2 - Mixed Novice: a novice across a medium
range of topics
• 3 - Distributed Expert: expert on a variety of
topics and participates across many different
forums
….
Mapping of cluster dimensions to levels
22. Correlation of behaviour with community
activity
§ How existence of certain behaviour roles impact activity in an online
community?
23. Online Community Health Analytics
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Churn Rate
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
User Count
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Seeds / Non−seeds Prop
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Clustering Coefficient
FPR
TPR
• Machine learning models to predict
community health based on compositions and
evolution of user behaviour
• Churn rate: proportion of community leavers in a
given time segment.
• User count: number of users who posted at least
once.
• Seeds to Non-seeds ratio: proportion of posts that get
responses to those that don’t
• Cluster coefficient: extent to which the community
forms a clique.
Health
categories
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Seeds / Non−seeds Prop
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Clustering Coefficient
FPR
TPR
False Positive Rate
False Positive RateFalse Positive Rate
False Positive Rate
TruePositiveRateTruePositiveRate
TruePositiveRateTruePositiveRate
The fewer Focused Experts in the
community, the more posts will
received a reply!
There is no “one size fits all” model!
25. Community types
§ Do communities of different types behave differently?
§ Analysed IBM Connections communities to study participation,
activity, and behaviour of users
§ Help us to know what is normal and healthy in a community, and
what is not!
§ Compare exhibited community with what users
say they use the community for
§ Does macro behaviour match micro needs?
26. Community types
Community
Wiki
Page
Blog
Post
Forum
Thread
Wiki
Edit
Blog
Comment
Forum
Reply
Bookmark
Tag
File
§ Data consists of non-
private info on IBM
Connections Intranet
deployment
§ Communities:
§ ID
§ Creation date
§ Members
§ Used applications
(blogs, Wikis, forums)
§ Forums:
§ Discussion threads
§ Comments
§ Dates
§ Authors and
responders
27. Community types
§ Muller, M. (CHI 2012) identified five distinct community
types in IBM Connections:
§ Communities of Practice (CoP): for sharing information and
network
§ Teams: shared goal for a particular project or client
§ Technical Support: support for a specific technology
§ Idea Labs Communities: for focused brainstorming
§ Recreation Communities: recreational activities unrelated to work.
§ Our data consisted of 186 most active
communities:
§ 100 CoPs, 72 Teams, and 14 Techs communities
§ No Ideas of Recreation communities
28. Behaviour in different community types
• Members of Team communities are
more engaged, popular, and initiate
more discussions
• Tech users are mostly active in a few
communities, and don’t initiate of
contribute much
• CoP users disperse their activity
across many communities, and
contribute more
Mean and Standard Deviation (in brackets) of the distribution of micro features within the
different community types
Need an ontology
and inference
engine of
community types
Matthew Rowe, Miriam Fernandez, Harith Alani, Inbal Ronen, Conor Hayes and Marcel Karnstedt: Behaviour Analysis across different
types of Enterprise Online Communities. ACM WebSci 2012
30. 41
%
47
%
8% 3%
1%
[Quality of
content] .
18%
46%
26%
8% 2%
[Number of
members] .
31%
53%
13%
2%
1%
[Diversity of
expertise] .
2% 15
%
30
%30
%
23
%
[Level of
entertainment] .
44%
50%
4% 2%
[Provides accurate answers
to questions].
38%
55%
5% 2%
[Contributes good quality
and well presented content].
21%
60%
14%
5%
[Provides quick answers to
questions].
38%
49%
8% 5%
[Has good expertise in a
domain].
11%
58%
25%
6%
[Contributes content
frequently]
1%
17%
34%30%
18%
[Has many contacts (e.g.
Facebook friends)].
2%
14%
32%31%
21%
[Has many fans (e.g.
Twitter followers, positive
replies to posts)].
Community Value
Community Member Value
Value of community features
Measurements of value and
needs satisfaction
• Assessing user engagement and needs
satisfaction
• Measuring value of individual users to
their communities
• Measuring value of communities to
their members
33. Mapping Maslow’s hierarchy of needs to
social media communities
Self_actualisation:
Altruistic behavior:
helping others, replying
to queries, giving rates
Self-Esteem: Need to be rated
and ranked higher in the
community, promotion of roles
from novice to active member to
expert and moderator
Social Belongingness: Need to be part of the
community, groups, need for interaction and
engagement
Security: Need for privacy, security from identity theft,
security from online abuse, trolling and bullying
Physical: Need for Hardware, Software, Information, Internet access.
34. User groups based on ‘needs’
High Helping Need
• Reply a lot
• Last 17% longer in system
• Contribute to many forums
• High and consistent
engagement
• (Self-actualisation)
High Information Need
• Contribute 70% less
• Don’t care about ‘points’
and ‘reputation’
• Don’t stay for long
• Engage with very few users
• (Basic needs)
High Social Need
• High level of social
interaction
• Moderate reputation scores
• High contribution level
• Low information needs
• (Social belongingness)
Recognition Need
• High ‘reputation’
• Moderate contribution level
• High engagement
• (Self-esteem)
~90% of users at happily staying at the lower levels of the ‘need’s hierarchy’
35. experts to-
be
about to
churn
on right path
to leadership
Behaviour evolution patterns
§ Can we predict future behaviour role?
§ Who’s on the path to become a
leader? an expert? a churner?
§ Which users we want to encourage
staying/leaving?
into becoming an expert - however this development only occurs 4 times
13
10
P28
13
8
P76
1
3
8
10
P103
12
3
P133
1
3
8
10
P155
1
3
6
10
P159
15
7
P190
17
10
P191
1
2
3
10
P193
1
38
10
11
P198
14
10
P201
1
3
10
11
P208
1
3
8
11
P223
1
3
6
10
P283
1
7
8
11
P284
13
6
P302
1
36
8
10
P305
13
10
P343
1
3
8
11
P363
1
38
10
11
P374
13
9
P413
17
8
P415
1
3
8
10
P417
1
2
3
11
P426
1
3
6
10
P427
1
5
7
10
P429
1
5
7
9
P430
1
2
3
8
P434
1
4
9
11
P458
3
8
10
11
P464
14
8
P480
1
35
10
11
P486
12
3
P507
1
2
3
6
P534
1
38
9
11
P537
1
23
6
10
P570
1
4
5
11
P571
7
8
10
11
P586
1
4
9
10
P602
1
3
6
11
P636
1
57
10
11
P654
1
45
9
11
P661
1
78
10
11
P667
1
36
8
10
P685
1
57
8
10
P720
1
2
3
6
P738
1
3
68
9
10
11
P750
1
57
8
10
P772
1
2
3
8
P785
1
3
5
8
9
11
P807
Fig. 6. Progression Patterns where users progress from a novice to an expert role over
time
38. Tweet recipe for generating engagement
§ Identifying seed posts
Top features: Time in Day, Readability,
Out-Degree, Polarity, Informativeness
Top features: Referral Count, Topic
Likelihood, Informativeness,
Readability, User Age
For both datasets:
• Content features play a greater
role than user features
• The combination of all features
provides the best results
• Predicting discussion activity
Top features: Referral Count(-),
Complexity(-)
Top features: URLs(-), Polarity(-), Topic
Likelihood(+), Complexity (+)
For both, a decrease in URLs is
associated with max activity.
Language and terminology are more
significant for Boards.ie.
39. Engagement in different
communities
§ How the results differ:
§ from one community type to another
§ from random datasets to topic-
based ones
§ from related experiments in the
literature
§ Experimented with 7 datasets, from:
§ Boards.ie
§ Twitter
§ SAP
§ Server Fault
§ Facebook
40. Impact of features on engagement
Boards.ie
β
−2
−1
0
1
2
Twitter Random
β
−0.5
0.0
0.5
1.0
Twitter Haiti
−6e+16
−4e+16
−2e+16
0e+00
2e+16
4e+16
6e+16
Twitter Union
β
−0.8
−0.6
−0.4
−0.2
0.0
0.2
Server Fault
β
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
SAP
β
−10
−5
0
5
Facebook
β
−0.1
0.0
0.1
0.2
0.3
0.4
0.5
In−degree
Out−degree
Post Count
Age
Post Rate
Post Length
Referrals Count
Polarity
Complexity
Readability
Readability Fog
Informativeness
EF−IPF
CF−IPF
Entity Entropy
Concept Entropy
Entity Degree Centrality
Concept Degree Centrality
Entity Network Entropy
Concept Network Entropy
Effects of individual social, content, and semantic features on the response variable
(i.e. whether the post seeds engagement or not).
42. Semantic sentiment analysis on social media
§ Offers a fast and cheap access to publics’
feelings towards brands, business, people, etc.
§ Range of features and statistical classifiers
have been used for in recent years
§ Semantics are often neglected
§ We add semantics as additional features
into the training set for sentiment analysis
§ Measure the correlation of the
representative concept with negative/
positive sentiment
43. Sentiment Analysis
hate negative
honest positive
inefficient negative
Love positive
…
Sentiment Lexicon
I hate the iPhone
I really love the iPhone
Lexical-Based Approach
Learn
Model
Apply
Model
Naïve
Bayes,
SVM,
MaxEnt
,
etc.
Training
Set
Test
Set
Model
Machine Learning Approach
44. Semantic Concept Extraction
§ Extract semantic concepts from tweets data and incorporate them
into the supervised classifier training.
OpenCalais and Zemanta. Their experimental results showed that AlchemyAPI
forms best for entity extraction and semantic concept mapping. Our datasets consis
informal tweets, and hence are intrinsically different from those used in [10]. Th
fore we conducted our own evaluation, and randomly selected 500 tweets from the S
corpus and asked 3 evaluators to evaluate the semantic concept extraction outputs g
erated from AlchemyAPI, OpenCalais and Zemanta.
No. of Concepts Entity-Concept Mapping Accuracy (%)
Extraction Tool Extracted Evaluator 1 Evaluator 2 Evaluator 3
AlchemyAPI 108 73.97 73.8 72.8
Zemanta 70 71 71.8 70.4
OpenCalais 65 68 69.1 68.7
Table 2. Evaluation results of AlchemyAPI, Zemanta and OpenCalais.
The assessment of the outputs was based on (1) the correctness of the extrac
entities; and (2) the correctness of the entity-concept mappings. The evaluation res
presented in Table 2 show that AlchemyAPI extracted the most number of conc
and it also has the highest entity-concept mapping accuracy compared to OpenCa
and Zematna. As such, we chose AlchemyAPI to extract the semantic concepts f
our three datasets. Table 3 lists the total number of entities extracted and the numbe
semantic concepts mapped against them for each dataset.
STS HCR OMD
No. of Entities 15139 723 1194
No. of Concepts 29 17 14
Table 3. Entity/concept extraction statistics of STS, OMD and HCR using AlchemyAPI.
45. Likely sentiment for a concept
§ Semantic concepts
can help determining
sentiment even when
no good lexical clues
are present
46. Impact of adding semantic features
§ Incorporating semantics increases accuracy by 6.5% for negative
sentiment, and 4.8% for positive sentiment
§ F = 75.95%, with 77.18% Precision and 75.33% Recall
§ Using baselines of unigrams and part-of-speech features
§ More to-dos:
§ Semantic Concepts Extraction: Explore more fine-grained approach
for the entity extraction and the entity-concept mapping
§ Selective Method: Interpolate semantic concepts based on their
contribution to the classification performance
Saif, Hassan; He, Yulan and Alani, Harith (2012). Semantic sentiment analysis of twitter. In: The 11th International Semantic Web
Conference (ISWC 2012), 11-15 November 2012, Boston, MA, USA
48. OUSocials
§ Many FB groups exist for students of OU
courses
§ Created and used by students to discuss and
share opinions on courses and get support
Behaviour
Analysis
Sen#ment
Analysis
Topic
Analysis
Course
tutors
Real
#me
monitoring
• How
are
opinion
and
sen#ment
towards
a
course
evolving?
• Who’s
providing
posi#ve/
nega#ve
support?
• What
topics
are
emerging?
How
they
change
over#me?
• Do
students
get
the
answers
and
support
they
need?
49. Analytics over FB groups
§ Compare findings to
course performance,
and student
performance
51. Problem Summary
• Fragmented digital selves don’t support social learning
and individual empowerment
• Need to enable:
– Digital empowerment
– Improved understanding and social cohesion
– Informed decision making (for individuals)
– Informed policy making (for organisations)
– Facilitating creative participation
– Co-curating of digital personhoods
53. Changing energy consumption behaviour
A Decarbonisation Platform for Citizen
Empowerment and Translating Collective
Awareness into Behavioural Change
August 2012
57. Thanks to ..
Matthew Rowe
(now at Uni Lancaster)
Sofia Angeletou
(now at BBC)
Gregoire BurelMiriam Fernandez Smitashree ChoudhuryHassan Saif
58. Papers http://oro.open.ac.uk/view/person/ha2294.html
§ Rowe, Matthew; Fernandez, Miriam; Angeletou, Sofia and Alani, Harith (2012). Community analysis through semantic rules and role composition
derivation. Journal of Web Semantics, 18(1)
§ Rowe, Matthew; Fernandez, Miriam; Alani, Harith; Ronen, Inbal ; Hayes, Conor and Karnstedt, Marcel (2012). Behaviour analysis across
different types of Enterprise Online Communities. In: ACM web Science Conference 2012 (WebSci12), 22-24 June 2012, Evanston, U.S.A.
§ Rowe, Matthew; Stankovic, Milan and Alani, Harith (2012). Who will follow whom? Exploiting semantics for link prediction in attention-information
networks. In: 11th International Semantic Web Conference (ISWC 2012), 11-15 November 2012, Boston, USA
§ Rowe, Matthew and Alani, Harith (2012). What makes communities tick? Community health analysis using role compositions. In: 4th IEEE
International Conference on Social Computing, 3-6 September 2012, Amsterdam, The Netherlands
§ Wagner, Claudia ; Rowe, Matthew; Strohmaier, Markus and Alani, Harith (2012). Ignorance isn't bliss: an empirical analysis of attention patterns
in online communities. In: 4th IEEE International Conference on Social Computing, 3-6 September 2012, Amsterdam, The Netherlands
§ Saif, Hassan; He, Yulan and Alani, Harith (2012). Semantic sentiment analysis of twitter. In: The 11th International Semantic Web Conference
(ISWC 2012), 11-15 November 2012, Boston, MA, USA.
§ Rowe, Matthew; Angeletou, Sofia and Alani, Harith (2011). Predicting discussions on the social semantic web. In: 8th Extended Semantic Web
Conference (ESWC 2011), 29 May - 2 June 2011, Heraklion, Greece.
§ Rowe, Matthew; Angeletou, Sofia and Alani, Harith (2011). Anticipating discussion activity on community forums. In: Third IEEE International
Conference on Social Computing (SocialCom2011) , 9-11 October 2011, Boston, MA, USA.
§ Angeletou, Sofia; Rowe, Matthew and Alani, Harith (2011). Modelling and analysis of user behaviour in online communities. In: 10th International
Semantic Web Conference (ISWC 2011), 23 - 27 Oct 2010, Bonn, Germany.
§ Karnstedt, Marcel ; Rowe, Matthew; Chan, Jeff ; Alani, Harith and Hayes, Conor (2011). The Effect of User Features on Churn in Social
Networks. In: ACM Web Science Conference 2011 (WebSci2011), 14 - 17 June 2011, Koblenz, Germany.