This document presents a framework for analyzing the structure, interaction, and evolution of online social networks around real-world social phenomena using Twitter data. It describes discretizing interaction data into timeslots, detecting communities using Louvain detection, and identifying evolving communities over time. Key features include influential users/communities, popular hashtags, persistence, stability, and centrality. The framework was applied to a Greek Twitter dataset involving political hashtags, extracting meaningful information about influential discussions. Future work aims to improve similarity search and incorporate retweets.
Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena
1. PCI13 Thessaloniki, 19 Sep 2013
Community Structure, Interaction and Evolution
Analysis of Online Social Networks around Real-World
Social Phenomena
Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris
3. #3
Motivation
• Social Networks
– Used to be small (Grevy's zebra dataset)
– Easy to organize
• Online Social Networks (Twitter)
– Have an immense amount of data
– Incredibly difficult to organize and extract useful information
• Ways to monitor activity in OSNs:
– Keywords (Produces too much info, doesn’t work when lexical variations are used)
– Newshounds and Persons of Interest (may result in loss of info)
• Proposal to leverage:
– Time
– Communities formulated by users interested in
a specific topic
– The behavior of these communities in time
• Provide the user with info regarding:
– Temporal user activity per topic
– Influential, Stable and Persistent Communities
– Users worth following (possibility of new newshounds)
– Content worth monitoring
4. #4
Framework overview
Feature
Fusion
Most influential
users and
communities
+
Popular
hashtags
Persistence
Stability
Centrality*
(PageRank)
Community
Size
Evolution
Heatmap
Pre-processsing
(Information
Extraction)
Temporal
Adjacency Matrix
Creation
Interaction Data
Discretization
Community
Evolution Detection
Community
Detection
(Louvain)
Ranking Process
Evolution Detection Process
*Ongoing work
Twitter Data
Mentions and
hashtags in
time
5. #5
Interaction data discretization
• Community evolution study requires timeslot analysis
• Tweeting activity provides information on whether or not the
users are active as well as if something interesting is
happening (has happened)
• In this framework, the timeslots are created using the local
minima of the overall activity
• Peaks and positive slopes inform us that the users are
interested in some phenomenon or are involved in a
conversation
• Minima and negative slopes show us that the users’ interest is
diminishing
8. Louvain Community Detection
A popular greedy modularity optimization approach.
The two following steps are repeated iteratively until a maximum of
modularity is attained and a hierarchy of communities is produced:
a) Small community detection by local modularity optimization
b) Aggregation of nodes belonging to the same community and
creation of a network with the communities as nodes
It was selected due to its efficiency regarding:
• Speed
• Accuracy when dealing with ad-hoc networks
• Due to its hierarchical structure it allows to look at communities at
different resolutions
#8
9. T11 T21 T41 T61 T81 T91
T11 T41 T52 T91
T11 T21 T52 T81 T91
T21 T52 T74 T91
T41 T52 T74 T81 T91
#9
Community evolution detection
C11 C21 C31 C41 C51 C61 C71 C81 C91
C12 C22 C32 C42 C52 C62 C72 C82 C92
C13 C23 C33 C43 C53 C63 C73 C83 C93
C14 C24 C34 C44 C54 C64 C74 C84 C94
C15 C25 C35 C45 C55 C65 C75 C85 C95
Comparing the communities from
each row to communities from
past rows using the Jaccard Index
Community similarity
according to:
• Jaccard Index
• Adaptive threshold
Adaptive threshold:
• Relative to size
• Range: [0.7,0.1]
10. #10
Single timeslot graph example
Searching through a single
timeslot (i.e. approximately 24
hours) can be time consuming.
Imagine browsing through
months of data!
Indexing is clearly a necessity.
11. #11
Evolution features, fusion & ranking
Centrality
Persistence
Stability
Community
Evolution
Dynamic
Community
Ranking
Ranked
Communities
(All Users)
Ranked Users in
Communities
based on
Centrality
Content (txt)
from timeslots of
interest
User Interface
• Persistence: overall appearances / total number of timeslots
• Stability: overall consecutive appearances/ total number of timeslots
• PageRank Centrality: a rough estimate of how important a node is by
counting the number and quality of links
12. Pros and Cons
#12
Dynamic Community and User Ranking
• Advantages
– Saves user time (manually searching for news is extremely time
consuming)
– Enables browsing through the most important information
– Provides a sense of user importance over time (users worth following
for future investigations)
• Disadvantages
– Community Detection and Community Evolution Detection are slow
processes
– No semantic ranking (lack of content consideration) renders the
framework susceptible to error
13. Framework application example
Application on a dataset extracted from the Twitter OSN.
• Dataset Characteristics:
– Period: 32 days
– Keywords: 40 (English and Greek)
– Unique users: 857K
– Messages: 880K
– Edges: 1.07M
#13
Greek Global
Hashtags Keywords Hashtags Keywords
Michaloliakos nazi
#Xryshaygh Kasidiaris #nazi far right
#GoldenDawn golden dawn #extremeright extreme right
#Kasidiaris xrysh aygh #farright Hitler
illegal immigrants Swastica
14. Framework application example
• Results
– Total number of communities:
232K
– Final number of communities
(excluding self loops &
communities<3): 89K
– Total evolution steps: 7K
– Total evolving communities: 1.1K
– Number of Timeslots: 28
#14
• Light Shades signify Small communities
• Dark Shades signify Large Communities
15. Framework application example (results)
Rank 1 2 3 4 5
Community Id 1,122 13,2044 10,404 18,89 22,2
Timeslot
appearance
1,2,3,4,5,6,7,8,9,11,
13
13,15,16,17,18,19,20,
22,23,25
10,11,12,15,16,17,1
8,19
18,19,20,21,22,23,2
5
22,23,24,25,26,27
Size/slot
16,15,8,5,7,28,4,8,9,
8,30
3,4,9,4,6,6,5,4,7,5 6,5,4,4,9,5,3,3
36,137,323,281,64,1
46,139
977,1129,942,946,1
251,2054
Persistence 0.392857 0.357142 0.285714 0.25 0.214285
Stability 0.310344 0.241379 0.241379 0.206896 0.206896
Centrality 0.635401 0.801170 0.817923 0.820052 0.797400
Popular Tags
(ranked)
Indiebooks, bcn,
madrid, andalucía,
españa
keepmovingforward
Israel, ashkenazi,
ptsd, 2rrf
Jamaat, nazi,
shahbag, taliban,
sayeedi
1,01,31,4,2
Topic
Spanish book on
Hitler: El Legado
Pakistani person
named Nazi
Israeli anti-nazi
posts
Associating Jamaat
(Bangladesh) to nazi
Videogame
#15
16. Framework application example (Greek interest)
Group of interconnected foreign and
Greek communities surrounded by an
abundance of groups and single users.
#16
A Greek community commenting on a
poll that presented the GGD party as
the most popular amongst unemployed
citizens
17. Future Work
• Enhance community
similarity search
(speedup)
• Framework
enrichment by
incorporating retweets
as a feature
• Introduce to journalists
for constructive
criticism
#17
Mention, Retweet &
Timestamp Information
Extraction
Community
Detection
Community
Evolution
Detection
Community
Size
Total # of
Mentions
Degree of
mentions
Persistence
Stability
Centrality
Could they be
used as a
Ground Truth
Set?
Provide a
base line
Fusion
Most
influential
users and
communities
+
Popular
hashtags
Query
Correction &
Improvement
via Relevance
Feedback?
Twitter Data
Retweets in
time
18. Conclusions
• A framework for extracting information from
evolving communities in dynamic social networks.
• Significant information can be retrieved by studying
the evolution of communities of OSNs (e.g. Twitter).
• Existence of a large number of dynamic communities
with various evolutionary characteristics.
#18