SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
PCI13 Thessaloniki, 19 Sep 2013
Community Structure, Interaction and Evolution
Analysis of Online Social Networks around Real-World
Social Phenomena
Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris
Problem
#2
Online Social Networks (OSNs) are immense!
#3
Motivation
• Social Networks
– Used to be small (Grevy's zebra dataset)
– Easy to organize
• Online Social Networks (Twitter)
– Have an immense amount of data
– Incredibly difficult to organize and extract useful information
• Ways to monitor activity in OSNs:
– Keywords (Produces too much info, doesn’t work when lexical variations are used)
– Newshounds and Persons of Interest (may result in loss of info)
• Proposal to leverage:
– Time
– Communities formulated by users interested in
a specific topic
– The behavior of these communities in time
• Provide the user with info regarding:
– Temporal user activity per topic
– Influential, Stable and Persistent Communities
– Users worth following (possibility of new newshounds)
– Content worth monitoring
#4
Framework overview
Feature
Fusion
Most influential
users and
communities
+
Popular
hashtags
Persistence
Stability
Centrality*
(PageRank)
Community
Size
Evolution
Heatmap
Pre-processsing
(Information
Extraction)
Temporal
Adjacency Matrix
Creation
Interaction Data
Discretization
Community
Evolution Detection
Community
Detection
(Louvain)
Ranking Process
Evolution Detection Process
*Ongoing work
Twitter Data
Mentions and
hashtags in
time
#5
Interaction data discretization
• Community evolution study requires timeslot analysis
• Tweeting activity provides information on whether or not the
users are active as well as if something interesting is
happening (has happened)
• In this framework, the timeslots are created using the local
minima of the overall activity
• Peaks and positive slopes inform us that the users are
interested in some phenomenon or are involved in a
conversation
• Minima and negative slopes show us that the users’ interest is
diminishing
#6
Interaction data discretization example
#7
Community detection & evolution
1
1 2 1 1 3
1 2 1 1 1
2
2 2 2
1 1 1 1
1 1
1 1
2 1
2
1 4 1
1 2
2 2
2
1 1 1
1
8 2 1 1
1 1 1
2 4
1
1 1 2 1
1 1 1
2 1
1
1 1 1 1
4 1
2 1
1 1 1
4
1
1 2 1 1 3
1 1 1 1
2
1 1 2
1 1
1 1 1
2 1
5 1
1 2 2
Timeslot (n-2)
Timeslot (n-1)
Timeslot (n)
Timeslot (n+1)
Louvain Community Detection Method
(V. D. Blondel, J.-L. Guillaume, R.
Lambiotte, and E. Lefebvre. Fast unfolding
of communities in large networks. Journal
of Statistical Mechanics: Theory and
Experiment, 2008(10):P10008 (12pp),
2008.)
n-1 n n+1
T1
T5
T4
T3
T2
C6(n-1)
C1n C1(n+1)C1(n-1)
C2(n-1) C2n C2(n+1)
C4(n-1) C4(n+1)
C5n C5(n+1)
C3n C3(n+1)C3(n-1)
Sequential Adjacency Matrices Evolving Communities
Timeslots [1,…,n-1,n,n+1,…]
Communities C = {C1n,C2n, ...,Ckn}
Time-Evolving Communities Ti
Louvain Community Detection
A popular greedy modularity optimization approach.
The two following steps are repeated iteratively until a maximum of
modularity is attained and a hierarchy of communities is produced:
a) Small community detection by local modularity optimization
b) Aggregation of nodes belonging to the same community and
creation of a network with the communities as nodes
It was selected due to its efficiency regarding:
• Speed
• Accuracy when dealing with ad-hoc networks
• Due to its hierarchical structure it allows to look at communities at
different resolutions
#8
T11 T21 T41 T61 T81 T91
T11 T41 T52 T91
T11 T21 T52 T81 T91
T21 T52 T74 T91
T41 T52 T74 T81 T91
#9
Community evolution detection
C11 C21 C31 C41 C51 C61 C71 C81 C91
C12 C22 C32 C42 C52 C62 C72 C82 C92
C13 C23 C33 C43 C53 C63 C73 C83 C93
C14 C24 C34 C44 C54 C64 C74 C84 C94
C15 C25 C35 C45 C55 C65 C75 C85 C95
Comparing the communities from
each row to communities from
past rows using the Jaccard Index
Community similarity
according to:
• Jaccard Index
• Adaptive threshold
Adaptive threshold:
• Relative to size
• Range: [0.7,0.1]
#10
Single timeslot graph example
Searching through a single
timeslot (i.e. approximately 24
hours) can be time consuming.
Imagine browsing through
months of data!
Indexing is clearly a necessity.
#11
Evolution features, fusion & ranking
Centrality
Persistence
Stability
Community
Evolution
Dynamic
Community
Ranking
Ranked
Communities
(All Users)
Ranked Users in
Communities
based on
Centrality
Content (txt)
from timeslots of
interest
User Interface
• Persistence: overall appearances / total number of timeslots
• Stability: overall consecutive appearances/ total number of timeslots
• PageRank Centrality: a rough estimate of how important a node is by
counting the number and quality of links
Pros and Cons
#12
Dynamic Community and User Ranking
• Advantages
– Saves user time (manually searching for news is extremely time
consuming)
– Enables browsing through the most important information
– Provides a sense of user importance over time (users worth following
for future investigations)
• Disadvantages
– Community Detection and Community Evolution Detection are slow
processes
– No semantic ranking (lack of content consideration) renders the
framework susceptible to error
Framework application example
Application on a dataset extracted from the Twitter OSN.
• Dataset Characteristics:
– Period: 32 days
– Keywords: 40 (English and Greek)
– Unique users: 857K
– Messages: 880K
– Edges: 1.07M
#13
Greek Global
Hashtags Keywords Hashtags Keywords
Michaloliakos nazi
#Xryshaygh Kasidiaris #nazi far right
#GoldenDawn golden dawn #extremeright extreme right
#Kasidiaris xrysh aygh #farright Hitler
illegal immigrants Swastica
Framework application example
• Results
– Total number of communities:
232K
– Final number of communities
(excluding self loops &
communities<3): 89K
– Total evolution steps: 7K
– Total evolving communities: 1.1K
– Number of Timeslots: 28
#14
• Light Shades signify Small communities
• Dark Shades signify Large Communities
Framework application example (results)
Rank 1 2 3 4 5
Community Id 1,122 13,2044 10,404 18,89 22,2
Timeslot
appearance
1,2,3,4,5,6,7,8,9,11,
13
13,15,16,17,18,19,20,
22,23,25
10,11,12,15,16,17,1
8,19
18,19,20,21,22,23,2
5
22,23,24,25,26,27
Size/slot
16,15,8,5,7,28,4,8,9,
8,30
3,4,9,4,6,6,5,4,7,5 6,5,4,4,9,5,3,3
36,137,323,281,64,1
46,139
977,1129,942,946,1
251,2054
Persistence 0.392857 0.357142 0.285714 0.25 0.214285
Stability 0.310344 0.241379 0.241379 0.206896 0.206896
Centrality 0.635401 0.801170 0.817923 0.820052 0.797400
Popular Tags
(ranked)
Indiebooks, bcn,
madrid, andalucía,
españa
keepmovingforward
Israel, ashkenazi,
ptsd, 2rrf
Jamaat, nazi,
shahbag, taliban,
sayeedi
1,01,31,4,2
Topic
Spanish book on
Hitler: El Legado
Pakistani person
named Nazi
Israeli anti-nazi
posts
Associating Jamaat
(Bangladesh) to nazi
Videogame
#15
Framework application example (Greek interest)
Group of interconnected foreign and
Greek communities surrounded by an
abundance of groups and single users.
#16
A Greek community commenting on a
poll that presented the GGD party as
the most popular amongst unemployed
citizens
Future Work
• Enhance community
similarity search
(speedup)
• Framework
enrichment by
incorporating retweets
as a feature
• Introduce to journalists
for constructive
criticism
#17
Mention, Retweet &
Timestamp Information
Extraction
Community
Detection
Community
Evolution
Detection
Community
Size
Total # of
Mentions
Degree of
mentions
Persistence
Stability
Centrality
Could they be
used as a
Ground Truth
Set?
Provide a
base line
Fusion
Most
influential
users and
communities
+
Popular
hashtags
Query
Correction &
Improvement
via Relevance
Feedback?
Twitter Data
Retweets in
time
Conclusions
• A framework for extracting information from
evolving communities in dynamic social networks.
• Significant information can be retrieved by studying
the evolution of communities of OSNs (e.g. Twitter).
• Existence of a large number of dynamic communities
with various evolutionary characteristics.
#18
Thank you!
Questions?
#19
Data and code are available at:
https://github.com/socialsensor/community-evolution-analysis/

Más contenido relacionado

Similar a Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper CorpusMyriam Traub
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4heyramzz
 
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and GraphsSTING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and GraphsJason Riedy
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfBalasundaramSr
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...Ali Ouni
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsDr Muhammad Adnan
 
Block chain as a graph
Block chain as a graphBlock chain as a graph
Block chain as a graphDZee Solutions
 
CS6010 Social Network Analysis Unit III
CS6010 Social Network Analysis   Unit IIICS6010 Social Network Analysis   Unit III
CS6010 Social Network Analysis Unit IIIpkaviya
 
Inferring social media user attributes using language and network information
Inferring social media user attributes using language and network informationInferring social media user attributes using language and network information
Inferring social media user attributes using language and network informationNikolaos Aletras
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
 
Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)tm1966
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User StudyEnrico Daga
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaAlceu Ferraz Costa
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Steve Kramer
 
Simon Forge TAFI workshop
Simon Forge TAFI workshopSimon Forge TAFI workshop
Simon Forge TAFI workshopblogzilla
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis, University of Maryland
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Fabien Gandon
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 

Similar a Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena (20)

Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and GraphsSTING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
Clarkson - Joshua White - Research Proposal Presentation
Clarkson - Joshua White - Research Proposal PresentationClarkson - Joshua White - Research Proposal Presentation
Clarkson - Joshua White - Research Proposal Presentation
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtods
 
Block chain as a graph
Block chain as a graphBlock chain as a graph
Block chain as a graph
 
CS6010 Social Network Analysis Unit III
CS6010 Social Network Analysis   Unit IIICS6010 Social Network Analysis   Unit III
CS6010 Social Network Analysis Unit III
 
Inferring social media user attributes using language and network information
Inferring social media user attributes using language and network informationInferring social media user attributes using language and network information
Inferring social media user attributes using language and network information
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social Media
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
 
Simon Forge TAFI workshop
Simon Forge TAFI workshopSimon Forge TAFI workshop
Simon Forge TAFI workshop
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis,
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Scaling Crisismapping
Scaling CrisismappingScaling Crisismapping
Scaling Crisismapping
 

Más de Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 

Más de Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

  • 1. PCI13 Thessaloniki, 19 Sep 2013 Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris
  • 2. Problem #2 Online Social Networks (OSNs) are immense!
  • 3. #3 Motivation • Social Networks – Used to be small (Grevy's zebra dataset) – Easy to organize • Online Social Networks (Twitter) – Have an immense amount of data – Incredibly difficult to organize and extract useful information • Ways to monitor activity in OSNs: – Keywords (Produces too much info, doesn’t work when lexical variations are used) – Newshounds and Persons of Interest (may result in loss of info) • Proposal to leverage: – Time – Communities formulated by users interested in a specific topic – The behavior of these communities in time • Provide the user with info regarding: – Temporal user activity per topic – Influential, Stable and Persistent Communities – Users worth following (possibility of new newshounds) – Content worth monitoring
  • 4. #4 Framework overview Feature Fusion Most influential users and communities + Popular hashtags Persistence Stability Centrality* (PageRank) Community Size Evolution Heatmap Pre-processsing (Information Extraction) Temporal Adjacency Matrix Creation Interaction Data Discretization Community Evolution Detection Community Detection (Louvain) Ranking Process Evolution Detection Process *Ongoing work Twitter Data Mentions and hashtags in time
  • 5. #5 Interaction data discretization • Community evolution study requires timeslot analysis • Tweeting activity provides information on whether or not the users are active as well as if something interesting is happening (has happened) • In this framework, the timeslots are created using the local minima of the overall activity • Peaks and positive slopes inform us that the users are interested in some phenomenon or are involved in a conversation • Minima and negative slopes show us that the users’ interest is diminishing
  • 7. #7 Community detection & evolution 1 1 2 1 1 3 1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 1 2 1 4 1 1 2 2 2 2 1 1 1 1 8 2 1 1 1 1 1 2 4 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 4 1 2 1 1 1 1 4 1 1 2 1 1 3 1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 5 1 1 2 2 Timeslot (n-2) Timeslot (n-1) Timeslot (n) Timeslot (n+1) Louvain Community Detection Method (V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008 (12pp), 2008.) n-1 n n+1 T1 T5 T4 T3 T2 C6(n-1) C1n C1(n+1)C1(n-1) C2(n-1) C2n C2(n+1) C4(n-1) C4(n+1) C5n C5(n+1) C3n C3(n+1)C3(n-1) Sequential Adjacency Matrices Evolving Communities Timeslots [1,…,n-1,n,n+1,…] Communities C = {C1n,C2n, ...,Ckn} Time-Evolving Communities Ti
  • 8. Louvain Community Detection A popular greedy modularity optimization approach. The two following steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced: a) Small community detection by local modularity optimization b) Aggregation of nodes belonging to the same community and creation of a network with the communities as nodes It was selected due to its efficiency regarding: • Speed • Accuracy when dealing with ad-hoc networks • Due to its hierarchical structure it allows to look at communities at different resolutions #8
  • 9. T11 T21 T41 T61 T81 T91 T11 T41 T52 T91 T11 T21 T52 T81 T91 T21 T52 T74 T91 T41 T52 T74 T81 T91 #9 Community evolution detection C11 C21 C31 C41 C51 C61 C71 C81 C91 C12 C22 C32 C42 C52 C62 C72 C82 C92 C13 C23 C33 C43 C53 C63 C73 C83 C93 C14 C24 C34 C44 C54 C64 C74 C84 C94 C15 C25 C35 C45 C55 C65 C75 C85 C95 Comparing the communities from each row to communities from past rows using the Jaccard Index Community similarity according to: • Jaccard Index • Adaptive threshold Adaptive threshold: • Relative to size • Range: [0.7,0.1]
  • 10. #10 Single timeslot graph example Searching through a single timeslot (i.e. approximately 24 hours) can be time consuming. Imagine browsing through months of data! Indexing is clearly a necessity.
  • 11. #11 Evolution features, fusion & ranking Centrality Persistence Stability Community Evolution Dynamic Community Ranking Ranked Communities (All Users) Ranked Users in Communities based on Centrality Content (txt) from timeslots of interest User Interface • Persistence: overall appearances / total number of timeslots • Stability: overall consecutive appearances/ total number of timeslots • PageRank Centrality: a rough estimate of how important a node is by counting the number and quality of links
  • 12. Pros and Cons #12 Dynamic Community and User Ranking • Advantages – Saves user time (manually searching for news is extremely time consuming) – Enables browsing through the most important information – Provides a sense of user importance over time (users worth following for future investigations) • Disadvantages – Community Detection and Community Evolution Detection are slow processes – No semantic ranking (lack of content consideration) renders the framework susceptible to error
  • 13. Framework application example Application on a dataset extracted from the Twitter OSN. • Dataset Characteristics: – Period: 32 days – Keywords: 40 (English and Greek) – Unique users: 857K – Messages: 880K – Edges: 1.07M #13 Greek Global Hashtags Keywords Hashtags Keywords Michaloliakos nazi #Xryshaygh Kasidiaris #nazi far right #GoldenDawn golden dawn #extremeright extreme right #Kasidiaris xrysh aygh #farright Hitler illegal immigrants Swastica
  • 14. Framework application example • Results – Total number of communities: 232K – Final number of communities (excluding self loops & communities<3): 89K – Total evolution steps: 7K – Total evolving communities: 1.1K – Number of Timeslots: 28 #14 • Light Shades signify Small communities • Dark Shades signify Large Communities
  • 15. Framework application example (results) Rank 1 2 3 4 5 Community Id 1,122 13,2044 10,404 18,89 22,2 Timeslot appearance 1,2,3,4,5,6,7,8,9,11, 13 13,15,16,17,18,19,20, 22,23,25 10,11,12,15,16,17,1 8,19 18,19,20,21,22,23,2 5 22,23,24,25,26,27 Size/slot 16,15,8,5,7,28,4,8,9, 8,30 3,4,9,4,6,6,5,4,7,5 6,5,4,4,9,5,3,3 36,137,323,281,64,1 46,139 977,1129,942,946,1 251,2054 Persistence 0.392857 0.357142 0.285714 0.25 0.214285 Stability 0.310344 0.241379 0.241379 0.206896 0.206896 Centrality 0.635401 0.801170 0.817923 0.820052 0.797400 Popular Tags (ranked) Indiebooks, bcn, madrid, andalucía, españa keepmovingforward Israel, ashkenazi, ptsd, 2rrf Jamaat, nazi, shahbag, taliban, sayeedi 1,01,31,4,2 Topic Spanish book on Hitler: El Legado Pakistani person named Nazi Israeli anti-nazi posts Associating Jamaat (Bangladesh) to nazi Videogame #15
  • 16. Framework application example (Greek interest) Group of interconnected foreign and Greek communities surrounded by an abundance of groups and single users. #16 A Greek community commenting on a poll that presented the GGD party as the most popular amongst unemployed citizens
  • 17. Future Work • Enhance community similarity search (speedup) • Framework enrichment by incorporating retweets as a feature • Introduce to journalists for constructive criticism #17 Mention, Retweet & Timestamp Information Extraction Community Detection Community Evolution Detection Community Size Total # of Mentions Degree of mentions Persistence Stability Centrality Could they be used as a Ground Truth Set? Provide a base line Fusion Most influential users and communities + Popular hashtags Query Correction & Improvement via Relevance Feedback? Twitter Data Retweets in time
  • 18. Conclusions • A framework for extracting information from evolving communities in dynamic social networks. • Significant information can be retrieved by studying the evolution of communities of OSNs (e.g. Twitter). • Existence of a large number of dynamic communities with various evolutionary characteristics. #18
  • 19. Thank you! Questions? #19 Data and code are available at: https://github.com/socialsensor/community-evolution-analysis/