SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Identifying Communities on
Twitter: Time, Topics & Clusters




    Pascal Jürgens (@pascal)
    Dept. of Communication, U of Mainz, Germany


                                                  1
Overview
Overview

Relevance

    
   
   
   
   
   / Why it’s interesting

The Basic Idea	
              	    	   	   	   	   / Why it works

Limitations	
           	   	   	   	   	   	   / When it works

Algorithms		   	   	   	   	   	   / How it works

Evaluations	   	   	   	   	   	   / How to tell whether it works




                                                                    2
What Science are we in,
anyways?
“
The antireductionist catch-phrase, “the whole is more than the
sum of its parts,” takes on increasing significance as new
sciences such as chaos, systems biology, evolutionary
economics, and network theory move beyond reductionism to
explain how complex behavior can arise from large collections
of simpler components.
”

Mitchell, 2009 — Complexity: A Guided Tour



                                                                 3
What Science are we in,
anyways?
 Interdisciplinary territory with distinct influences

 20th century Sociology — small-scale social network analysis

 Econometrics — time-series analysis, predictions & forecasting

 Mass Communication — media effects

 Theoretical Physics — abstract, high-level descriptions of
 networks; large-scale network analysis (Why is this even here?)




                                                                   4
What is Community Detection?
“
Communities are groups of vertices which probably share common
properties and/or play similar roles within the graph.
”

Fortunato & Castellano 2009 — Community Structure in Graphs in the
Encyclopedia of Complexity and Systems Science

An exploratory method for partitioning a network into smaller pieces.
In many ways it is comparable to cluster analysis.




                                                                        5
(Caveat Emptor)
 CD is a complex, fairly new set of statistical methods for
 exploratively building groups from data

 So why not use simpler, better-known methods such as
 clustering?

 By all means, use simple methods!
 (but they do something different)




                                                              6
Relevance
 Networks are a fundamental structure of the world

 There are global properties of networks (diameter &c.)

 There are properties of nodes (centrality &c.)



 However — Networks are almost never homogenous!

 There is a structure hidden within the whole




                                                          7
Group A




          Group B


Group C


                    8
Applications
 Identify separate groups within relevant population for further
 description

 Captures “public sphere” better than aggregates such as
 #hashtags
 (Users who share a #tag might have nothing in common)

 Investigate relationship of communities (mesoscopic graph)

 In general: more accurate, delivers more details




                                                                   9
Terminology
 Graph: A network, consisting of

 nodes (or vertices) such as twitter users
 - with degree = number of connections

 links (or edges) such as relationships via @-messages
 with weight = intensity of links

 Partition: one way to split a network into a set of communities

 (k-) Clique: a set of k completely connected nodes



                                                                   10
The Basic Idea
 Communities: Local structures within a network that differ in their
 structure from the surroundings

 A good starting point: communities are better connected among
 themselves than with other communities

 Opens up two obvious methods:

  Add links between close nodes until some condition is met

  Remove links between distant nodes until some condition is met




                                                                       11
possible Partitions   12
The Edge Betweenness
Algorithm (Girvan / Newman)
  Edge betweenness: the number of shortest paths between any
  two nodes that go through one edge

  High EB: the link is very important to fast information flow

  Low EB: the link can easily be replaced by using another way

  The algorithm simply eliminates the links with the highest EB step
  by step

  An optimal cut can be selected from the sequence of partitions



                                                                       13
small network example   14
small network example — edge betweenness cluster   15
Limitations — Technical
 The number of potential ways to divide a network grows super-
 exponentially with the number of nodes (!)

 Two critical performance parameters of algorithms: runtime (“Big-
 O”-notation) and memory

 Networks up to 100s of nodes and/or edges — usually OK

 Networks up to 10 000s of nodes and/or edges — buy a lot of
 memory (8GB upwards) and prepare to wait

 Bigger networks: Ask a computer scientist


                                                                     16
Limitations — Methodological
 Quality of partitions — algorithms don’t guarantee best results

 Instability of partitions — algorithms can be non-deterministic and
 very sensitive to small changes

 Evaluation / Comparison of partitions is near-impossible

 Sometimes result is not one best but a whole set of partitions

 Nodes can only belong to one community (!)




                                                                       17
large network example — edge betweenness cluster   18
Notable Algorithms
 The Edge Betweenness Algorithm (Girvan / Newman)

 Markov Cluster Algorithm (MCL, van Dongen, this one is in
 gephi)

 Clique Percolation (CPM, Palla et al.)

 Information theoretical Algorithm (Roswall & Bergstrom, does
 hierarchies and works with communities of very different sizes)




                                                                   19
A Word about MCL
 Marked as experimental in gephi and hard to use (clustering panel
 needs to be open before loading dataset), but the only clustering
 algorithm available

 Based on probability of link use - simulates flow through the
 network

 Often sub-stellar results

 Connection probabilities seem an odd predictor for empirical
 connection habits

 DEMO

                                                                     20
Clique Percolation
 One among several new algorithms that address shortcomings

 Intuitive mechanism

 Nodes can be in several communities!

 Works rather well for dense networks!




                                                              21
Clique Percolation
 Idea: Find k-cliques in the network

 Try to “move” the cliques until they reach a bottleneck that they
 can’t fit through

 All the nodes covered by this “trail” are assigned to a community

 Rather easy to implement in software (igraph) plus free
 implementation available (CFinder, cfinder.org)

 DEMO




                                                                     22
Clique Percolation by Example




                                23
Evaluation
 Exploratory methods are notoriously difficult to assess (beyond
 rule-of-thumb judgements).

 Two ways allow rigorous examination:

 Comparison of two partitions

 Comparison of a partition agains a baseline model (zero model)

 Effectively unfeasible for non-mathematicians: Pick a good
 algorithm and treat results with care




                                                                  24
What about user attributes?
 What happens when we use empirical attributes to group users?

 Example of the German General Election 2009: Measured party
 affiliation (wahlgetwitter hashtag +/- convention)

 Turns out, users don’t cluster by party affiliation

 But careful: this approach means measuring with two loose ends

 Clustering baseline needs to be really, really solid




                                                                  25
Takeaway
 Community detection used to be hard but is pretty usable now

 Think about the design and scope of a collected network
 beforehand! (timeframe, directed, size/scope etc.)

 Watch the outliers (Justin Bieber will sink your analysis)

 Choose an algorithm that

  uses directed & weighted links, is understandable, robust

  and one that produces meaningful, simple results!



                                                                26
Thanks!




          27
Literature
 Fortunato, Santo and Castellano, Claudio (2009): Community
 Structure in Graphs. In: Meyers, Robert A. (Ed.): Encyclopedia of
 Complexity and Systems Science. Springer.

 Lancichinetti, Andrea and Fortunato, Santo: Community detection
 algorithms: a comparative analysis. Phys. Review E.

 Mitchell, Melanie (2009): Complexity: A Guided Tour

 Palla, Gergely; Barabási, Albert-László and Vicsek, Tamás ():
 Quantifying social group evolution



                                                                     28

Más contenido relacionado

La actualidad más candente

Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of communityIJCSES Journal
 
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...Gabriela Agustini
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with NetworkxErika Fille Legara
 
Harnessing Collective Intelligence in Personal Learning Environments
Harnessing Collective Intelligence in Personal Learning EnvironmentsHarnessing Collective Intelligence in Personal Learning Environments
Harnessing Collective Intelligence in Personal Learning EnvironmentsMohamed Amine Chatti
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
 
NetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for LearningNetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for LearningMohamed Amine Chatti
 
Model-Driven Mashup Personal Learning Environments
Model-Driven Mashup Personal Learning EnvironmentsModel-Driven Mashup Personal Learning Environments
Model-Driven Mashup Personal Learning EnvironmentsMohamed Amine Chatti
 
Multidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex NetworksMultidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex NetworksLino Possamai
 
Wanted: a larger, different kind of box
Wanted: a larger, different kind of boxWanted: a larger, different kind of box
Wanted: a larger, different kind of boxLina Martinsson Achi
 

La actualidad más candente (16)

16
1616
16
 
17
1717
17
 
Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of community
 
Mvs handout
Mvs handoutMvs handout
Mvs handout
 
Sunbelt 2013 Presentation
Sunbelt 2013 PresentationSunbelt 2013 Presentation
Sunbelt 2013 Presentation
 
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with Networkx
 
Harnessing Collective Intelligence in Personal Learning Environments
Harnessing Collective Intelligence in Personal Learning EnvironmentsHarnessing Collective Intelligence in Personal Learning Environments
Harnessing Collective Intelligence in Personal Learning Environments
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large Networks
 
NetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for LearningNetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for Learning
 
Model-Driven Mashup Personal Learning Environments
Model-Driven Mashup Personal Learning EnvironmentsModel-Driven Mashup Personal Learning Environments
Model-Driven Mashup Personal Learning Environments
 
Multidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex NetworksMultidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex Networks
 
Wanted: a larger, different kind of box
Wanted: a larger, different kind of boxWanted: a larger, different kind of box
Wanted: a larger, different kind of box
 
The End of Privacy Hypothesis
The End of Privacy HypothesisThe End of Privacy Hypothesis
The End of Privacy Hypothesis
 
Mp 064-04
Mp 064-04Mp 064-04
Mp 064-04
 

Similar a Jürgens diata12-communities

Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]sdnumaygmailcom
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsMarkus Strohmaier
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 TutorialAlexander Pico
 
A Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth LevelA Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth LevelEswar Publications
 
community Detection.pptx
community Detection.pptxcommunity Detection.pptx
community Detection.pptxBhuvana97
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structuresMaxim Boiko Savenko
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
 
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS csandit
 
Maps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsMaps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsUmeå University
 
Wanted: a larger, different kind of box
Wanted: a larger, different kind of boxWanted: a larger, different kind of box
Wanted: a larger, different kind of boxLina Martinsson Achi
 
2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-BoxMarc Smith
 
SDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering ProtocolSDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering ProtocolCSCJournals
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detectionroberval mariano
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013ijcsbi
 
THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterDiana Zajac
 

Similar a Jürgens diata12-communities (20)

Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging Systems
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
 
A Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth LevelA Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth Level
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
community Detection.pptx
community Detection.pptxcommunity Detection.pptx
community Detection.pptx
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structures
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
 
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
 
Maps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsMaps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flows
 
Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and ApplicationsNetwork Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
 
Distributed clouds — micro clouds
Distributed clouds — micro cloudsDistributed clouds — micro clouds
Distributed clouds — micro clouds
 
Wanted: a larger, different kind of box
Wanted: a larger, different kind of boxWanted: a larger, different kind of box
Wanted: a larger, different kind of box
 
2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box
 
SDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering ProtocolSDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering Protocol
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013
 
Deep learning and computer vision
Deep learning and computer visionDeep learning and computer vision
Deep learning and computer vision
 
THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 Poster
 

Último

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 

Último (20)

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 

Jürgens diata12-communities

  • 1. Identifying Communities on Twitter: Time, Topics & Clusters Pascal Jürgens (@pascal) Dept. of Communication, U of Mainz, Germany 1
  • 2. Overview Overview Relevance / Why it’s interesting The Basic Idea / Why it works Limitations / When it works Algorithms / How it works Evaluations / How to tell whether it works 2
  • 3. What Science are we in, anyways? “ The antireductionist catch-phrase, “the whole is more than the sum of its parts,” takes on increasing significance as new sciences such as chaos, systems biology, evolutionary economics, and network theory move beyond reductionism to explain how complex behavior can arise from large collections of simpler components. ” Mitchell, 2009 — Complexity: A Guided Tour 3
  • 4. What Science are we in, anyways? Interdisciplinary territory with distinct influences 20th century Sociology — small-scale social network analysis Econometrics — time-series analysis, predictions & forecasting Mass Communication — media effects Theoretical Physics — abstract, high-level descriptions of networks; large-scale network analysis (Why is this even here?) 4
  • 5. What is Community Detection? “ Communities are groups of vertices which probably share common properties and/or play similar roles within the graph. ” Fortunato & Castellano 2009 — Community Structure in Graphs in the Encyclopedia of Complexity and Systems Science An exploratory method for partitioning a network into smaller pieces. In many ways it is comparable to cluster analysis. 5
  • 6. (Caveat Emptor) CD is a complex, fairly new set of statistical methods for exploratively building groups from data So why not use simpler, better-known methods such as clustering? By all means, use simple methods! (but they do something different) 6
  • 7. Relevance Networks are a fundamental structure of the world There are global properties of networks (diameter &c.) There are properties of nodes (centrality &c.) However — Networks are almost never homogenous! There is a structure hidden within the whole 7
  • 8. Group A Group B Group C 8
  • 9. Applications Identify separate groups within relevant population for further description Captures “public sphere” better than aggregates such as #hashtags (Users who share a #tag might have nothing in common) Investigate relationship of communities (mesoscopic graph) In general: more accurate, delivers more details 9
  • 10. Terminology Graph: A network, consisting of nodes (or vertices) such as twitter users - with degree = number of connections links (or edges) such as relationships via @-messages with weight = intensity of links Partition: one way to split a network into a set of communities (k-) Clique: a set of k completely connected nodes 10
  • 11. The Basic Idea Communities: Local structures within a network that differ in their structure from the surroundings A good starting point: communities are better connected among themselves than with other communities Opens up two obvious methods: Add links between close nodes until some condition is met Remove links between distant nodes until some condition is met 11
  • 13. The Edge Betweenness Algorithm (Girvan / Newman) Edge betweenness: the number of shortest paths between any two nodes that go through one edge High EB: the link is very important to fast information flow Low EB: the link can easily be replaced by using another way The algorithm simply eliminates the links with the highest EB step by step An optimal cut can be selected from the sequence of partitions 13
  • 15. small network example — edge betweenness cluster 15
  • 16. Limitations — Technical The number of potential ways to divide a network grows super- exponentially with the number of nodes (!) Two critical performance parameters of algorithms: runtime (“Big- O”-notation) and memory Networks up to 100s of nodes and/or edges — usually OK Networks up to 10 000s of nodes and/or edges — buy a lot of memory (8GB upwards) and prepare to wait Bigger networks: Ask a computer scientist 16
  • 17. Limitations — Methodological Quality of partitions — algorithms don’t guarantee best results Instability of partitions — algorithms can be non-deterministic and very sensitive to small changes Evaluation / Comparison of partitions is near-impossible Sometimes result is not one best but a whole set of partitions Nodes can only belong to one community (!) 17
  • 18. large network example — edge betweenness cluster 18
  • 19. Notable Algorithms The Edge Betweenness Algorithm (Girvan / Newman) Markov Cluster Algorithm (MCL, van Dongen, this one is in gephi) Clique Percolation (CPM, Palla et al.) Information theoretical Algorithm (Roswall & Bergstrom, does hierarchies and works with communities of very different sizes) 19
  • 20. A Word about MCL Marked as experimental in gephi and hard to use (clustering panel needs to be open before loading dataset), but the only clustering algorithm available Based on probability of link use - simulates flow through the network Often sub-stellar results Connection probabilities seem an odd predictor for empirical connection habits DEMO 20
  • 21. Clique Percolation One among several new algorithms that address shortcomings Intuitive mechanism Nodes can be in several communities! Works rather well for dense networks! 21
  • 22. Clique Percolation Idea: Find k-cliques in the network Try to “move” the cliques until they reach a bottleneck that they can’t fit through All the nodes covered by this “trail” are assigned to a community Rather easy to implement in software (igraph) plus free implementation available (CFinder, cfinder.org) DEMO 22
  • 23. Clique Percolation by Example 23
  • 24. Evaluation Exploratory methods are notoriously difficult to assess (beyond rule-of-thumb judgements). Two ways allow rigorous examination: Comparison of two partitions Comparison of a partition agains a baseline model (zero model) Effectively unfeasible for non-mathematicians: Pick a good algorithm and treat results with care 24
  • 25. What about user attributes? What happens when we use empirical attributes to group users? Example of the German General Election 2009: Measured party affiliation (wahlgetwitter hashtag +/- convention) Turns out, users don’t cluster by party affiliation But careful: this approach means measuring with two loose ends Clustering baseline needs to be really, really solid 25
  • 26. Takeaway Community detection used to be hard but is pretty usable now Think about the design and scope of a collected network beforehand! (timeframe, directed, size/scope etc.) Watch the outliers (Justin Bieber will sink your analysis) Choose an algorithm that uses directed & weighted links, is understandable, robust and one that produces meaningful, simple results! 26
  • 27. Thanks! 27
  • 28. Literature Fortunato, Santo and Castellano, Claudio (2009): Community Structure in Graphs. In: Meyers, Robert A. (Ed.): Encyclopedia of Complexity and Systems Science. Springer. Lancichinetti, Andrea and Fortunato, Santo: Community detection algorithms: a comparative analysis. Phys. Review E. Mitchell, Melanie (2009): Complexity: A Guided Tour Palla, Gergely; Barabási, Albert-László and Vicsek, Tamás (): Quantifying social group evolution 28