SlideShare a Scribd company logo
1 of 21
Download to read offline
THE SEMANTIC
EVOLUTION OF
ONLINE COMMUNITIES
MATTHEW ROWE1 AND MARKUS STROHMAIER2
1. LANCASTER UNIVERSITY, LANCASTER, UK
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
2. UNIVERSITY OF KOBLENZ AND GESIS, COLOGNE, GERMANY
@MSTROHM | MARKUS.STROHMAIER@GESIS.ORG
World Wide Web Conference 2014
Seoul, South Korea
Studies of Online Community Evolution
The Semantic Evolution of Online Communities
1
¨  Prior work has examined online community
development based on:
¤  Social network properties
n  (Gong et al., 2012), (Mislove et al., 2007)
¤  Social group formation
n  (Backstrom et al., 2006)
¤  Lexical term usage and uptake
n  (Danescu et al., 2013)
Organic fixie Pitchfork, fingerstache
fashion axe 8-bit ethical. Neutra
shabby chic brunch, mustache vegan
twee typewriter dreamcatcher try-hard
organic church-key!
subsequent interpretation, of churne
nities. Churners present a serious
managers and hosts as the leaving of
a detrimental effect on the communi
a question-answering community ca
unanswered queries).
In this section we define churn
classification task and use the previo
tors of lifecycle trajectories to predi
churner or not. As we confine user
the start of their lifecycle to the end
mined from this period to characteri
We define churners as any user who
before the final 10% of the time w
cutoff points are: 2012-07-09 for Fac
SAP, and 2010-12-23 for ServerFaul
following form: D = {(xi, yi)}, whe
label of the user from one of two
while xi denotes an 11-element R-va
either a Facebook or SAP user, and
= {w1,w2,…,wn}
The Semantic Evolution of Online Communities
2
Work has yet to examine the semantic evolution of online
communities
1.  Understand how semantic concepts emerge
2.  Model the development of semantic structures over time
3.  Examine how communities differ from one another in their evolution
Assessing Semantic Evolution
The Semantic Evolution of Online Communities
3
Define: Community Semantics!
Entities, and their classes, discussed within a
community over a given time-period, and the
structure connecting those concepts!
Time
…
Assessing Semantic Evolution
The Semantic Evolution of Online Communities
4
¨  Assessing community semantics enables:
1.  Characterisation of semantic evolution dynamics
2.  Comparison based on community evolution
3.  Forecasting churn rates from evolution signals
Our Contributions
Define: Community Semantics!
Entities, and their classes, discussed within a
community over a given time-period, and the
structure connecting those concepts!
The Semantic Evolution of Online Communities
Approach: Examining Semantic
Evolution5
Retrieve
Posts and
Extract
Entities
Construct
Time-
delimited
Semantic
Graphs
Inspect
Macro
Evolution
Induce
Community-
Specific
Evolution
Models
Apply
Mined
Semantic
Evolution
Dynamics
We have two flavours:
(i)  concept graphs
(ii)  entity graphs
Applying graph
measures over cumulative
time-sensitive graphs
Inducing logistic
population models for
each community and
graph measure
Using model dynamics as
community motifs to:
(i)  inspect communities
(ii)  predict churn rate
Experiment Dataset: Boards.ie
All posts during 2005-2008
93 Communities: online forums
Extracting Entities from Online
Community Posts
The Semantic Evolution of Online Communities
6
¨  We are provided with a set of post quadruples:
¨  We constrain content to within a given interval [t’,t’’)
¨  Extract each post’s entities within the time interval using TextRazor
¤  No quota limit, and good prior performance (Derczynski et al., 2013)
¨  Result: time-sensitive community-discussed entities (DBPedia URIs)
Retrieve
Posts and
Extract
Entities
specialised topics of discussion over A. Posts are provided
as a set of quadruples <u, s, t, f> 2 P, where user u posted
message s at time t in forum f. A message (s) is com-
posed of terms that we use to build the semantic models
for individual communities. The information discussed in
a community, and thus its semantics, can change and alter
over time, therefore we constrain a community’s model to
specific time snapshots - e.g. t0
! t00
where t0
< t00
- for
this we use the following construct that filters through all
relevant posts’ contents within the allotted time window:
St0
t00
= {s : <u, s, t, f> 2 P, t0
 t < t00
} (1)
Information discussed within online communities can be
represented in terms of its semantics, using information from
either the schema-level (i.e. ontological classes and relations
between them) or the data level (i.e. using entities and how
they are related to one another). For the former we consider
concepts to be classes found within the DBPedia Ontology,
that is: the types of entities that users are discussing (e.g.
people, locations, etc.), while for the latter case we con-
Pa
ide
ite
rea
for
{c,
riv
fro
3.2
A
wh
con
fro
as
- w
ing
pro
NITIES WITH SEMANTIC GRAPHS
For our experiments we used data from the Irish commu-
nity message board Boards.ie.2
This is a general-discussion
community message board that includes a set of hierarchi-
cally nested forums (F) in which posts are made - i.e. fo-
rum A can be a parent of forum B, and thus B contains
specialised topics of discussion over A. Posts are provided
as a set of quadruples <u, s, t, f> 2 P, where user u posted
message s at time t in forum f. A message (s) is com-
posed of terms that we use to build the semantic models
for individual communities. The information discussed in
a community, and thus its semantics, can change and alter
over time, therefore we constrain a community’s model to
specific time snapshots - e.g. t0
! t00
where t0
< t00
- for
this we use the following construct that filters through all
relevant posts’ contents within the allotted time window:
St0
t00
= {s : <u, s, t, f> 2 P, t0
 t < t00
} (1)
In or
the se
that
conne
such
(i.e. w
Path
ident
iterat
reach
forme
{c, p,
rived
from
3.2
An
User u posted content s at time t in forum f
Building Semantic Resource
Graphs
The Semantic Evolution of Online Communities
7
1.  Concept graphs
¤  Nodes: Union of rootpaths from each entity’s class to
owl:Thing class
¤  Edges: Between nodes within the DBPedia Ontology
2.  Entity graphs
¤  Nodes: Entities provided by time-specific post extractions
¤  Edges: Relations between entity pairs within the DBPedia linked
data graph
Construct
Time-
delimited
Semantic
Graphs
e - i.e. fo-
B contains
e provided
er u posted
s) is com-
tic models
iscussed in
e and alter
s model to
< t00
- for
through all
window:
} (1)
ties can be
mation from
d relations
es and how
we consider
Ontology,
ussing (e.g.
se we con-
(e.g. dbpe-
ents, St0
t00
,
such concepts, and derive a fully connected concept graph
(i.e. with no disconnected components) we extract the Root
Path Graph as follows: From each concept (c 2 RC ) we
identify the parent concept (<c rdfs:subClassOf p>) and
iteratively move up the concept graph until the root node is
reached (owl:Thing), thereby returning a set of nodes that
formed the path from c to the root node: rootpath(c) =
{c, p, ...,owl:Thing}. The graph’s vertices are therefore de-
rived by taking the union of all paths to the root returned
from each concept within the seed set:
Vf =
[
c2RC
rootpath(c) (3)
3.2 Entity Graphs
An entity graph (GE) is a type of semantic resource graph
where the vertices (V ) are entities and the set of edges (E)
connecting these entities are relations between them derived
from the Web of Linked Data. We define the entity graph
as GE[f, t0
, t00
] = hVf , Ef i, such that GE[f, t0
, t00
] ⇢ Gentity
- where Gentity denotes the DBPedia entity graph contain-
ing relations between entities at the data level. As we are
provided with a collection of time-delimited entities RE for
a given forum, we query DBPedia for links between entity
pairs and add such edges to the graph where such a link
exists:
nd char-
Google+.
cial net-
oo! An-
node ar-
ng times
ed term
on how
lthough
cial net-
rs evolve
MMU-
HS
commu-
scussion
ierarchi-
i.e. fo-
contains
provided
u posted
is com-
resource URIs, is typed according to one or more classes
from the DBPedia ontology.3
Therefore to construct the set
of concepts that are cited within a given forum f over the
allotted time period we retrieve the classes that each en-
tity is a type of and store these in the following set: RC .
From this set we generate a time-dependent concept graph:
GC [f, t0
, t00
] = hVf , Ef i, such that GC [f, t0
, t00
] ⇢ Gtype -
where Gtype denotes the DBPedia type graph formed from
the class structure of the DBPedia ontology. In this con-
text the set of concepts denotes the seed set and is used to
populate the vertices in the graph and then construct edges
between the vertices based on existing links between the
concepts in the DBPedia type graph (Gtype):
Ef = {(ci, cj) : ci, cj 2 Vf , (ci, cj) 2 Etype} (2)
In order to derive the set of vertices we must consider how
the seed set can be used for this process as it is often the case
that the set is comprised of concepts which are not directly
connected to one another in the concept graph. To connect
such concepts, and derive a fully connected concept graph
(i.e. with no disconnected components) we extract the Root
Path Graph as follows: From each concept (c 2 RC ) we
identify the parent concept (<c rdfs:subClassOf p>) and
iteratively move up the concept graph until the root node is
c models
cussed in
and alter
model to
< t00
- for
rough all
window:
(1)
es can be
tion from
relations
and how
e consider
Ontology,
sing (e.g.
we con-
.g. dbpe-
nts, St0
t00
,
a time pe-
t0
t00
using
of entities
returned
reached (owl:Thing), thereby returning a set of nodes that
formed the path from c to the root node: rootpath(c) =
{c, p, ...,owl:Thing}. The graph’s vertices are therefore de-
rived by taking the union of all paths to the root returned
from each concept within the seed set:
Vf =
[
c2RC
rootpath(c) (3)
3.2 Entity Graphs
An entity graph (GE) is a type of semantic resource graph
where the vertices (V ) are entities and the set of edges (E)
connecting these entities are relations between them derived
from the Web of Linked Data. We define the entity graph
as GE[f, t0
, t00
] = hVf , Ef i, such that GE[f, t0
, t00
] ⇢ Gentity
- where Gentity denotes the DBPedia entity graph contain-
ing relations between entities at the data level. As we are
provided with a collection of time-delimited entities RE for
a given forum, we query DBPedia for links between entity
pairs and add such edges to the graph where such a link
exists:
Ef = {(ri, rj) : ri, rj 2 Vf , (ri, rj) 2 Eentity} (4)
Given this edge construction mechanism we only look for
relations one-hop away in the entity graph, that is: given
Measuring Graph Dynamics
The Semantic Evolution of Online Communities
8
1.  Node Count: size of the graph
2.  Diameter: breadth of the graph
3.  Specialisation Count: class specialisations
4.  Graph Entropy: density of the graph
5.  Clustering Coefficient: cliquishness of the graph
Inspect
Macro
Evolution
Computation of the measures is described within the paper
The Semantic Evolution of Online Communities
9
Macro Evolution
Cumulative Time Interval
Entropy of the
Concept Graph
Showing the mean graph entropy across all
communities and the 95% Confidence Interval
0 20 40 60 80 100 120
150200250300
Timestep
(a) Node Count
0 20 40 60 80 100 120
4.04.55.05.5
Timestep
H(G)
(b) Graph Entropy
0 20 40 60 80 100
150200250
Timestep
Specialisations
(c) Specialisations
: Concept graphs’ evolution based on node counts, graph entropy and spec
ty of the graph grows as more entities are of a given online community based on
Inspect
Macro
Evolution
Inspect
Macro
Evolution
The Semantic Evolution of Online Communities
10
Macro Evolution: Concept Graphs
0 20 40 60 80 100 120
150200250300
Timestep
|V|
(a) Node Count
0 20 40 60 80 100 120
4.04.55.05.5 Timestep
H(G)
(b) Graph Entropy
0 20 40 60 80 100 120
150200250
Timestep
Specialisations
(c) Specialisations
Figure 1: Concept graphs’ evolution based on node counts, graph entropy and specialisations.
ing that the density of the graph grows as more entities are
added and thus more connections are possible between them.
Summary: We can summarise the following salient find-
ings: (i) for concept graphs: node count, specialisation count
and density (graph entropy) tend to converge to limit; (ii)
for entity graphs, the diameter, graph entropy and cluster-
ing coe cient tend to converge to a limit, while the node
count (number of entities) increases linearly; (iii) despite
new entities arriving at a constant linear rate, on average,
of a given online community based on a single graph mea-
sure. We exclude the derivation of the equations from the
paper, but it is su cient to conclude that given |T| time
steps we would have a single equation for each time step
(t 2 T): Rtr 1
+ PtE 1
= 1. We can then solve for the
unknown variables r and E using the QR-decomposition of
a matrix: expressing the lefthand side of the simultaneous
equations as a |T| ⇥ 2 matrix and the righthand side as a
|T|-element vector where each element is 1. We induced
logistic population models for each of the graph measures
Node count, graph entropy, and specialisations converge to a limit
Inspect
Macro
Evolution
The Semantic Evolution of Online Communities
11
Macro Evolution: Entity Graphs
0 20 40 60 80 100 120
050010001500
Timestep
|V|
Linear Model (β=9.932)
(a) Node Count
0 20 40 60 80 100 120
1234567
Timestep
Diameter
(b) Diameter
0 20 40 60 80 100 120
1234
Timestep
H(G)
(c) Graph Entropy
0 20 40 60 80 100 120
0.020.060.100.14
Timestep
CC
(d) Clustering Coe cient
Figure 2: Entity graphs’ evolution based on node counts, diameter, graph entropy, and clustering coe cient.
and entity graphs defined within a single vector for a given
community (f): mf = {m1, m2, . . . , m13}. We began by de-
riving a single matrix M 2 R93⇥13
containing the 93 commu-
nities (rows) under analysis together with their 13 evolution
measures (columns), and performing principal component
analysis over this matrix. The result of this clustering is
shown in Fig. 3 where we have colour-coded the di↵erent
community forums by their hierarchical level in the plat-
form (level 2 = most general, level 4 = most specific). We
found that several of the more general forums appeared as
dynamics of a given community at time step t to predict
the churn rate of community members at time t + 1. We
defined the churn rate of a community as the proportion of
active users during a given time period (i.e. week segment)
that post for the last time. We used the semantic dynamics
motifs from the prior experiment (as listed within Fig. 3)
and also included graph measures at a given time period:
i.e. graph entropy at time t, specialisation count at time t,
etc. We derived these features for every time step for each
community and derived the response variable as the churn
Diameter, graph entropy, and clustering coefficient converge to a limit
Modelling the Semantic Evolution
of Individual Communities
The Semantic Evolution of Online Communities
12
¨  For each community and measure, induce a logistic
population model:
1.  Get the set of measure increase time steps (T)
2.  Derive the proportionate growth rate for each step
3.  Use the set of proportionate growth rates to solve for
unknown community-specific evolution variables (r & E)
T = {a : a 2 [1, 119], m(G1,a+1) > m(G1,a)}
Deriving the set of change time steps for a given com
allows the proportionate growth rate for a given t
(t 2 T) to be derived: Rt = (Pt+1 Pt)/Pt. Th
is equivalent to the following equation which defi
proportionate growth rate Rt in terms of the comm
growth rate (r) and carrying capacity (E), our u
variables: Rt = r(1 Pt/E). Therefore if we mea
proportionate growth rate over the |T| distinct tim
then we can derive, via simultaneous equations, the
rate of the graph and its carrying capacity, the ve
sures that we can use to characterise the semantic e
Induce
Community-
Specific
Evolution
Models
Proportionate
Growth Rate @ t
Population Growth Rate
Population (measure) size @ t
Carrying Capacity
(Equilibrium)
All communities’ models fit with χ2-test at α<0.01
The Semantic Evolution of Online Communities
13
Semantic Evolution… what can we actually
use it for?
The Semantic Evolution of Online Communities
Application: Community Evolution
Analysis14
Apply
Mined
Semantic
Evolution
Dynamics
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20 40 60
−40−20020
PC1
PC2
7
9
10
11 12
18
19
20
21
22
23
24
25
31
34
37
38
47
52
54
55
56
6064
68
82
86
93
99
105
107
108
109
116
120
124
125
126
127
136
137
151
171
177
227
232
237
246
252
259
264
267
269
271
333
343
346
370
382
388
389
392
410
411 443
446
453
464
468
471
474
475
476
478
481
482
483
490
495
503
506 512
514
518
522
529
532
542544
545
547
554
556
●
●
●
Level 2
Level 3
Level 4
E
EG − CSemantic Evolution Motif:
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
●
●
20 40 6022
24
5
31
34
37
54
56
82
93
105
109
177
227
2
264
269
333
392 443
446468
4
3
490
495
6 512
514522
529
542544
554
556
●
●
●
Level 2
Level 3
Level 4 CG − Node Count − Rate
CG − Node Count − Equilbrium
CG − Graph Entropy − Rate
CG − Graph Entropy − Eq
CG − Specialisation Count − Rate
CG − Specialisation − Equilibrium
EG − Node Count − Slope
EG − Diameter − Rate
EG − Diameter − Equilibrium
EG − Entropy − Rate
EG − Entropy − Equilibrium
EG − Clustering Coefficient − Rate
EG − Clustering Coefficient − Equilibrium f7 − After Hours
f227 − Television
f554 − Wanted Motors
10−1
100
101
102
103
he communities based on their semantic motifs (left) where level 4 forums are clustered
lues for the concept graphs (CG) and entity graphs (EG) for the three outlier forums
right).
hat concept and entity graph den-
row linearly (unlike in social net-
erges on a limit, which we charac-
Conference on Advances in Social Networks Analysis
and Mining, 0:724–725, 2012.
[4] Cristian Danescu-Niculescu-Mizil, Robert West, Dan
20 40 60 80 100 120
Timestep
Linear Model (β=9.932)
(a) Node Count
0 20 40 60 80 100 1201234567
Timestep
Diameter
(b) Diameter
0 20 40 60 80 100 1
1234
Timestep
H(G)
(c) Graph Entropy
2: Entity graphs’ evolution based on node counts, diameter, graph entro
ity graphs defined within a single vector for a given
nity (f): mf = {m1, m2, . . . , m13}. We began by de-
single matrix M 2 R93⇥13
containing the 93 commu-
ows) under analysis together with their 13 evolution
es (columns), and performing principal component
over this matrix. The result of this clustering is
dynamics of a given com
the churn rate of commu
defined the churn rate of
active users during a give
that post for the last time
motifs from the prior exp
40 60 80 100 120
Timestep
Linear Model (β=9.932)
Node Count
0 20 40 60 80 100 120
1234
Timestep
Diameter
(b) Diameter
0 20
123
H(G)
(c) G
Entity graphs’ evolution based on node counts, diame
raphs defined within a single vector for a given
(f): mf = {m1, m2, . . . , m13}. We began by de-
le matrix M 2 R93⇥13
containing the 93 commu-
) under analysis together with their 13 evolution
dynami
the chu
defined
active u
Application: Community Evolution
Analysis
The Semantic Evolution of Online Communities
15
Apply
Mined
Semantic
Evolution
Dynamics
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
0 20 40 60
PC1
10
22
24
25
31
34
37
38
54
56
60
68
82
86
93
105
107
109
177
227
232
237
6
259
264
269
271
333
43
346
382
388
389
392411 443
446
453
464
468
471
474
476
478
481
482
483
490
495
3
506 512
514
518
522
529
542544
545
554
556
●
●
●
Level 2
Level 3
Level 4 CG − Node Count − Rate
CG − Node Count − Equilbrium
CG − Graph Entropy − Rate
CG − Graph Entropy − Eq
CG − Specialisation Count − Rate
CG − Specialisation − Equilibrium
EG − Node Count − Slope
EG − Diameter − Rate
EG − Diameter − Equilibrium
EG − Entropy − Rate
EG − Entropy − Equilibrium
EG − Clustering Coefficient − Rate
EG − Clustering Coefficient − Equilibrium f7 − After Hours
f227 − Television
f554 − Wanted Motors
10−1
100
101
102
103
t of the communities based on their semantic motifs (left) where level 4 forums are clustered
el values for the concept graphs (CG) and entity graphs (EG) for the three outlier forums
els (right).
Entity Graph: fastest growth in the random forum
Concept graph: slowest growth in random forum, largest equilibrium
Application: Churn Rate Prediction
The Semantic Evolution of Online Communities
16
¨  Task: forecast community churn rate at t+1
¤  Based on semantic evolution up to t
¨  Trained a ridge regression model
¨  Root Mean Square Error: Predicted vs. Actual churn
rate
Apply
Mined
Semantic
Evolution
Dynamics
Training Test
0 120 150
Training
Test
dynamics of a given community at time step t to predict
the churn rate of community members at time t + 1. We
defined the churn rate of a community as the proportion of
active users during a given time period (i.e. week segment)
that post for the last time. We used the semantic dynamics
motifs from the prior experiment (as listed within Fig. 3)
and also included graph measures at a given time period:
i.e. graph entropy at time t, specialisation count at time t,
etc. We derived these features for every time step for each
community and derived the response variable as the churn
rate at the following time step. We then compiled a train-
ing dataset (up to week 120) and a test dataset (from week
120). Each dataset had the following form: D = {(xi, yi)},
where xi contained a 21-element time-delimited feature vec-
tor for a given community and yi was the churn rate of the
community at the following time step. We trained a ridge
regression model ( ) using Dtrain and applied it to Dtest,
testing the performance of: a) just concept graph features,
b) just entity graph features, and c) all features. An autore-
gressive model was used as the baseline - using the churn
rate at time t as a single predictor variable for the churn
rate at time t + 1. Performance was evaluated using the
Root Mean Square Error (RMSE).
Table 1: Root Mean Square Error when predicting
churn rates using: an Autoregressive model (R2
=
0.341) and a Ridge Regression model using Concept
Application: Churn Rate Prediction
The Semantic Evolution of Online Communities
17
sions
: we
te of
that
t the
con-
size.
aphs:
h but
g to-
ount
After
ums,
ty of
ecific
The
and
neral
ntity
gressive model was used as the baseline - using the churn
rate at time t as a single predictor variable for the churn
rate at time t + 1. Performance was evaluated using the
Root Mean Square Error (RMSE).
Table 1: Root Mean Square Error when predicting
churn rates using: an Autoregressive model (R2
=
0.341) and a Ridge Regression model using Concept
Graph, Entity Graph, and all features.
Baseline Concept Graph Entity Graph All Features
7.310 ⇥10 3
5.315 ⇥10 3
5.301 ⇥10 3
4.941 ⇥10 3
Table 1 presents the results from our prediction our exper-
iment. We found that for all tested models (concept graph,
entity graph, all features) we significantly outperformed the
baseline - tested using the sign test (↵ = 0.001). Entity
graph features outperform concept graphs but not signifi-
cantly, while our best model is the use of all features together
in a single model. These results empirically demonstrate the
Significant reduction in error (Sign test with α=0.001)
Baseline: Autoregressive model with churn
rate @ t as a predictor for t+1
Apply
Mined
Semantic
Evolution
Dynamics
Findings and Conclusions
The Semantic Evolution of Online Communities
18
¨  Semantic graphs of online communities do not grow linearly:
instead, they evolve to a limit
¤  Unlike in social networks [Lekovec et al., 2008]
¤  Exception: entity graph size
¨  A finite number of topics are discussed within communities
¤  Variation between communities
¨  Our use of logistic population models has enabled:
1.  Characterisation of community-specific evolution dynamics
2.  Community analysis to inspect how communities evolved
differently
3.  Churn prediction based on semantic evolution
Future Work
The Semantic Evolution of Online Communities
19
1.  Expanded to cover other online communities
¤  Question-answering, mined communities
2.  User profiling
¤  Capturing user-specific semantic evolution
In-edge weight
distribution
(ServerFault)
●
●
● ●
●
1 2 3 4 5
0.50.60.70.80.91.01.11.2 k= 5
Lifecycle Stage
H
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
2 4 6 8 10
0.30.40.50.60.7
k= 10
Lifecycle Stage
H
●
●
● ●
●
● ●
● ●
●
● ●
●
●
●
● ●
●
●
●
0.050.150.250.35
H
Non-churners
Churners
Matthew Rowe
@mrowebot
m.rowe@lancaster.ac.uk
http://www.lancaster.ac.uk/staff/rowem/
Markus Strohmaier
@mstrohm
markus.strohmaier@gesis.org
http://markusstrohmaier.info/
Questions?20
The Semantic Evolution of Online Communities

More Related Content

Similar to The Semantic Evolution of Online Communities

Dynamic Data Community Discovery
Dynamic Data Community DiscoveryDynamic Data Community Discovery
Dynamic Data Community Discovery
Sarang Rakhecha
 
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksRostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Witology
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
Amparo Elizabeth Cano Basave
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
Anvardh Nanduri
 
I Link Search And Routing In Social Networks
I Link Search And Routing In Social NetworksI Link Search And Routing In Social Networks
I Link Search And Routing In Social Networks
nkaf61
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 

Similar to The Semantic Evolution of Online Communities (20)

Poster presentation 5th BENet (Belgium Network Research Meting), Namur
Poster presentation 5th BENet (Belgium Network Research Meting), Namur Poster presentation 5th BENet (Belgium Network Research Meting), Namur
Poster presentation 5th BENet (Belgium Network Research Meting), Namur
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Tweet Cloud
Tweet CloudTweet Cloud
Tweet Cloud
 
Dynamic Data Community Discovery
Dynamic Data Community DiscoveryDynamic Data Community Discovery
Dynamic Data Community Discovery
 
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksRostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
 
Informatics systems
Informatics systemsInformatics systems
Informatics systems
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
 
Finding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologyFinding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontology
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
I Link Search And Routing In Social Networks
I Link Search And Routing In Social NetworksI Link Search And Routing In Social Networks
I Link Search And Routing In Social Networks
 
Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012
 
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Sys...
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Sys...Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Sys...
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Sys...
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
 
Structural Analysis of Hacktivism on Twitter
Structural Analysis of Hacktivism on TwitterStructural Analysis of Hacktivism on Twitter
Structural Analysis of Hacktivism on Twitter
 
Temporal networks - Alain Barrat
Temporal networks - Alain BarratTemporal networks - Alain Barrat
Temporal networks - Alain Barrat
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 

More from Matthew Rowe

Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
Matthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
Matthew Rowe
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
Matthew Rowe
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
Matthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
Matthew Rowe
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
Matthew Rowe
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
Matthew Rowe
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Matthew Rowe
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
Matthew Rowe
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User Study
Matthew Rowe
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
Matthew Rowe
 

More from Matthew Rowe (20)

Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User Study
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 

Recently uploaded

Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
eliklein8
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
ednyonat
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
Health
 
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

VIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 OnlyVIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
 
Film the city investagation powerpoint :)
Film the city investagation powerpoint :)Film the city investagation powerpoint :)
Film the city investagation powerpoint :)
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
 
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...
 
Interpreting the brief for the media IDY
Interpreting the brief for the media IDYInterpreting the brief for the media IDY
Interpreting the brief for the media IDY
 
Call Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking MenCall Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking Men
 
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
 
Ready to get noticed? Partner with Sociocosmos
Ready to get noticed? Partner with SociocosmosReady to get noticed? Partner with Sociocosmos
Ready to get noticed? Partner with Sociocosmos
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
 
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIRBVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
 
Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"
Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"
Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"
 
Film show production powerpoint for site
Film show production powerpoint for siteFilm show production powerpoint for site
Film show production powerpoint for site
 
Film show investigation powerpoint for the site
Film show investigation powerpoint for the siteFilm show investigation powerpoint for the site
Film show investigation powerpoint for the site
 
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFECASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
 
The Butterfly Effect
The Butterfly EffectThe Butterfly Effect
The Butterfly Effect
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
 
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceVellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
 
Enjoy Night⚡Call Girls Palam Vihar Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Palam Vihar Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Palam Vihar Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Palam Vihar Gurgaon >༒8448380779 Escort Service
 

The Semantic Evolution of Online Communities

  • 1. THE SEMANTIC EVOLUTION OF ONLINE COMMUNITIES MATTHEW ROWE1 AND MARKUS STROHMAIER2 1. LANCASTER UNIVERSITY, LANCASTER, UK @MROWEBOT | M.ROWE@LANCASTER.AC.UK 2. UNIVERSITY OF KOBLENZ AND GESIS, COLOGNE, GERMANY @MSTROHM | MARKUS.STROHMAIER@GESIS.ORG World Wide Web Conference 2014 Seoul, South Korea
  • 2. Studies of Online Community Evolution The Semantic Evolution of Online Communities 1 ¨  Prior work has examined online community development based on: ¤  Social network properties n  (Gong et al., 2012), (Mislove et al., 2007) ¤  Social group formation n  (Backstrom et al., 2006) ¤  Lexical term usage and uptake n  (Danescu et al., 2013) Organic fixie Pitchfork, fingerstache fashion axe 8-bit ethical. Neutra shabby chic brunch, mustache vegan twee typewriter dreamcatcher try-hard organic church-key! subsequent interpretation, of churne nities. Churners present a serious managers and hosts as the leaving of a detrimental effect on the communi a question-answering community ca unanswered queries). In this section we define churn classification task and use the previo tors of lifecycle trajectories to predi churner or not. As we confine user the start of their lifecycle to the end mined from this period to characteri We define churners as any user who before the final 10% of the time w cutoff points are: 2012-07-09 for Fac SAP, and 2010-12-23 for ServerFaul following form: D = {(xi, yi)}, whe label of the user from one of two while xi denotes an 11-element R-va either a Facebook or SAP user, and = {w1,w2,…,wn}
  • 3. The Semantic Evolution of Online Communities 2 Work has yet to examine the semantic evolution of online communities 1.  Understand how semantic concepts emerge 2.  Model the development of semantic structures over time 3.  Examine how communities differ from one another in their evolution
  • 4. Assessing Semantic Evolution The Semantic Evolution of Online Communities 3 Define: Community Semantics! Entities, and their classes, discussed within a community over a given time-period, and the structure connecting those concepts! Time …
  • 5. Assessing Semantic Evolution The Semantic Evolution of Online Communities 4 ¨  Assessing community semantics enables: 1.  Characterisation of semantic evolution dynamics 2.  Comparison based on community evolution 3.  Forecasting churn rates from evolution signals Our Contributions Define: Community Semantics! Entities, and their classes, discussed within a community over a given time-period, and the structure connecting those concepts!
  • 6. The Semantic Evolution of Online Communities Approach: Examining Semantic Evolution5 Retrieve Posts and Extract Entities Construct Time- delimited Semantic Graphs Inspect Macro Evolution Induce Community- Specific Evolution Models Apply Mined Semantic Evolution Dynamics We have two flavours: (i)  concept graphs (ii)  entity graphs Applying graph measures over cumulative time-sensitive graphs Inducing logistic population models for each community and graph measure Using model dynamics as community motifs to: (i)  inspect communities (ii)  predict churn rate Experiment Dataset: Boards.ie All posts during 2005-2008 93 Communities: online forums
  • 7. Extracting Entities from Online Community Posts The Semantic Evolution of Online Communities 6 ¨  We are provided with a set of post quadruples: ¨  We constrain content to within a given interval [t’,t’’) ¨  Extract each post’s entities within the time interval using TextRazor ¤  No quota limit, and good prior performance (Derczynski et al., 2013) ¨  Result: time-sensitive community-discussed entities (DBPedia URIs) Retrieve Posts and Extract Entities specialised topics of discussion over A. Posts are provided as a set of quadruples <u, s, t, f> 2 P, where user u posted message s at time t in forum f. A message (s) is com- posed of terms that we use to build the semantic models for individual communities. The information discussed in a community, and thus its semantics, can change and alter over time, therefore we constrain a community’s model to specific time snapshots - e.g. t0 ! t00 where t0 < t00 - for this we use the following construct that filters through all relevant posts’ contents within the allotted time window: St0 t00 = {s : <u, s, t, f> 2 P, t0  t < t00 } (1) Information discussed within online communities can be represented in terms of its semantics, using information from either the schema-level (i.e. ontological classes and relations between them) or the data level (i.e. using entities and how they are related to one another). For the former we consider concepts to be classes found within the DBPedia Ontology, that is: the types of entities that users are discussing (e.g. people, locations, etc.), while for the latter case we con- Pa ide ite rea for {c, riv fro 3.2 A wh con fro as - w ing pro NITIES WITH SEMANTIC GRAPHS For our experiments we used data from the Irish commu- nity message board Boards.ie.2 This is a general-discussion community message board that includes a set of hierarchi- cally nested forums (F) in which posts are made - i.e. fo- rum A can be a parent of forum B, and thus B contains specialised topics of discussion over A. Posts are provided as a set of quadruples <u, s, t, f> 2 P, where user u posted message s at time t in forum f. A message (s) is com- posed of terms that we use to build the semantic models for individual communities. The information discussed in a community, and thus its semantics, can change and alter over time, therefore we constrain a community’s model to specific time snapshots - e.g. t0 ! t00 where t0 < t00 - for this we use the following construct that filters through all relevant posts’ contents within the allotted time window: St0 t00 = {s : <u, s, t, f> 2 P, t0  t < t00 } (1) In or the se that conne such (i.e. w Path ident iterat reach forme {c, p, rived from 3.2 An User u posted content s at time t in forum f
  • 8. Building Semantic Resource Graphs The Semantic Evolution of Online Communities 7 1.  Concept graphs ¤  Nodes: Union of rootpaths from each entity’s class to owl:Thing class ¤  Edges: Between nodes within the DBPedia Ontology 2.  Entity graphs ¤  Nodes: Entities provided by time-specific post extractions ¤  Edges: Relations between entity pairs within the DBPedia linked data graph Construct Time- delimited Semantic Graphs e - i.e. fo- B contains e provided er u posted s) is com- tic models iscussed in e and alter s model to < t00 - for through all window: } (1) ties can be mation from d relations es and how we consider Ontology, ussing (e.g. se we con- (e.g. dbpe- ents, St0 t00 , such concepts, and derive a fully connected concept graph (i.e. with no disconnected components) we extract the Root Path Graph as follows: From each concept (c 2 RC ) we identify the parent concept (<c rdfs:subClassOf p>) and iteratively move up the concept graph until the root node is reached (owl:Thing), thereby returning a set of nodes that formed the path from c to the root node: rootpath(c) = {c, p, ...,owl:Thing}. The graph’s vertices are therefore de- rived by taking the union of all paths to the root returned from each concept within the seed set: Vf = [ c2RC rootpath(c) (3) 3.2 Entity Graphs An entity graph (GE) is a type of semantic resource graph where the vertices (V ) are entities and the set of edges (E) connecting these entities are relations between them derived from the Web of Linked Data. We define the entity graph as GE[f, t0 , t00 ] = hVf , Ef i, such that GE[f, t0 , t00 ] ⇢ Gentity - where Gentity denotes the DBPedia entity graph contain- ing relations between entities at the data level. As we are provided with a collection of time-delimited entities RE for a given forum, we query DBPedia for links between entity pairs and add such edges to the graph where such a link exists: nd char- Google+. cial net- oo! An- node ar- ng times ed term on how lthough cial net- rs evolve MMU- HS commu- scussion ierarchi- i.e. fo- contains provided u posted is com- resource URIs, is typed according to one or more classes from the DBPedia ontology.3 Therefore to construct the set of concepts that are cited within a given forum f over the allotted time period we retrieve the classes that each en- tity is a type of and store these in the following set: RC . From this set we generate a time-dependent concept graph: GC [f, t0 , t00 ] = hVf , Ef i, such that GC [f, t0 , t00 ] ⇢ Gtype - where Gtype denotes the DBPedia type graph formed from the class structure of the DBPedia ontology. In this con- text the set of concepts denotes the seed set and is used to populate the vertices in the graph and then construct edges between the vertices based on existing links between the concepts in the DBPedia type graph (Gtype): Ef = {(ci, cj) : ci, cj 2 Vf , (ci, cj) 2 Etype} (2) In order to derive the set of vertices we must consider how the seed set can be used for this process as it is often the case that the set is comprised of concepts which are not directly connected to one another in the concept graph. To connect such concepts, and derive a fully connected concept graph (i.e. with no disconnected components) we extract the Root Path Graph as follows: From each concept (c 2 RC ) we identify the parent concept (<c rdfs:subClassOf p>) and iteratively move up the concept graph until the root node is c models cussed in and alter model to < t00 - for rough all window: (1) es can be tion from relations and how e consider Ontology, sing (e.g. we con- .g. dbpe- nts, St0 t00 , a time pe- t0 t00 using of entities returned reached (owl:Thing), thereby returning a set of nodes that formed the path from c to the root node: rootpath(c) = {c, p, ...,owl:Thing}. The graph’s vertices are therefore de- rived by taking the union of all paths to the root returned from each concept within the seed set: Vf = [ c2RC rootpath(c) (3) 3.2 Entity Graphs An entity graph (GE) is a type of semantic resource graph where the vertices (V ) are entities and the set of edges (E) connecting these entities are relations between them derived from the Web of Linked Data. We define the entity graph as GE[f, t0 , t00 ] = hVf , Ef i, such that GE[f, t0 , t00 ] ⇢ Gentity - where Gentity denotes the DBPedia entity graph contain- ing relations between entities at the data level. As we are provided with a collection of time-delimited entities RE for a given forum, we query DBPedia for links between entity pairs and add such edges to the graph where such a link exists: Ef = {(ri, rj) : ri, rj 2 Vf , (ri, rj) 2 Eentity} (4) Given this edge construction mechanism we only look for relations one-hop away in the entity graph, that is: given
  • 9. Measuring Graph Dynamics The Semantic Evolution of Online Communities 8 1.  Node Count: size of the graph 2.  Diameter: breadth of the graph 3.  Specialisation Count: class specialisations 4.  Graph Entropy: density of the graph 5.  Clustering Coefficient: cliquishness of the graph Inspect Macro Evolution Computation of the measures is described within the paper
  • 10. The Semantic Evolution of Online Communities 9 Macro Evolution Cumulative Time Interval Entropy of the Concept Graph Showing the mean graph entropy across all communities and the 95% Confidence Interval 0 20 40 60 80 100 120 150200250300 Timestep (a) Node Count 0 20 40 60 80 100 120 4.04.55.05.5 Timestep H(G) (b) Graph Entropy 0 20 40 60 80 100 150200250 Timestep Specialisations (c) Specialisations : Concept graphs’ evolution based on node counts, graph entropy and spec ty of the graph grows as more entities are of a given online community based on Inspect Macro Evolution
  • 11. Inspect Macro Evolution The Semantic Evolution of Online Communities 10 Macro Evolution: Concept Graphs 0 20 40 60 80 100 120 150200250300 Timestep |V| (a) Node Count 0 20 40 60 80 100 120 4.04.55.05.5 Timestep H(G) (b) Graph Entropy 0 20 40 60 80 100 120 150200250 Timestep Specialisations (c) Specialisations Figure 1: Concept graphs’ evolution based on node counts, graph entropy and specialisations. ing that the density of the graph grows as more entities are added and thus more connections are possible between them. Summary: We can summarise the following salient find- ings: (i) for concept graphs: node count, specialisation count and density (graph entropy) tend to converge to limit; (ii) for entity graphs, the diameter, graph entropy and cluster- ing coe cient tend to converge to a limit, while the node count (number of entities) increases linearly; (iii) despite new entities arriving at a constant linear rate, on average, of a given online community based on a single graph mea- sure. We exclude the derivation of the equations from the paper, but it is su cient to conclude that given |T| time steps we would have a single equation for each time step (t 2 T): Rtr 1 + PtE 1 = 1. We can then solve for the unknown variables r and E using the QR-decomposition of a matrix: expressing the lefthand side of the simultaneous equations as a |T| ⇥ 2 matrix and the righthand side as a |T|-element vector where each element is 1. We induced logistic population models for each of the graph measures Node count, graph entropy, and specialisations converge to a limit
  • 12. Inspect Macro Evolution The Semantic Evolution of Online Communities 11 Macro Evolution: Entity Graphs 0 20 40 60 80 100 120 050010001500 Timestep |V| Linear Model (β=9.932) (a) Node Count 0 20 40 60 80 100 120 1234567 Timestep Diameter (b) Diameter 0 20 40 60 80 100 120 1234 Timestep H(G) (c) Graph Entropy 0 20 40 60 80 100 120 0.020.060.100.14 Timestep CC (d) Clustering Coe cient Figure 2: Entity graphs’ evolution based on node counts, diameter, graph entropy, and clustering coe cient. and entity graphs defined within a single vector for a given community (f): mf = {m1, m2, . . . , m13}. We began by de- riving a single matrix M 2 R93⇥13 containing the 93 commu- nities (rows) under analysis together with their 13 evolution measures (columns), and performing principal component analysis over this matrix. The result of this clustering is shown in Fig. 3 where we have colour-coded the di↵erent community forums by their hierarchical level in the plat- form (level 2 = most general, level 4 = most specific). We found that several of the more general forums appeared as dynamics of a given community at time step t to predict the churn rate of community members at time t + 1. We defined the churn rate of a community as the proportion of active users during a given time period (i.e. week segment) that post for the last time. We used the semantic dynamics motifs from the prior experiment (as listed within Fig. 3) and also included graph measures at a given time period: i.e. graph entropy at time t, specialisation count at time t, etc. We derived these features for every time step for each community and derived the response variable as the churn Diameter, graph entropy, and clustering coefficient converge to a limit
  • 13. Modelling the Semantic Evolution of Individual Communities The Semantic Evolution of Online Communities 12 ¨  For each community and measure, induce a logistic population model: 1.  Get the set of measure increase time steps (T) 2.  Derive the proportionate growth rate for each step 3.  Use the set of proportionate growth rates to solve for unknown community-specific evolution variables (r & E) T = {a : a 2 [1, 119], m(G1,a+1) > m(G1,a)} Deriving the set of change time steps for a given com allows the proportionate growth rate for a given t (t 2 T) to be derived: Rt = (Pt+1 Pt)/Pt. Th is equivalent to the following equation which defi proportionate growth rate Rt in terms of the comm growth rate (r) and carrying capacity (E), our u variables: Rt = r(1 Pt/E). Therefore if we mea proportionate growth rate over the |T| distinct tim then we can derive, via simultaneous equations, the rate of the graph and its carrying capacity, the ve sures that we can use to characterise the semantic e Induce Community- Specific Evolution Models Proportionate Growth Rate @ t Population Growth Rate Population (measure) size @ t Carrying Capacity (Equilibrium) All communities’ models fit with χ2-test at α<0.01
  • 14. The Semantic Evolution of Online Communities 13 Semantic Evolution… what can we actually use it for?
  • 15. The Semantic Evolution of Online Communities Application: Community Evolution Analysis14 Apply Mined Semantic Evolution Dynamics ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −60 −40 −20 0 20 40 60 −40−20020 PC1 PC2 7 9 10 11 12 18 19 20 21 22 23 24 25 31 34 37 38 47 52 54 55 56 6064 68 82 86 93 99 105 107 108 109 116 120 124 125 126 127 136 137 151 171 177 227 232 237 246 252 259 264 267 269 271 333 343 346 370 382 388 389 392 410 411 443 446 453 464 468 471 474 475 476 478 481 482 483 490 495 503 506 512 514 518 522 529 532 542544 545 547 554 556 ● ● ● Level 2 Level 3 Level 4 E EG − CSemantic Evolution Motif: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 20 40 6022 24 5 31 34 37 54 56 82 93 105 109 177 227 2 264 269 333 392 443 446468 4 3 490 495 6 512 514522 529 542544 554 556 ● ● ● Level 2 Level 3 Level 4 CG − Node Count − Rate CG − Node Count − Equilbrium CG − Graph Entropy − Rate CG − Graph Entropy − Eq CG − Specialisation Count − Rate CG − Specialisation − Equilibrium EG − Node Count − Slope EG − Diameter − Rate EG − Diameter − Equilibrium EG − Entropy − Rate EG − Entropy − Equilibrium EG − Clustering Coefficient − Rate EG − Clustering Coefficient − Equilibrium f7 − After Hours f227 − Television f554 − Wanted Motors 10−1 100 101 102 103 he communities based on their semantic motifs (left) where level 4 forums are clustered lues for the concept graphs (CG) and entity graphs (EG) for the three outlier forums right). hat concept and entity graph den- row linearly (unlike in social net- erges on a limit, which we charac- Conference on Advances in Social Networks Analysis and Mining, 0:724–725, 2012. [4] Cristian Danescu-Niculescu-Mizil, Robert West, Dan 20 40 60 80 100 120 Timestep Linear Model (β=9.932) (a) Node Count 0 20 40 60 80 100 1201234567 Timestep Diameter (b) Diameter 0 20 40 60 80 100 1 1234 Timestep H(G) (c) Graph Entropy 2: Entity graphs’ evolution based on node counts, diameter, graph entro ity graphs defined within a single vector for a given nity (f): mf = {m1, m2, . . . , m13}. We began by de- single matrix M 2 R93⇥13 containing the 93 commu- ows) under analysis together with their 13 evolution es (columns), and performing principal component over this matrix. The result of this clustering is dynamics of a given com the churn rate of commu defined the churn rate of active users during a give that post for the last time motifs from the prior exp 40 60 80 100 120 Timestep Linear Model (β=9.932) Node Count 0 20 40 60 80 100 120 1234 Timestep Diameter (b) Diameter 0 20 123 H(G) (c) G Entity graphs’ evolution based on node counts, diame raphs defined within a single vector for a given (f): mf = {m1, m2, . . . , m13}. We began by de- le matrix M 2 R93⇥13 containing the 93 commu- ) under analysis together with their 13 evolution dynami the chu defined active u
  • 16. Application: Community Evolution Analysis The Semantic Evolution of Online Communities 15 Apply Mined Semantic Evolution Dynamics ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 PC1 10 22 24 25 31 34 37 38 54 56 60 68 82 86 93 105 107 109 177 227 232 237 6 259 264 269 271 333 43 346 382 388 389 392411 443 446 453 464 468 471 474 476 478 481 482 483 490 495 3 506 512 514 518 522 529 542544 545 554 556 ● ● ● Level 2 Level 3 Level 4 CG − Node Count − Rate CG − Node Count − Equilbrium CG − Graph Entropy − Rate CG − Graph Entropy − Eq CG − Specialisation Count − Rate CG − Specialisation − Equilibrium EG − Node Count − Slope EG − Diameter − Rate EG − Diameter − Equilibrium EG − Entropy − Rate EG − Entropy − Equilibrium EG − Clustering Coefficient − Rate EG − Clustering Coefficient − Equilibrium f7 − After Hours f227 − Television f554 − Wanted Motors 10−1 100 101 102 103 t of the communities based on their semantic motifs (left) where level 4 forums are clustered el values for the concept graphs (CG) and entity graphs (EG) for the three outlier forums els (right). Entity Graph: fastest growth in the random forum Concept graph: slowest growth in random forum, largest equilibrium
  • 17. Application: Churn Rate Prediction The Semantic Evolution of Online Communities 16 ¨  Task: forecast community churn rate at t+1 ¤  Based on semantic evolution up to t ¨  Trained a ridge regression model ¨  Root Mean Square Error: Predicted vs. Actual churn rate Apply Mined Semantic Evolution Dynamics Training Test 0 120 150 Training Test dynamics of a given community at time step t to predict the churn rate of community members at time t + 1. We defined the churn rate of a community as the proportion of active users during a given time period (i.e. week segment) that post for the last time. We used the semantic dynamics motifs from the prior experiment (as listed within Fig. 3) and also included graph measures at a given time period: i.e. graph entropy at time t, specialisation count at time t, etc. We derived these features for every time step for each community and derived the response variable as the churn rate at the following time step. We then compiled a train- ing dataset (up to week 120) and a test dataset (from week 120). Each dataset had the following form: D = {(xi, yi)}, where xi contained a 21-element time-delimited feature vec- tor for a given community and yi was the churn rate of the community at the following time step. We trained a ridge regression model ( ) using Dtrain and applied it to Dtest, testing the performance of: a) just concept graph features, b) just entity graph features, and c) all features. An autore- gressive model was used as the baseline - using the churn rate at time t as a single predictor variable for the churn rate at time t + 1. Performance was evaluated using the Root Mean Square Error (RMSE). Table 1: Root Mean Square Error when predicting churn rates using: an Autoregressive model (R2 = 0.341) and a Ridge Regression model using Concept
  • 18. Application: Churn Rate Prediction The Semantic Evolution of Online Communities 17 sions : we te of that t the con- size. aphs: h but g to- ount After ums, ty of ecific The and neral ntity gressive model was used as the baseline - using the churn rate at time t as a single predictor variable for the churn rate at time t + 1. Performance was evaluated using the Root Mean Square Error (RMSE). Table 1: Root Mean Square Error when predicting churn rates using: an Autoregressive model (R2 = 0.341) and a Ridge Regression model using Concept Graph, Entity Graph, and all features. Baseline Concept Graph Entity Graph All Features 7.310 ⇥10 3 5.315 ⇥10 3 5.301 ⇥10 3 4.941 ⇥10 3 Table 1 presents the results from our prediction our exper- iment. We found that for all tested models (concept graph, entity graph, all features) we significantly outperformed the baseline - tested using the sign test (↵ = 0.001). Entity graph features outperform concept graphs but not signifi- cantly, while our best model is the use of all features together in a single model. These results empirically demonstrate the Significant reduction in error (Sign test with α=0.001) Baseline: Autoregressive model with churn rate @ t as a predictor for t+1 Apply Mined Semantic Evolution Dynamics
  • 19. Findings and Conclusions The Semantic Evolution of Online Communities 18 ¨  Semantic graphs of online communities do not grow linearly: instead, they evolve to a limit ¤  Unlike in social networks [Lekovec et al., 2008] ¤  Exception: entity graph size ¨  A finite number of topics are discussed within communities ¤  Variation between communities ¨  Our use of logistic population models has enabled: 1.  Characterisation of community-specific evolution dynamics 2.  Community analysis to inspect how communities evolved differently 3.  Churn prediction based on semantic evolution
  • 20. Future Work The Semantic Evolution of Online Communities 19 1.  Expanded to cover other online communities ¤  Question-answering, mined communities 2.  User profiling ¤  Capturing user-specific semantic evolution In-edge weight distribution (ServerFault) ● ● ● ● ● 1 2 3 4 5 0.50.60.70.80.91.01.11.2 k= 5 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 0.30.40.50.60.7 k= 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.050.150.250.35 H Non-churners Churners