The document discusses life-cycles and cross-community effects in scientific communities. It analyzes co-citation networks of publications from two fields, semantic web and information retrieval, over multiple time periods. Key findings include identifying a community shift between the two fields focused on topics like personalization and business processes. Another community specialized over time from general semantic web topics to focus on semantic web services. A community merge between two information retrieval communities centered around different IR topics was also found. Various measures are used to analyze community structure, topic evolution, and cross-community dynamics.
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Life-Cycles and Mutual Effects of Scientific Communities: ASNA 2010
1. Life-Cycles and Mutual Effects of Scientific
Communities
V´clav Bel´k, Marcel Karnstedt, Conor Hayes
a a
Digital Enterprise Research Institute
NUI Galway
ASNA 2010, Z¨rich
u
2. Introduction Methodology Data-Set Results Conclusion and FW
Motivation
Progress in science is often measured by citation measures,
which are relatively static
Detection and explanation of evolution and life-cycles provides
better arguments for the progress
Previous approaches focused mainly on analysing co-citation
graphs or textual clustering
Little work on analysis of cross-community effects
Kuhn [5] claimed the development of scientific knowledge
proceeds in discrete steps:
Pre-paradigm period
Paradigm period—normal science
Crisis
Reaction to the crisis—paradigm shift
1 / 34
3. Introduction Methodology Data-Set Results Conclusion and FW
Cross-Community Effects I
Expected Phenomena
Expected Phenomena
Clique: Graph & Network Analysis Cluster
Clique: Graph & Network Analysis Cluster
ParadigmParadigm shift
shift Paradigm merge
Paradigm merge
(a) Community shift (b) Community merge
(with community shift)
2 / 34
4. Introduction Methodology Data-Set Results Conclusion and FW
Cross-Community Effects II
Although inspired by Kuhn, we expected evolution of
communities in rather an alleviated form
Instead of paradigm shift, we were looking for community
shift
Community merge is a complementary phenomenon, but
rather uninteresting one
Thus, rather combinations of shifts with subsequent merges,
i.e. community merge/shifts, were investigated
Instead of paradigm articulation, we were looking for
community specialization
Co-citation networks of two big camps in CS were analysed:
Semantic Web (solution-driven) and Information Retrieval
(problem-driven) [1]
3 / 34
5. Introduction Methodology Data-Set Results Conclusion and FW
Outline
1 Methodology
2 Data-Sets
3 Results
4 Conclusion and Future Work
4 / 34
6. Introduction Methodology Data-Set Results Conclusion and FW
Initial Expectations&Requirements
The methodology was developed with a set of certain requirements
arising from the nature of the problem:
1 Dynamic data-set represented by snapshots of several
consecutive time-steps
2 Communities have to be identified in the network in each
time-step
3 Authors (nodes in general) have to be uniquely identified
among all time-steps
4 For topical analysis, meta-data (topics) describing the nodes
are necessary
5 / 34
7. Introduction Methodology Data-Set Results Conclusion and FW
Community Detection
We identified communities using three popular algorithms:
Infomap [7]
Louvain [2]
WT [8]
All have publicly available implementations, are able to
operate over weighted networks, and produce non-overlapping
communities
In each time-step t, we identified clustering C t of n
communities: C t = {c1 , c2 , ..., cn }, where n is determined
t t t
automatically for each time-step
6 / 34
8. Introduction Methodology Data-Set Results Conclusion and FW
Tracking of Dynamic Communities
Communities are identified independently for each time-step.
It is thus necessary to track the evolution of each community
in further time-steps
Communities were matched according to the highest Jaccard
coefficient:
|cit ∩ cjt+1 |
match(cit ) = arg max t
cjt+1 ∈C t+1 |ci ∪ cjt+1 |
Important ancestors and descendants were identified by
modified Jaccard coefficient:
|cit ∩ cjt+1 | |cit ∩ cjt+1 |
ancestor (cit , cjt+1 ) = , descendant(cit , cjt+1 ) =
|cjt+1 | |cit |
7 / 34
9. Introduction Methodology Data-Set Results Conclusion and FW
Visualization
To compare and inspect the state of the network in different
time-steps, a proper visualization is very helpful
Nodes that appeared previously should have similar positions
Colours denoting the affiliation of the node to its cluster
should be preserved
As we have not found any existing tool implementing these
requirements, we built our own one based on JUNG
Another tool based on Graphviz was build to automatically
create diagrams of ancestors and descendants based on
respective relations
8 / 34
10. Introduction Methodology Data-Set Results Conclusion and FW
Topic Detection I
We mined keywords using NLP techniques [3] from the
abstracts or full-texts for almost 70% of the underlying articles
Tokenised and stemmed [6] keywords were then assigned to
each author
Ability of keywords to discriminate authors was ranked
according to their frequency (TF) and uniqueness in the
corpus (IAF): TF-IAF
Each author a in time-step t was thus described by a
t
bag-of-words vector ka
Topical description of cluster c was obtained by a centroid of
its members
Cosine similarity was used for determining topical similarity of
two clusters
9 / 34
11. Introduction Methodology Data-Set Results Conclusion and FW
Topic Detection II
Interpretation of a cluster’s topic was based on characterizing
keywords—a union of:
20 highest ranked keywords
20 most frequent keywords
We were particularly interested in cross-community activity
between IR and SW camps
Definition what is IR- and what SW-related community was
based on frequent patterns mined from the publications
Any event detected by community topic evolution measures
associated with both IR- and SW-related communities was
then considered as an inter-camp dynamics
Meta-data was used to assess the quality of clusterings—WT
was omitted from further analysis
10 / 34
12. Introduction Methodology Data-Set Results Conclusion and FW
Measures
Overlap measures induce huge number of inter-reactions
between communities
Solution is to apply more specific measures or to use the
simple ones in combination
We developed and/or used two categories of measures
1 community life-cycle measures for measurement and
explanation the state and the evolution of the community
2 community topic evolution measures for revealing of
cross-community phenomena like community shift
11 / 34
13. Introduction Methodology Data-Set Results Conclusion and FW
Community Life-Cycle Measures
Structural perspective:
size S
average vertex betweenness B, RB ∈ R+
relative density ρ, Rρ ∈ [0, 1]
author entropy A, RA ∈ [0, 1]
Topical perspective:
topic drift T , RT ∈ [0, 1]
cluster content ratio H, RH ∈ R+
12 / 34
14. Introduction Methodology Data-Set Results Conclusion and FW
Community Topic Evolution Measures
We looked for parallel changes of structure and topic of
communities
Structural and topical measures were combined by
multiplication for simplicity and because the range remains
within [0, 1]
Community shift PS may be detected as an emergence of a
new community topically distinct from its ancestor:
PS (cit , cjt+1 ) = dissim(cit , cjt+1 ) × ancestor (cit , cjt+1 )
13 / 34
15. Introduction Methodology Data-Set Results Conclusion and FW
Community Topic Evolution Measures II
Community shift/merge PS/M may be detected as a merge of
two topically distinct community:
PS/M (cit , cjt+1 ) = dissim(cit , cjt+1 ) × descendant(cit , cjt+1 )
Note that both PS and PS/M are defined only for two
different communities, i.e. only if i = j
Community topic change PC expresses a change of topic of a
structurally stable community:
PC (cit ) = dissim(cit , cit+1 ) × (1 − A(cit+1 ))
Only events with values > 0.5 and with a minimal overlap of
10 authors were selected for deeper analysis
14 / 34
16. Introduction Methodology Data-Set Results Conclusion and FW
Data-Set
We first picked a set of major conferences in both fields
We then selected publications from these conferences from
DBLP for 2000–2009
Co-citation network of 5772 authors and 817642 edges over
all years was extracted
3-year time-steps with 2-year overlap: 2000–2002,
2001–2003, . . .
Total number of articles was 39314 for which we were able to
scrape 22975 abstracts and 3740 full-texts
Nearly 70% coverage by content
We scraped 18313 author-provided keywords for 4102 distinct
articles
Coverage by these high-quality meta-data was 10%
We mined 263742 keywords from abstracts and full-texts
15 / 34
17. Introduction Methodology Data-Set Results Conclusion and FW
Shift of Louvain Community 26
Emergence of Louvain community 26 was identified as an
.
inter-camp community shift PS = 0.62 in 2006
It was formed by 80% of community 6 “web IR” and by 20%
of community 5 “SW”
The keywords in 2006 like “navigation”, “personalization”,
and “semantic web” suggests transdisciplinary topics
Massive influence of community 15 “SW and IR” in 2007 and
a change of topic towards “SW and business processes”
.
Observed as a low topic drift T = 0.29
IR-related keywords appeared again among characterizing
keywords in 2008
.
Topic then stabilized: T = 0.65
16 / 34
18. Introduction Methodology Data-Set Results Conclusion and FW
Evolution of Louvain Community 26
Communities 6 “web information retrieval”, 5 “semantic web”,
15 “semantic web and information retrieval” and their descendant
community 26
2005–2007 2006-2008 2007–2009 2008–2009
c5 c5 c5
20 2.8 48.6
c6 80 c26 4.7 c26 51.4 c26
90.6 8.3
c15 c15 c15
17 / 34
19. Introduction Methodology Data-Set Results Conclusion and FW
Position of Louvain Community 26 in 2006 and 2007
Communities 6 “web information retrieval” (pink), 5 “semantic
web” (red), 15 “semantic web and information retrieval” (violet)
and their descendant community 26 (green)
18 / 34
20. Introduction Methodology Data-Set Results Conclusion and FW
Specialization of Infomap Community 9
First oriented on general and core SW-related topics in 2000
Between 2002–2004 we identified 3 shifts
One of these shifts was community 99 “semantic desktop and
personalization”
The community itself then specialized on “SW services”
S,T , and H provided valuable insights
ρ, B, and A did not seem to provide any further insights
19 / 34
21. Introduction Methodology Data-Set Results Conclusion and FW
Life-Cycle Measures of Infomap Community 9
2 4500 ρ
H
1.8 4000 B
1.6 3500 S
A
1.4
3000 T
H, T , A, ρ
1.2
2500
B, S
1
2000
0.8
1500
0.6
0.4 1000
0.2 500
0 0
2000 2001 2002 2003 2004 2005 2006 2007 2008
time
20 / 34
22. Introduction Methodology Data-Set Results Conclusion and FW
Life-Cycle Measures of Infomap Community 99
1.6 1000 ρ
H
900
1.4 B
800 S
1.2 A
700
T
H, T , A, ρ
1 600
B, S
500
0.8 400
0.6 300
200
0.4
100
0.2 0
2003 2004 2005 2006 2007 2008
time
21 / 34
23. Introduction Methodology Data-Set Results Conclusion and FW
Shift/Merge of Community 86
.
We identified shift/merge PS/M = 0.91 of community 86
with community 0
Both communities were concerned with IR-related topics, but
each had its specific theme:
86 being more focused on “development”, “engine”, and
“system”
0 being more focused on “question answering”
90.9% of authors from 86 moved to community 0
.
Relative density ρ = 0.47 and high cluster content ratio
.
H = 1.91 suggests it was topically coherent, but structurally
weak
It is not possible to generalize the suitability of any life-cycle
measures as we have identified only one shift/merge
22 / 34
25. Introduction Methodology Data-Set Results Conclusion and FW
Change of topic of Infomap community 54
.
Inter-camp community topic change PC = 0.58 was identified
for Infomap community 54 between 2005 and 2006
The topic changed from “knowledge management” and
“information extraction” towards “knowledge querying” and
“semantic web”
Zero author entropy A suggests this might have been caused
by new members joining the community
34.5% were completely new, i.e. they did not come from any
previous community
20.7% coming from 54 “knowledge management and
information extraction”
17.2% coming from 29 “ontologies and SW”
6.9% coming from 70 “ontologies and folksonomies”
6.9% coming from 112 “semantic web services”
24 / 34
27. Introduction Methodology Data-Set Results Conclusion and FW
Emergence of Intermediary Louvain Community 15
The most complex scenario we investigated
It first emerged as a descendant of community 4 “IR” with
topic “cross-language IR”, which was identified as a
.
community shift PS = 0.55 in 2003
Since 2004, this community was under a massive influence of
community 5 “SW”, which caused a change towards
.
SW-related topics PC = 0.31
Since 2005, IR-related keywords appeared again among
characterizing keywords, while those keywords disappeared in
community 5
Therefore, whereas community 5 kept its focus on the core
SW-related topics, it largely participated in forming of a new
interdisciplinary community
26 / 34
28. Introduction Methodology Data-Set Results Conclusion and FW
Betweenness of Louvain Community 15
Despite of being still focused on mainly SW-related topics,
community 15 worked as an intermediary of both camps
This hypothesis is supported by high average author
betweenness B
2004–2006 2007–2009
S B S B
c15 444 1591.01659 445 2535.02
entire network 2776 2066.70764 2190 2192.85117
27 / 34
29. Introduction Methodology Data-Set Results Conclusion and FW
Position of Louvain Community 15 in 2004 and 2007
Community 5 “SW” (red—left side), “IR” communities 0, 4, 6 and
9 (grey, beige, pink and red—right side, respectively) and their
intermediary community 15 (violet)
28 / 34
30. Introduction Methodology Data-Set Results Conclusion and FW
Conclusion and Future Work I
We presented a general and scalable methodology for analysis
of cross-community phenomena uniquely combining
topological and content analysis and supported by special
visualization techniques
Three community topic evolution measures tailored for
identifying phenomena like community shift, shift/merge, and
change of topic were proposed and successfully assessed
Community shift and topic change were detected quite
commonly, which suggests that they are part of many
community life-cycles
Community shift/merge was detected very rarely, which either
means we have to improve the measure or that this is simply a
rare phenomenon
We proposed life-cycle measures characterising the states and
evolution of communities
29 / 34
31. Introduction Methodology Data-Set Results Conclusion and FW
Conclusion and Future Work II
The assessment showed that average vertex betweenness,
relative density, cluster content ratio, and topic drift offered
valuable insights into the phenomena revealed by community
topic evolution measures
We observed strong shifts PS → 1, when the shifted
community disappeared in the next time-step
These strong shifts had usually very different but coherent
topics
They might have been the initial sources of new topics or even
research streams
Frequently, a newly emerged community had quite weak
structure (low ρ, high A) and/or topic (low T ), while these
characteristics then improved in the subsequent time-steps
B seems to be a good measure for identification of
intermediary communities
30 / 34
32. Introduction Methodology Data-Set Results Conclusion and FW
Conclusion and Future Work III
We intend to cluster the community life-cycles by the
characteristic events expressed by all the measures
We expect this to provide an automated way of extracting
life-cycle taxonomies
The combination of content and structural analysis allowed us
to assess the quality of clustering revealed only by inspection
of structure of the network
We consider this original approach as a fertile ground for
future research
We plan to use other algorithms—e.g. co-clustering algorithm
of both content and objects [4]
We will extend the whole work to a larger data-set
31 / 34
33. Introduction Methodology Data-Set Results Conclusion and FW
References I
R. Baeza-Yates, P. Mika, and H. Zaragoza.
Search, Web 2.0, and the Semantic Web.
IEEE Intelligent Systems, 23(1):80–82, 2008.
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte,
and Etienne Lefebvre.
Fast unfolding of communities in large networks.
Journal of Statistical Mechanics: Theory and Experiment,
P10008, 2008.
Georgeta Bordea.
The Semantic Web: Research and Applications, chapter
Concept Extraction Applied to the Task of Expert Finding ,
pages 451–456.
Springer, 2010.
32 / 34
34. Introduction Methodology Data-Set Results Conclusion and FW
References II
Derek Greene and P´draig Cunningham.
a
Spectral Co-Clustering for Dynamic Bipartite Graphs.
Technical report, School of Computer Science & Informatics,
UCD, 2010.
Th. S. Kuhn.
The Structure of Scientific Revolutions.
University Of Chicago Press, December 1996.
Martin F. Porter.
An algorithm for suffix stripping.
Program, 14:130–137, 1980.
33 / 34
35. Introduction Methodology Data-Set Results Conclusion and FW
References III
Martin Rosvall and Carl T. Bergstrom.
Maps of random walks on complex networks reveal community
structure.
In National Academy of Sciences USA, volume 105, pages
1118–1123, 2008.
Ken Wakita and Toshiyuki Tsurumi.
Finding community structure in a mega-scale social networking
service.
In IADIS international conference on WWW/Internet 2007,
pages 153–162, 2007.
34 / 34