A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities
by F. Osborne, G. Scavo, E. Motta
URL: http://oro.open.ac.uk/41083/
In earlier papers we characterised the notion of diachronic topic-based communities –i.e., communities of people who work on semantically related topics at the same time. These communities are important to enable topic-centred analyses of the dynamics of the research world. In this paper we present an innovative algorithm, called Research Communities Map Builder (RCMB), which is able to automatically link diachronic topic-based communities over subsequent time intervals to identify significant events. These include topic shifts within a research community; the appearance and fading of a community; communities splitting, merging, spawning other communities; and others. The output of our algorithm is a map of research communities, annotated with the detected events, which provides a concise visual representation of the dynamics of a research area. In contrast with existing approaches, RCMB enables a much more fine-grained understanding of the evolution of research communities, with respect to both the granularity of the events and the granularity of the topics. This improved understanding can, for example, inform the research strategies of funders and researchers alike. We illustrate our approach with two case studies, highlighting the main communities and events that characterized the World Wide Web and Semantic Web areas in the 2000 – 2010 decade.
EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities
1. A Hybrid Semantic Approach to Building
Dynamic Maps of Research Communities
Francesco Osborne, Beppe Scavo, Enrico Motta
KMi, The Open University, United Kingdom
November 27th 2014
3. We need to understand how scientific communities
adapt and cooperate to implement visions into
concrete technologies.
4. Research communities
Communities of academic authors are usually identified by using
standard community detection algorithms, which typically
exploit co-authorship or citation graphs.
5. Temporal topic-based communities (TTC)
A different type of community we investigated is formed by the
set of researchers who, at a given time, are following shared
research trajectory, i.e. they are working on the same topics at
the same time.
Osborne, F., Scavo, G., & Motta, E. (2014). Identifying diachronic topic-based research
communities by clustering shared research trajectories. In The Semantic Web: Trends and
Challenges (pp. 114-129). Springer International Publishing.
6. Research Communities Map Builder
• RCMB is able to automatically link diachronic topic-based
communities over subsequent time intervals to
identify significant events.
• These include topic shifts within a community; the
appearance and fading of a community; communities
splitting, merging, spawning other communities; etc.
• The output of RCMB is a map of research
communities, annotated with the detected events,
which provides a concise visual representation of the
dynamics of a research area.
7. RCMB steps:
1. Applies the Temporal Semantic Topic-Based
Clustering (TST) algorithm to find Temporal topic-based
communities in different time intervals;
2. Detects Topic Shifts;
3. Links Communities in different years;
4. Detect Key Events;
8. RCMB steps:
1. Applies the Temporal Semantic Topic-Based
Clustering (TST) algorithm to find Temporal topic-based
communities in different time intervals.
2. Detects Topic Shifts in following years
3. Links Communities in different years
4. Detect Key Events
Temporal Semantic Topic-Based Clustering
Osborne, F., Scavo, G., & Motta, E. (2014). Identifying diachronic topic-based
research communities by clustering shared research trajectories. In The
Semantic Web: Trends and Challenges (pp. 114-129). Springer International
Publishing.
9. TST in short
1. It augments the topic semantically using an automatically
generated OWL ontology and represent each author as a
semantic topic distribution over subsequent years.
2. It weighs each topic according to its relationship with the
main topic, for highlighting the communities strongly
related to the main topic.
3. It clusters authors using the ATTS (Adjusted Temporal
Topic Similarity), which is computed by averaging the
cosine similarities of the topic vectors over progressively
smaller intervals of time.
10. Detecting Topic Shifts
We use a sliding window algorithm that checks for a topic shift
by comparing the initial topic distribution in time t with the topic
distributions in time t+1, t+2… t+n.
Information Extraction/Semantic Annotation community
2002
Infor. Extraction: 26 %
Natural Language: 17 %
Named Entity: 12 %
Machine Learning: 9 %
Knowledge Base: 9 %
2010
Linked Data: 16 %
Natural Language: 15 %
Semantic Annotation: 15 %
SW Technology: 10 %
Information Retrieval: 10 %
Knowledge Base: 9 %
Semantic Wiki: 9 %
2006
Semantic Annotation: 25 %
Knowledge Base: 15 %
Semantic Wiki: 11 %
Information Extraction: 10 %
Semantic Information: 8 %
Natural Language: 6 %
Information Retrieval: 6 %
11. Detecting Topic Shifts
We define a topic shift a statistically significant change (detected
via chi-square test ) in the topic distribution of a community
which occurred in a certain time interval.
To detect which topics were the main protagonists of this shift,
we applying the same test excluding each time a different topic,
and selecting the topic whose absence yields the bigger
increment in the p value.
12. Community linking
We are interested in two different links between community:
• The strong link is defined as a link that connects the same
community in subsequent timeframes.
• The weak link is defined as the link that connects community
C1 with community C2 in a subsequent timeframe, if C1 has an
impact over C2 in terms of migrating authors and/or topics.
14. Community linking
We take the minimum values of ts
and tw that minimize the MEF using
the Nelder-Mead algorithm.
15. Key Events detection
If a community has no strong links with any precedent
interval communities, we detect the appearance of a
community.
2006 2007
C1
C3
C2
C1
C2
16. Key Events detection
If a community has no strong links with any subsequent
interval communities, we detect the fading of a community.
2006 2007
C1
C2
C3
C1
C2
17. Key Events detection
If a community is linked to more than one community in the
subsequent interval and one of the links is a strong one we
detect the forking of one or more communities out of the
community characterized by the strong link.
2006 2007
C1 C1
C2
18. Key Events detection
If a community is linked to more than one community in the
subsequent interval and none of the links is a strong one we
detect the splitting of a community into multiple communities.
2006 2007
C1
C2
C3
19. Key Events detection
If two or more communities are linked to one community in
the subsequent interval and one of the inlinks is a strong link,
we detect the assimilation of one or more communities into
the community C characterized by the strong link.
2006 2007
C1 C1
C2
If the communities fade after the event, they are labelled
as absorbed to C.
20. Key Events detection
If two or more communities are linked to one community in
the subsequent interval and none of the inlinks is a strong
link, we detect the merging of two or more communities in a
new community C.
2006 2007
C1
C3
C2
If the communities fade after the event, they are labelled
as merged in C.
22. Case study
We applying RCMB to two research areas: World Wide Web
(WWW) and Semantic Web (SW).
Our study was based on a dataset built from data retrieved by
means of the API provided by Microsoft Academic Search.
We first retrieved authors and papers labelled with WWW
and SW or with their first 150 co-occurring topics. We then
run RCMB on WWW and SW in the 2000-2010 time interval
with a granularity of 3. The average number of authors
selected in each year was 932 for WWW and 646 for SW.
25. Future Work
• Automatically generate comprehensive explanations for
the identified dynamics.
• Forecasting topic shifts and key events, e.g., estimating
the probability that a new topic will emerge in a certain
community or that two communities will merge in the
coming years.
26. Questions?
Interested in scholarly data?
SAVE-SD 2015
Semantics, Analytics, Visualisation: Enhancing Scholarly Data
Workshop at 24th International World Wide Web Conference
May 19, 2015 - Florence, Italy
Site: cs.unibo.it/save-sd