SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Life-Cycles and Mutual Effects of Scientific
               Communities
   V´clav Bel´k, Marcel Karnstedt, Conor Hayes
    a        a
           Digital Enterprise Research Institute
                       NUI Galway

                ASNA 2010, Z¨rich
                            u
Introduction         Methodology     Data-Set            Results   Conclusion and FW


Motivation

               Progress in science is often measured by citation measures,
               which are relatively static
               Detection and explanation of evolution and life-cycles provides
               better arguments for the progress
               Previous approaches focused mainly on analysing co-citation
               graphs or textual clustering
                   Little work on analysis of cross-community effects
               Kuhn [5] claimed the development of scientific knowledge
               proceeds in discrete steps:
                   Pre-paradigm period
                   Paradigm period—normal science
                   Crisis
                   Reaction to the crisis—paradigm shift


                                                1 / 34
Introduction            Methodology                            Data-Set              Results            Conclusion and FW


Cross-Community Effects I
                         Expected Phenomena
                    Expected Phenomena
                              Clique: Graph & Network Analysis Cluster
               Clique: Graph & Network Analysis Cluster




                       ParadigmParadigm shift
                                shift                                                          Paradigm merge
                                                                                    Paradigm merge
                  (a) Community shift                                              (b) Community       merge
                                                                                   (with community shift)




                                                                          2 / 34
Introduction         Methodology     Data-Set            Results   Conclusion and FW


Cross-Community Effects II

               Although inspired by Kuhn, we expected evolution of
               communities in rather an alleviated form
               Instead of paradigm shift, we were looking for community
               shift
                   Community merge is a complementary phenomenon, but
                   rather uninteresting one
                   Thus, rather combinations of shifts with subsequent merges,
                   i.e. community merge/shifts, were investigated
               Instead of paradigm articulation, we were looking for
               community specialization
               Co-citation networks of two big camps in CS were analysed:
               Semantic Web (solution-driven) and Information Retrieval
               (problem-driven) [1]


                                                3 / 34
Introduction         Methodology   Data-Set            Results   Conclusion and FW


Outline




          1    Methodology
          2    Data-Sets
          3    Results
          4    Conclusion and Future Work




                                              4 / 34
Introduction         Methodology    Data-Set            Results   Conclusion and FW


Initial Expectations&Requirements


       The methodology was developed with a set of certain requirements
       arising from the nature of the problem:
          1    Dynamic data-set represented by snapshots of several
               consecutive time-steps
          2    Communities have to be identified in the network in each
               time-step
          3    Authors (nodes in general) have to be uniquely identified
               among all time-steps
          4    For topical analysis, meta-data (topics) describing the nodes
               are necessary




                                               5 / 34
Introduction         Methodology      Data-Set            Results   Conclusion and FW


Community Detection


               We identified communities using three popular algorithms:
                    Infomap [7]
                    Louvain [2]
                    WT [8]
               All have publicly available implementations, are able to
               operate over weighted networks, and produce non-overlapping
               communities
               In each time-step t, we identified clustering C t of n
               communities: C t = {c1 , c2 , ..., cn }, where n is determined
                                      t t          t

               automatically for each time-step




                                                 6 / 34
Introduction          Methodology            Data-Set               Results           Conclusion and FW


Tracking of Dynamic Communities
               Communities are identified independently for each time-step.
               It is thus necessary to track the evolution of each community
               in further time-steps
               Communities were matched according to the highest Jaccard
               coefficient:
                                                                      |cit ∩ cjt+1 |
                                    match(cit ) = arg max               t
                                                         cjt+1 ∈C t+1 |ci     ∪ cjt+1 |

               Important ancestors and descendants were identified by
               modified Jaccard coefficient:
                                             |cit ∩ cjt+1 |                                          |cit ∩ cjt+1 |
               ancestor (cit , cjt+1 )   =                       , descendant(cit , cjt+1 )      =
                                                |cjt+1 |                                                  |cit |

                                                        7 / 34
Introduction        Methodology      Data-Set            Results   Conclusion and FW


Visualization


               To compare and inspect the state of the network in different
               time-steps, a proper visualization is very helpful
                   Nodes that appeared previously should have similar positions
                   Colours denoting the affiliation of the node to its cluster
                   should be preserved
               As we have not found any existing tool implementing these
               requirements, we built our own one based on JUNG
               Another tool based on Graphviz was build to automatically
               create diagrams of ancestors and descendants based on
               respective relations




                                                8 / 34
Introduction         Methodology     Data-Set            Results   Conclusion and FW


Topic Detection I

               We mined keywords using NLP techniques [3] from the
               abstracts or full-texts for almost 70% of the underlying articles
               Tokenised and stemmed [6] keywords were then assigned to
               each author
               Ability of keywords to discriminate authors was ranked
               according to their frequency (TF) and uniqueness in the
               corpus (IAF): TF-IAF
               Each author a in time-step t was thus described by a
                                    t
               bag-of-words vector ka
               Topical description of cluster c was obtained by a centroid of
               its members
               Cosine similarity was used for determining topical similarity of
               two clusters

                                                9 / 34
Introduction         Methodology     Data-Set             Results   Conclusion and FW


Topic Detection II
               Interpretation of a cluster’s topic was based on characterizing
               keywords—a union of:
                   20 highest ranked keywords
                   20 most frequent keywords
               We were particularly interested in cross-community activity
               between IR and SW camps
               Definition what is IR- and what SW-related community was
               based on frequent patterns mined from the publications
               Any event detected by community topic evolution measures
               associated with both IR- and SW-related communities was
               then considered as an inter-camp dynamics
               Meta-data was used to assess the quality of clusterings—WT
               was omitted from further analysis


                                                10 / 34
Introduction          Methodology      Data-Set             Results   Conclusion and FW


Measures


               Overlap measures induce huge number of inter-reactions
               between communities
               Solution is to apply more specific measures or to use the
               simple ones in combination
               We developed and/or used two categories of measures
                 1   community life-cycle measures for measurement and
                     explanation the state and the evolution of the community
                 2   community topic evolution measures for revealing of
                     cross-community phenomena like community shift




                                                  11 / 34
Introduction            Methodology   Data-Set             Results   Conclusion and FW


Community Life-Cycle Measures


       Structural perspective:
               size S
               average vertex betweenness B, RB ∈ R+
               relative density ρ, Rρ ∈ [0, 1]
               author entropy A, RA ∈ [0, 1]
       Topical perspective:
               topic drift T , RT ∈ [0, 1]
               cluster content ratio H, RH ∈ R+




                                                 12 / 34
Introduction         Methodology        Data-Set             Results    Conclusion and FW


Community Topic Evolution Measures


               We looked for parallel changes of structure and topic of
               communities
               Structural and topical measures were combined by
               multiplication for simplicity and because the range remains
               within [0, 1]
               Community shift PS may be detected as an emergence of a
               new community topically distinct from its ancestor:

                   PS (cit , cjt+1 ) = dissim(cit , cjt+1 ) × ancestor (cit , cjt+1 )




                                                   13 / 34
Introduction        Methodology        Data-Set             Results   Conclusion and FW


Community Topic Evolution Measures II

               Community shift/merge PS/M may be detected as a merge of
               two topically distinct community:

                PS/M (cit , cjt+1 ) = dissim(cit , cjt+1 ) × descendant(cit , cjt+1 )

               Note that both PS and PS/M are defined only for two
               different communities, i.e. only if i = j
               Community topic change PC expresses a change of topic of a
               structurally stable community:

                         PC (cit ) = dissim(cit , cit+1 ) × (1 − A(cit+1 ))

               Only events with values > 0.5 and with a minimal overlap of
               10 authors were selected for deeper analysis


                                                  14 / 34
Introduction         Methodology    Data-Set             Results   Conclusion and FW


Data-Set
               We first picked a set of major conferences in both fields
               We then selected publications from these conferences from
               DBLP for 2000–2009
               Co-citation network of 5772 authors and 817642 edges over
               all years was extracted
               3-year time-steps with 2-year overlap: 2000–2002,
               2001–2003, . . .
               Total number of articles was 39314 for which we were able to
               scrape 22975 abstracts and 3740 full-texts
                   Nearly 70% coverage by content
               We scraped 18313 author-provided keywords for 4102 distinct
               articles
                   Coverage by these high-quality meta-data was 10%
               We mined 263742 keywords from abstracts and full-texts

                                               15 / 34
Introduction        Methodology      Data-Set             Results   Conclusion and FW


Shift of Louvain Community 26

               Emergence of Louvain community 26 was identified as an
                                             .
               inter-camp community shift PS = 0.62 in 2006
               It was formed by 80% of community 6 “web IR” and by 20%
               of community 5 “SW”
               The keywords in 2006 like “navigation”, “personalization”,
               and “semantic web” suggests transdisciplinary topics
               Massive influence of community 15 “SW and IR” in 2007 and
               a change of topic towards “SW and business processes”
                                                   .
                   Observed as a low topic drift T = 0.29
               IR-related keywords appeared again among characterizing
               keywords in 2008
                                            .
                   Topic then stabilized: T = 0.65



                                                16 / 34
Introduction     Methodology          Data-Set                      Results               Conclusion and FW


Evolution of Louvain Community 26
       Communities 6 “web information retrieval”, 5 “semantic web”,
       15 “semantic web and information retrieval” and their descendant
       community 26
                       2005–2007     2006-2008          2007–2009             2008–2009


                           c5              c5                 c5

                                      20                2.8                    48.6



                           c6   80         c26 4.7            c26 51.4              c26


                                                 90.6                         8.3

                                           c15                c15                   c15




                                                   17 / 34
Introduction      Methodology     Data-Set             Results   Conclusion and FW


Position of Louvain Community 26 in 2006 and 2007




         Communities 6 “web information retrieval” (pink), 5 “semantic
         web” (red), 15 “semantic web and information retrieval” (violet)
                   and their descendant community 26 (green)

                                             18 / 34
Introduction         Methodology    Data-Set             Results   Conclusion and FW


Specialization of Infomap Community 9



               First oriented on general and core SW-related topics in 2000
               Between 2002–2004 we identified 3 shifts
                   One of these shifts was community 99 “semantic desktop and
                   personalization”
               The community itself then specialized on “SW services”
               S,T , and H provided valuable insights
               ρ, B, and A did not seem to provide any further insights




                                               19 / 34
Introduction                 Methodology        Data-Set             Results         Conclusion and FW


Life-Cycle Measures of Infomap Community 9

                               2                                               4500       ρ
                                                                                         H
                             1.8                                               4000      B
                             1.6                                               3500      S
                                                                                         A
                             1.4
                                                                               3000      T
               H, T , A, ρ




                             1.2
                                                                               2500




                                                                                              B, S
                               1
                                                                               2000
                             0.8
                                                                               1500
                             0.6
                             0.4                                               1000

                             0.2                                               500
                               0                                             0
                                2000 2001 2002 2003 2004 2005 2006 2007 2008

                                                   time



                                                           20 / 34
Introduction                 Methodology          Data-Set              Results                Conclusion and FW


Life-Cycle Measures of Infomap Community 99

                             1.6                                                         1000       ρ
                                                                                                   H
                                                                                         900
                             1.4                                                                   B
                                                                                         800       S
                             1.2                                                                   A
                                                                                         700
                                                                                                   T
               H, T , A, ρ




                               1                                                         600




                                                                                                        B, S
                                                                                         500
                             0.8                                                         400
                             0.6                                                         300
                                                                                         200
                             0.4
                                                                                         100
                             0.2                                                         0
                                2003       2004   2005       2006      2007       2008

                                                     time



                                                             21 / 34
Introduction         Methodology      Data-Set             Results   Conclusion and FW


Shift/Merge of Community 86
                                              .
               We identified shift/merge PS/M = 0.91 of community 86
               with community 0
               Both communities were concerned with IR-related topics, but
               each had its specific theme:
                    86 being more focused on “development”, “engine”, and
                    “system”
                    0 being more focused on “question answering”
               90.9% of authors from 86 moved to community 0
                                  .
               Relative density ρ = 0.47 and high cluster content ratio
                  .
               H = 1.91 suggests it was topically coherent, but structurally
               weak
               It is not possible to generalize the suitability of any life-cycle
               measures as we have identified only one shift/merge

                                                 22 / 34
Introduction     Methodology         Data-Set             Results      Conclusion and FW


Tag Clouds of Communities 86 and 0

         community      characterising keywords
          2002
         c86            intuitive, development, ir, retrieval, control, imple-
                        mented, describing, high-dimensional, reducing, engine, execu-

                        tion, advanced, information, system, multi-

                        dimensional, image, usin, accurate, time, precise, features,
                        queries, service, dataset, document, analysis, large, structure,
                        cluster, and, web, processing

                        resolution, evaluation, passages, architecture, question, qa,
          2003
         c0
                        patterns, definitions, development, trec, mit, candidates, linguis-
                        tic, retrieval, answering, system, analysis, javelin,
                        modules, advanced, methods, science, information, approaches, pro-
                        cessing, using, computer, language, techniques




                                                23 / 34
Introduction        Methodology     Data-Set             Results   Conclusion and FW


Change of topic of Infomap community 54
                                                    .
               Inter-camp community topic change PC = 0.58 was identified
               for Infomap community 54 between 2005 and 2006
               The topic changed from “knowledge management” and
               “information extraction” towards “knowledge querying” and
               “semantic web”
               Zero author entropy A suggests this might have been caused
               by new members joining the community
                   34.5% were completely new, i.e. they did not come from any
                   previous community
                   20.7% coming from 54 “knowledge management and
                   information extraction”
                   17.2% coming from 29 “ontologies and SW”
                   6.9% coming from 70 “ontologies and folksonomies”
                   6.9% coming from 112 “semantic web services”


                                               24 / 34
Introduction     Methodology         Data-Set             Results      Conclusion and FW


Tag Clouds of Infomap community 54

         community      characterising keywords
          2005
         c54            organizational, kms, sw, capturing, environment, working, ie,
                        acquisition, wikifactory, legacy, manager, goal, seman-

                        tic, tool, cooperative, layers, healthier, defining, quantitative,
                        knowledge, web, text, learning, techniques, computer, sup-
                        porting, science, machine, documents, information, system
          2006
         c54            ontologies, language, query, specification, knowl-
                        edge, manager, semantic, pure, capturing, data,
                        search, keyword, layers, keyword-based, hybrid, archi-
                        tecture, spreadsheet, web, ie, application, informa-
                        tion, modelling, approach, algorithm, using, methodic, retrieval,
                        service, system, structures




                                                25 / 34
Introduction         Methodology     Data-Set             Results   Conclusion and FW


Emergence of Intermediary Louvain Community 15

               The most complex scenario we investigated
               It first emerged as a descendant of community 4 “IR” with
               topic “cross-language IR”, which was identified as a
                                    .
               community shift PS = 0.55 in 2003
               Since 2004, this community was under a massive influence of
               community 5 “SW”, which caused a change towards
                                    .
               SW-related topics PC = 0.31
               Since 2005, IR-related keywords appeared again among
               characterizing keywords, while those keywords disappeared in
               community 5
               Therefore, whereas community 5 kept its focus on the core
               SW-related topics, it largely participated in forming of a new
               interdisciplinary community

                                                26 / 34
Introduction         Methodology       Data-Set             Results     Conclusion and FW


Betweenness of Louvain Community 15


               Despite of being still focused on mainly SW-related topics,
               community 15 worked as an intermediary of both camps
               This hypothesis is supported by high average author
               betweenness B

                                      2004–2006                      2007–2009
                                   S       B                      S       B
                     c15           444   1591.01659               445   2535.02
               entire network      2776 2066.70764                2190 2192.85117




                                                  27 / 34
Introduction     Methodology    Data-Set             Results   Conclusion and FW


Position of Louvain Community 15 in 2004 and 2007




       Community 5 “SW” (red—left side), “IR” communities 0, 4, 6 and
       9 (grey, beige, pink and red—right side, respectively) and their
       intermediary community 15 (violet)


                                           28 / 34
Introduction         Methodology     Data-Set             Results   Conclusion and FW


Conclusion and Future Work I

               We presented a general and scalable methodology for analysis
               of cross-community phenomena uniquely combining
               topological and content analysis and supported by special
               visualization techniques
               Three community topic evolution measures tailored for
               identifying phenomena like community shift, shift/merge, and
               change of topic were proposed and successfully assessed
                   Community shift and topic change were detected quite
                   commonly, which suggests that they are part of many
                   community life-cycles
                   Community shift/merge was detected very rarely, which either
                   means we have to improve the measure or that this is simply a
                   rare phenomenon
               We proposed life-cycle measures characterising the states and
               evolution of communities

                                                29 / 34
Introduction         Methodology      Data-Set             Results   Conclusion and FW


Conclusion and Future Work II
                   The assessment showed that average vertex betweenness,
                   relative density, cluster content ratio, and topic drift offered
                   valuable insights into the phenomena revealed by community
                   topic evolution measures
               We observed strong shifts PS → 1, when the shifted
               community disappeared in the next time-step
                   These strong shifts had usually very different but coherent
                   topics
                   They might have been the initial sources of new topics or even
                   research streams
               Frequently, a newly emerged community had quite weak
               structure (low ρ, high A) and/or topic (low T ), while these
               characteristics then improved in the subsequent time-steps
               B seems to be a good measure for identification of
               intermediary communities

                                                 30 / 34
Introduction         Methodology     Data-Set             Results   Conclusion and FW


Conclusion and Future Work III

               We intend to cluster the community life-cycles by the
               characteristic events expressed by all the measures
                   We expect this to provide an automated way of extracting
                   life-cycle taxonomies
               The combination of content and structural analysis allowed us
               to assess the quality of clustering revealed only by inspection
               of structure of the network
                   We consider this original approach as a fertile ground for
                   future research
               We plan to use other algorithms—e.g. co-clustering algorithm
               of both content and objects [4]
               We will extend the whole work to a larger data-set



                                                31 / 34
Introduction         Methodology    Data-Set             Results   Conclusion and FW


References I

               R. Baeza-Yates, P. Mika, and H. Zaragoza.
               Search, Web 2.0, and the Semantic Web.
               IEEE Intelligent Systems, 23(1):80–82, 2008.
               Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte,
               and Etienne Lefebvre.
               Fast unfolding of communities in large networks.
               Journal of Statistical Mechanics: Theory and Experiment,
               P10008, 2008.
               Georgeta Bordea.
               The Semantic Web: Research and Applications, chapter
               Concept Extraction Applied to the Task of Expert Finding ,
               pages 451–456.
               Springer, 2010.

                                               32 / 34
Introduction         Methodology    Data-Set             Results   Conclusion and FW


References II


               Derek Greene and P´draig Cunningham.
                                   a
               Spectral Co-Clustering for Dynamic Bipartite Graphs.
               Technical report, School of Computer Science & Informatics,
               UCD, 2010.
               Th. S. Kuhn.
               The Structure of Scientific Revolutions.
               University Of Chicago Press, December 1996.
               Martin F. Porter.
               An algorithm for suffix stripping.
               Program, 14:130–137, 1980.




                                               33 / 34
Introduction         Methodology    Data-Set             Results   Conclusion and FW


References III


               Martin Rosvall and Carl T. Bergstrom.
               Maps of random walks on complex networks reveal community
               structure.
               In National Academy of Sciences USA, volume 105, pages
               1118–1123, 2008.
               Ken Wakita and Toshiyuki Tsurumi.
               Finding community structure in a mega-scale social networking
               service.
               In IADIS international conference on WWW/Internet 2007,
               pages 153–162, 2007.




                                               34 / 34

Más contenido relacionado

La actualidad más candente

Secure Communication with Privacy Preservation in VANET
Secure Communication with Privacy Preservation in VANETSecure Communication with Privacy Preservation in VANET
Secure Communication with Privacy Preservation in VANET
Ankit Gupta
 

La actualidad más candente (19)

Secure Communication with Privacy Preservation in VANET
Secure Communication with Privacy Preservation in VANETSecure Communication with Privacy Preservation in VANET
Secure Communication with Privacy Preservation in VANET
 
Towards Seamless TCP Congestion Avoidance in Multiprotocol Environments
Towards Seamless TCP Congestion Avoidance in Multiprotocol EnvironmentsTowards Seamless TCP Congestion Avoidance in Multiprotocol Environments
Towards Seamless TCP Congestion Avoidance in Multiprotocol Environments
 
Gh2411361141
Gh2411361141Gh2411361141
Gh2411361141
 
딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Et25897899
Et25897899Et25897899
Et25897899
 
A DWT, DCT AND SVD BASED WATERMARKING TECHNIQUE TO PROTECT THE IMAGE PIRACY
A DWT, DCT AND SVD BASED WATERMARKING TECHNIQUE TO PROTECT THE IMAGE PIRACYA DWT, DCT AND SVD BASED WATERMARKING TECHNIQUE TO PROTECT THE IMAGE PIRACY
A DWT, DCT AND SVD BASED WATERMARKING TECHNIQUE TO PROTECT THE IMAGE PIRACY
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
Time domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter designTime domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter design
 
1 sati
1 sati1 sati
1 sati
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
F505052131
F505052131F505052131
F505052131
 
Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protectionAdvance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
 
BLIND WATERMARKING SCHEME BASED ON RDWT-DCT FOR COLOR IMAGES
BLIND WATERMARKING SCHEME BASED ON RDWT-DCT FOR COLOR IMAGES BLIND WATERMARKING SCHEME BASED ON RDWT-DCT FOR COLOR IMAGES
BLIND WATERMARKING SCHEME BASED ON RDWT-DCT FOR COLOR IMAGES
 
Speaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesSpeaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia Databases
 

Similar a Life-Cycles and Mutual Effects of Scientific Communities: ASNA 2010

Presentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesPresentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands Graphes
Juan David Cruz-Gómez
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detection
butest
 
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Martin Chapman
 
Pattern Recognition in Multiple Bike sharing Systems for comparability
Pattern Recognition in Multiple Bike sharing Systems for comparabilityPattern Recognition in Multiple Bike sharing Systems for comparability
Pattern Recognition in Multiple Bike sharing Systems for comparability
Athiq Ahamed
 
Gephi icwsm-tutorial
Gephi icwsm-tutorialGephi icwsm-tutorial
Gephi icwsm-tutorial
csedays
 
Cs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_miningCs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_mining
hari91
 

Similar a Life-Cycles and Mutual Effects of Scientific Communities: ASNA 2010 (20)

Presentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesPresentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands Graphes
 
Assessing data dissemination strategies
Assessing data dissemination strategiesAssessing data dissemination strategies
Assessing data dissemination strategies
 
Automated Experimentation in Social Informatics
Automated Experimentation in Social InformaticsAutomated Experimentation in Social Informatics
Automated Experimentation in Social Informatics
 
MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
 
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
 
CFAR-m Presentation English
CFAR-m Presentation EnglishCFAR-m Presentation English
CFAR-m Presentation English
 
Jürgens diata12-communities
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communities
 
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
 
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
Energy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networksEnergy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networks
 
[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...
[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...
[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detection
 
Quantum computing and machine learning overview
Quantum computing and machine learning overviewQuantum computing and machine learning overview
Quantum computing and machine learning overview
 
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
 
Pattern Recognition in Multiple Bike sharing Systems for comparability
Pattern Recognition in Multiple Bike sharing Systems for comparabilityPattern Recognition in Multiple Bike sharing Systems for comparability
Pattern Recognition in Multiple Bike sharing Systems for comparability
 
A Novel Approach for User Search Results Using Feedback Sessions
A Novel Approach for User Search Results Using Feedback  SessionsA Novel Approach for User Search Results Using Feedback  Sessions
A Novel Approach for User Search Results Using Feedback Sessions
 
Gephi icwsm-tutorial
Gephi icwsm-tutorialGephi icwsm-tutorial
Gephi icwsm-tutorial
 
Cs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_miningCs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_mining
 

Más de Václav Belák

Towards Maximising Cross-Community Information Diffusion
Towards Maximising Cross-Community Information DiffusionTowards Maximising Cross-Community Information Diffusion
Towards Maximising Cross-Community Information Diffusion
Václav Belák
 
Targeting Communities to Maximise Information Diffusion
Targeting Communities to Maximise Information DiffusionTargeting Communities to Maximise Information Diffusion
Targeting Communities to Maximise Information Diffusion
Václav Belák
 
Cross-Community Influence in Discussion Fora
Cross-Community Influence in Discussion ForaCross-Community Influence in Discussion Fora
Cross-Community Influence in Discussion Fora
Václav Belák
 
Life-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 poster
Life-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 posterLife-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 poster
Life-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 poster
Václav Belák
 

Más de Václav Belák (6)

Vaclav Belak PhD Viva
Vaclav Belak PhD VivaVaclav Belak PhD Viva
Vaclav Belak PhD Viva
 
Towards Maximising Cross-Community Information Diffusion
Towards Maximising Cross-Community Information DiffusionTowards Maximising Cross-Community Information Diffusion
Towards Maximising Cross-Community Information Diffusion
 
Targeting Communities to Maximise Information Diffusion
Targeting Communities to Maximise Information DiffusionTargeting Communities to Maximise Information Diffusion
Targeting Communities to Maximise Information Diffusion
 
Cross-Community Influence in Discussion Fora
Cross-Community Influence in Discussion ForaCross-Community Influence in Discussion Fora
Cross-Community Influence in Discussion Fora
 
Life-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 poster
Life-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 posterLife-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 poster
Life-Cycles and Mutual Effects of Scientific Communities: RSWebSci2010 poster
 
Supporting Self-Organization in Politics by the Semantic Web Technologies
Supporting Self-Organization in Politics by the Semantic Web TechnologiesSupporting Self-Organization in Politics by the Semantic Web Technologies
Supporting Self-Organization in Politics by the Semantic Web Technologies
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Life-Cycles and Mutual Effects of Scientific Communities: ASNA 2010

  • 1. Life-Cycles and Mutual Effects of Scientific Communities V´clav Bel´k, Marcel Karnstedt, Conor Hayes a a Digital Enterprise Research Institute NUI Galway ASNA 2010, Z¨rich u
  • 2. Introduction Methodology Data-Set Results Conclusion and FW Motivation Progress in science is often measured by citation measures, which are relatively static Detection and explanation of evolution and life-cycles provides better arguments for the progress Previous approaches focused mainly on analysing co-citation graphs or textual clustering Little work on analysis of cross-community effects Kuhn [5] claimed the development of scientific knowledge proceeds in discrete steps: Pre-paradigm period Paradigm period—normal science Crisis Reaction to the crisis—paradigm shift 1 / 34
  • 3. Introduction Methodology Data-Set Results Conclusion and FW Cross-Community Effects I Expected Phenomena Expected Phenomena Clique: Graph & Network Analysis Cluster Clique: Graph & Network Analysis Cluster ParadigmParadigm shift shift Paradigm merge Paradigm merge (a) Community shift (b) Community merge (with community shift) 2 / 34
  • 4. Introduction Methodology Data-Set Results Conclusion and FW Cross-Community Effects II Although inspired by Kuhn, we expected evolution of communities in rather an alleviated form Instead of paradigm shift, we were looking for community shift Community merge is a complementary phenomenon, but rather uninteresting one Thus, rather combinations of shifts with subsequent merges, i.e. community merge/shifts, were investigated Instead of paradigm articulation, we were looking for community specialization Co-citation networks of two big camps in CS were analysed: Semantic Web (solution-driven) and Information Retrieval (problem-driven) [1] 3 / 34
  • 5. Introduction Methodology Data-Set Results Conclusion and FW Outline 1 Methodology 2 Data-Sets 3 Results 4 Conclusion and Future Work 4 / 34
  • 6. Introduction Methodology Data-Set Results Conclusion and FW Initial Expectations&Requirements The methodology was developed with a set of certain requirements arising from the nature of the problem: 1 Dynamic data-set represented by snapshots of several consecutive time-steps 2 Communities have to be identified in the network in each time-step 3 Authors (nodes in general) have to be uniquely identified among all time-steps 4 For topical analysis, meta-data (topics) describing the nodes are necessary 5 / 34
  • 7. Introduction Methodology Data-Set Results Conclusion and FW Community Detection We identified communities using three popular algorithms: Infomap [7] Louvain [2] WT [8] All have publicly available implementations, are able to operate over weighted networks, and produce non-overlapping communities In each time-step t, we identified clustering C t of n communities: C t = {c1 , c2 , ..., cn }, where n is determined t t t automatically for each time-step 6 / 34
  • 8. Introduction Methodology Data-Set Results Conclusion and FW Tracking of Dynamic Communities Communities are identified independently for each time-step. It is thus necessary to track the evolution of each community in further time-steps Communities were matched according to the highest Jaccard coefficient: |cit ∩ cjt+1 | match(cit ) = arg max t cjt+1 ∈C t+1 |ci ∪ cjt+1 | Important ancestors and descendants were identified by modified Jaccard coefficient: |cit ∩ cjt+1 | |cit ∩ cjt+1 | ancestor (cit , cjt+1 ) = , descendant(cit , cjt+1 ) = |cjt+1 | |cit | 7 / 34
  • 9. Introduction Methodology Data-Set Results Conclusion and FW Visualization To compare and inspect the state of the network in different time-steps, a proper visualization is very helpful Nodes that appeared previously should have similar positions Colours denoting the affiliation of the node to its cluster should be preserved As we have not found any existing tool implementing these requirements, we built our own one based on JUNG Another tool based on Graphviz was build to automatically create diagrams of ancestors and descendants based on respective relations 8 / 34
  • 10. Introduction Methodology Data-Set Results Conclusion and FW Topic Detection I We mined keywords using NLP techniques [3] from the abstracts or full-texts for almost 70% of the underlying articles Tokenised and stemmed [6] keywords were then assigned to each author Ability of keywords to discriminate authors was ranked according to their frequency (TF) and uniqueness in the corpus (IAF): TF-IAF Each author a in time-step t was thus described by a t bag-of-words vector ka Topical description of cluster c was obtained by a centroid of its members Cosine similarity was used for determining topical similarity of two clusters 9 / 34
  • 11. Introduction Methodology Data-Set Results Conclusion and FW Topic Detection II Interpretation of a cluster’s topic was based on characterizing keywords—a union of: 20 highest ranked keywords 20 most frequent keywords We were particularly interested in cross-community activity between IR and SW camps Definition what is IR- and what SW-related community was based on frequent patterns mined from the publications Any event detected by community topic evolution measures associated with both IR- and SW-related communities was then considered as an inter-camp dynamics Meta-data was used to assess the quality of clusterings—WT was omitted from further analysis 10 / 34
  • 12. Introduction Methodology Data-Set Results Conclusion and FW Measures Overlap measures induce huge number of inter-reactions between communities Solution is to apply more specific measures or to use the simple ones in combination We developed and/or used two categories of measures 1 community life-cycle measures for measurement and explanation the state and the evolution of the community 2 community topic evolution measures for revealing of cross-community phenomena like community shift 11 / 34
  • 13. Introduction Methodology Data-Set Results Conclusion and FW Community Life-Cycle Measures Structural perspective: size S average vertex betweenness B, RB ∈ R+ relative density ρ, Rρ ∈ [0, 1] author entropy A, RA ∈ [0, 1] Topical perspective: topic drift T , RT ∈ [0, 1] cluster content ratio H, RH ∈ R+ 12 / 34
  • 14. Introduction Methodology Data-Set Results Conclusion and FW Community Topic Evolution Measures We looked for parallel changes of structure and topic of communities Structural and topical measures were combined by multiplication for simplicity and because the range remains within [0, 1] Community shift PS may be detected as an emergence of a new community topically distinct from its ancestor: PS (cit , cjt+1 ) = dissim(cit , cjt+1 ) × ancestor (cit , cjt+1 ) 13 / 34
  • 15. Introduction Methodology Data-Set Results Conclusion and FW Community Topic Evolution Measures II Community shift/merge PS/M may be detected as a merge of two topically distinct community: PS/M (cit , cjt+1 ) = dissim(cit , cjt+1 ) × descendant(cit , cjt+1 ) Note that both PS and PS/M are defined only for two different communities, i.e. only if i = j Community topic change PC expresses a change of topic of a structurally stable community: PC (cit ) = dissim(cit , cit+1 ) × (1 − A(cit+1 )) Only events with values > 0.5 and with a minimal overlap of 10 authors were selected for deeper analysis 14 / 34
  • 16. Introduction Methodology Data-Set Results Conclusion and FW Data-Set We first picked a set of major conferences in both fields We then selected publications from these conferences from DBLP for 2000–2009 Co-citation network of 5772 authors and 817642 edges over all years was extracted 3-year time-steps with 2-year overlap: 2000–2002, 2001–2003, . . . Total number of articles was 39314 for which we were able to scrape 22975 abstracts and 3740 full-texts Nearly 70% coverage by content We scraped 18313 author-provided keywords for 4102 distinct articles Coverage by these high-quality meta-data was 10% We mined 263742 keywords from abstracts and full-texts 15 / 34
  • 17. Introduction Methodology Data-Set Results Conclusion and FW Shift of Louvain Community 26 Emergence of Louvain community 26 was identified as an . inter-camp community shift PS = 0.62 in 2006 It was formed by 80% of community 6 “web IR” and by 20% of community 5 “SW” The keywords in 2006 like “navigation”, “personalization”, and “semantic web” suggests transdisciplinary topics Massive influence of community 15 “SW and IR” in 2007 and a change of topic towards “SW and business processes” . Observed as a low topic drift T = 0.29 IR-related keywords appeared again among characterizing keywords in 2008 . Topic then stabilized: T = 0.65 16 / 34
  • 18. Introduction Methodology Data-Set Results Conclusion and FW Evolution of Louvain Community 26 Communities 6 “web information retrieval”, 5 “semantic web”, 15 “semantic web and information retrieval” and their descendant community 26 2005–2007 2006-2008 2007–2009 2008–2009 c5 c5 c5 20 2.8 48.6 c6 80 c26 4.7 c26 51.4 c26 90.6 8.3 c15 c15 c15 17 / 34
  • 19. Introduction Methodology Data-Set Results Conclusion and FW Position of Louvain Community 26 in 2006 and 2007 Communities 6 “web information retrieval” (pink), 5 “semantic web” (red), 15 “semantic web and information retrieval” (violet) and their descendant community 26 (green) 18 / 34
  • 20. Introduction Methodology Data-Set Results Conclusion and FW Specialization of Infomap Community 9 First oriented on general and core SW-related topics in 2000 Between 2002–2004 we identified 3 shifts One of these shifts was community 99 “semantic desktop and personalization” The community itself then specialized on “SW services” S,T , and H provided valuable insights ρ, B, and A did not seem to provide any further insights 19 / 34
  • 21. Introduction Methodology Data-Set Results Conclusion and FW Life-Cycle Measures of Infomap Community 9 2 4500 ρ H 1.8 4000 B 1.6 3500 S A 1.4 3000 T H, T , A, ρ 1.2 2500 B, S 1 2000 0.8 1500 0.6 0.4 1000 0.2 500 0 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 time 20 / 34
  • 22. Introduction Methodology Data-Set Results Conclusion and FW Life-Cycle Measures of Infomap Community 99 1.6 1000 ρ H 900 1.4 B 800 S 1.2 A 700 T H, T , A, ρ 1 600 B, S 500 0.8 400 0.6 300 200 0.4 100 0.2 0 2003 2004 2005 2006 2007 2008 time 21 / 34
  • 23. Introduction Methodology Data-Set Results Conclusion and FW Shift/Merge of Community 86 . We identified shift/merge PS/M = 0.91 of community 86 with community 0 Both communities were concerned with IR-related topics, but each had its specific theme: 86 being more focused on “development”, “engine”, and “system” 0 being more focused on “question answering” 90.9% of authors from 86 moved to community 0 . Relative density ρ = 0.47 and high cluster content ratio . H = 1.91 suggests it was topically coherent, but structurally weak It is not possible to generalize the suitability of any life-cycle measures as we have identified only one shift/merge 22 / 34
  • 24. Introduction Methodology Data-Set Results Conclusion and FW Tag Clouds of Communities 86 and 0 community characterising keywords 2002 c86 intuitive, development, ir, retrieval, control, imple- mented, describing, high-dimensional, reducing, engine, execu- tion, advanced, information, system, multi- dimensional, image, usin, accurate, time, precise, features, queries, service, dataset, document, analysis, large, structure, cluster, and, web, processing resolution, evaluation, passages, architecture, question, qa, 2003 c0 patterns, definitions, development, trec, mit, candidates, linguis- tic, retrieval, answering, system, analysis, javelin, modules, advanced, methods, science, information, approaches, pro- cessing, using, computer, language, techniques 23 / 34
  • 25. Introduction Methodology Data-Set Results Conclusion and FW Change of topic of Infomap community 54 . Inter-camp community topic change PC = 0.58 was identified for Infomap community 54 between 2005 and 2006 The topic changed from “knowledge management” and “information extraction” towards “knowledge querying” and “semantic web” Zero author entropy A suggests this might have been caused by new members joining the community 34.5% were completely new, i.e. they did not come from any previous community 20.7% coming from 54 “knowledge management and information extraction” 17.2% coming from 29 “ontologies and SW” 6.9% coming from 70 “ontologies and folksonomies” 6.9% coming from 112 “semantic web services” 24 / 34
  • 26. Introduction Methodology Data-Set Results Conclusion and FW Tag Clouds of Infomap community 54 community characterising keywords 2005 c54 organizational, kms, sw, capturing, environment, working, ie, acquisition, wikifactory, legacy, manager, goal, seman- tic, tool, cooperative, layers, healthier, defining, quantitative, knowledge, web, text, learning, techniques, computer, sup- porting, science, machine, documents, information, system 2006 c54 ontologies, language, query, specification, knowl- edge, manager, semantic, pure, capturing, data, search, keyword, layers, keyword-based, hybrid, archi- tecture, spreadsheet, web, ie, application, informa- tion, modelling, approach, algorithm, using, methodic, retrieval, service, system, structures 25 / 34
  • 27. Introduction Methodology Data-Set Results Conclusion and FW Emergence of Intermediary Louvain Community 15 The most complex scenario we investigated It first emerged as a descendant of community 4 “IR” with topic “cross-language IR”, which was identified as a . community shift PS = 0.55 in 2003 Since 2004, this community was under a massive influence of community 5 “SW”, which caused a change towards . SW-related topics PC = 0.31 Since 2005, IR-related keywords appeared again among characterizing keywords, while those keywords disappeared in community 5 Therefore, whereas community 5 kept its focus on the core SW-related topics, it largely participated in forming of a new interdisciplinary community 26 / 34
  • 28. Introduction Methodology Data-Set Results Conclusion and FW Betweenness of Louvain Community 15 Despite of being still focused on mainly SW-related topics, community 15 worked as an intermediary of both camps This hypothesis is supported by high average author betweenness B 2004–2006 2007–2009 S B S B c15 444 1591.01659 445 2535.02 entire network 2776 2066.70764 2190 2192.85117 27 / 34
  • 29. Introduction Methodology Data-Set Results Conclusion and FW Position of Louvain Community 15 in 2004 and 2007 Community 5 “SW” (red—left side), “IR” communities 0, 4, 6 and 9 (grey, beige, pink and red—right side, respectively) and their intermediary community 15 (violet) 28 / 34
  • 30. Introduction Methodology Data-Set Results Conclusion and FW Conclusion and Future Work I We presented a general and scalable methodology for analysis of cross-community phenomena uniquely combining topological and content analysis and supported by special visualization techniques Three community topic evolution measures tailored for identifying phenomena like community shift, shift/merge, and change of topic were proposed and successfully assessed Community shift and topic change were detected quite commonly, which suggests that they are part of many community life-cycles Community shift/merge was detected very rarely, which either means we have to improve the measure or that this is simply a rare phenomenon We proposed life-cycle measures characterising the states and evolution of communities 29 / 34
  • 31. Introduction Methodology Data-Set Results Conclusion and FW Conclusion and Future Work II The assessment showed that average vertex betweenness, relative density, cluster content ratio, and topic drift offered valuable insights into the phenomena revealed by community topic evolution measures We observed strong shifts PS → 1, when the shifted community disappeared in the next time-step These strong shifts had usually very different but coherent topics They might have been the initial sources of new topics or even research streams Frequently, a newly emerged community had quite weak structure (low ρ, high A) and/or topic (low T ), while these characteristics then improved in the subsequent time-steps B seems to be a good measure for identification of intermediary communities 30 / 34
  • 32. Introduction Methodology Data-Set Results Conclusion and FW Conclusion and Future Work III We intend to cluster the community life-cycles by the characteristic events expressed by all the measures We expect this to provide an automated way of extracting life-cycle taxonomies The combination of content and structural analysis allowed us to assess the quality of clustering revealed only by inspection of structure of the network We consider this original approach as a fertile ground for future research We plan to use other algorithms—e.g. co-clustering algorithm of both content and objects [4] We will extend the whole work to a larger data-set 31 / 34
  • 33. Introduction Methodology Data-Set Results Conclusion and FW References I R. Baeza-Yates, P. Mika, and H. Zaragoza. Search, Web 2.0, and the Semantic Web. IEEE Intelligent Systems, 23(1):80–82, 2008. Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008, 2008. Georgeta Bordea. The Semantic Web: Research and Applications, chapter Concept Extraction Applied to the Task of Expert Finding , pages 451–456. Springer, 2010. 32 / 34
  • 34. Introduction Methodology Data-Set Results Conclusion and FW References II Derek Greene and P´draig Cunningham. a Spectral Co-Clustering for Dynamic Bipartite Graphs. Technical report, School of Computer Science & Informatics, UCD, 2010. Th. S. Kuhn. The Structure of Scientific Revolutions. University Of Chicago Press, December 1996. Martin F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 1980. 33 / 34
  • 35. Introduction Methodology Data-Set Results Conclusion and FW References III Martin Rosvall and Carl T. Bergstrom. Maps of random walks on complex networks reveal community structure. In National Academy of Sciences USA, volume 105, pages 1118–1123, 2008. Ken Wakita and Toshiyuki Tsurumi. Finding community structure in a mega-scale social networking service. In IADIS international conference on WWW/Internet 2007, pages 153–162, 2007. 34 / 34