On cluster stability

•Descargar como PPTX, PDF•

2 recomendaciones•1,682 vistas

To ensure that publications are assigned to clusters in a meaningful way, we introduce the notion of stable clusters. Essentially, a cluster is stable if it is insensitive to small changes in the underlying data. Bootstrapping is used to make small changes in the data. It is shown that if we want to have an accurate and detailed clustering, we need to be satisfied with a clustering that doesn’t comprehensively cover all publications. Publications that do not clearly belong to one of the main topics in a field cannot be assigned to a cluster.

Ciencias

On cluster stability
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
15th International Conference on Scientometrics & Informetrics
Istanbul, Turkey, June 30, 2015

Introduction
• A clustering technique can be used to obtain highly
detailed clustering results (i.e., a large number of
clusters)
• A clustering technique can be used to force each
publication to be assigned to a cluster
• However, in a highly detailed clustering, is the
assignment of publications to clusters still meaningful?
1

Cluster stability
• To ensure that publications are assigned to clusters in a
meaningful way, we introduce the notion of stable
clusters
• Essentially, a cluster is stable if it is insensitive to small
changes in the underlying data
• Bootstrapping is used to make small changes in the data
3

Identification of stable clusters:
Step 1
• Collect the citation network of publications
• Create a large number (e.g., 100) of bootstrap citation
networks:
– A bootstrap citation network is a weighted variant of the original citation
network in which each edge has an integer weight drawn from a
Poisson distribution with mean 1 (cf. Rosvall & Bergstrom, 2009)
• In each bootstrap citation network, perform clustering
• For each pair of publications, calculate the proportion of
the bootstrap clustering results in which the publications
are in the same cluster
4

5
Original network Bootstrap networks
1
1
1
0
1
1
2
1
1
0
1
3
1
1
1 2
1
1
1
1
0
1
1
3
1
0
4
1
1
1
2 2
1
1
1
1
0
2
1
0
0
1
3
1
1
0
1 1
Clustering
1
1
1
0
1
1
2
1
1
0
1
3
1
0
1 2
1
1
0
0
1
1
1
3
0
0
4
1
1
1
0 2
1
1
1
1
0
1
1
0
0
1
3
1
2
0
1 1
1.0
0.9
0.9
0.4
0.6
0.9
0.9
0.9
0.1
0.1
0.9
1.0
0.9
0.5
0.9 1.0
Weighted network Clustered bootstrap networks

Identification of stable clusters:
Step 2
• Create a network of publications with an edge between
two publications if the publications are in the same
cluster in at least a certain proportion (e.g., 0.9) of the
bootstrap clustering results
• Identify connected components in the newly created
network
• Each connected component represents a stable cluster
6

1.0
0.9
0.9
0.4
0.6
0.9
0.9
0.9
0.1
0.1
0.9
1.0
0.9
0.5
0.9 1.0
Weighted network
7
Binary network
Connected components
Stable clusters

Data
• Library & Information Sciences (LIS):
– Time period: 1996-2013
– Publications: 31,534
– Citation links: 131,266
• Astrophysics (Berlin dataset):
– Time period: 2003-2010
– Publications: 101,828
– Citation links: 924,171
8

Conclusions
• If we want to have an accurate and detailed clustering,
we need to be satisfied with a clustering that doesn’t
comprehensively cover all publications
• Publications that do not clearly belong to one of the main
topics in a field cannot be assigned to a cluster
• Cluster stability analysis can be used to distinguish
between meaningful and non-meaningful assignments of
publications to clusters
14

References
Rosvall, M., & Bergstrom, C.T. (2009). Mapping change in large
networks. PLoS ONE, 5(1), e8694.
http://dx.doi.org/10.1371/journal.pone.0008694
Waltman, L., & Van Eck, N.J. (2012). A new methodology for
constructing a publication-level classification system of
science. JASIST, 63(12), 2378-2392.
http://dx.doi.org/10.1002/asi.22748
Waltman, L., & Van Eck, N.J. (2013). A smart local moving
algorithm for large-scale modularity-based community
detection. European Physical Journal B, 86(11), 471.
http://dx.doi.org/10.1140/epjb/e2013-40829-0
16

Más contenido relacionado

La actualidad más candente

Large-scale analysis of bibliometric data sourcesNees Jan van Eck

Multiple perspectives on bibliometric dataNees Jan van Eck

Large-scale analysis of bibliometric networksNees Jan van Eck

Advanced citation matching and large-scale cited reference extractionNees Jan van Eck

CWTS Leiden Ranking: An advanced bibliometric approach to university rankingNees Jan van Eck

Science Mapping and Research PositioningNees Jan van Eck

VOSviewer and CitNetExplorer TutorialNees Jan van Eck

Visual exploration of scientific literature using VOSviewer and CitNetExplorerNees Jan van Eck

Intermediacy of publicationsNees Jan van Eck

Open data sources in VOSviewerNees Jan van Eck

VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...Nees Jan van Eck

VOSviewer: A software tool for analyzing and visualizing scientific literatureNees Jan van Eck

Large-scale visualization of scienceNees Jan van Eck

Large-scale visualization of science: Methods, tools, and applicationsLudo Waltman

Bibliometric visualization using VOSviewerLudo Waltman

Visualizing science based on open data sourcesNees Jan van Eck

Using full-text data to create improved term mapsNees Jan van Eck

Scientometric approaches to classificationNees Jan van Eck

Web of Science, Scopus, Dimensions, and beyond: The evolving landscape of bib...Ludo Waltman

The landscape of research on researchLudo Waltman

La actualidad más candente (20)

Large-scale analysis of bibliometric data sources

Multiple perspectives on bibliometric data

Large-scale analysis of bibliometric networks

Advanced citation matching and large-scale cited reference extraction

CWTS Leiden Ranking: An advanced bibliometric approach to university ranking

Science Mapping and Research Positioning

VOSviewer and CitNetExplorer Tutorial

Visual exploration of scientific literature using VOSviewer and CitNetExplorer

Intermediacy of publications

Open data sources in VOSviewer

VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...

VOSviewer: A software tool for analyzing and visualizing scientific literature

Large-scale visualization of science

Large-scale visualization of science: Methods, tools, and applications

Bibliometric visualization using VOSviewer

Visualizing science based on open data sources

Using full-text data to create improved term maps

Scientometric approaches to classification

Web of Science, Scopus, Dimensions, and beyond: The evolving landscape of bib...

The landscape of research on research

Similar a On cluster stability

Text clusteringKU Leuven

Energy efficient protocol with static clustering (eepsc) comparing with low e...Alexander Decker

C04511822IOSR-JEN

CERN User StoryTim Bell

Scientific Publication Retrieval in Linked DataAIMS (Agricultural Information Management Standards)

Based on Heterogeneity and Electing Probability of Nodes Improvement in LEACHijsrd.com

I04503075078ijceronline

Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho

Enhanced Leach Protocolijceronline

An efficient tree based self-organizing protocol for internet of thingsredpel dot com

Open Science Data Cloud (IEEE Cloud 2011)Robert Grossman

algoritma klastering.pdfbintis1

Using the Open Science Data Cloud for Data Science ResearchRobert Grossman

Ed33777782IJERA Editor

Slide-TIF311-DM-10-11.pptSandinoBerutu1

Slide-TIF311-DM-10-11.pptImXaib

A ROUTING PROTOCOL ORPHAN-LEACH TO JOIN ORPHAN NODES IN WIRELESS SENSOR NETWORKcscpconf

A Routing Protocol Orphan-Leach to Join Orphan Nodes in Wireless Sensor Netwo...csandit

IRJET- Study on Hierarchical Cluster-Based Energy-Efficient Routing in Wi...IRJET Journal

Similar a On cluster stability (20)

Text clustering

Energy efficient protocol with static clustering (eepsc) comparing with low e...

C04511822

CERN User Story

Scientific Publication Retrieval in Linked Data

Based on Heterogeneity and Electing Probability of Nodes Improvement in LEACH

I04503075078

Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...

Enhanced Leach Protocol

An efficient tree based self-organizing protocol for internet of things

Open Science Data Cloud (IEEE Cloud 2011)

algoritma klastering.pdf

Using the Open Science Data Cloud for Data Science Research

Ed33777782

Slide-TIF311-DM-10-11.ppt

A ROUTING PROTOCOL ORPHAN-LEACH TO JOIN ORPHAN NODES IN WIRELESS SENSOR NETWORK

A Routing Protocol Orphan-Leach to Join Orphan Nodes in Wireless Sensor Netwo...

IRJET- Study on Hierarchical Cluster-Based Energy-Efficient Routing in Wi...

Más de Nees Jan van Eck

Crossref as a source of open bibliographic metadataNees Jan van Eck

Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...Nees Jan van Eck

Community detection using citation relations and textual similarities in a la...Nees Jan van Eck

Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...Nees Jan van Eck

A scientometric perspective on university rankingNees Jan van Eck

Open data sources in VOSviewerNees Jan van Eck

A scientometric perspective on university rankingNees Jan van Eck

CWTS Leiden Ranking: An advanced bibliometric approach to university rankingNees Jan van Eck

Open data sources in VOSviewerNees Jan van Eck

Accuracy of citation data in Web of Science and ScopusNees Jan van Eck

How to design a ranking system: Criteria and opportunities for a comparisonNees Jan van Eck

Más de Nees Jan van Eck (11)

Crossref as a source of open bibliographic metadata

Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...

Community detection using citation relations and textual similarities in a la...

Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...

A scientometric perspective on university ranking

Open data sources in VOSviewer

A scientometric perspective on university ranking

CWTS Leiden Ranking: An advanced bibliometric approach to university ranking

Open data sources in VOSviewer

Accuracy of citation data in Web of Science and Scopus

How to design a ranking system: Criteria and opportunities for a comparison

Último

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl

Boyles law module in the grade 10 sciencefloriejanemacaya1

Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Natural Polymer Based NanomaterialsAArockiyaNisha

The Philosophy of ScienceUniversity of Hertfordshire

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Green chemistry and Sustainable development.pptxRajatChauhan518211

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314

Biological Classification BioHack (3).pdfmuntazimhurra

On cluster stability

1. On cluster stability Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University 15th International Conference on Scientometrics & Informetrics Istanbul, Turkey, June 30, 2015

2. Introduction • A clustering technique can be used to obtain highly detailed clustering results (i.e., a large number of clusters) • A clustering technique can be used to force each publication to be assigned to a cluster • However, in a highly detailed clustering, is the assignment of publications to clusters still meaningful? 1

3. Example: Waltman and Van Eck (2012) 2

4. Cluster stability • To ensure that publications are assigned to clusters in a meaningful way, we introduce the notion of stable clusters • Essentially, a cluster is stable if it is insensitive to small changes in the underlying data • Bootstrapping is used to make small changes in the data 3

5. Identification of stable clusters: Step 1 • Collect the citation network of publications • Create a large number (e.g., 100) of bootstrap citation networks: – A bootstrap citation network is a weighted variant of the original citation network in which each edge has an integer weight drawn from a Poisson distribution with mean 1 (cf. Rosvall & Bergstrom, 2009) • In each bootstrap citation network, perform clustering • For each pair of publications, calculate the proportion of the bootstrap clustering results in which the publications are in the same cluster 4

6. 5 Original network Bootstrap networks 1 1 1 0 1 1 2 1 1 0 1 3 1 1 1 2 1 1 1 1 0 1 1 3 1 0 4 1 1 1 2 2 1 1 1 1 0 2 1 0 0 1 3 1 1 0 1 1 Clustering 1 1 1 0 1 1 2 1 1 0 1 3 1 0 1 2 1 1 0 0 1 1 1 3 0 0 4 1 1 1 0 2 1 1 1 1 0 1 1 0 0 1 3 1 2 0 1 1 1.0 0.9 0.9 0.4 0.6 0.9 0.9 0.9 0.1 0.1 0.9 1.0 0.9 0.5 0.9 1.0 Weighted network Clustered bootstrap networks

7. Identification of stable clusters: Step 2 • Create a network of publications with an edge between two publications if the publications are in the same cluster in at least a certain proportion (e.g., 0.9) of the bootstrap clustering results • Identify connected components in the newly created network • Each connected component represents a stable cluster 6

8. 1.0 0.9 0.9 0.4 0.6 0.9 0.9 0.9 0.1 0.1 0.9 1.0 0.9 0.5 0.9 1.0 Weighted network 7 Binary network Connected components Stable clusters

9. Data • Library & Information Sciences (LIS): – Time period: 1996-2013 – Publications: 31,534 – Citation links: 131,266 • Astrophysics (Berlin dataset): – Time period: 2003-2010 – Publications: 101,828 – Citation links: 924,171 8

10. Cluster stability LIS 9

11. Stable clusters LIS (resolution 2) 10

12. Stable clusters LIS (resolution 2) 11

13. Cluster stability Berlin 12

14. Cluster stability 13 LIS Berlin

15. Conclusions • If we want to have an accurate and detailed clustering, we need to be satisfied with a clustering that doesn’t comprehensively cover all publications • Publications that do not clearly belong to one of the main topics in a field cannot be assigned to a cluster • Cluster stability analysis can be used to distinguish between meaningful and non-meaningful assignments of publications to clusters 14

16. Thank you for your attention! 15

17. References Rosvall, M., & Bergstrom, C.T. (2009). Mapping change in large networks. PLoS ONE, 5(1), e8694. http://dx.doi.org/10.1371/journal.pone.0008694 Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392. http://dx.doi.org/10.1002/asi.22748 Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. European Physical Journal B, 86(11), 471. http://dx.doi.org/10.1140/epjb/e2013-40829-0 16

On cluster stability

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a On cluster stability

Similar a On cluster stability (20)

Más de Nees Jan van Eck

Más de Nees Jan van Eck (11)

Último

Último (20)

On cluster stability