SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
The CSO Classifier: Ontology-Driven Detection of
Research Topics in Scholarly Articles
Angelo A. Salatino, Francesco Osborne, Thiviyan Thanapalasingam, Enrico Motta
@angelosalatino
Knowledge Media Institute
The Open University
United Kingdom
pip install cso-classifier
Classifying Research Papers with their Topics
Annotating research papers allows us to:
• categorise proceedings in digital libraries
• semantically enhance the metadata of scientific publications
• generate recommendations
• produce smart analytics
• detect research trends
• …
Classifying Research Papers with their Topics
1) Topic detection methods
• Clustering approaches based on
citations, title, keywords
• Topic models
• Latent Dirichlet Analysis
• Author-topic models
• Supervised classifiers
2) Vocabulary-driven
Computing
Classification System
(CCS)
JEL Classification
System
Australian and New Zealand
Standard Research
Classification (ANZSRC)
CSO Classifier
The Computer Science Ontology
• Ontology of research areas*, automatically generated using Klink-2**
algorithm, on a dataset of 16 million publications mainly in Computer
Science
• Current version of CSO includes 14K topics and 143K relationships
• Main roots include Computer Science, Linguistic, Mathematics,
Geometry, Semantics and so on.
• Download CSO from https://cso.kmi.open.ac.uk
* Angelo A Salatino, Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne, Enrico Motta. "The Computer Science
Ontology: A Large-Scale Taxonomy of Research Areas." In ISWC 2018, Monterey, CA (USA).
** Francesco Osborne, and Enrico Motta. "Klink-2: integrating multiple web sources to generate semantic topic networks." In
ISWC 2015, Bethlehem, PA (USA).
Syntactic Module
Syntactic Module
• We split the text in unigrams, bigrams and trigrams
• For each n-gram we measure the Levenshtein similarity with the topics
in CSO
• We select CSO topics having similarity above or equal to 0.94 with n-
grams
• Helps handling plurals and hyphenated topics, such as:
• “knowledge based systems” and “knowledge-based systems”
• “database” and “databases”
Semantic Module
Semantic Module
Word Embedding model
• We used titles and abstracts from 4.5M papers in Computer Science
• Pre-processed text:
• Topic replacement – “digital libraries” → “digital_libraries”
• Collocation analysis – “highest_accuracies”, “highly_cited_journals”
• Trained word2vec model
method
skipgram
emb. size
128
window size
10
negative
5
max iter.
5
min-count cutoff
10
Semantic Module
Entity Extraction
• POS tagger, and grammar-based chunk parser <JJ.*>*<NN.*>+
“digital libraries”
CSO concept identification
• Selects all CSO topics found in the top-10 similar words of the resulting
n-grams (with cosine similarity > 0.7)
Semantic Module
Concept ranking
• We assign a score to each identified topic:
• Frequency – number of times it was inferred
• Diversity – number of unique n-grams from which it was inferred
Concept Selection
• Elbow method
CSO Topic score
domain ontologies 40
semantic web 40
ontology learning 40
data mining 40
heterogeneous resources 24
semantics 24
world wide web 10
network architecture 6
scholarly communication 6
ontology matching 6
… …
Post Processing
Post Processing
Combination of output
Semantic enhancement
• We use the superTopicOf to enhance the output set
• E.g., if “machine learning” then also “artificial intelligence”
• Provides wider context for the analysed paper
• Enables analytics on high-level abstract topics (e.g., digital libraries)
Evaluation
• We evaluated CSO Classifier against other state-of-the-art algorithms
• TF-IDF
• LDA (with an increasing number of topics)
• previous versions of CSO Classifier
• Using a gold standard of 70 papers
Field # papers
Semantic Web 23
Natural Language Processing 23
Data Mining 24
Total 70
Gold Standard
• We asked 21 domain experts to annotate 10 papers (each paper got
annotated thrice)
• Each paper was annotated using 3 classifiers:
• Syntactic module
• Semantic module
• Window-based word2vec classifier
• Experts were asked to assess whether the candidate topics were relevant
or not relevant for the annotated papers
• For each paper, experts selected an average of 18 topics over 42
candidate topics (avg 0.45 Fleiss’ kappa)
• GS was built using majority rule approach
Evaluation
Classifier Description Prec. Rec. F1
TF-IDF TF-IDF. 16.7% 24.0% 19.7%
TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1%
LDA100 LDA with 100 topics. 5.9% 11.9% 7.9%
LDA500 LDA with 500 topics. 4.2% 12.5% 6.3%
LDA1000 LDA with 1000 topics. 3.8% 5.0% 4.3%
LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6%
LDA500-M LDA with 500 topics mapped to CSO. 9.6% 21.2% 13.2%
LDA1000-M LDA with 1000 topics mapped to CSO. 12.0% 11.5% 11.7%
W2V-W W2V on windows of words.
.
41.2% 16.7% 23.8%
STM Syntactic module, msm=1. 80.8% 58.2% 67.6%
SYN Syntactic module, msm=0.94. 78.3% 63.8% 70.3%
SEM Semantic module. 70.8% 72.2% 71.5%
INT Intersection of SYN and SEM. 79.3% 59.1% 67.7%
CSO-C The CSO Classifier. 73.0% 75.3% 74.1%
CSO Classifier adoption so far …
Since its introduction we had many industrial and academic partners that started
processing their data using the CSO Classifier:
Industry
• Springer Nature
• Dimension.ai
Universities
• CSET - George Washington University
(USA)
• FIZ Karlsruhe (DE)
• Paris 13 (FR)
• University of Trento (IT)
• University of Campinas (BR)
Smart Topic Miner
The Smart Topic Miner* (STM) is a semantic
application that supports the Springer Nature
editorial team in classifying scholarly
publications in the field of Computer Science.
Try me: http://stm-demo.kmi.open.ac.uk
*Angelo Salatino, Francesco Osborne, Aliaksandr Birukou, and
Enrico Motta. "Improving Editorial Workflow and Metadata
Quality at Springer Nature." In ISWC 2019. Auckland, New
Zealand.
Smart Topic Miner
Since its adoption at Springer Nature they experience three main benefits:
• halved the time for classifying a proceedings book – 30 min à 10-15 min
• reduced cost by 75%
• better classification increases their discoverability (+9M downloads in 3 years)
0
5000
10000
15000
20000
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Average number of yearly downloads
for books in SpringerLink
downloads (CS Proceedings) expected downloads (CS Proceedings)
downloads (CS Proceedings) withSTM downloads (other books in CS)
downloads (overall)
Dimensions.ai
FoR from ANZSRC
Issues:
• Dated to 2008
• Coarse-grained
CSO Classifier Web Demo
Try me: https://cso.kmi.open.ac.uk/classify
Future Work
• Working on a better performing classifier
• Using up-to-date NLP technologies: ELMO, BERT
• Large scale evaluation (high number of papers and different fields)
• Method for classifying papers when there is limited data (e.g. using citations)
• Collaboration with the FIZ Karlsruhe (Leibniz)
• Creating graph embeddings to support the current word2vec model
• Collaboration with University of Trento
• Using CSO Classifier on biomedical data
Thank you
Angelo Salatino
angelo.salatino@open.ac.uk
@angelosalatino
https://salatino.org
… and get in touch
References
• Angelo Salatino, Francesco Osborne, Aliaksandr
Birukou, and Enrico Motta. "Improving Editorial
Workflow and Metadata Quality at Springer
Nature." In ISWC 2019. Auckland, New Zealand.
• Angelo A Salatino, Thiviyan Thanapalasingam,
Andrea Mannocci, Francesco Osborne, Enrico
Motta. "The Computer Science Ontology: A
Large-Scale Taxonomy of Research Areas." In
ISWC 2018, Monterey, CA (USA).
• Francesco Osborne, and Enrico Motta. "Klink-2:
integrating multiple web sources to generate
semantic topic networks." In ISWC 2015,
Bethlehem, PA (USA).

Más contenido relacionado

La actualidad más candente

Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth
 
Proposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkAravind Sesagiri Raamkumar
 
More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...Aravind Sesagiri Raamkumar
 
Building Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationBuilding Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationDaniel Kershaw
 
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...Ahmed Saleh
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...Francesco Osborne
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesJason Hattrick-Simpers
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
 
On the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema MatchingOn the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema MatchingJoe Raad
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and PythonShintaro Fukushima
 
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaQuery Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaYI-JHEN LIN
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Optique presentation
Optique presentationOptique presentation
Optique presentationDBOnto
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad DataSteffen Staab
 

La actualidad más candente (20)

Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Proposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender Framework
 
More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...
 
Building Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationBuilding Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly Information
 
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
 
ML in materials discovery
ML in materials discovery ML in materials discovery
ML in materials discovery
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
 
On the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema MatchingOn the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema Matching
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and Python
 
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaQuery Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Optique presentation
Optique presentationOptique presentation
Optique presentation
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 

Similar a The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles

Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesToine Bogers
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Sergey Sosnovsky
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryAntonio Gulli
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
 
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...Francesco Osborne
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyICDEcCnferenece
 
Model of semantic textual document clustering
Model of semantic textual document clusteringModel of semantic textual document clustering
Model of semantic textual document clusteringSK Ahammad Fahad
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptxJitha Kannan
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرنمركز البحوث الأقسام العلمية
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Computer Science Library Training
Computer Science Library TrainingComputer Science Library Training
Computer Science Library Trainingpvhead123
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 

Similar a The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles (20)

Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program Committees
 
Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
Final presentation
Final presentationFinal presentation
Final presentation
 
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
 
Model of semantic textual document clustering
Model of semantic textual document clusteringModel of semantic textual document clustering
Model of semantic textual document clustering
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Computer Science Library Training
Computer Science Library TrainingComputer Science Library Training
Computer Science Library Training
 
لتحليل الدراسات السابقة Nails محاضرة برنامج
  لتحليل الدراسات السابقة Nails محاضرة برنامج  لتحليل الدراسات السابقة Nails محاضرة برنامج
لتحليل الدراسات السابقة Nails محاضرة برنامج
 
Harvester_presentaion
Harvester_presentaionHarvester_presentaion
Harvester_presentaion
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 

Más de Angelo Salatino

Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryAngelo Salatino
 
Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Angelo Salatino
 
AUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research TopicsAUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research TopicsAngelo Salatino
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingAngelo Salatino
 

Más de Angelo Salatino (10)

Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
 
Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
AUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research TopicsAUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research Topics
 
Tesi Triennale Slide
Tesi Triennale SlideTesi Triennale Slide
Tesi Triennale Slide
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 

Último

School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 

Último (20)

School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 

The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles

  • 1. The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles Angelo A. Salatino, Francesco Osborne, Thiviyan Thanapalasingam, Enrico Motta @angelosalatino Knowledge Media Institute The Open University United Kingdom
  • 3. Classifying Research Papers with their Topics Annotating research papers allows us to: • categorise proceedings in digital libraries • semantically enhance the metadata of scientific publications • generate recommendations • produce smart analytics • detect research trends • …
  • 4. Classifying Research Papers with their Topics 1) Topic detection methods • Clustering approaches based on citations, title, keywords • Topic models • Latent Dirichlet Analysis • Author-topic models • Supervised classifiers 2) Vocabulary-driven Computing Classification System (CCS) JEL Classification System Australian and New Zealand Standard Research Classification (ANZSRC)
  • 6. The Computer Science Ontology • Ontology of research areas*, automatically generated using Klink-2** algorithm, on a dataset of 16 million publications mainly in Computer Science • Current version of CSO includes 14K topics and 143K relationships • Main roots include Computer Science, Linguistic, Mathematics, Geometry, Semantics and so on. • Download CSO from https://cso.kmi.open.ac.uk * Angelo A Salatino, Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne, Enrico Motta. "The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas." In ISWC 2018, Monterey, CA (USA). ** Francesco Osborne, and Enrico Motta. "Klink-2: integrating multiple web sources to generate semantic topic networks." In ISWC 2015, Bethlehem, PA (USA).
  • 8. Syntactic Module • We split the text in unigrams, bigrams and trigrams • For each n-gram we measure the Levenshtein similarity with the topics in CSO • We select CSO topics having similarity above or equal to 0.94 with n- grams • Helps handling plurals and hyphenated topics, such as: • “knowledge based systems” and “knowledge-based systems” • “database” and “databases”
  • 10. Semantic Module Word Embedding model • We used titles and abstracts from 4.5M papers in Computer Science • Pre-processed text: • Topic replacement – “digital libraries” → “digital_libraries” • Collocation analysis – “highest_accuracies”, “highly_cited_journals” • Trained word2vec model method skipgram emb. size 128 window size 10 negative 5 max iter. 5 min-count cutoff 10
  • 11. Semantic Module Entity Extraction • POS tagger, and grammar-based chunk parser <JJ.*>*<NN.*>+ “digital libraries” CSO concept identification • Selects all CSO topics found in the top-10 similar words of the resulting n-grams (with cosine similarity > 0.7)
  • 12. Semantic Module Concept ranking • We assign a score to each identified topic: • Frequency – number of times it was inferred • Diversity – number of unique n-grams from which it was inferred Concept Selection • Elbow method CSO Topic score domain ontologies 40 semantic web 40 ontology learning 40 data mining 40 heterogeneous resources 24 semantics 24 world wide web 10 network architecture 6 scholarly communication 6 ontology matching 6 … …
  • 14. Post Processing Combination of output Semantic enhancement • We use the superTopicOf to enhance the output set • E.g., if “machine learning” then also “artificial intelligence” • Provides wider context for the analysed paper • Enables analytics on high-level abstract topics (e.g., digital libraries)
  • 15. Evaluation • We evaluated CSO Classifier against other state-of-the-art algorithms • TF-IDF • LDA (with an increasing number of topics) • previous versions of CSO Classifier • Using a gold standard of 70 papers Field # papers Semantic Web 23 Natural Language Processing 23 Data Mining 24 Total 70
  • 16. Gold Standard • We asked 21 domain experts to annotate 10 papers (each paper got annotated thrice) • Each paper was annotated using 3 classifiers: • Syntactic module • Semantic module • Window-based word2vec classifier • Experts were asked to assess whether the candidate topics were relevant or not relevant for the annotated papers • For each paper, experts selected an average of 18 topics over 42 candidate topics (avg 0.45 Fleiss’ kappa) • GS was built using majority rule approach
  • 17. Evaluation Classifier Description Prec. Rec. F1 TF-IDF TF-IDF. 16.7% 24.0% 19.7% TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1% LDA100 LDA with 100 topics. 5.9% 11.9% 7.9% LDA500 LDA with 500 topics. 4.2% 12.5% 6.3% LDA1000 LDA with 1000 topics. 3.8% 5.0% 4.3% LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6% LDA500-M LDA with 500 topics mapped to CSO. 9.6% 21.2% 13.2% LDA1000-M LDA with 1000 topics mapped to CSO. 12.0% 11.5% 11.7% W2V-W W2V on windows of words. . 41.2% 16.7% 23.8% STM Syntactic module, msm=1. 80.8% 58.2% 67.6% SYN Syntactic module, msm=0.94. 78.3% 63.8% 70.3% SEM Semantic module. 70.8% 72.2% 71.5% INT Intersection of SYN and SEM. 79.3% 59.1% 67.7% CSO-C The CSO Classifier. 73.0% 75.3% 74.1%
  • 18. CSO Classifier adoption so far … Since its introduction we had many industrial and academic partners that started processing their data using the CSO Classifier: Industry • Springer Nature • Dimension.ai Universities • CSET - George Washington University (USA) • FIZ Karlsruhe (DE) • Paris 13 (FR) • University of Trento (IT) • University of Campinas (BR)
  • 19. Smart Topic Miner The Smart Topic Miner* (STM) is a semantic application that supports the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. Try me: http://stm-demo.kmi.open.ac.uk *Angelo Salatino, Francesco Osborne, Aliaksandr Birukou, and Enrico Motta. "Improving Editorial Workflow and Metadata Quality at Springer Nature." In ISWC 2019. Auckland, New Zealand.
  • 20. Smart Topic Miner Since its adoption at Springer Nature they experience three main benefits: • halved the time for classifying a proceedings book – 30 min à 10-15 min • reduced cost by 75% • better classification increases their discoverability (+9M downloads in 3 years) 0 5000 10000 15000 20000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Average number of yearly downloads for books in SpringerLink downloads (CS Proceedings) expected downloads (CS Proceedings) downloads (CS Proceedings) withSTM downloads (other books in CS) downloads (overall)
  • 21. Dimensions.ai FoR from ANZSRC Issues: • Dated to 2008 • Coarse-grained
  • 22. CSO Classifier Web Demo Try me: https://cso.kmi.open.ac.uk/classify
  • 23. Future Work • Working on a better performing classifier • Using up-to-date NLP technologies: ELMO, BERT • Large scale evaluation (high number of papers and different fields) • Method for classifying papers when there is limited data (e.g. using citations) • Collaboration with the FIZ Karlsruhe (Leibniz) • Creating graph embeddings to support the current word2vec model • Collaboration with University of Trento • Using CSO Classifier on biomedical data
  • 24. Thank you Angelo Salatino angelo.salatino@open.ac.uk @angelosalatino https://salatino.org … and get in touch References • Angelo Salatino, Francesco Osborne, Aliaksandr Birukou, and Enrico Motta. "Improving Editorial Workflow and Metadata Quality at Springer Nature." In ISWC 2019. Auckland, New Zealand. • Angelo A Salatino, Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne, Enrico Motta. "The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas." In ISWC 2018, Monterey, CA (USA). • Francesco Osborne, and Enrico Motta. "Klink-2: integrating multiple web sources to generate semantic topic networks." In ISWC 2015, Bethlehem, PA (USA).