SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
http://creativecommons.org/licenses/by-sa/3.0
Towards the automatic identification
of the nature of citations
Angelo Di Iorio – diiorio@cs.unibo.it
Andrea Giovanni Nuzzolese – nuzzoles@cs.unibo.it
Silvio Peroni – essepuntato@cs.unibo.it
Outline
• Interpretation of bibliographic citations
• CiTalO architecture
• Preliminary empirical evaluation
• Conclusions
Citations as tools
• Bibliographic citations can be seen as tools for:
✦ linking research: making pointers to related works, to source of experimental data, to
methods used, etc.
✦ disseminating research: conference proceedings, journals,Web platforms (e.g. blogs,
wikis), Semantic Publishing platforms and projects (e.g. OpenCitation,
OpenBibliography, Lucero)
✦ exploring research: new ways of browsing article through networks of citations (e.g.
CiteWiz, Citation Sensitive In-browser Summariser)
✦ evaluating research: measuring the importance of journals (e.g. impact factor) or the
scientific productivity of authors (e.g. h-index)
• This work is based on the assumption that all these activities can be
radically improved by exploiting the actual function of citations, i.e.
author’s reason for citing a given paper
✦ Could a paper that is cited many times negatively be given a high score?
✦ Could a paper containing several self-citations be given the same score of a paper with
heterogeneous citations?
Intended vs. interpreted
meaning of citation functions
It extends the research outlined in earlier work [3].
Intended meaning
Interpreted meaning
The author of the text
A reader of the text Another reader of the text
“Text as a medium through which an
author transmits knowledge”
“The reader decodes this meaning
and literally ‘make sense’ of it”
“The meaning is not
contained within the words
themselves but in the
minds of the participants.”
Quotations from From Proteins to Fairytales: Directions in Semantic Publishing by Anita De Waard, DOI: 10.1109/MIS.2010.49
I want to convey a
particular meaning through
this text
I recognise
a particular meaning by
reading that text
I recognise
a particular meaning by
reading that text
These three
“meanings” are
not the same
http://wit.istc.cnr.it:8080/tools/citalo
Our setting
It extends the research outlined in earlier work [3].Interpreted meaning
A reader of the text Another reader of the text
I recognise
a particular meaning by
reading that text
I recognise
a particular meaning by
reading that text
I recognise a particular
meaning by reading that text
CiTalO (English translation: cite it) is a tool
that recognises the function of citations by
exploiting Semantic Web technologies
(CiTO, FRED, SPARQL) and NLP techniques
(IMS,AlchemyAPI)
pipeline
It extends the research
outlined in earlier work X.
Ontology
learning
Citation type
extraction
Word-sense
disambiguation
Alignment
to CiTO
Sentiment
analysis
output:
cito:extends
Input: a sentence
containing a
reference to a
bibliographic entity
indicated by an “X”
Derive a logical (i.e.
an OWL ontology)
representation of
the sentence
through FRED
Extract candidate
types for the citation
by looking for
patterns in FRED
output via SPARQL
Gather the sense of
the candidate types
through IMS with
respect to
OntoWordNet
Capture the sentiment
polarity emerging
from the text through
AlchemyAPI
Assign CiTO types
to the citation
through SPARQL
CONSTRUCT
Ontology learning and
citation type extraction
FRED output
It is returned as an OWL ontology in RDF/XML format
SELECT ?type WHERE {
?subj ?prop fred:X; a ?typeTmp.
?typeTmp rdfs:subClassOf+ ?type}
SELECT ?type WHERE {
?subj ?prop fred:X; a ?type}
SELECT ?type WHERE {?subj a dul:Event;
boxer:patient ?patient. ?patient a ?type}
SELECT ?type WHERE {?subj a dul:Event,
?type. FILTER(?type != dul:Event)}
Candidates
extraction
Looking for patterns
through SPARQL
Word-sense disambiguation and
alignment to CiTO
• We use IMS (a word-sense disambiguator) to disambiguate the candidate
types found, with respect to OntoWordNet (OWN)
✦ EarlierWork and Work: own:synset-work-noun-1
✦ Extend: own:synset-prolong-verb-1
✦ Outline: own:synset-delineate-verb-3
✦ Research: own:synset-research-noun-1
It is possible to extend
the set of retrieved
synsets by adding the
proximal ones
automatically
Wordnetsynsets
retrievedbyIMS
CiTO2Wordnet is an
ontology we developed
to maps all the CiTO
properties defining
citations with related
Wordnet synsets
CiTO is an ontology
that defines
functions of citations
as object properties
• Finally we perform SPARQL
CONSTRUCT to align the synsets to
those defined in the CiTO2Wordnet
ontology, which imports CiTO
✦ Properties in CiTO having opposite polarity
(that is defined in the CiTOFunctions
ontology) to what is returned by the
sentiment analysis module are not
considered
✦ If no alignment is performed,
cito:citesForInformation is returned as default
cito:extends
skos:closeMatch
own:synset-prolong-verb-1 .
cito:extends
is selected!
A preliminary empirical evaluation
• Comparing the results of CiTalO with a human classification of citations
• The test bed we used for our experiments includes some scientific papers (written in
English) encoded in DocBook, containing citations of different types
• We automatically extracted citation sentences, through an XSLT document, from all
the papers published in the 7th volume of Balisage Proceedings
✦ 18 scientific papers written by different authors
✦ 377 citations (20.94 citations per paper)
• We filtered all the citation sentences from the selected articles
✦ The annotation of citation functions is an hard problem to address
✦ We kept only 106 citations, which were accompanied by verbs and/or other grammatical structures
carrying explicitly a particular citation function, as suggested by Teufel et al. [18]
• We marked (according to CiTO properties) these 106 citations, obtaining at least
one representative citation for each of the 18 paper used (5.89 citations per paper)
• We used 21 CiTO properties out of 38 to annotate all these citations
CiTalO setting
• We run CiTalO on these 106 citation sentences and compared
results with our annotations
• We also tested eight different configurations of CiTalO,
corresponding to all possible combinations of three options:
✦ activating or deactivating the sentiment-analysis module;
✦ applying or not the proximal synsets to the IMS output;
✦ using an extended version of CiTO2Wordnet that includes all the synsets
Wordnet retrieves for a particular string (including those synsets having the
gloss radically different to the CiTO property in consideration)
CiTO2Wordnet
does not link cito:extends to that synset
CiTO2Wordnet – extended
do link the synset with cito:extends
Consider the synset own:synset-credit-verb-3, having gloss
“accounting: enter as credit”
and the CiTO definition for cito:credits
“the citing entity acknowledges contributions made by the cited entity”
Results
Kinds and number of citations marked by us
Similarly to Teufel et al. [19] the most
neutral CiTO property,
citesForInformation, was the most
prevalent function in our dataset
too, as the second most used
property was usedMethodIn
Precision and recall of CiTalO according to different configurations
No
configuration
that emerges as
the absolutely
best one from
these data
Worst
configurations
were those that
took into
account all the
proximal synsets
Conclusions
• The implementation of CiTalO is still at an early stage
• Current experiments are not enough to fully validate this approach
• However, the goal of this work was to build such a modular
architecture, to perform some exploratory experiments and to
identify issues and possible developments of our approach
• Future works:
✦ Propose extensions to CiTO to cover more scenarios (e.g. cito:speculatesOn
and cito:citesAsPotentialSolution)
✦ Decrease the noise given by proximal synsets
✦ Identification of anaphoras (e.g. nouns, pronouns) speaking about the citations
✦ Perform exhaustive tests with a larger set of documents and users
Thanks for your attention
Please come to see CiTalO in action
during the ESWC Demo Session on Tuesday

Más contenido relacionado

Similar a Towards the automatic identification of the nature of citations

Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsAndrea Nuzzolese
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$Sof Ouni
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Lightning Talk Session - Connecting Altmetric (K. Capretta)
Lightning Talk Session - Connecting Altmetric (K. Capretta)Lightning Talk Session - Connecting Altmetric (K. Capretta)
Lightning Talk Session - Connecting Altmetric (K. Capretta)ORCID, Inc
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsEditor IJCATR
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3Dave King
 
CL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdf
CL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdfCL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdf
CL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdfssuserb76cdd
 
Library Search 3: Finding a journal article 2021
Library Search 3: Finding a journal article 2021Library Search 3: Finding a journal article 2021
Library Search 3: Finding a journal article 2021EISLibrarian
 
Presentation_euroCRIS_ES
Presentation_euroCRIS_ESPresentation_euroCRIS_ES
Presentation_euroCRIS_ESEd Simons
 
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY cscpconf
 
Citation semantic based approaches to identify article quality
Citation semantic based approaches to identify article qualityCitation semantic based approaches to identify article quality
Citation semantic based approaches to identify article qualitycsandit
 

Similar a Towards the automatic identification of the nature of citations (20)

Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
OpenCitations
OpenCitationsOpenCitations
OpenCitations
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Lightning Talk Session - Connecting Altmetric (K. Capretta)
Lightning Talk Session - Connecting Altmetric (K. Capretta)Lightning Talk Session - Connecting Altmetric (K. Capretta)
Lightning Talk Session - Connecting Altmetric (K. Capretta)
 
Mathew.ppt
Mathew.pptMathew.ppt
Mathew.ppt
 
Citation indexing
Citation indexingCitation indexing
Citation indexing
 
Chapter 2 c#
Chapter 2 c#Chapter 2 c#
Chapter 2 c#
 
060429 Insna
060429 Insna060429 Insna
060429 Insna
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
CL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdf
CL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdfCL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdf
CL8 Scientiometrics Module 6 RPE-Rijo TKMCE.pdf
 
Library Search 3: Finding a journal article 2021
Library Search 3: Finding a journal article 2021Library Search 3: Finding a journal article 2021
Library Search 3: Finding a journal article 2021
 
Presentation_euroCRIS_ES
Presentation_euroCRIS_ESPresentation_euroCRIS_ES
Presentation_euroCRIS_ES
 
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY
 
Citation semantic based approaches to identify article quality
Citation semantic based approaches to identify article qualityCitation semantic based approaches to identify article quality
Citation semantic based approaches to identify article quality
 

Más de University of Bologna

The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusUniversity of Bologna
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...University of Bologna
 
A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...University of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...University of Bologna
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachUniversity of Bologna
 

Más de University of Bologna (16)

The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations Corpus
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...
 
A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Towards the automatic identification of the nature of citations

  • 1. http://creativecommons.org/licenses/by-sa/3.0 Towards the automatic identification of the nature of citations Angelo Di Iorio – diiorio@cs.unibo.it Andrea Giovanni Nuzzolese – nuzzoles@cs.unibo.it Silvio Peroni – essepuntato@cs.unibo.it
  • 2. Outline • Interpretation of bibliographic citations • CiTalO architecture • Preliminary empirical evaluation • Conclusions
  • 3. Citations as tools • Bibliographic citations can be seen as tools for: ✦ linking research: making pointers to related works, to source of experimental data, to methods used, etc. ✦ disseminating research: conference proceedings, journals,Web platforms (e.g. blogs, wikis), Semantic Publishing platforms and projects (e.g. OpenCitation, OpenBibliography, Lucero) ✦ exploring research: new ways of browsing article through networks of citations (e.g. CiteWiz, Citation Sensitive In-browser Summariser) ✦ evaluating research: measuring the importance of journals (e.g. impact factor) or the scientific productivity of authors (e.g. h-index) • This work is based on the assumption that all these activities can be radically improved by exploiting the actual function of citations, i.e. author’s reason for citing a given paper ✦ Could a paper that is cited many times negatively be given a high score? ✦ Could a paper containing several self-citations be given the same score of a paper with heterogeneous citations?
  • 4. Intended vs. interpreted meaning of citation functions It extends the research outlined in earlier work [3]. Intended meaning Interpreted meaning The author of the text A reader of the text Another reader of the text “Text as a medium through which an author transmits knowledge” “The reader decodes this meaning and literally ‘make sense’ of it” “The meaning is not contained within the words themselves but in the minds of the participants.” Quotations from From Proteins to Fairytales: Directions in Semantic Publishing by Anita De Waard, DOI: 10.1109/MIS.2010.49 I want to convey a particular meaning through this text I recognise a particular meaning by reading that text I recognise a particular meaning by reading that text These three “meanings” are not the same
  • 5. http://wit.istc.cnr.it:8080/tools/citalo Our setting It extends the research outlined in earlier work [3].Interpreted meaning A reader of the text Another reader of the text I recognise a particular meaning by reading that text I recognise a particular meaning by reading that text I recognise a particular meaning by reading that text CiTalO (English translation: cite it) is a tool that recognises the function of citations by exploiting Semantic Web technologies (CiTO, FRED, SPARQL) and NLP techniques (IMS,AlchemyAPI)
  • 6. pipeline It extends the research outlined in earlier work X. Ontology learning Citation type extraction Word-sense disambiguation Alignment to CiTO Sentiment analysis output: cito:extends Input: a sentence containing a reference to a bibliographic entity indicated by an “X” Derive a logical (i.e. an OWL ontology) representation of the sentence through FRED Extract candidate types for the citation by looking for patterns in FRED output via SPARQL Gather the sense of the candidate types through IMS with respect to OntoWordNet Capture the sentiment polarity emerging from the text through AlchemyAPI Assign CiTO types to the citation through SPARQL CONSTRUCT
  • 7. Ontology learning and citation type extraction FRED output It is returned as an OWL ontology in RDF/XML format SELECT ?type WHERE { ?subj ?prop fred:X; a ?typeTmp. ?typeTmp rdfs:subClassOf+ ?type} SELECT ?type WHERE { ?subj ?prop fred:X; a ?type} SELECT ?type WHERE {?subj a dul:Event; boxer:patient ?patient. ?patient a ?type} SELECT ?type WHERE {?subj a dul:Event, ?type. FILTER(?type != dul:Event)} Candidates extraction Looking for patterns through SPARQL
  • 8. Word-sense disambiguation and alignment to CiTO • We use IMS (a word-sense disambiguator) to disambiguate the candidate types found, with respect to OntoWordNet (OWN) ✦ EarlierWork and Work: own:synset-work-noun-1 ✦ Extend: own:synset-prolong-verb-1 ✦ Outline: own:synset-delineate-verb-3 ✦ Research: own:synset-research-noun-1 It is possible to extend the set of retrieved synsets by adding the proximal ones automatically Wordnetsynsets retrievedbyIMS CiTO2Wordnet is an ontology we developed to maps all the CiTO properties defining citations with related Wordnet synsets CiTO is an ontology that defines functions of citations as object properties • Finally we perform SPARQL CONSTRUCT to align the synsets to those defined in the CiTO2Wordnet ontology, which imports CiTO ✦ Properties in CiTO having opposite polarity (that is defined in the CiTOFunctions ontology) to what is returned by the sentiment analysis module are not considered ✦ If no alignment is performed, cito:citesForInformation is returned as default cito:extends skos:closeMatch own:synset-prolong-verb-1 . cito:extends is selected!
  • 9. A preliminary empirical evaluation • Comparing the results of CiTalO with a human classification of citations • The test bed we used for our experiments includes some scientific papers (written in English) encoded in DocBook, containing citations of different types • We automatically extracted citation sentences, through an XSLT document, from all the papers published in the 7th volume of Balisage Proceedings ✦ 18 scientific papers written by different authors ✦ 377 citations (20.94 citations per paper) • We filtered all the citation sentences from the selected articles ✦ The annotation of citation functions is an hard problem to address ✦ We kept only 106 citations, which were accompanied by verbs and/or other grammatical structures carrying explicitly a particular citation function, as suggested by Teufel et al. [18] • We marked (according to CiTO properties) these 106 citations, obtaining at least one representative citation for each of the 18 paper used (5.89 citations per paper) • We used 21 CiTO properties out of 38 to annotate all these citations
  • 10. CiTalO setting • We run CiTalO on these 106 citation sentences and compared results with our annotations • We also tested eight different configurations of CiTalO, corresponding to all possible combinations of three options: ✦ activating or deactivating the sentiment-analysis module; ✦ applying or not the proximal synsets to the IMS output; ✦ using an extended version of CiTO2Wordnet that includes all the synsets Wordnet retrieves for a particular string (including those synsets having the gloss radically different to the CiTO property in consideration) CiTO2Wordnet does not link cito:extends to that synset CiTO2Wordnet – extended do link the synset with cito:extends Consider the synset own:synset-credit-verb-3, having gloss “accounting: enter as credit” and the CiTO definition for cito:credits “the citing entity acknowledges contributions made by the cited entity”
  • 11. Results Kinds and number of citations marked by us Similarly to Teufel et al. [19] the most neutral CiTO property, citesForInformation, was the most prevalent function in our dataset too, as the second most used property was usedMethodIn Precision and recall of CiTalO according to different configurations No configuration that emerges as the absolutely best one from these data Worst configurations were those that took into account all the proximal synsets
  • 12. Conclusions • The implementation of CiTalO is still at an early stage • Current experiments are not enough to fully validate this approach • However, the goal of this work was to build such a modular architecture, to perform some exploratory experiments and to identify issues and possible developments of our approach • Future works: ✦ Propose extensions to CiTO to cover more scenarios (e.g. cito:speculatesOn and cito:citesAsPotentialSolution) ✦ Decrease the noise given by proximal synsets ✦ Identification of anaphoras (e.g. nouns, pronouns) speaking about the citations ✦ Perform exhaustive tests with a larger set of documents and users
  • 13. Thanks for your attention Please come to see CiTalO in action during the ESWC Demo Session on Tuesday