Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
SemEval 2017 Task 10: ScienceIE –
Extracting Keyphrases and Relations
from Scientific Publications
Isabelle Augenstein*#, ...
Motivation
Previous Tasks
SemEval 2010 Task 5 (Kim, Medelyan, Kan, Baldwin):
Automatic Keyphrase Extraction from Scientific Articles
...
Extracting Keyphrases and Relations from
Scientific Publications
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi...
Annotation & Dataset
-  brat (Stenetorp, Pyysalo, Topić,
Ohta, Ananiadou, Tsujii, 2012)
-  *.ann stand-off format
-  Hoste...
Annotation & Dataset
-  13 paid science & engineering student annotators, 8 completed all
assigned annotations
-  Up to 38...
Dataset Statistics
Characteristic
Labels Material, Process, Task
Topics
Computer Science, Physics, Material
Science
Number...
Subtasks and Evaluation Scenarios
Subtasks
a)  Mention-level keyphrase identification
b)  Mention-level keyphrase classifi...
Overall Participation
-  54 systems submitted in development phase
-  26 systems out of those participated in test phase
-...
Results Scenario 1
Teams Overall F1 A B C
s2 end2end
(Ammar et al., 2017)
0.43 0.55 0.44 0.28
TIAL UW 0.42 0.56 0.44
TTI C...
Results Scenario 2
Teams Overall F1 B C
MayoNLP
(Liu et al., 2017)
0.64 0.67 0.23
UKP/EELECTION
(Eger et al., 2017)
0.63 0...
Results Scenario 3
Teams Overall F1 / C
MIT
(Lee et al., 2017a)
0.64
s2_rel
(Ammar et al., 2017)
0.54
NTNU-2
(Barik and Ma...
Summary
-  Most successful systems use RNNs (+ CRFs)
-  However, best system for Scenario 1: SVM + well-engineered feature...
Relevant Papers at ACL
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman
and Andrew McCallum. SemEval 20...
Thank you!
isabelleaugenstein.github.io
augenstein@di.ku.dk
@iaugenstein
github.com/isabelleaugenstein
Próxima SlideShare
Cargando en…5
×

0

Compartir

Descargar para leer sin conexión

SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications

Descargar para leer sin conexión

Shared task summary for SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications
Paper: https://arxiv.org/abs/1704.02853
Abstract:
We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo
  • Sé el primero en recomendar esto

SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications

  1. 1. SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications Isabelle Augenstein*#, Mrinal Das$, Sebastian Riedel*, Lakshmi Vikraman$, Andrew McCallum$ *University College London, #University of Copenhagen, $University of Massachusetts Amherst 4 August 2017Supported by:
  2. 2. Motivation
  3. 3. Previous Tasks SemEval 2010 Task 5 (Kim, Medelyan, Kan, Baldwin): Automatic Keyphrase Extraction from Scientific Articles Extract list of words/phrases representing key topics from scientific documents -  context-independent -  ranking evaluation -  no relations -  CS publications only Main Title Abstract …........................................ …........................................ …........................................ …........................................ …........................................ …........................................ ….................. ...................... ….................. ...................... ….................. ...................... ….................. ...................... ….................. ...................... ….................. ...................... ….................. ...................... ….................. ...................... Keyphrases 1) ........... 2) …....... 3) …....... 4) ...... .... 5) .…......
  4. 4. Extracting Keyphrases and Relations from Scientific Publications Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum Subtasks: A) Mention-level keyphrase identification
 B) Mention-level keyphrase classification: •  PROCESS (e.g. methods, equipment) •  TASK •  MATERIAL (e.g. corpora, physical materials) C) Mention-level semantic relation extraction: •  HYPONYM-OF •  SYNONYM-OF … addresses the task of named entity recognition (NER), a subtask of information extraction, using conditional random fields (CRF). Our method is evaluated on the ConLL-2003 NER corpus. Which papers present which processes/tasks/materials? How do they relate to one another? Supported by:
  5. 5. Annotation & Dataset -  brat (Stenetorp, Pyysalo, Topić, Ohta, Ananiadou, Tsujii, 2012) -  *.ann stand-off format -  Hosted on AWS S3 -  Annotators work remotely -  500 paragraphs from CS, Phys, MS: 350 train, 50 dev, 100 test -  Sampling semi-automatically from keyphrase / relation-rich paragraphs -  Full article text given to participants as well for context
  6. 6. Annotation & Dataset -  13 paid science & engineering student annotators, 8 completed all assigned annotations -  Up to 38 instances per annotator -  Double-annotated by expert annotator given student annotations Student Annotator IAA (Cohen’s kappa) 1 0.85 2 0.66 3 0.63 4 0.60
  7. 7. Dataset Statistics Characteristic Labels Material, Process, Task Topics Computer Science, Physics, Material Science Number all keyphrases 5730 Number unique keyphrases 1697 % singleton keyphrases 31% % single-word mentions 18% % mentions, word length >= 3 51% % mentions, word length >= 5 22% % mentions, noun phrases 93% Most common keyphrases ‘Isogeometric analysis’, ‘samples’, ‘calibration process’, ‘Zirconium alloys’
  8. 8. Subtasks and Evaluation Scenarios Subtasks a)  Mention-level keyphrase identification b)  Mention-level keyphrase classification (PROCESS, TASK, MATERIAL) c)  Mention-level semantic relation extraction between keyphrases with the same keyphrase types (HYPONYM-OF, SYNONYM-OF) Evaluation Scenarios 1)  Only plain text is given (Subtasks A, B, C) 2)  Plain text with manually annotated keyphrase boundaries are given (Subtasks B, C) 3)  Plain text with manually annotated keyphrases and their types are given (Subtask C)
  9. 9. Overall Participation -  54 systems submitted in development phase -  26 systems out of those participated in test phase -  Wide variety of approaches -  Neural networks -  CRFs -  Supervised approaches with careful feature engineering -  Rule-based systems -  Ensembles
  10. 10. Results Scenario 1 Teams Overall F1 A B C s2 end2end (Ammar et al., 2017) 0.43 0.55 0.44 0.28 TIAL UW 0.42 0.56 0.44 TTI COIN (Tsujimura et al., 2017) 0.38 0.5 0.39 0.21 upper bound 0.84 0.85 0.85 0.77 random 0.00 0.03 0.01 0.00 17 participating systems
  11. 11. Results Scenario 2 Teams Overall F1 B C MayoNLP (Liu et al., 2017) 0.64 0.67 0.23 UKP/EELECTION (Eger et al., 2017) 0.63 0.66 LABDA (Segura-Bedmar et al., 2017) 0.48 0.51 upper bound 0.84 0.85 0.77 random 0.15 0.23 0.01 4 participating systems
  12. 12. Results Scenario 3 Teams Overall F1 / C MIT (Lee et al., 2017a) 0.64 s2_rel (Ammar et al., 2017) 0.54 NTNU-2 (Barik and Marsi, 2017) 0.5 upper bound 0.84 random 0.04 5 participating systems
  13. 13. Summary -  Most successful systems use RNNs (+ CRFs) -  However, best system for Scenario 1: SVM + well-engineered features -  Identifying keyphrases is most challenging subtask -  Dataset contains many long and infrequent keyphrases -  Systems relying memorising lists of keyphrases do not perform well -  Finding high-quality annotators for this task is hard – many student annotators dropped out -  Better recruitment, pilot annotation, pick only top annotators -  Combining subtasks to evaluation scenarios caused confusion -  Many teams’ systems did not tackle relation extraction subtask – even though it hurt their overall F1
  14. 14. Relevant Papers at ACL Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman and Andrew McCallum. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications. SemEval 2017. https://arxiv.org/abs/1704.02853 Isabelle Augenstein, Anders Søgaard. Multi-Task Learning of Keyphrase Boundary Classification. ACL 2017 (short). https://arxiv.org/abs/1704.00514 Ed Collins, Isabelle Augenstein, Sebastian Riedel. A Supervised Approach to Extractive Summarisation of Scientific Papers. CoNLL 2017. https://arxiv.org/abs/1706.03946
  15. 15. Thank you! isabelleaugenstein.github.io augenstein@di.ku.dk @iaugenstein github.com/isabelleaugenstein

Shared task summary for SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications Paper: https://arxiv.org/abs/1704.02853 Abstract: We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.

Vistas

Total de vistas

1.185

En Slideshare

0

De embebidos

0

Número de embebidos

88

Acciones

Descargas

36

Compartidos

0

Comentarios

0

Me gusta

0

×