SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Introduction           Reichenbach’s Model of Tense       RTMML       Building the corpus              Summary




           RTMBank: Capturing Verbs with Reichenbach’s
                         Tense Model

                            Leon Derczynski and Robert Gaizauskas

                                              University of Sheffield


                                                  20 July 2011




Leon Derczynski and Robert Gaizauskas                                                       University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Outline


       1 Introduction

       2 Reichenbach’s Model of Tense

       3 RTMML

       4 Building the corpus

       5 Summary




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Introduction to RTMBank




          We present RTMBank - a annotated corpus of tense verbs and
           their relations - that helps us better automatically process
                           temporal information in text.




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Temporal Annotation

               Why annotate temporal information?
               Relating time is an essential part of discourse.
               To automatically process time information in documents, we
               can record data with a temporal annotation.
               Data annotated might include: times, events, the relations
               between them.
               Existing standards - TIMEX, TimeML
               Existing corpora - TimeBank, AQUAINT TimeML corpus,
               WikiWars
               We focus on two sub-tasks: relating events, and processing
               temporal expressions.

Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Relating events


               How do we relate things in time?
               Temporal expressions and events may be seen as intervals.
               Defining relations between two intervals permits temporal
               ordering.
               Human readers can usually temporally relate events in a
               discourse using cues.
               Automatic labelling of temporal links is a difficult research
               problem.




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Temporal expressions



               What is a temporal expression?
               A temporal expression is text that describes a time or period.
               Wednesday; December 4, 1997; for two weeks
               The process of mapping temporal expressions to an absolute
               calendar is normalisation.
               Wednesday ⇒ 2011/01/12 T 00:00:00 - 2011/01/12 T
               23:59:59




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Outline


       1 Introduction

       2 Reichenbach’s Model of Tense

       3 RTMML

       4 Building the corpus

       5 Summary




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




The Model



               When describing an event, we can call the time of the event E .
               The event is described at speech time S.
               “I will walk the dog.” ⇒ S < E
               Reichenbach introduces a reference point, an abstract time
               from which events are viewed.
               “I ran home” ⇒ E < S
               “I had run home” ⇒ E < R < S




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Special properties of the reference point


               permanence: when sentences or clauses are combined,
               grammatical rules constrain the set of available tenses. These
               rules work such that R has the same position in all cases.
               “I had run home when I heard the noise”
               positional use: when a time is found in the same clause as a
               verbal event, R is bound to the time.
               “It was six o’clock and John had prepared” – the preparation,
               E , occurs before six o’clock, R.




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction            Reichenbach’s Model of Tense      RTMML   Building the corpus              Summary




Motivation for using the model

                   How will this model help temporal annotation?
                   At least 10% of normalisation errors are due to incorrect R 1 .
                   There is a “need to develop sophisticated methods for
                   temporal focus tracking if we are to extend current
                   time-stamping technologies”2 .

                   If can relate the S and R of multiple event verbs, we can
                   reason about and label the relation between event times.

                   Tracking S, E and R according to Reichenbach’s model helps
                   generate correct tense and aspect for NLG3 .
               1
                 Mani & Wilson, 2000
               2
                 Mazur & Dale, 2010
               3
                 Elson & McKeown, 2010
Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Outline


       1 Introduction

       2 Reichenbach’s Model of Tense

       3 RTMML

       4 Building the corpus

       5 Summary




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




What to annotate?


               RTMML is intended to describe Reichenbach’s verbal event
               structure.
               First primitive: verb groups describing an event (had snowed,
               is running, will have given).
               Second primitive: text describing a time (next two weeks, last
               April).
               We are more concerned with the relations between times than
               the absolute position of the times.




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Annotation Schema


               What does the markup look like?
               RTMML is an XML-based standoff annotation standard that
               may be standalone or integrated with TimeML.
               <doc> defines document creation/utterance time, which is
               also the default speech time.
               <verb> denotes verb groups, including tense, and relations
               between S, E and R. These points may be expressed in terms
               of other times in the annotation or new labels.
               <timerefx> denotes a time-referring expression, which can be
               optionally specified in TIMEX3 format.


Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Example markup


       <rtmml>
       Yesterday, John ate well.
       <seg type="token" />
       <doc time="now" />
       <timerefx xml:id="t1" target="#token0" />
       <verb xml:id="v1" target="#token3"
       view="simple" tense="past"
       sr=">" er="=" se=">"
       r="t1" s="doc" />
       </rtmml>



Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




RTMML links


               Although we can relate primitives with the current syntax,
               three common types of link are catered for with <rtmlink>:
               positions, describing positional use of the reference point;
               same timeframe, where events share a commonly situated
               reference point;
               reports, for reported speech.
       <rtmlink xml:id="l1" type="POSITIONS">
       <link source="#t1" />
       <link target="#v1" />
       </rtmlink>


Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Outline


       1 Introduction

       2 Reichenbach’s Model of Tense

       3 RTMML

       4 Building the corpus

       5 Summary




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Annotation Method



               How did we create the corpus?
               Collect 50 documents that overlap with TimeML text
               (TimeBank / ATC)
               Pre-process with GATE to identify VG (verb groups)
               Pre-process with TERNIP to identify times
               Add “reporting” links for some verbs (e.g. said)
               Curate annotations per-document




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Annotation Guidelines



               What edge cases frequently emerged?
               Infinitives: ignore these, and use the dominating verb (they
               want to find out)
               Times as durations: only include those that can be absolutely
               normalised
               Modals: only include these in a verb annotation for future
               tense




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Corpus Statistics



               What did we end up with?
               50 RTMML annotated documents, comprising
               22 000 tokens; in which are described
               5 000 events, and
               1 000 times; these are connected by
               temporal linkages.




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Context



               How do events tend to relate to each other in discourse?
               Emmanuel had said “This will explode!”, but later changed
               his mind.
               Positions of S and R persist throughout each context.
               Changing the S − R relationship inside a context leads to
               awkward text;
               “By the time I ran, John will have arrived”.




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Contexts in natural language



               What can we use context for?
               Sets of events and times are often found in “pools”, all
               belonging to the same temporal context
               These pools each describe a temporally-close set of events in
               the same realis or irrealis world
               Identifying the nature of the pools helps us perform temporal
               annotation (conditionals, reported speech)




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Outline


       1 Introduction

       2 Reichenbach’s Model of Tense

       3 RTMML

       4 Building the corpus

       5 Summary




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Summary




               Have we accomplished our goals?
               We have created a corpus annotated with Reichenbach’s tense
               model
               We can use this corpus to observe temporal links in text in
               detail




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Future Work



               Automatic annotation of events and links using Reichenbach’s
               model and RTMBank as training data
               Resolution of RTMML-annotated annotations for temporal
               reasoning
               Adapting the model to work outside of English, especially for
               languages with more complex aspectual systems (e.g. Russian)




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Conclusion




               We have presented RTMBank, a corpus containing temporal
                  annotations based on Reichenbach’s model of tense.




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model
Introduction           Reichenbach’s Model of Tense       RTMML   Building the corpus              Summary




Questions




                   Thank you for your time. Are there any questions?




Leon Derczynski and Robert Gaizauskas                                                   University of Sheffield
RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Más contenido relacionado

Más de Leon Derczynski

Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
Leon Derczynski
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Leon Derczynski
 
Review of: Challenges of migrating to agile methodologies
Review of: Challenges of migrating to agile methodologiesReview of: Challenges of migrating to agile methodologies
Review of: Challenges of migrating to agile methodologies
Leon Derczynski
 

Más de Leon Derczynski (20)

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
 
RumourEval
RumourEvalRumourEval
RumourEval
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media Data
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation Resource
 
Review of: Challenges of migrating to agile methodologies
Review of: Challenges of migrating to agile methodologiesReview of: Challenges of migrating to agile methodologies
Review of: Challenges of migrating to agile methodologies
 

Último

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

RTMBank: Capturing Verbs with Reichenbach's Tense Model

  • 1. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary RTMBank: Capturing Verbs with Reichenbach’s Tense Model Leon Derczynski and Robert Gaizauskas University of Sheffield 20 July 2011 Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 2. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Outline 1 Introduction 2 Reichenbach’s Model of Tense 3 RTMML 4 Building the corpus 5 Summary Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 3. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Introduction to RTMBank We present RTMBank - a annotated corpus of tense verbs and their relations - that helps us better automatically process temporal information in text. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 4. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Temporal Annotation Why annotate temporal information? Relating time is an essential part of discourse. To automatically process time information in documents, we can record data with a temporal annotation. Data annotated might include: times, events, the relations between them. Existing standards - TIMEX, TimeML Existing corpora - TimeBank, AQUAINT TimeML corpus, WikiWars We focus on two sub-tasks: relating events, and processing temporal expressions. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 5. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Relating events How do we relate things in time? Temporal expressions and events may be seen as intervals. Defining relations between two intervals permits temporal ordering. Human readers can usually temporally relate events in a discourse using cues. Automatic labelling of temporal links is a difficult research problem. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 6. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Temporal expressions What is a temporal expression? A temporal expression is text that describes a time or period. Wednesday; December 4, 1997; for two weeks The process of mapping temporal expressions to an absolute calendar is normalisation. Wednesday ⇒ 2011/01/12 T 00:00:00 - 2011/01/12 T 23:59:59 Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 7. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Outline 1 Introduction 2 Reichenbach’s Model of Tense 3 RTMML 4 Building the corpus 5 Summary Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 8. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary The Model When describing an event, we can call the time of the event E . The event is described at speech time S. “I will walk the dog.” ⇒ S < E Reichenbach introduces a reference point, an abstract time from which events are viewed. “I ran home” ⇒ E < S “I had run home” ⇒ E < R < S Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 9. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Special properties of the reference point permanence: when sentences or clauses are combined, grammatical rules constrain the set of available tenses. These rules work such that R has the same position in all cases. “I had run home when I heard the noise” positional use: when a time is found in the same clause as a verbal event, R is bound to the time. “It was six o’clock and John had prepared” – the preparation, E , occurs before six o’clock, R. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 10. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Motivation for using the model How will this model help temporal annotation? At least 10% of normalisation errors are due to incorrect R 1 . There is a “need to develop sophisticated methods for temporal focus tracking if we are to extend current time-stamping technologies”2 . If can relate the S and R of multiple event verbs, we can reason about and label the relation between event times. Tracking S, E and R according to Reichenbach’s model helps generate correct tense and aspect for NLG3 . 1 Mani & Wilson, 2000 2 Mazur & Dale, 2010 3 Elson & McKeown, 2010 Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 11. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Outline 1 Introduction 2 Reichenbach’s Model of Tense 3 RTMML 4 Building the corpus 5 Summary Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 12. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary What to annotate? RTMML is intended to describe Reichenbach’s verbal event structure. First primitive: verb groups describing an event (had snowed, is running, will have given). Second primitive: text describing a time (next two weeks, last April). We are more concerned with the relations between times than the absolute position of the times. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 13. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Annotation Schema What does the markup look like? RTMML is an XML-based standoff annotation standard that may be standalone or integrated with TimeML. <doc> defines document creation/utterance time, which is also the default speech time. <verb> denotes verb groups, including tense, and relations between S, E and R. These points may be expressed in terms of other times in the annotation or new labels. <timerefx> denotes a time-referring expression, which can be optionally specified in TIMEX3 format. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 14. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Example markup <rtmml> Yesterday, John ate well. <seg type="token" /> <doc time="now" /> <timerefx xml:id="t1" target="#token0" /> <verb xml:id="v1" target="#token3" view="simple" tense="past" sr=">" er="=" se=">" r="t1" s="doc" /> </rtmml> Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 15. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary RTMML links Although we can relate primitives with the current syntax, three common types of link are catered for with <rtmlink>: positions, describing positional use of the reference point; same timeframe, where events share a commonly situated reference point; reports, for reported speech. <rtmlink xml:id="l1" type="POSITIONS"> <link source="#t1" /> <link target="#v1" /> </rtmlink> Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 16. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Outline 1 Introduction 2 Reichenbach’s Model of Tense 3 RTMML 4 Building the corpus 5 Summary Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 17. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Annotation Method How did we create the corpus? Collect 50 documents that overlap with TimeML text (TimeBank / ATC) Pre-process with GATE to identify VG (verb groups) Pre-process with TERNIP to identify times Add “reporting” links for some verbs (e.g. said) Curate annotations per-document Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 18. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Annotation Guidelines What edge cases frequently emerged? Infinitives: ignore these, and use the dominating verb (they want to find out) Times as durations: only include those that can be absolutely normalised Modals: only include these in a verb annotation for future tense Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 19. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Corpus Statistics What did we end up with? 50 RTMML annotated documents, comprising 22 000 tokens; in which are described 5 000 events, and 1 000 times; these are connected by temporal linkages. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 20. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Context How do events tend to relate to each other in discourse? Emmanuel had said “This will explode!”, but later changed his mind. Positions of S and R persist throughout each context. Changing the S − R relationship inside a context leads to awkward text; “By the time I ran, John will have arrived”. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 21. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Contexts in natural language What can we use context for? Sets of events and times are often found in “pools”, all belonging to the same temporal context These pools each describe a temporally-close set of events in the same realis or irrealis world Identifying the nature of the pools helps us perform temporal annotation (conditionals, reported speech) Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 22. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Outline 1 Introduction 2 Reichenbach’s Model of Tense 3 RTMML 4 Building the corpus 5 Summary Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 23. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Summary Have we accomplished our goals? We have created a corpus annotated with Reichenbach’s tense model We can use this corpus to observe temporal links in text in detail Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 24. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Future Work Automatic annotation of events and links using Reichenbach’s model and RTMBank as training data Resolution of RTMML-annotated annotations for temporal reasoning Adapting the model to work outside of English, especially for languages with more complex aspectual systems (e.g. Russian) Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 25. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Conclusion We have presented RTMBank, a corpus containing temporal annotations based on Reichenbach’s model of tense. Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model
  • 26. Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary Questions Thank you for your time. Are there any questions? Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model