1. TETI: a TimeML Compliant TimEx
Tagger for Italian
Tommaso Caselli, Felice dell'Orletta and Irina Prodanof
Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa
{firstName.secondName@ilc.cnr.it}
IMCSIT 2009 – CL-A09, Mragawo, October, 13
2. Outline:
Motivations
Extracting Temporal expression and the TIMEX3
tag
TETI:
− System architecture
− Demo
Evaluation
Conclusions & Future Work
3. Motivations
Recovering temporal relations in text/discourse is essential to
improve the performance of many NLP systems (O.D-Q.A., Text
Mining, Summarization, Reasoning)
Most temporal information in text/discourse is only IMPLICITLY
stated
Need to develop procedures to maximize the role of the various
sources of information
Temporal expressions represent a source of explicit temporal
knowledge which can:
− Locate an eventuality in time, and thus used for
inferencing for temporal relations between eventualities
− Measure the duration of an eventuality
4. Extracting Temporal Expressions
The extraction of timexes can be divide into 4
subtasks:
− Recognizing and bracketing the timex
− Feature extraction (type of time unit, referential
status, presence of modifiers)
− Computing the interval of reference on the time
line
− Resolving the timex, i.e. normalize the value to a
standard output format
5. Extracting Temporal Expressions
The extraction of timexes can be divide into 4
subtasks:
− Recognizing and bracketing the timex
− Feature extraction (type of time unit,
referential status, presence of modifiers)
− Computing the interval of reference on the time
line
− Resolving the timex, i.e. normalize the value to a
standard output format
6. Temporal Expressions in TimeML:
The TIMEX3 tag
TIMEX3 tag extends and improves previous tags for this task,
namely TIMEX, TIDES TIMEX2
TIMEX3 tag is used to mark any time word i.e. both absolute
and relative timexes such as day time (midnight..), dates of
different granularity (yesterday, last spring..), calendar dates
(01/12/1980..), durations (three hours, two years..), set of time
(yearly, every day..)
The annotation process is based on:
− the constituent structure (NP, AdjP, AdvP, Time/Date
Pattern)
− the granularity of the time units
− the relations between the timexes
7. TETI: Temporal Expression Tagger
for Italian
Rule-based system
Main components:
Chunked text
TIMEX
DETECTOR &
TIMEX TAGGER
Two external
resources: TimEx
Trigger Dictionary
and a Modifier
Dictionary
10. TETI: Temporal Expression Tagger
for Italian (2)
Chunker output
approximate
TIMEX3 tag
extent
Extent of timexes
corresponds to
regolar patterns of
combination of
chunks
11. TETI: Temporal Expression Tagger
for Italian (3)
Analysis of the
chuncked text
Chunked text
Lookout in the
TimeEx Trigger
dictionary
Extraction of the
necessary features
for the bracketing
13. TETI: Temporal Expression Tagger
for Italian (4)
Core element of
the tagger
Chunked text
A general
condition + set of
local conditions
If the conditions
are true, the tagger
activates the
related rules and
brackets the timex
with TIMEX3
17. TETI: Temporal Expression Tagger
for Italian (5)
More complex
timexes require a
Chunked text further lookup in
the TimEx Trigger
Dictionary to
extract further
features (sematic
relations) for the
correct bracketing
19. Evaluation
42 newpaper articles manually annotated
367 timexes
TAG TOT CORR. MISSING INCORR. P R F
TIMEX3 367 321 35 66 82.95 90.17 86.41
TIMEX3: 90 55 12 23 82.09 70.51 75.86
modificatori
20. Conclusion & Future Work
• Reduction of the number of false positives
• Implemetation of the normalization phase → rule
based
• Re-wrting of the rules to be compliant with the
KAF format (KYOTO Project)
• Release of the tool via web service
21. Acknowlegments
Thanks to Roberto Bartolini for his help in the
development of the demo