Discovery of temporal information is key for organising knowledge and therefore the task of extracting and representing temporal information from texts has received an increasing interest. In this paper we focus on the discovery of temporal footprints from encyclopaedic descriptions. Temporal footprints are time-line periods that are associated to the existence of specific concepts. Our approach relies on the extraction of date mentions and prediction of lower and upper bound- aries that define temporal footprints. We report on several experiments on persons’ pages from Wikipedia in order to illustrate the feasibility of the proposed methods.
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Mining temporal footprints from Wikipedia
1. filannim@cs.man.ac.uk
School of Computer Science
presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014
Mining temporal
footprints from Wikipedia
Michele Filannino, Goran Nenadic
2. presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014 / 25
introduction
■ Temporal information is crucial for organising
structured and unstructured data
■ Several temporal information extraction (TIE)
systems are nowadays available
● thanks to TempEval challenge series
2
5. presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014 / 25
temporal footprint
A temporal footprint is a
continuous period on the time-line
that temporally defines the
existence of a particular concept.
Immanuel Kant, Paul Guyer, and Allen W Wood. 1998. Critique of pure reason. Cambridge
University Press.
5
6. presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014 / 25
problem
Can we predict temporal footprints from
encyclopaedic descriptions of concepts?
■ input: textual description of a concept
■ output: prediction of a temporal
interval
7. Web
Cellphone
Computer
Car
Richard Feynman
Bicycle
Carl Friedrich Gauss
French revolution
Age of Enlightenment
Galileo Galilei
Leonardo Da Vinci
Christopher Columbus
Renaissance
Arming sword
High Middle Ages
Gengis Khan
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
Object Person Historical period
Examples of temporal footprints 7 / 23
15. presentation 1st AHA! Workshop, COLING 2014
normal distribution fitting
0.050
0.038
freq
1360 1410 1460 1510 1560 1610 1660 1710 1760 1810
12 Dublin, 23/08/2014 / 25
0.025
0.013
0.000
time (in years)
Alpha and Beta parameters control the size and offset of the gaussian bell.
α param.
16. presentation 1st AHA! Workshop, COLING 2014
normal distribution fitting
0.050
0.038
freq
1360 1410 1460 1510 1560 1610 1660 1710 1760 1810
12 Dublin, 23/08/2014 / 25
0.025
0.013
0.000
time (in years)
Alpha and Beta parameters control the size and offset of the gaussian bell.
α param.
17. presentation 1st AHA! Workshop, COLING 2014
normal distribution fitting
β param.
1360 1410 1460 1510 1560 1610 1660 1710 1760 1810
Dublin, 23/08/2014 / 25
freq
0.050
0.038
0.025
0.013
0.000
time (in years)
Alpha and Beta parameters control the size and offset of the gaussian bell.
13
18. presentation 1st AHA! Workshop, COLING 2014
normal distribution fitting
β param.
1360 1410 1460 1510 1560 1610 1660 1710 1760 1810
Dublin, 23/08/2014 / 25
freq
0.050
0.038
0.025
0.013
0.000
time (in years)
Alpha and Beta parameters control the size and offset of the gaussian bell.
13
19. presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014 / 25
error measure
gold
prediction
Fatima De Carvalho. 1996. Histogrammes et indices de proximite ́en analyse donne és
symboliques. Acyes de l’e ćole d’e t́e ́sur l’analyse des donne és symboliques. LISE-CEREMADE,
Universite ́de Paris IX Dauphine, pages 101–127.
14
union overlap
20. presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014 / 25
error measure
Fatima De Carvalho. 1996. Histogrammes et indices de proximite ́en analyse donne és
symboliques. Acyes de l’e ćole d’e t́e ́sur l’analyse des donne és symboliques. LISE-CEREMADE,
Universite ́de Paris IX Dauphine, pages 101–127.
15
union
gold
prediction
21. presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014 / 25
strategies
A. RegEx
B. RegEx + Filtering
C. RegEx + Filtering + Gaussian fitting
D. HeidelTime + Filtering + Gaussian fitting
16
22. presentation 1st AHA! Workshop, COLING 2014
Dublin, 23/08/2014 / 25
evaluation
■ subject: people
■ lived from 1000 AD to 2014
● text from Wikipedia web pages
● year of birth and death from DBpedia
■ 228,824 people collected
■ simple definition of temporal footprint
● birth and death dates
17
29. presentation 1st AHA! Workshop, COLING 2014
other types of temporal footprint?
■ Christopher Columbus will die in 2057 ?!
Dublin, 23/08/2014 / 25
Prediction: 1366-2057 (1451-1506), E: 0.92
23
30. presentation 1st AHA! Workshop, COLING 2014
other types of temporal footprint?
■ Christopher Columbus will die in 2057 ?!
Dublin, 23/08/2014 / 25
Prediction: 1366-2057 (1451-1506), E: 0.92
23
31. presentation 1st AHA! Workshop, COLING 2014
other types of temporal footprint?
■ Christopher Columbus will die in 2057 ?!
Dublin, 23/08/2014 / 25
Prediction: 1366-2057 (1451-1506), E: 0.92
23
32. presentation 1st AHA! Workshop, COLING 2014
other types of temporal footprint?
■ Christopher Columbus will die in 2057 ?!
Dublin, 23/08/2014 / 25
Prediction: 1366-2057 (1451-1506), E: 0.92
23
AHA!
33. presentation 1st AHA! Workshop, COLING 2014
physical existence vs. social coverage
■ Anne Frank’s footprint is shifted in the future
24
Dublin, 23/08/2014 / 25
34. presentation 1st AHA! Workshop, COLING 2014
physical existence vs. social coverage
■ Anne Frank’s footprint is shifted in the future
24
Dublin, 23/08/2014 / 25
35. presentation 1st AHA! Workshop, COLING 2014
physical existence vs. social coverage
■ Anne Frank’s footprint is shifted in the future
24
Dublin, 23/08/2014 / 25
36. presentation 1st AHA! Workshop, COLING 2014
conclusions
■ how the methodology behaves on different
Dublin, 23/08/2014 / 25
languages? how on different sources?
■ oracle-like side-effect behaviour:
• Apple Inc. will be closed down this year
• Stanford University will be closed down in 2029
■ Future works
• mixture of normal distributions
25