Talk given at Soeterbeeck eHumanities Workshop 14 June 2013: http://www.ru.nl/ehumanities/workshop-2013/home/ Describing event and storyline modelling for Semantics of History, BiographyNet and NewsReader
2. • Events are things that
happen at a certain place
and time
• Events are a core building
block of many information
sources
• Events are at the heart of
many eHumanities research
domains
Image http://historymartinez.files.wordpress.com/2010/11/moon-flag.jpg
3. • Events are multidimensional
objects
• Exact event boundaries and
elements are difficult to
define
• Sources reporting on events
may not have complete
information or may promote
their own view
Image: http://www.cardinalpath.com/cpwp/wp-content/uploads/img_0001.jpg
4. • Semantics of History
explores the temporal
dimension of events
• BiographyNet explores
relations between and
people and events as well as
changes in perspective on
these events
• NewsReader builds upon
Semantics of History and
BiographyNet and scales it
up
Image: http://images.yourdictionary.com/images/science/AStesser.jpg
5. • One elderly British gentleman was walking around in a state of shock. His wife
had been swimming when the waves struck. (BBC News, Sri Lanka, 26
December 2004)
• More than 300 people have died and 3,500 were injured after the massive sea
surges caused by an earthquake smashed intoThailand's western coast. (BBC
News, 27 December 2004)
• The UK government is to give at least £15m to help the victims of the Asian
earthquake which is thought to have killed nearly 60,000 people. (BBC News,
28 December 2004)
• A huge quake off western Indonesia on 26 December 2004 caused a massive
tsunami that killed around 230,000 people around the region. (BBC News, 4
January 2009)
11. Total daily stream of documents
Archives of decades
of news reports
Daily document intake of an individual
decision maker 50–3,000
±2,000,000 sources
±25,000,000,000 documents:
news, company reports, manager biographies
unknown volume:
events, sources and background data consulted
NewsReader: Zooming, Linking and Scaling up
Volumes beyond result list paradigm
Duplications, repetitions: new/old
Inconsistent and contradictory
Coloured and opinionated
Incomplete, piece-meal
Unauthorised
12. • the 7.7 magnitude quake
(source: Xinhuanet)
• two quakes, measuring 7.6
and 7.4 (source: Bloomberg)
• One 7.3-magnitude tremor
(source: Jakartapost)
Image: http://imgace.com/wp-content/uploads/2012/10/the-blue-button-is-true.jpg
13. • To link current to previous
information, different ways of
describing and registering
events need to be
interconnected
• To allow reasoning, domain
knowledge needs to be
captured
• To provide different
perspectives on the same
news story, the source a piece
of information came from
needs to be kept track of
Image: http://www.widescreen-wallpaper.eu/wallpapers/layers_of_color-1920x1080.jpg
14. Grounded Annotation Framework
(GAF)
• Keep event mentions
separate from event instances
• Linguistic information captured in
separate layer from semantic
information
• Semantic layer can also import
non-linguistic information, e.g.
coming from sensors
• Provenance is captured through
PROV-O
15. changes in the world
publication of sources
2004 2009
ANNOTATION
NAF
SEM-EVENT
TEMBLOR
ANNOTATION
TAF
SEM-EVENT
TSUNAMI
2004 2006 2007 2008 2009
SEM-EVENT
TEMBLOR
SEM-EVENT
TSUNAMI
ANNOTATION
SEM-EVENT
TEMBLOR
SEM-EVENT
TSUNAMI
2013
ANNOTATION
ANNOTATION ANNOTATION
ANNOTATION
sensor data
direct event report
delayed event report
future event report
Tsunami alert
system
future tsunami
"The catastrophe four years ago devastated Indian
Ocean community and killed more than 230,000
people, over 170,000 of them in Aceh
at northern tip of Sumatra Island of Indonesia."
..., the vessel is the party responsible for the 2004 Indian
Ocean tsunami that killed 230,000 people. Apparently,
the submarine was able to trigger seismic activity via
some kind of directed energy weapon.
SEM-EVENT
USS Jimmy
Carter energy
weapon
2005
2006 2007 20082005
20. • Events can be processed and
presented in a myriad of ways
→ interdisciplinary problem
• To preserve context,
perspective and provenance
need to be presented →
recognised in both humanities
and computer science
• A representation framework
needs to separate mentions
from instances → GAF is a
first step
Image: http://www.inhabitat.com/wp-content/uploads/fp_inhabitat2.jpg
22. NewsReader is funded by the European Union’s
7th Framework Programme (ICT-316404)
BiographyNet is funded by the Netherlands
eScience Center. Partners in BiographyNet are
Huygens/ING Institute of the Dutch Academy of
Sciences andVU University Amsterdam.
Semantics of History is funded by the Network
Institute.