Tata AIG General Insurance Company - Insurer Innovation Award 2024
PoliMedia presentation NOTaS meeting
1. Interlinking multimedia for the analysis of media
coverage of political debates
Max Kemman & Henri Beunders
NOTaS meeting
www.polimedia.nl
2. Main goal
• Aimed at Humanities researchers
• Using CLARIN standard
25-6-2012 PoliMedia - NOTaS meeting 2
3. Main research question
What choices do different media make in the coverage
of people and topics while reporting on debates in the
Dutch parliament since the first televised evening news
in 1956 until 1995?
25-6-2012 PoliMedia - NOTaS meeting 3
4. Historical research use case
• How did the European Monetary Union (EMU)
come to be in the 1990’s?
• What events led to the becoming of the EMU?
• How was this all represented by the media at
that time?
25-6-2012 PoliMedia - NOTaS meeting 4
5. Current approach
+ = Too much
work
Limited
material
+ = and
different
systems
25-6-2012 PoliMedia - NOTaS meeting 5
6. PoliMedia approach
PoliMedia Newspapers
Portal KB
1950-1995
Staten
- Browse: Generaal Television
debate and Digitaal Sound and Vision
date KB 1956-1995
- Search: 1818-1995
Radio
debate and
KB
person 1950-1984
25-6-2012 PoliMedia - NOTaS meeting 6
8. Data sets
• Primary data set:
• The Dutch parliamentary debates (Handelingen der
Staten-General (Dutch Hansard))
• Available at the KB in raw format
• Made CLARIN compliant in War In Parliament project
– chronological structure of consecutive speakers in a debate
• Secondary data set:
1. NISV Academia set (OAI protocol)
2. KB - newspapers (SRU protocol)
3. KB - radio bulletins (SRU protocol)
25-6-2012 PoliMedia - NOTaS meeting 8
9. Current status of technical work
1. Extract structure of debates
2. Find named entities in debate texts: people,
organizations, locations.
3. Find links between debates and media.
25-6-2012 PoliMedia - NOTaS meeting 9
12. 2. Named Entity Recognition
in debates
• Fietstas: web services for processing textual
content
– http://fietstas.science.uva.nl/
• Lists of named entities (NEs) that appear in
specific documents or sets of documents
• Works well with Dutch language (unlike other
popular services like Dbpedia spotlight)
25-6-2012 PoliMedia - NOTaS meeting 12
15. 3. Find links to newspapers and radio
bulletins
We use the dates, topics, named entities and
speakers of the debates to query the media
archives.
Media document harvesting:
• SRU protocol (Search and retrieval via URL )
• http://www.loc.gov/standards/sru/
• JSRU is a Java implementation of the SRU
protocol at the KB
25-6-2012 PoliMedia - NOTaS meeting 15
16. Automatic Query Construction
• Persons, Locations and Organizations
Debate
Metadata mentioned inside topics of the debate
• Speakers
Topic 1 TopicList =
PersonsInTopic LocationsInTopic Org.InTopic
Speaker 1 / Content
Speaker 2 / Content +
Speaker n =
Speaker 3 / Content
ActorFromSegment TimeFrame
Topic 2
Example query: give all the
newspaper issues in the collection
Speaker 1 / Content Query
DDD_krantnr where the date value is
between 01-01-1940 and 31-12-1945
25-6-2012 PoliMedia - NOTaS meeting
19. The date of a debate and a media
article
• We use the dates, topics, named
entities and speakers of the
debates to query the media
archives.
• News item is always at the same
day or after the debate.
• How much time should we allow
between debate and media item?
• Current choice: 1 month.
Result 1-26 of 26 results for “Princen” AND “Van
Mierlo”
Timeframe: one month period:
• 26 articles in period between 21/12. and 21/01
25-6-2012 • 7 on day of the
PoliMedia - NOTaS meeting debate, only 1 article 1 month later.
19
20. Debate → Newspaper example
Dates between:
21.12.1994.(debate date)
21.01.1995.
• Queries:
o Small numbers of topics (to avoid
overspecialization)
o Shorter timespan (fast media cycle)
25-6-2012 PoliMedia - NOTaS meeting 20
22. PoliMedia+
• Elections in September
300
influential
political
Twitter
accounts
25-6-2012 PoliMedia - NOTaS meeting 22
23. What can you do with this?
• PoliMedia allows a better insight between
politics and media
• What can Speech- and Language-technologists
do with it?
25-6-2012 PoliMedia - NOTaS meeting 23
24. Contact
www.polimedia.nl
kemman@eshcc.eur.nl
Acknowledgements
• Rest of the team
– Laura Hollink (VU), Geert-Jan Houben, Damir Juric (TU
Delft), Johan Oomen, Jaap Blom (NISV), Martijn
Kleppe (EUR)
– KB
• War in Parliament
• CLARIN
– Arjan van Hessen
25-6-2012 PoliMedia - NOTaS meeting 24
Notas del editor
Limited: not everything is in it, but more importantly no mark-up or pages
Searching and browsing multimedial databases in a single interfaceOffering a better insight in the relations between media itemsAllowing researchers to create their own interface on top of the infrastructure