Connecting political promises and actions through text analysis
1. PoliticalMashup 1
PoliticalMashup
Connecting promises and actions of politicians and how
the society reacts on them
Maarten Marx
Universiteit van Amsterdam
Groningen, α-informatica, 2011-03-11
2. PoliticalMashup 2
Content
• Overview PoliticalMashup project
• Zooming in on one cultural heritage dataset
• A few example applications
• Research ideas for NLP-scientists.
3. PoliticalMashup 3
Who am I?
• Political scientist turned computer scientist
• My field:
• Theory of XML Database Systems
• Semi Structured Information Retrieval
• Cooperation with
• Tweede Kamer
• Koninklijke Bibliotheek,
• historians at NIOD, DNPP
4. PoliticalMashup 4
PoliticalMashup project
• Large scale data integration project
• 2 years NWO funded infrastructure project 2010-2012
• Partners: U. Amsterdam, Groningen and Tilburg
• Ongoing with irregular funding since 2008
5. PoliticalMashup 5
Goal of PoliticalMashup
• Making huge amounts of textual data available for
• large scale automatic quantitative data and content analysis
• done by scientists from the humanities and social sciences.
6. PoliticalMashup 6
Mashup of what and how?
• 4 data sources
Promises and actions of politicians
Reactions on those in media and general public
• Connect data on
Political entities
Time
Topics
7. PoliticalMashup 7
Data sources
Promises
• Election manifestos, mostly scans, DNPP
• Party websites and blogs, Archipol
• Twitter of politicians
Actions Parliamentary proceedings, mostly scans, KB
Reactions
• News media
• User generated content Fora, Blogs, Comments on news,
Twitter
8. PoliticalMashup 8
Used techniques
• Text analytics and XML DB and IR technology
• Named entity recognition and normalization
• Data mining, Machine Learning, hand-crafted rules
• Natural Language Processing, Language Models
Make implicit structure and information explicit.
16. PoliticalMashup 16
De Handelingen der Staten Generaal (Dutch
Hansards)
17. PoliticalMashup 17
About this collection
• very sparse available metadata
• very rich “metadata” sits hidden inside the raw data
• Rich data model
• Meeting (1 Day)
• Topic
• Stage direction
• Scene
• Stage direction
• Speech
• Paragraph
18. PoliticalMashup 18
Same data: different views
• Raw data in PDF
• XML styled with stylesheet
• Machine readable XML format
20. PoliticalMashup 20
Content and structure search
• Combine IR style keyword search with restrictions on structure.
• E.g., return speeches by Wilders about Islam
21. PoliticalMashup 21
Exhaustive data collection
• Example query for NIOD historians
• Search for paragraphs about fascisme OR nazisme OR dictatuur
OR (nazi AND dictatuur) OR . . .
• Return a tsv file with for each hit date speakername speakerid
speaker-party . . .
• NIOD query
22. PoliticalMashup 22
Link the proceedings to entities
• Who is speaking?
• Who says what to whom?
Applications
• Summary of one speaker
• On old OCRed data: Linking and resolving entities
23. PoliticalMashup 23
Application: Interruption graph (Attackogram)
• MP A interrupts B ⇐⇒ A speaks during the block of B.
25. PoliticalMashup 25
0) Topics
• Common European thesaurus http://eurovoc.europa.eu
• detection
• classification (sentence, paragraph, speech level)
26. PoliticalMashup 26
1) Populist language in parliament
• PhD Thesis Jan Jagers (2006).
27. PoliticalMashup 27
2) Automatically detecting promises (’toezegging’)
by ministers in Parliament
• https:
//zoek.officielebekendmakingen.nl/kst-103196.pdf
(pagina 56)
• Eerste Kamer has a nice database online
http://www.eerstekamer.nl/toezeggingen_2
28. PoliticalMashup 28
Example
De voorzitter: Ik constateer dat wij bijna aan het einde van deze
vergadering zijn gekomen. Wij hebben nog tijd om even de
toezeggingen langs te lopen. Ik vraag iedereen om op te letten of er
niets over het hoofd is gezien. Ik zal dit snel doen en daarna spreken
wij nog even over het vervolg. De toezeggingen.
Na de zomer ligt het wetsvoorstel bij de Kamer.
Er komt een brief om de Kamer erover te informeren op welke wijze
er voorkomen wordt dat er expertise verloren gaat.
Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niet
toegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niet
toegezegd.
29. PoliticalMashup 29
3) Opinion detection
• Detect opinions expressed about entities and topics. (Speaker is
known)
• Detect reported speech.
30. PoliticalMashup 30
4) Detect type of speech
• Interruption, attack, answer, speech (“betoog”), ’stage-direction’,
...
• http://data.politicalmashup.nl/debates/nl/
h-ek-19961997-37-58.1-tijdslijn.html
31. PoliticalMashup 31
5) Detect “bullshit”
• Tautologi¨en . . .
e
• Regels zijn regels, Op is op
• p→p
• het is wat het is
32. PoliticalMashup 32
6) Spelling normalization
• Dutch had many spelling reforms.
• Leads to lower recall.
• Search in new spelling, return results in old spellings.
33. PoliticalMashup 33
Lots of data available: happy to share
• Now: 15 years of Dutch Parliamentary Proceedings in rich XML
• Now: 200 years more in poorer XML, slowly getting richer.
• Parliamentary proceedings from EU (15y), UK (75y), Spain (40y),
Scandinavian countries, . . .
• Election manifestos (provincial elections 2007 and 2011)
• All tweets, blogs, Flickr and Youtube of all Dutch national
politicians since 1.5 year.