SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
PoliticalMashup                                            1




                     PoliticalMashup
  Connecting promises and actions of politicians and how
               the society reacts on them

                             Maarten Marx

                      Universiteit van Amsterdam

                  Groningen, α-informatica, 2011-03-11
PoliticalMashup                                 2



                            Content

• Overview PoliticalMashup project

• Zooming in on one cultural heritage dataset

• A few example applications

• Research ideas for NLP-scientists.
PoliticalMashup                                   3



                           Who am I?


• Political scientist turned computer scientist

• My field:
  • Theory of XML Database Systems
  • Semi Structured Information Retrieval

• Cooperation with
  • Tweede Kamer
  • Koninklijke Bibliotheek,
  • historians at NIOD, DNPP
PoliticalMashup                                         4



                  PoliticalMashup project

• Large scale data integration project

• 2 years NWO funded infrastructure project 2010-2012

• Partners: U. Amsterdam, Groningen and Tilburg

• Ongoing with irregular funding since 2008
PoliticalMashup                                                  5



                  Goal of PoliticalMashup

• Making huge amounts of textual data available for

• large scale automatic quantitative data and content analysis

• done by scientists from the humanities and social sciences.
PoliticalMashup                                          6



                     Mashup of what and how?

• 4 data sources
        Promises and actions of politicians
        Reactions on those in media and general public

• Connect data on
        Political entities
        Time
        Topics
PoliticalMashup                                               7



                          Data sources

Promises
    • Election manifestos, mostly scans, DNPP
    • Party websites and blogs, Archipol
    • Twitter of politicians

Actions Parliamentary proceedings, mostly scans, KB

Reactions
    • News media
    • User generated content Fora, Blogs, Comments on news,
      Twitter
PoliticalMashup                                       8



                      Used techniques

• Text analytics and XML DB and IR technology

• Named entity recognition and normalization

• Data mining, Machine Learning, hand-crafted rules

• Natural Language Processing, Language Models


 Make implicit structure and information explicit.
PoliticalMashup                                9



                  Zoom in on one data corpus
PoliticalMashup                                      10



                     Longitudinal data

• weakly measurement for over 150 years

• very stable measurement procedure and data model
PoliticalMashup                                11



                  Data about human behaviour
PoliticalMashup                         12



                  Often rather boring
PoliticalMashup                                       13



         But sometimes full of drama and excitement
PoliticalMashup                                                       14



                       Loads of measurement points

                  24.000 days, 450.000 topics, 7.5 miljoen speeches
PoliticalMashup                         15



                  Digitally available
PoliticalMashup                                      16



         De Handelingen der Staten Generaal (Dutch
                        Hansards)
PoliticalMashup                                          17



                    About this collection

• very sparse available metadata

• very rich “metadata” sits hidden inside the raw data

• Rich data model
• Meeting (1 Day)
  • Topic
    • Stage direction
    • Scene
     • Stage direction
     • Speech
      • Paragraph
PoliticalMashup                               18



                  Same data: different views

• Raw data in PDF

• XML styled with stylesheet

• Machine readable XML format
PoliticalMashup                               19



                  Some applications of this
PoliticalMashup                                                     20



                  Content and structure search

• Combine IR style keyword search with restrictions on structure.

• E.g., return speeches by Wilders about Islam
PoliticalMashup                                                   21



                  Exhaustive data collection

• Example query for NIOD historians

• Search for paragraphs about fascisme OR nazisme OR dictatuur
  OR (nazi AND dictatuur) OR . . .

• Return a tsv file with for each hit date speakername speakerid
  speaker-party . . .

• NIOD query
PoliticalMashup                                       22



                  Link the proceedings to entities

• Who is speaking?

• Who says what to whom?

Applications

• Summary of one speaker

• On old OCRed data: Linking and resolving entities
PoliticalMashup                                          23



       Application: Interruption graph (Attackogram)

• MP A interrupts B ⇐⇒ A speaks during the block of B.
PoliticalMashup                         24



                  NLP research topics
PoliticalMashup                                        25



                            0) Topics

• Common European thesaurus http://eurovoc.europa.eu

• detection

• classification (sentence, paragraph, speech level)
PoliticalMashup                                        26



                  1) Populist language in parliament

• PhD Thesis Jan Jagers (2006).
PoliticalMashup                                       27



 2) Automatically detecting promises (’toezegging’)
            by ministers in Parliament

• https:
  //zoek.officielebekendmakingen.nl/kst-103196.pdf
  (pagina 56)

• Eerste Kamer has a nice database online
  http://www.eerstekamer.nl/toezeggingen_2
PoliticalMashup                                                          28



                             Example

De voorzitter: Ik constateer dat wij bijna aan het einde van deze
vergadering zijn gekomen. Wij hebben nog tijd om even de
toezeggingen langs te lopen. Ik vraag iedereen om op te letten of er
niets over het hoofd is gezien. Ik zal dit snel doen en daarna spreken
wij nog even over het vervolg. De toezeggingen.
Na de zomer ligt het wetsvoorstel bij de Kamer.
Er komt een brief om de Kamer erover te informeren op welke wijze
er voorkomen wordt dat er expertise verloren gaat.
Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niet
toegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niet
toegezegd.
PoliticalMashup                                                      29



                    3) Opinion detection

• Detect opinions expressed about entities and topics. (Speaker is
  known)

• Detect reported speech.
PoliticalMashup                                                         30



                  4) Detect type of speech

• Interruption, attack, answer, speech (“betoog”), ’stage-direction’,
  ...

• http://data.politicalmashup.nl/debates/nl/
  h-ek-19961997-37-58.1-tijdslijn.html
PoliticalMashup                               31



                       5) Detect “bullshit”

• Tautologi¨en . . .
           e

• Regels zijn regels, Op is op

• p→p

• het is wat het is
PoliticalMashup                                              32



                  6) Spelling normalization

• Dutch had many spelling reforms.

• Leads to lower recall.

• Search in new spelling, return results in old spellings.
PoliticalMashup                                                     33



                  Lots of data available: happy to share

• Now: 15 years of Dutch Parliamentary Proceedings in rich XML

• Now: 200 years more in poorer XML, slowly getting richer.

• Parliamentary proceedings from EU (15y), UK (75y), Spain (40y),
  Scandinavian countries, . . .

• Election manifestos (provincial elections 2007 and 2011)

• All tweets, blogs, Flickr and Youtube of all Dutch national
  politicians since 1.5 year.
PoliticalMashup                        34



                      Thanks




                  maartenmarx@uva.nl

Más contenido relacionado

Similar a Connecting political promises and actions through text analysis

Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenMaxKemman
 
Using open datasets for research purposes
Using open datasets for research purposesUsing open datasets for research purposes
Using open datasets for research purposesMartijn Kleppe
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebLaura Hollink
 
Sense4us PACITA event presentation
Sense4us PACITA event presentationSense4us PACITA event presentation
Sense4us PACITA event presentationSENSE4US project
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Tuukka Ylä-Anttila
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statisticsEdwin de Jonge
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentationPiet J.H. Daas
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...Miriam Fernandez
 
WeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineWeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineTimo Wandhoefer
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsPiet J.H. Daas
 
voting advice slides
 voting advice slides voting advice slides
voting advice slidesmaartenmarx
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayDiana Maynard
 
Library IT in DK
Library IT in DK Library IT in DK
Library IT in DK Bo Fristed
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptxAkhirulAminulloh2
 
AMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationAMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationNickRichardson44
 
MACE 2012 Assignment Strategy
MACE 2012 Assignment StrategyMACE 2012 Assignment Strategy
MACE 2012 Assignment StrategyCindy Chang
 
Introduction to Research project PoliMedia
Introduction to Research project PoliMediaIntroduction to Research project PoliMedia
Introduction to Research project PoliMediaMartijn Kleppe
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 

Similar a Connecting political promises and actions through text analysis (20)

Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
 
Using open datasets for research purposes
Using open datasets for research purposesUsing open datasets for research purposes
Using open datasets for research purposes
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic Web
 
Sense4us PACITA event presentation
Sense4us PACITA event presentationSense4us PACITA event presentation
Sense4us PACITA event presentation
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
 
WeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineWeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens Online
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
 
voting advice slides
 voting advice slides voting advice slides
voting advice slides
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Library IT in DK
Library IT in DK Library IT in DK
Library IT in DK
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
AMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationAMIA 2017 - Data visualisation
AMIA 2017 - Data visualisation
 
MACE 2012 Assignment Strategy
MACE 2012 Assignment StrategyMACE 2012 Assignment Strategy
MACE 2012 Assignment Strategy
 
Introduction to Research project PoliMedia
Introduction to Research project PoliMediaIntroduction to Research project PoliMedia
Introduction to Research project PoliMedia
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 

Más de maartenmarx

Ilja state2014expressivity
Ilja state2014expressivityIlja state2014expressivity
Ilja state2014expressivitymaartenmarx
 
Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13maartenmarx
 
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13maartenmarx
 
Economie van de aandacht
  Economie van de aandacht  Economie van de aandacht
Economie van de aandachtmaartenmarx
 
Dans dataprijs2012
Dans dataprijs2012Dans dataprijs2012
Dans dataprijs2012maartenmarx
 
College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08maartenmarx
 
Presentation at NLDB 2012
Presentation at NLDB 2012Presentation at NLDB 2012
Presentation at NLDB 2012maartenmarx
 
Women in Dutch parliament: what they did
Women in Dutch parliament: what they didWomen in Dutch parliament: what they did
Women in Dutch parliament: what they didmaartenmarx
 
Namescape 2012 03 06
Namescape 2012 03 06Namescape 2012 03 06
Namescape 2012 03 06maartenmarx
 
TV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaalTV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaalmaartenmarx
 
Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10maartenmarx
 

Más de maartenmarx (11)

Ilja state2014expressivity
Ilja state2014expressivityIlja state2014expressivity
Ilja state2014expressivity
 
Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13
 
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
 
Economie van de aandacht
  Economie van de aandacht  Economie van de aandacht
Economie van de aandacht
 
Dans dataprijs2012
Dans dataprijs2012Dans dataprijs2012
Dans dataprijs2012
 
College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08
 
Presentation at NLDB 2012
Presentation at NLDB 2012Presentation at NLDB 2012
Presentation at NLDB 2012
 
Women in Dutch parliament: what they did
Women in Dutch parliament: what they didWomen in Dutch parliament: what they did
Women in Dutch parliament: what they did
 
Namescape 2012 03 06
Namescape 2012 03 06Namescape 2012 03 06
Namescape 2012 03 06
 
TV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaalTV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaal
 
Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10
 

Connecting political promises and actions through text analysis

  • 1. PoliticalMashup 1 PoliticalMashup Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Groningen, α-informatica, 2011-03-11
  • 2. PoliticalMashup 2 Content • Overview PoliticalMashup project • Zooming in on one cultural heritage dataset • A few example applications • Research ideas for NLP-scientists.
  • 3. PoliticalMashup 3 Who am I? • Political scientist turned computer scientist • My field: • Theory of XML Database Systems • Semi Structured Information Retrieval • Cooperation with • Tweede Kamer • Koninklijke Bibliotheek, • historians at NIOD, DNPP
  • 4. PoliticalMashup 4 PoliticalMashup project • Large scale data integration project • 2 years NWO funded infrastructure project 2010-2012 • Partners: U. Amsterdam, Groningen and Tilburg • Ongoing with irregular funding since 2008
  • 5. PoliticalMashup 5 Goal of PoliticalMashup • Making huge amounts of textual data available for • large scale automatic quantitative data and content analysis • done by scientists from the humanities and social sciences.
  • 6. PoliticalMashup 6 Mashup of what and how? • 4 data sources Promises and actions of politicians Reactions on those in media and general public • Connect data on Political entities Time Topics
  • 7. PoliticalMashup 7 Data sources Promises • Election manifestos, mostly scans, DNPP • Party websites and blogs, Archipol • Twitter of politicians Actions Parliamentary proceedings, mostly scans, KB Reactions • News media • User generated content Fora, Blogs, Comments on news, Twitter
  • 8. PoliticalMashup 8 Used techniques • Text analytics and XML DB and IR technology • Named entity recognition and normalization • Data mining, Machine Learning, hand-crafted rules • Natural Language Processing, Language Models Make implicit structure and information explicit.
  • 9. PoliticalMashup 9 Zoom in on one data corpus
  • 10. PoliticalMashup 10 Longitudinal data • weakly measurement for over 150 years • very stable measurement procedure and data model
  • 11. PoliticalMashup 11 Data about human behaviour
  • 12. PoliticalMashup 12 Often rather boring
  • 13. PoliticalMashup 13 But sometimes full of drama and excitement
  • 14. PoliticalMashup 14 Loads of measurement points 24.000 days, 450.000 topics, 7.5 miljoen speeches
  • 15. PoliticalMashup 15 Digitally available
  • 16. PoliticalMashup 16 De Handelingen der Staten Generaal (Dutch Hansards)
  • 17. PoliticalMashup 17 About this collection • very sparse available metadata • very rich “metadata” sits hidden inside the raw data • Rich data model • Meeting (1 Day) • Topic • Stage direction • Scene • Stage direction • Speech • Paragraph
  • 18. PoliticalMashup 18 Same data: different views • Raw data in PDF • XML styled with stylesheet • Machine readable XML format
  • 19. PoliticalMashup 19 Some applications of this
  • 20. PoliticalMashup 20 Content and structure search • Combine IR style keyword search with restrictions on structure. • E.g., return speeches by Wilders about Islam
  • 21. PoliticalMashup 21 Exhaustive data collection • Example query for NIOD historians • Search for paragraphs about fascisme OR nazisme OR dictatuur OR (nazi AND dictatuur) OR . . . • Return a tsv file with for each hit date speakername speakerid speaker-party . . . • NIOD query
  • 22. PoliticalMashup 22 Link the proceedings to entities • Who is speaking? • Who says what to whom? Applications • Summary of one speaker • On old OCRed data: Linking and resolving entities
  • 23. PoliticalMashup 23 Application: Interruption graph (Attackogram) • MP A interrupts B ⇐⇒ A speaks during the block of B.
  • 24. PoliticalMashup 24 NLP research topics
  • 25. PoliticalMashup 25 0) Topics • Common European thesaurus http://eurovoc.europa.eu • detection • classification (sentence, paragraph, speech level)
  • 26. PoliticalMashup 26 1) Populist language in parliament • PhD Thesis Jan Jagers (2006).
  • 27. PoliticalMashup 27 2) Automatically detecting promises (’toezegging’) by ministers in Parliament • https: //zoek.officielebekendmakingen.nl/kst-103196.pdf (pagina 56) • Eerste Kamer has a nice database online http://www.eerstekamer.nl/toezeggingen_2
  • 28. PoliticalMashup 28 Example De voorzitter: Ik constateer dat wij bijna aan het einde van deze vergadering zijn gekomen. Wij hebben nog tijd om even de toezeggingen langs te lopen. Ik vraag iedereen om op te letten of er niets over het hoofd is gezien. Ik zal dit snel doen en daarna spreken wij nog even over het vervolg. De toezeggingen. Na de zomer ligt het wetsvoorstel bij de Kamer. Er komt een brief om de Kamer erover te informeren op welke wijze er voorkomen wordt dat er expertise verloren gaat. Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niet toegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niet toegezegd.
  • 29. PoliticalMashup 29 3) Opinion detection • Detect opinions expressed about entities and topics. (Speaker is known) • Detect reported speech.
  • 30. PoliticalMashup 30 4) Detect type of speech • Interruption, attack, answer, speech (“betoog”), ’stage-direction’, ... • http://data.politicalmashup.nl/debates/nl/ h-ek-19961997-37-58.1-tijdslijn.html
  • 31. PoliticalMashup 31 5) Detect “bullshit” • Tautologi¨en . . . e • Regels zijn regels, Op is op • p→p • het is wat het is
  • 32. PoliticalMashup 32 6) Spelling normalization • Dutch had many spelling reforms. • Leads to lower recall. • Search in new spelling, return results in old spellings.
  • 33. PoliticalMashup 33 Lots of data available: happy to share • Now: 15 years of Dutch Parliamentary Proceedings in rich XML • Now: 200 years more in poorer XML, slowly getting richer. • Parliamentary proceedings from EU (15y), UK (75y), Spain (40y), Scandinavian countries, . . . • Election manifestos (provincial elections 2007 and 2011) • All tweets, blogs, Flickr and Youtube of all Dutch national politicians since 1.5 year.
  • 34. PoliticalMashup 34 Thanks maartenmarx@uva.nl