SlideShare una empresa de Scribd logo
1 de 34
Legislative document content extraction based
on Semantic Web technologies
A use case about processing the History of the Law at Chile
Francisco Cifuentes Silva
Library of Congress, Chile
PhD Student
WESO research group
Jose Emilio Labra Gayo
WESO research group
University of Oviedo, Spain
Chilean Library of Congress
In Spanish: BCN (Biblioteca del Congreso Nacional de Chile)
Political
powers
ExecutiveJudiciaryLegislative
Independent body inside the Legislative power
Advices the parliament and gives services to citizens
http://www.bcn.cl
2 projects at library of congress (BCN)
History of the Law
Parliamentary work
History of the Law (LeyChile)
Collect all documents generated during a law legislative process
Phases:
An initiative sees life as a draft bill
Subject to debates
Validity time (it is published)
Modifications, additions,...
Derogation
Goal:
Capture the spirit of the law
Traceability
https://www.bcn.cl/historiadelaley
Parliamentary work
Collect all legislative activity by each Member of Parliament
Retrieve all interventions made
Parliamentary motion
Session journal
Commission report
Ordered and categorised
https://www.bcn.cl/laborparlamentaria/
Both projects adopted semantic technologies
Some initial reasons:
Semantic technologies considered one pillar of strategic plan (in 2014)
Innovative action to generate new products
Improve interoperability mechanisms
Sem. Web aligned well with open & public data
Which semantic technologies?
Text mining and content enrichment
Entity extraction
Topic identification
Automatic markup
Classification
Machine readable info
XML & URIs
RDF
Ontologies
Linked Open Data
Workflow pipelines
3 main steps
Automatic XML Marker
RDF & Linked data generation
Content delivery
Linked Open
Data
Query DB
Workflow overview
National library
Legislative documents
• Paper (requires OCR)
• Text documents
Automatic
XML
marker
SVN repository
Akoma-Ntoso
XML editor &
tools
Publishing
(RDF extraction
From Akoma-Ntoso)
Services
layer
Content
portals
Automatic XML marker
Source: Text Target: XML following Akoma-Ntoso
Automatic XML marker
Text
Entity Type
MediatorLegal Knowledge
Base
Entity Type URI Structural
marker
Internal XML
representation
Converter
XML
AKN
Text
Text
Named Entity
Recognizer 4 phases
1. Named Entity Recognizer
Detection of entities & types of entities
Web service implementing the Stanford NER with a CRF classifier
Evaluation in production: detects 97% entities
Type Some examples # of entities
Person Salvador Allende, Sebastián Piñera 5.139
Organization Ministerio de Salud, SERNATUR 2.848
Location Valparaíso, Santiago de Chile 1.251
Document Ley 20.000, Diario de sesión nº 12 732.497
Role Senador, Diputado, Alcalde 428
Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389
Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737
Dates 27 de febrero de 2010, el próximo año, ... 20.632
Text
Entity Type
Text
Named Entity
Recognizer
2. Mediator
Entity linking and disambiguation
Text similarity algorithms
Based on Apache Lucene
In-house development
- Use of context information to narrow
list of candidates
- Custom filters and association
heuristics
- Specialized web services
Entity Type
Mediator
Legal Knowledge
Base
Entity Type URI
Text
Text
3. Structural marker
Detect structures in the text
Titles, subtitles, paragraphs, sections,...
Special structure for debates: participation
Regular expressions + custom rules
Entity Type URI
Structural
marker
Internal XML
representation
Text
4. XML converter to Akom-Ntoso
Programmatic approach
Internal XML representation similar to DOM
Each node converted to text in AKN-XML
Internal XML
representation
Converter
XML
AKN
Human edition of AKN-Documents
Quality assurance by human analysts
They review the generated XML documents
2 editors:
Ad-hoc XML editor
Commercial editor: LegisPro (Xcential)
Linked data generation
The pilot project (2011) carefully defined a stable URI model
URIs have been maintained since them
URIs = IDs in the whole system
URIs are dereferentiable
Content negotiation
Custom linked data browser
Documentation (in Spanish)
http://datos.bcn.cl/es/documentacion
AKN2RDF
RDF extraction from Akoma-Ntoso XML
● Custom-made converter (XSL discarded for perceived complexity)
● Each XML tag implemented in one Class
● Extracted data saved into multiple databases (Relational and RDF)
Linked data generation
Source: AKN XML documents
Linked data browser (WESO-DESH)
Target: RDF data
http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
SPARQL endpoint
RDF triples are published as a public SPARQL endpoint
Number of norms by municipality
Content delivery
Web portals using Open Source Technologies
CMS (Typo3)
Python/Java
Varnish
Apache Lucene
REST Web service layers which connect to RDF triplestore and DB
Data exports to PDF, Doc and XML formats
URIs of parliamentary profiles = URIs in triplestore
History of the Law portal
https://www.bcn.cl/historiadelaley
Links to
Members of
Parliament
Each article
has a link
Different
versions
of a law
History of the Law portal
https://www.bcn.cl/historiadelaley
Compare
different
versions
Parliamentary Work
https://www.bcn.cl/laborparlamentaria
Show
participation of
each Member of
Parliament
Some experimental visualizations
Relationships between laws
Historical Parliament
Parliamentary genealogy (family relationships)
Regions mentioned in laws (legislative hackathon)
Links between laws
Historical parliament
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/
Parliamentary genealogy
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/consulta.jsp
Regions mentioned by law
Result of a legislative hackathon
http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html
In 2010 there was an
Earthquake in BioBio region
Some statistics
24.368 documents (nov. 2018)
Number of RDF triples: 28 millions
According to Google analytics
Average browsing time: 2min 26s
Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2016-2017)
And some findings...
Question: why are there some valleys?
Dictatorship time
Session attendance by year
RDF triples generated by year
Some lessons learnt
RDF granularity & inference trade-off
RDF statements + inference (high running times...queries that didn't terminate)
A priori inferred triples added to triple store (high response times for large docs)
Small subset of RDF triples (structural parts of docs and metadata)
Performance problems in XML editor browsing long docs (>1000pages)
Low SPARQL endpoint usage by external apps
If we could start again, I would recommend ShEx
Personal note: These kind of data portals led to my interest in ShEx
Conclusions & future projects
Well designed URIs can act as a perfect glue for interoperability
Automatic workflow pipelines help long-term survival of LD-based projects
SPARQL endpoint since 2011
Future projects on top of existing ones
National Budget as Linked data
Diana Project: Members of Parliament linked to social network analysis
New portal: User customization & recommender systems
End of presentation
Acknowledgements:
David Vilches, Eridan Otto, Christian Sifaqui

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperability
 
Snac webinar v3
Snac webinar v3Snac webinar v3
Snac webinar v3
 
Building NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham CousinsBuilding NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham Cousins
 
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
 
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
 
Stahmer-9-Jun15-final
Stahmer-9-Jun15-finalStahmer-9-Jun15-final
Stahmer-9-Jun15-final
 
Wacker-4-june15
Wacker-4-june15Wacker-4-june15
Wacker-4-june15
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
 
Moving to the network level: discovery and disclosure
Moving to the network level:discovery and disclosureMoving to the network level:discovery and disclosure
Moving to the network level: discovery and disclosure
 
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015
 
Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...
 
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Semantic web
Semantic webSemantic web
Semantic web
 

Similar a Legislative document content extraction based on Semantic Web technologies

Roles of the Chilean Library of Congress
Roles of the Chilean Library of CongressRoles of the Chilean Library of Congress
Roles of the Chilean Library of Congress
congresochile
 
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
Building a Legal Taxonomy &  Thesaurus: The Palestinian ExperienceBuilding a Legal Taxonomy &  Thesaurus: The Palestinian Experience
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
Jamil Salem
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...
Benoit Pauwels
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...
ULB - Bibliothèques
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011
Peter Neish
 

Similar a Legislative document content extraction based on Semantic Web technologies (20)

Publishing web content tailored to audiences / Liberando contenido a la med...
Publishing  web content tailored to  audiences / Liberando contenido a la med...Publishing  web content tailored to  audiences / Liberando contenido a la med...
Publishing web content tailored to audiences / Liberando contenido a la med...
 
Lex school 2011
Lex school 2011Lex school 2011
Lex school 2011
 
The ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and EvaluationThe ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and Evaluation
 
Workshop on "Legislative XML
Workshop on "Legislative XMLWorkshop on "Legislative XML
Workshop on "Legislative XML
 
Collecter 04
Collecter 04Collecter 04
Collecter 04
 
Introduction to uk legislation
Introduction to uk legislationIntroduction to uk legislation
Introduction to uk legislation
 
Roles of the Chilean Library of Congress
Roles of the Chilean Library of CongressRoles of the Chilean Library of Congress
Roles of the Chilean Library of Congress
 
E resources for law libraries
E resources for law librariesE resources for law libraries
E resources for law libraries
 
Presentación para USM
Presentación para USMPresentación para USM
Presentación para USM
 
eGov2017 Blockchain Technology
eGov2017 Blockchain TechnologyeGov2017 Blockchain Technology
eGov2017 Blockchain Technology
 
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
Building a Legal Taxonomy &  Thesaurus: The Palestinian ExperienceBuilding a Legal Taxonomy &  Thesaurus: The Palestinian Experience
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011
 
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
 
Limitreal
LimitrealLimitreal
Limitreal
 
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin BlockchainBeyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
 
Information Technology and Legal Education_
Information Technology and Legal Education_Information Technology and Legal Education_
Information Technology and Legal Education_
 
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
 
Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)
 

Más de Jose Emilio Labra Gayo

Más de Jose Emilio Labra Gayo (20)

Publicaciones de investigación
Publicaciones de investigaciónPublicaciones de investigación
Publicaciones de investigación
 
Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctorado
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Wikidata
WikidataWikidata
Wikidata
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introducción a la Web Semántica
Introducción a la Web SemánticaIntroducción a la Web Semántica
Introducción a la Web Semántica
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
2017 Tendencias en informática
2017 Tendencias en informática2017 Tendencias en informática
2017 Tendencias en informática
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
19 javascript servidor
19 javascript servidor19 javascript servidor
19 javascript servidor
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazados
 
16 Alternativas XML
16 Alternativas XML16 Alternativas XML
16 Alternativas XML
 
XSLT
XSLTXSLT
XSLT
 
XPath
XPathXPath
XPath
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el Servidor
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Legislative document content extraction based on Semantic Web technologies

  • 1. Legislative document content extraction based on Semantic Web technologies A use case about processing the History of the Law at Chile Francisco Cifuentes Silva Library of Congress, Chile PhD Student WESO research group Jose Emilio Labra Gayo WESO research group University of Oviedo, Spain
  • 2. Chilean Library of Congress In Spanish: BCN (Biblioteca del Congreso Nacional de Chile) Political powers ExecutiveJudiciaryLegislative Independent body inside the Legislative power Advices the parliament and gives services to citizens http://www.bcn.cl
  • 3. 2 projects at library of congress (BCN) History of the Law Parliamentary work
  • 4. History of the Law (LeyChile) Collect all documents generated during a law legislative process Phases: An initiative sees life as a draft bill Subject to debates Validity time (it is published) Modifications, additions,... Derogation Goal: Capture the spirit of the law Traceability https://www.bcn.cl/historiadelaley
  • 5. Parliamentary work Collect all legislative activity by each Member of Parliament Retrieve all interventions made Parliamentary motion Session journal Commission report Ordered and categorised https://www.bcn.cl/laborparlamentaria/
  • 6. Both projects adopted semantic technologies Some initial reasons: Semantic technologies considered one pillar of strategic plan (in 2014) Innovative action to generate new products Improve interoperability mechanisms Sem. Web aligned well with open & public data
  • 7. Which semantic technologies? Text mining and content enrichment Entity extraction Topic identification Automatic markup Classification Machine readable info XML & URIs RDF Ontologies Linked Open Data
  • 8. Workflow pipelines 3 main steps Automatic XML Marker RDF & Linked data generation Content delivery
  • 9. Linked Open Data Query DB Workflow overview National library Legislative documents • Paper (requires OCR) • Text documents Automatic XML marker SVN repository Akoma-Ntoso XML editor & tools Publishing (RDF extraction From Akoma-Ntoso) Services layer Content portals
  • 10. Automatic XML marker Source: Text Target: XML following Akoma-Ntoso
  • 11. Automatic XML marker Text Entity Type MediatorLegal Knowledge Base Entity Type URI Structural marker Internal XML representation Converter XML AKN Text Text Named Entity Recognizer 4 phases
  • 12. 1. Named Entity Recognizer Detection of entities & types of entities Web service implementing the Stanford NER with a CRF classifier Evaluation in production: detects 97% entities Type Some examples # of entities Person Salvador Allende, Sebastián Piñera 5.139 Organization Ministerio de Salud, SERNATUR 2.848 Location Valparaíso, Santiago de Chile 1.251 Document Ley 20.000, Diario de sesión nº 12 732.497 Role Senador, Diputado, Alcalde 428 Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389 Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737 Dates 27 de febrero de 2010, el próximo año, ... 20.632 Text Entity Type Text Named Entity Recognizer
  • 13. 2. Mediator Entity linking and disambiguation Text similarity algorithms Based on Apache Lucene In-house development - Use of context information to narrow list of candidates - Custom filters and association heuristics - Specialized web services Entity Type Mediator Legal Knowledge Base Entity Type URI Text Text
  • 14. 3. Structural marker Detect structures in the text Titles, subtitles, paragraphs, sections,... Special structure for debates: participation Regular expressions + custom rules Entity Type URI Structural marker Internal XML representation Text
  • 15. 4. XML converter to Akom-Ntoso Programmatic approach Internal XML representation similar to DOM Each node converted to text in AKN-XML Internal XML representation Converter XML AKN
  • 16. Human edition of AKN-Documents Quality assurance by human analysts They review the generated XML documents 2 editors: Ad-hoc XML editor Commercial editor: LegisPro (Xcential)
  • 17. Linked data generation The pilot project (2011) carefully defined a stable URI model URIs have been maintained since them URIs = IDs in the whole system URIs are dereferentiable Content negotiation Custom linked data browser Documentation (in Spanish) http://datos.bcn.cl/es/documentacion
  • 18. AKN2RDF RDF extraction from Akoma-Ntoso XML ● Custom-made converter (XSL discarded for perceived complexity) ● Each XML tag implemented in one Class ● Extracted data saved into multiple databases (Relational and RDF)
  • 19. Linked data generation Source: AKN XML documents Linked data browser (WESO-DESH) Target: RDF data http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
  • 20. SPARQL endpoint RDF triples are published as a public SPARQL endpoint Number of norms by municipality
  • 21. Content delivery Web portals using Open Source Technologies CMS (Typo3) Python/Java Varnish Apache Lucene REST Web service layers which connect to RDF triplestore and DB Data exports to PDF, Doc and XML formats URIs of parliamentary profiles = URIs in triplestore
  • 22. History of the Law portal https://www.bcn.cl/historiadelaley Links to Members of Parliament Each article has a link Different versions of a law
  • 23. History of the Law portal https://www.bcn.cl/historiadelaley Compare different versions
  • 25. Some experimental visualizations Relationships between laws Historical Parliament Parliamentary genealogy (family relationships) Regions mentioned in laws (legislative hackathon)
  • 29. Regions mentioned by law Result of a legislative hackathon http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html In 2010 there was an Earthquake in BioBio region
  • 30. Some statistics 24.368 documents (nov. 2018) Number of RDF triples: 28 millions According to Google analytics Average browsing time: 2min 26s Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2016-2017)
  • 31. And some findings... Question: why are there some valleys? Dictatorship time Session attendance by year RDF triples generated by year
  • 32. Some lessons learnt RDF granularity & inference trade-off RDF statements + inference (high running times...queries that didn't terminate) A priori inferred triples added to triple store (high response times for large docs) Small subset of RDF triples (structural parts of docs and metadata) Performance problems in XML editor browsing long docs (>1000pages) Low SPARQL endpoint usage by external apps If we could start again, I would recommend ShEx Personal note: These kind of data portals led to my interest in ShEx
  • 33. Conclusions & future projects Well designed URIs can act as a perfect glue for interoperability Automatic workflow pipelines help long-term survival of LD-based projects SPARQL endpoint since 2011 Future projects on top of existing ones National Budget as Linked data Diana Project: Members of Parliament linked to social network analysis New portal: User customization & recommender systems
  • 34. End of presentation Acknowledgements: David Vilches, Eridan Otto, Christian Sifaqui