Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Vila LOD-innovacion- bib-semweb-redux
1. Linked Open Data
and innovation:
libraries and
the Semantic Web
Daniel Vila Suero
dvila@fi.upm.es
03/11/2011
Ontology Engineering Group, Universidad Politécnica de Madrid
Agradecimientos: A los miembros del OEG que han participado en
la elaboración de estas transparencias
2. Contenido
• Linked Data
• Library Linked Data
- W3C Incubator Group
- IFLA
- Stanford Manifesto
- A Bibliographic Framework for the Digital Age
• Casos de uso, herramientas y demos
2
5. Smart Web, Dumb Web
• La Web está llena de aplicaciones ―inteligentes‖
(Motores de búsqueda, recomendadores,
geolocalización, etc.)
• Sin embargo, también se dan situaciones en las que
la respuesta de la Web no parece alineada con el
estado de la tecnología
5
6. Smart Web, Dumb Web
• Problemas frecuentes (Usuario):
- Información inconsistente entre servicios aparentemente
relacionados.
- Necesidad de visitar múltiples aplicaciones para una tarea
simple
- Dificultad para encontrar información muy específica
CLAVE:
Integración de datos en la
Web en un formato estándar
• Problemas frecuentes (Desarrollador):
- Heterogeneidad de formatos
- Formatos propietarios o de difícil tratamiento
- Falta de documentación APIs
- 1 API 1 Funcionalidad 1 Forma de acceso => APIs
desconectadas
6
7. ¿Qué es la Web de Linked Data?
• Han pasado 10 años desde la visión original de la
Web Semántica.
• Hasta ahora poco ejemplos de impacto real
• Tecnologías demasiado complejas (maduras a día
de hoy)
• En 2006 aparece la iniciativa Linked Data
- Una extensión de la Web actual donde se publican y
consumen datos de acuerdo a 4 principios
(http://www.w3.org/DesignIssues/LinkedData.html)
8. Principios de Linked Data
1. Utilizar URIs para hacer referencias a cosas (recursos)
2. Usar el protocolo HTTP para publicar/recuperar recursos
http://dbpedia.org/resource/Tim_Berners-Lee
http://geo.linkeddata.es/resource/Provincia/Navarra
3. Describir datos en un formato estándar (RDF)
dbpedia:Tim_Berners-Lee rdf:type foaf:Person
foaf:surname "Berners-Lee"@en ;
foaf:givenName "Tim"@en ;
4. Enlazar con otros recursos a través de URIs
8
21. Library Linked Data is here
• Growing interest on Linked Data:
- Stanford Manifesto
- IFLA Semantic Web Special Interest Group and RDFS/OWL
models
- W3C Incubator Group
- RDA vocabularies
- European Librarians supporting Open Licensing
announcement
- LOC Bibliographic Framework Initiative
21
22. European national libraries: Open Data
• CENL (Conference of European National Librarians)
• 46 National Libraries voted to support open
licensing
• Data more accesible and reusable
• Keys:
- Innovation for app development
- Enrichment of services like Wikipedia with highly curated
data
- Generation of relationships accross datasets through LOD
22
23. Stanford Manifesto
• Manifesto for Linked Libraries - http://bit.ly/sldw-mf
1. Publishing data on the Web for discovery over
preserving it in dark archives.
2. Continuous improvement of data over waiting to
publish perfect data.
3. Semantically structured data over flat
unstructured data.
4. Use common vocabularies over rolling your own.
5. Collaboration over working alone.
6. Web standards over domain-specific standards.
7. Use of open, commonly understood licenses over
closed, local licenses.
23
24. LOC: A Bibliographic Framework for the Digital Age
• Bibliographic Framework Initiative
• 31st of October 2011
• APPROACH: Embrace the Web and Linked Data and
broadly adopted data models (RDF)
• GOAL: move the current library-technological
environment away from being a niche market unto
itself to one more readily understandable by present
and future
- data creators,
- data modelers,
- and software developers.
24
25. W3C incubator (XG) activity
• Short-lived working groups: around 1 year
• No delivery of W3C Recommendations, but ―innovative
ideas for specifications, guidelines, and applications that
are not (or not yet) clear candidates as Web standards‖
http://www.w3.org/2005/Incubator/
25
26. Library Linked Data incubator
• May 2010 – August 2011
• 51 participants
• 23 W3C member organizations
VU Amsterdam, INRIA, Library of Congress, JISC, Deutsche
Nationalbibliotek, DERI Galway, OCLC, Talis, LANL,
Helsinki University of Technology, University of Edinburgh,
Universidad Politécnica de Madrid, etc.
• Invited experts from other organizations
BnF, National Library of Latvia, German National Library of
Economics, etc.
27. W3C XG Participants
Alexander Haffner Guenther Neher Marcia Zeng
András Micsik Herbert Van De Sompel Mark van Assem
Andrew Houghton Hideaki Takeda Martin Malmsten
Anette Seiler Ikki Ohmukai Michael Hausenblas
Antoine Isaac Jeff Young Michael Panzer
Asaf Bartov Joachim Neubert Monica Duke
Bernard Vatant Jodi Schneider Nicolas Delaforge
Carlo Meghini Jon Phipps Oreste Signore
Dan Brickley Jonathan Rees Peter Murray
Daniel Vila Suero Kai Eckert Ray Denenberg
Dickson Lukose Karen Coyle Ross Singer
Ed Summers Kevin Ford Stu Weibel
Emmanuelle Bermes Kim Viljanen Thomas Baker
Felix Sasaki Kosuke Tanabe Tod Matola
Fumihiro Kato Lars Svensson Uldis Bojars
Glen Newton Laszlo Kovacs William Waites
Gordon Dunsire Marcel Ruhl Wolfgang Halb
Up-to-date list at http://www.w3.org/2000/09/dbwg/details?group=44833
28. W3C XG Mission
• To help increase global interoperability of library data
on the Web, by
- bringing together people involved in Semantic Web
activities—focusing on Linked Data—in the library
community and beyond,
- building on existing initiatives, and
- identifying collaboration tracks for the future.
28
29. W3C XG Results
• Loads of interesting discussions! See public mailing
list archive: http://lists.w3.org/Archives/Public/public-
lld/
• Final report (3 separate documents) 25/10/2011
1. Final report
2. Datasets, Value Vocabularies, and Metadata Element Sets
3. Use Cases report
• Translation into Spanish coming soon…
29
30. W3C XG Final report
• Available at
http://www.w3.org/2005/Incubator/lld/XGR-lld-
20111025/
BENEFITS
CURRENT
SITUATION
RECOMMENDATIONS
30
31. W3C XG Final report: Benefits
Researchers, students, patrons
Organizations
BENEFITS
Librarians, archivists and curators
Developers and vendors
31
32. W3C XG Final report: Benefits
• Improved discovery and
Researchers, students,
browsing of data
patrons
• Better visibility of library
resources (SEO)
Organizations
BENEFITS • Enriched (scientific)
publications
Librarians, archivists and
curators
Developers and vendors
32
33. W3C XG Final report: Benefits
• Bottom-up approach to data
Researchers, students,
publication More actors,
patrons different views
• Wider choice of vendors and
Organizations
technologies, not only ILS
BENEFITS
• + Visibility and connectivity
- infrastructure costs
Librarians, archivists and
curators
• ―The coolest thing to do to
your data will be thought by
Developers and vendors someone else‖
33
34. W3C XG Final report: Benefits
• Up-to-date resource
Researchers, students,
descriptions directly citable by
patrons catalogers thanks to
URIs+RDF
Organizations
• Reduce redundancy and
duplication
BENEFITS
• Catalogers efforts focused on
Librarians, archivists and their domain of expertise
curators
Developers and vendors
34
35. W3C XG Final report: Benefits
• Use of well-known Web
Researchers, students,
standards and protocols
patrons
• More and more generic tools,
not tied to library-specific
Organizations
formats
BENEFITS
• Welcomes a much larger
developer community
Librarians, archivists and
curators
Developers and vendors
35
36. W3C XG Final report: Current situation
Issues with traditional library data
1. Library data is not integrated with Web resources
2. Library standards are designed only for the library
community
3. Library data is expressed primarily as natural-language
4. Library and SemWeb communities use different
terminology for similar metadata concepts
5. Library technology changes depend on vendor systems
development
36
37. W3C XG Final report: Current situation
Library Linked Data available today
1. Fewer bibliographic datasets than value vocabs & el. sets
2. Variable quality and support
3. Cross-linking requires further effort and coordination
37
38. W3C XG Final report: Current situation
Right issues
1. Rights ownership is complex
2. Data rights may be considered business assets
38
39. W3C XG Final report: Current situation
Recommendations: Library leadership
1. Identify candidate data sets for early exposure
2. Foster discussion about Open Data and rights
39
40. W3C XG Final report: Current situation
Recommendations: data and sys designers
1. Design/Test user services based on LD capabilities
2. Develop policies for managing vocabs and URIs
3. Create URIs for the items in library datasets
4. Reuse and Map to existing LD vocabularies
40
41. W3C XG Final report: Current situation
Recommendations: librarians and archivists
1. Preserve LD element sets and value vocabularies
2. Apply library experience in curation and long-term
preservation to LD datasets
41
42. W3C XG Vocabs and Datasets report
• Available at
http://www.w3.org/2005/Incubator/lld/XGR-lld-
vocabdataset-20111025/
British National Bibliography,
Datasets Europeana LOD, data.bnf.fr ..
LCSH, VIAF, AGROVOC …
Value vocabularies
Element sets
42
43. W3C XG Use cases report
• Available at
http://www.w3.org/2005/Incubator/lld/XGR-lld-
usecase-20111025/
43
44. W3C XG Use cases report
• 8 Clusters
• 60 Individual use cases from XG participants and
community
• Generalized (Extracted) use cases for each cluster
• Good place to look for examples, fresh ideas, space
of innovation and research topics!
44
45. W3C XG Use cases report
Generated with TagCrowd 45
48. Chronicling America
• Historic newspapers and select digitized newspaper pages
(+2.5 million), produced by the National Digital Newspaper
Program
• From1690 to the present
• Nice example of Linked Data best practices and transparent
integration
• Linking and describing:
- DBpedia
- Dublin Core and DCMI Terms
- FRBR concepts in RDF
- GeoNames
- OAI-ORE (more about aggregations below)
- OWL
- RDA
- WorldCat
48
51. Datos.bne.es
• December 2011
• Catalog data from BNE MARC21 to RDF using IFLA
models
- Authority records: +5 million
- Bibliographic records: +8 million
• Release of the MARC2LOD tool (Open Source)
• Public announcement at the BNE:
14th December
51
53. MARC2LOD Tool
Flexible tool for transforming MARC21 records to RDF
Allows free selection of any RDFS/OWL set of terms
Easy to handle mappings
Composed of two modules:
MODULE 1: Mapping templates and report generation
MODULE 2: RDF Generation and linkage
Three main steps:
1) Mapping template generation
2) Mapping assignment by domain experts
3) RDF generation and linkage
58. map4rdf
http://oegdev.dia.fi.upm.es/projects/map4rdf/
map4rdf:
• Google maps viewer of RDF resources
• Resources with spatial information
• Extensible with Google plugins
• Used in other applications like Aemet, Goodrelations
map4rdf SPARQL
Triplestore
58
64. Visor: A tool for end user data exploration
VISOR alpha v0.11
A tool for end user data exploration
• http://visor.psi.enakting.org/
• Linked data browser from University of Southampton
• Multifaceted browsing
• Configurable for any SPARQL endpoint
• DEMO: http://visor.psi.enakting.org/visor
64