Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Echoes Project

170 visualizaciones

Publicado el

Presentació del projecte europeu ECHOES duta a terme el 28 de juny de 2018 a Leiden (Holanda), on el CSUC ha mostrat els objectius i principals característiques del projecte a empreses tecnològiques holandeses.

Publicado en: Tecnología
  • Sé el primero en comentar

Echoes Project

  1. 1. Empowering Communities with a Heritage Open Ecosystem 28th june 2018 ECHOES PROJECT Technological partner
  2. 2. Agenda 1. About Echoes 2. Analysis 3. Development 4. Conclusions
  3. 3. Agenda 1. About Echoes 1.1. Scope 1.2.Design principles 1.3. Interoperability network
  4. 4. Project scope ECHOES tries to • provide a modular IT architecture • based on open source • to heritage collection holders • that functions as a digital ecosystem for a broad range of user communities • allowing them to – take an active role – be able to enrich digital collections
  5. 5. Main goals OPEN Open your collections and link them to the world MODULAR Modular and extensible architecture INNOVATION New ways of searching and displaying information
  6. 6. Design principles STANDARD Use EDM data model as standard metadata schema TRIPLETS Data transformed to LOD/RDF triplets DEVELOPMENT Agile development methodologies USER User centered design MODULAR Block by block approach
  7. 7. Interoperability network Each “echoes hub” can • manage different collections • use all or only a part of the functionalities
  8. 8. Agenda 2. Analysis 2.2. Tasks 2.3. Proposal 2.4. Technical architecture
  9. 9. Analysis phase Technical architecture • Study possible technologies – Data structures – Interoperability • Component structure Study some references • Europeana • LoCloud • Catalan Research Portal Development proposal for each component • Scope and functionalities • Tools and technologies • Risk analysis
  10. 10. Proposal STANDARD EDM metadata schema as interoperability for • inputs • enrichments TECHNOLOGIES Proposed technologies and tools: • Dspace • Apache Fuseki • Mysql • Mint, Ontowiki, Hub3.. • Geonames, Dbpedia.. • Zooniverse MODULAR Four main modules • Data Sources • Enrichments • Data Lake • Data retrieval an visualization
  11. 11. Proposed architecture Vale Handen
  12. 12. Agenda 3. Development 3.1. Methodology 3.2. Data sources module 3.3. Data Lake module 3.4. Data retrieval and visualization module 3.5. Enrichment module
  13. 13. Methodology Agile development • Develop a prototype with the minimum requirements • Test the prototype • Add new features and improvements on each iteration Main goal • Developed product is better suited to the needs 19 sprints 7 releases 1 MPV
  14. 14. Project planning
  15. 15. Agenda 3. Development 3.1. Methodology 3.2. Data sources module 3.3. Data Lake module 3.4. Data retrieval and visualization module 3.5. Enrichment module
  16. 16. Data source module INPUTS Collections from different sources defined in many metadata schemas TOOLS Mapping and transformation tools to prepare data Data Source Module
  17. 17. Inputs INPUTS Collections from different sources defined in many metadata schemas ELO • 17 collection • Dublin Core, A2A, EAD, Custom metadata schema • Source OAI, files Tresoar • 32 collection • A2A • Source: OAI Gencat • 1 collection • Custom metadata schema • Source: Excel file DIBA • 10 collection • Custom metadata schema • Source: Access file Data Source Module
  18. 18. Inputs We put data from the different inputs on the system and... our planning was working on agreggated visualization What really happens... • Too much data, difficult to explore using conventional tools • Too much heterogeneous data, silos of information • Poor data quality (date formats, misspellings, different kinds of geolocations) Data Source Module
  19. 19. Inputs • We need to improve the data quality • Data profiling • Define data formats • Transform the data • Data cleansing • Data standardization • Data validation First approach Inputs (examples) → directly mapping to EDM → Data Lake → chaos Second approach Define standard mapping on each format and create validators Inputs (examples) → validator → if ok → Data Lake → no chaos :) Data Source Module
  20. 20. Tools TOOLS Mapping and transformation tools to prepare data Transform inputs to EDM • Create mapping tool for each metadata schema • Look at examples to decide mapping Metadata schemas transformed • Dublin Core • A2A • EAD • Custom from memorix • Custom Catalan metadata schema • Topx (working) Data Source Module
  21. 21. Local Data Lake Document specification describing metadata schema mapping to EDM Data Source Module
  22. 22. Deduplication inputs challenges WHERE WHO WHAT ----------------------------- Basílica WHO WHERE WHEN ----------------------------- 1882-2026? WHERE WHO WHAT ----------------------------- Basílica WHEN ----------------------------- 1882-2026? COLLECTION 1 COLLECTION 2 Desired object to Data Lake Data Source Module
  23. 23. How to use mapping tool Data Source Module http://github.com/CSUC • ECHOES 1,2
  24. 24. Mapping and validation tools graphic user interface Data Source Module
  25. 25. Agenda 3. Development 3.1. Methodology 3.2. Data sources module 3.3. Data Lake module 3.4. Data retrieval and visualization module 3.5. Enrichment module
  26. 26. Data Lake module DATA LAKE Contains data from different sources in EDM
  27. 27. Data Lake Module Data Lake stores a big amount of data Data comes from different sources All data is in the same format EDM Data Lake Module
  28. 28. EDM metadata schema Data Lake Module
  29. 29. Data Lake Analysis propose use DSpace But starting to work on it • DublinCore mapping was done ok • A2A need to store relation between data Data Lake Module
  30. 30. Technologies Study and test graph database tool • Behavior using real data • Performance tests • API Data Lake Module
  31. 31. Graph database Blazegraph database • standards-based • high-performance • Scalable Open-source Written entirely in Java Supports • Blueprints • RDF/SPARQL1.1 family of specifications Data Lake Module
  32. 32. Agenda 3. Development 3.1. Methodology 3.2. Data sources module 3.3. Data Lake module 3.4. Data retrieval and visualization module 3.5. Enrichment module
  33. 33. Data retrieval and visualization module SPARQL ENDPOINT Open your collections and link them to the world WEB PORTAL Modular and extensible architecture
  34. 34. Data retrieval and visualization module Analysis phase propose Web Portal SPQRL Endpoint (as Data Lake) (Good intregration with DSpace) But proposal for Data Lake was changed Data retrieval and visualization Module
  35. 35. Data retrieval and visualization module SPARQL ENDPOINT Open your collections and link them to the world Data retrieval and visualization Module
  36. 36. SPARQL Endpoint Requirements • Allow to export data to the semantic web • APIs to export information to web pages or widgets • Triplet Store Database • SPARQL Proposal • YASGUI suite – Query Editor YASQE – Result Set Visualizer YASR Data retrieval and visualization Module
  37. 37. SAPQL Endpoint PREFIX rdaGr2: <http://rdvocab.info/ElementsGr2/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX edm: <http://www.europeana.eu/schemas/edm/> SELECT ?gender (COUNT(?gender) AS ?Count) WHERE { ?agent a edm:Agent ; rdaGr2:gender ?gender . ?provided a edm:ProvidedCHO ; dc:contributor ?agent . ?aggregacio edm:aggregatedCHO ?provided ; edm:dataProvider ?institucio ; edm:intermediateProvider ?col } GROUP BY ?gender ?institucio ?col ORDER BY DESC(?Count) EDM AGENT GENDER http://blazegraph.pre.csuc.cat/echoes/short/Hk_GRIbxf PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX edm: <http://www.europeana.eu/schemas/edm/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT distinct ?place WHERE { ?s a edm:Place; skos:prefLabel ?place; } LIMIT 10 PLACES http://blazegraph.pre.csuc.cat/echoes/short/S1xkWRZxG Data retrieval and visualization Module
  38. 38. Data retrieval and visualization module WEB PORTAL Modular and extensible architecture Data retrieval and visualization Module
  39. 39. Web Portal Requirements • browse, access to, and search the contents • Different visualization tools • Advanced search functionalities Proposal • WordPress – CMS capabilities to manage web pages – Customizable creating plugins – Retrieve data using Blazegraph API – Visualization Data retrieval and visualization Module
  40. 40. Visualization focus Data retrieval and visualization Module
  41. 41. Echoes portal Data retrieval and visualization Module
  42. 42. Echoes portal No se puede mostrar la imagen en este momento. Data retrieval and visualization Module
  43. 43. Echoes portal Data retrieval and visualization Module
  44. 44. Data visualization PLACES EDM: Place • Metadata • Relations Source • Map TIME EDM: TimeSpan • Metadata • Relations Source • Timeline • Heat map CULTURAL OBJECTS EDM: ProvidedCHO • Metadata • Relations Source • Map PEOPLE EDM:Agent • Metadata • Relations Source • Graph Data retrieval and visualization Module
  45. 45. Places visualization Map Search • Using keywords • Allow select a region also • Filter by type Tooltip • information related to place • Show more option Download results in JSON, CSV… Data retrieval and visualization Module
  46. 46. Cultural object visualization Graph Related information showed as a graph Tooltip • Detailed information • Show more options Download results in JSON, CSV… Data retrieval and visualization Module
  47. 47. Timeline Search • Between dates • Period Tooltip • information related to place • Show more option Time span visualization Data retrieval and visualization Module
  48. 48. Time span visualization Timeline Search • Between dates Period showed under years Banner • information related to place • Show more option • Click to next Data retrieval and visualization Module
  49. 49. Time span visualization Heat map Search • Between dates On mouse over day box show number of providedCHO related and a link to show them Color darkens depending on number of occurrences Selecting a date from calendar related providedCHO are listed Data retrieval and visualization Module
  50. 50. Time span visualization Timespan Search • Between dates Show providedCHO by year • Show more options Tooltip • Detailed information Data retrieval and visualization Module
  51. 51. Agent visualization https://echoes.pre.csuc.cat/ag ents/demo/ • Using the same library used to graph the relations in the CHO details, with icons, different type of relations and pseudo- hierarchy. • Tooltip can be included Graph Graph to show agent relations in a providedCHO Tooltip • Detailed information No se puede mostrar la imagen en este momento. Data retrieval and visualization Module
  52. 52. Agent visualization Family tree Agents showed as a simple hybrid graph/tree hierarchy depending their relation Colored by gender Data retrieval and visualization Module
  53. 53. Agent visualization https://echoes.pre.csuc.cat/ag ents/demo_three/ • Information showed as a left to right tree • Additional agent details are showed Family tree Information showed as a left to right tree Additional agent details are showed Data retrieval and visualization Module
  54. 54. Agenda 3. Development 3.1. Methodology 3.2. Data sources module 3.3. Data Lake module 3.4. Data retrieval and visualization module 3.5. Enrichment module
  55. 55. Enrichments module AUTOMATIC Predefined processes include new metadata related to objects MANUAL User empowering
  56. 56. Enrichments module AUTOMATIC Predefined processes include new metadata related to objects Enrichments Module Which • Select metadata fields When • New metadata can be incorporated as a – preprocess – post process How • Reuse existing fields • Create new metadata
  57. 57. Enrichments module Possible enrichments • Place: TGN, GeoNames, Pleiades, HPN… • Agent: VIAF, ULAN, GND, Wikidata.. • TimeSpan: PeriodO, ChronOntology.. • Concepts: LCSH • Generals: Getty, Biotechnology Glossary, DBpedia, EUROVOC, Geopolical Ontology Enrichments Module
  58. 58. Enrichments module MANUAL User empowering Enrichments Module Coming soon…
  59. 59. Agenda 1. About Echoes 2. Analysis 3. Development 4. Conclusions
  60. 60. Lessons learned
  61. 61. Challenges in ECHOES development Sprint 4: A2A relations doesn’t fit in a relational database Sprint 11: Data quality vs Data quantity Sprint 14: Create unique objects using metadata from many sources
  62. 62. ECHOES architecture
  63. 63. Incoming challenges Automatic enrichments •When •How •Sources Manual enrichments •Review user proposal •Load new data in Data Lake
  64. 64. Lessons learned Modular architecture allow us • Change technologies • Add new pieces to modules • Add new functionalities
  65. 65. Lessons learned Quality Homogeneous Schema Quantity Heterogeneous Examples
  66. 66. Thanks for your attention Ús intern

×