Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Linked Data and Semantic Web Application Development by Peter Haase

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 62 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Linked Data and Semantic Web Application Development by Peter Haase (20)

Anuncio

Más de Laboratory of Information Science and Semantic Technologies (12)

Más reciente (20)

Anuncio

Linked Data and Semantic Web Application Development by Peter Haase

  1. 1. Linked Data and Semantic Application Development Peter Haase Санкт-Петербург 4. December 2014
  2. 2. Who am I and What am I Talking About? A Linked Data Perspective affilia%on develops affilia%on owl:sameAs founder develops www.metaphacts.com owl:sameAs project worksOn
  3. 3. For exercises, quiz and further material visit our website: http://www.euclid-­‐project.eu EUCLID -­‐ Providing Linked Data 3 eBook @euclid_project euclidproject euclidproject Other channels: Course
  4. 4. Semantic Technologies enabling Smart Data § Not just data, not just information, but actionable insights, delivering insight and support better decisions 4 Raw Data Access Sense Making Ac%onable Insights Decision Support Data Informa%on Knowledge
  5. 5. Google Knowledge Graph 5
  6. 6. Google Knowledge Graph 6
  7. 7. Google Knowledge Graph 7
  8. 8. LinkedIn Economic Graph 8
  9. 9. Freebase § http://www.freebase.com
  10. 10. Classes and properties for Wikipedia export (infoboxes), regularly updated See http://wiki.dbpedia.org/ DBpedia
  11. 11. Linked (Open) Data • Set of standards, principles for publishing, sharing 11 and interrelating structured knowledge • Data from different knowledge domains, self-described, linked and accessible • From data silos to a Web of Data • RDF as data model, SPARQL for querying • Ontologies to describe the semantics
  12. 12. Linked Data Principles 1. Use URIs as names for things. 2. Use HTTP URIs so that users can look up those names. 3. When someone looks up a URI, provide useful informa7on, using the standards (RDF*, SPARQL). 4. Include links to other URIs, so that users can discover more things.
  13. 13. Semantics on the Web Seman%c Web Stack Berners-­‐Lee (2006) 13 Applica%on specific declara%ve-­‐knowledge Query language Basic data model Syntac%c basis Simple vocabulary (schema) language Expressive vocabulary (ontology) language Digital signatures, recommenda%ons Proof genera%on, exchange, valida%on
  14. 14. Ontologies § An ontology defines a domain of interest – … in terms of the things you talk about in the domain, their attributes, as well as relationships between them § Ontologies are used to – Share a common understanding about a domain among people and machines – Enable reuse of domain knowledge 06.12.14
  15. 15. Categories of Linked Data Applications Furthermore, Linked Data applica%ons can be classified according to the following dimensions: Dimensions Levels Descrip7on Seman%c Extrinsic technology depth Use of seman%cs on the surface of the applica%on. Intrinsic Conven%onal technologies (e.g., RDBMS) are complemented or replaced with SW equivalents. Source: M. Mar%n and S. Auer. “Categorisa%on of Seman%c Web Applica%ons” EUCLID – Building Linked Data applica%ons 15 Informa%on flow direc%on Consuming LD is retrieved from the source or via a wrapper. Producing Publishes LD (in RDF-­‐based formats). Seman%c richness Shallow Simple taxonomies, use of RDF or RDFS. Strong High level representa%on formalisms (OWL variants) Seman%c integra%on Isolated Crea%on of own vocabularies Integrated Reuse of informa%on at schema or instance level
  16. 16. Linked Data Examples 16 NYTimes hcp://data.ny%mes.com/schools/schools.html
  17. 17. Some Application Scenarios 17 BBC
  18. 18. Example: ResearchSpace Image Annota%on EUCLID – Building Linked Data applica%ons 18 • The ResearchSpace environment aims at providing a set of RDF data sets and tools to describe concepts and objects related to cultural historical research. • The tools are highly interac7ve: allow users to access the data and contribute to the data set by crea%ng RDF annota%ons. Geo Mapper Source: hcps://sites.google.com/a/researchspace.org/researchspace/
  19. 19. Example: ResearchSpace CRM Search System Search by predicates Source: Snapshot from hcps://www.youtube.com/watch?v=HCnwgq6ebAs EUCLID – Building Linked Data applica%ons 19 Faceted search
  20. 20. Some Application Scenarios 20 Linked Government Data: USA
  21. 21. Some Application Scenarios 21 Linked Government Data: UK
  22. 22. Benefits of Linked Data in the Enterprise § Enterprise Data Integra7on: Seman%cally integrate data scacered across different informa%on systems, leading to transparent, streamlined informa%on management with less redundancies and inconsistencies § Simplified publishing, sharing and reuse of data: increase openness and accessibility of enterprise data through open, standards-­‐based APIs § Enrichment and contextualiza7on through interlinking: Increase value add by linking to Linked Open Data § Improved analy7cs: enable cross-­‐organiza7on analysis, interac7ve analy7cs, and repor7ng on top of a collabora7ve plaKorm
  23. 23. Optique Case Study: Statoil Exploration Experts in geology and geophysics develop stratigraphic models of unexplored areas – Based on production and exploration data from nearby locations – Analytics on: • 1,000 TB of relational data • using diverse schemata • spread over 3,000 tables • spread over multiple individual data bases – 900 experts in Statoil Exploration – Up to 4 days for new data access queries – Assistance from IT-experts required
  24. 24. Ontology Based Data Access Complex case: information need specialized query engineer IT expert translation disparate sources Up to 80% of expert‘s %me spent on data access
  25. 25. Example Query § Find – fields together with their remaining oil – that are currently operated by Statoil and – show the types of wellbores located on this fields
  26. 26. Visual Query Formulation
  27. 27. Optique Demo Videos hcp://www.youtube.com/user/op%queproject hcp://www.op%que-­‐project.eu
  28. 28. General Architecture of Linked Data Applications 28 Presenta7on Tier Logic Tier Data Integra%on Component SPARQL Web Data accessed via APIs Endpoints Data Tier RDF/ XML Integrated Dataset (Triple Store) Interlinking Cleansing Data Access Component Linked Data EUCLID – Building Linked Data applica%ons Rela%onal Data Vocabulary Mapping Republica%on Republica%on Component Physical Wrapper SPARQL Wr. R2R Transf. LD Wrapper
  29. 29. Architectural Patterns 1. The Crawling PaPern: Crawls or loads data in advance. Data is managed in one triple store, thus it can be accessed efficiently. The disadvantage of this pacern is that the data might not be up to date. 2. The On-­‐The-­‐Fly Dereferencing PaPern: URIs are dereferenced at the moment that the app requires the data. This pacern retrieves up to date data. Performance is affected when the app must dereference many URIs. 3. The (Federated) Query PaPern: Submits complex queries to a fixed set of data sources. Enables applica%ons to work with current data directly retrieved from the sources. Finding op%mal query execu%on plans over a large number of sources is a complex problem. Data Access Data Access Cache App EUCLID – Building Linked Data applica%ons 29 App Data Access App Source: T. Heath, C. Bizer. Linked Data: Evolving the Web into a Global Data Space
  30. 30. Data Layer Data Access Component • Linked Data applica%ons may implement a Mediator-­‐ Wrapper Architecture to access heterogeneous sources: EUCLID – Building Linked Data applica%ons 30 – Wrappers are built around each data source in order to provide an unified view of the retrieved data. • The method to access the data depends on the Linked Data architectural paPern. • The factors that determine the decision of a paPern are: – Number of data sources to access – Requirement of consuming up-­‐to-­‐date data – Tolerance to high response %me – Requirement of discovering new data sources
  31. 31. Data Layer (2) Data Access Component (2) • The data access component may be implemented by using one or a combina%on of the following tools: Mechanisms Tools (Examples) Linked Data Crawlers LDspider hcps://code.google.com/p/ldspider/ Slug hcps://code.google.com/p/slug-­‐semweb-­‐crawler/ Linked Data Client Libraries Seman%c Web Client Library hcp://wifo5-­‐03.informa%k.uni-­‐ mannheim.de/bizer/ng4j/semwebclient/ The Tabulator hcp://www.w3.org/2005/ajar/tab Moriarty hcps://code.google.com/p/moriarty/ SPARQL Client Libraries Jena Seman%c Web Framework hcp://jena.apache.org/ Federated SPARQL Engines ANAPSID hcps://github.com/anapsid/anapsid FedX hcp://www.fluidops.com/fedx/ SPLENDID hcps://code.google.com/p/rdffederator/ Search Engine APIs Sindice hcp://sindice.com/developers/api Uberblic hcp://uberblic.com/ EUCLID – Building Linked Data applica%ons 31
  32. 32. Data Layer (3) Data Integration Component • Consolidates the data retrieved from heterogeneous sources. • This component may operate at: – Schema level: Performs vocabulary mappings in order to translate data into a single unified schema. Links correspond to RDFS proper%es or OWL property and class axioms. – Instance level: Performs en%ty resolu%on via owl:sameAs links. In case the data sources do not provide the links, further tools like Silk or Open Refine can be used to integrate the data. Data Integra%on Component Interlinking Cleansing EUCLID – Building Linked Data applica%ons 32 Data Access Component Vocabulary Mapping
  33. 33. Data Layer (4) Integrated Dataset • The dataset resul%ng of integrated and consolidated data can be cached in a RDF store. • There are many solu%ons to deploy triple/RDF stores, e.g.: EUCLID – Building Linked Data applica%ons 33 • bigdata (hcp://www.bigdata.com/) • OWLIM (hcp://www.ontotext.com/owlim) • Jena TDB (hcp://jena.apache.org/documenta%on/tdb/) • AllegroGraph (hcp://www.franz.com/agraph/allegrograph/) • Virtuoso Universal Server (hcp://virtuoso.openlinksw.com/) • RDF3x (hcps://code.google.com/p/rdf3x/) Integrated Dataset Republica%on Republica%on Component
  34. 34. Data Layer (5) Republication Component • Exposes as Linked Data por%ons EUCLID – Building Linked Data applica%ons 34 • There are different solu%ons to make the data accessible: • Via SPARQL endpoints (e.g., Sesame OpenRDF SPARQL Endpoint, …) • Via APIs (e.g., Linked Data API) • As RDF dumps • With the built-­‐in means of your framework/CMS (e.g., Drupal, Informa%on Workbench, …) Data Layer Integrated Dataset Republica%on Republica%on Component
  35. 35. Application and Presentation Layers • The logic layer implements sophis%cated processing according to the func%onali%es of the applica%on. This layer may include data mining components as well as reasoners that are not integrated in the data layer. • The presenta7on layer displays the informa%on to the user in various formats, including text, diagrams or other type of visualiza%on techniques. Presenta%on Layer Logic Layer EUCLID – Building Linked Data applica%ons 35
  36. 36. LINKED DATA APPLICATION DEVELOPMENT FRAMEWORKS Informa%on Workbench EUCLID – Building Linked Data applica%ons 36
  37. 37. Information Workbench • Platorm for development of linked data applica%ons Seman%c Web Data Seman%cs-­‐ & Linked Data-­‐based Integra%on of Enterprise and Open Data Sources Intelligent Data Access and Analy%cs • Visual EUCLID – Building Linked Data applica%ons 37 explora%on • Seman%c search • Dashboarding and repor%ng Collabora%on and Knowledge Management Platorm • Wiki-­‐based cura%on & authoring of data • Collabora%ve workflows Source: hcp://www.fluidops.com/informa%on-­‐workbench/
  38. 38. Information Workbench (2) Customized applica%on solu%ons Reusable UI and data integra%on components Data storage and management platorm External resources to reuse data and create mashups EUCLID – Building Linked Data applica%ons 38
  39. 39. Data Integration: Data Provider Concept Data providers support the periodic Examples: EUCLID – Building Linked Data applica%ons 39 extrac7on & integra7on from external data sources into a central repository • Living from arbitrary data formats to RDF (e.g., rela%onal, XML, CSV) • Parametrizable (e.g. connec%on informa%on, refresh interval, ..) • Built-­‐in UI for instan%a%ng providers • Intui%ve interfaces and APIs for wri%ng own, custom providers Connect to data source Convert data into RDF Extract data from source RDF R2RML XML2RDF SPARQL Store RDF in repository
  40. 40. W3C RDB2RDF • Task: Integrate data from rela%onal DBMS with Linked Data • Approach: map from rela%onal schema to seman%c vocabulary with R2RML • Publishing: two alterna%ves – – Translate SPARQL into SQL on the fly – Batch transform data into RDF, index and provide SPARQL access in a triplestore 40 Access LD Data set Integrated Data in Triplestore Interlinking Vocabulary Cleansing Mapping SPARQL Endpoint Publishing Data acquisi%on R2RML Engine EUCLID -­‐ Providing Linked Data Rela%onal DBMS
  41. 41. W3C RDB2RDF • The W3C made, last year, two recommenda%ons for mapping between rela%onal databases and RDF: – Direct mapping directly exposes data as RDF • Not allowance for vocabulary mapping • No allowance for interlinking (unless URIs used in rela%onal data) – R2RML, the RDB to RDF mapping language • Allows vocabulary mapping (subject, predicate and object maps with class op%ons) • Allows interlinking – URIs can be constructed hcp://www.w3.org/2001/sw/rdb2rdf/ EUCLID -­‐ Providing Linked Data 41
  42. 42. R2RML Class Mapping • Declera%ve mappings with an RDF-­‐based syntax: lb:Artist a rr:TriplesMap ; rr:logicalTable [rr:tableName "artist"] ; rr:subjectMap [rr:class mo:MusicArtist ; rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:musicbrainz_guid ; rr:objectMap [rr:column "gid" ; rr:datatype xsd:string]] . EUCLID -­‐ Providing Linked Data 42
  43. 43. Data Warehousing vs. Federation Warehousing / Crawling • Data is copied from the source into the warehouse • Query runs in the warehouse • Supported in IWB using data providers Federa7on • Data remains in federated DB • Query is pushed down to federated DB • Supported in IWB using SPARQL federa3on Query Warehouse Load DB DB Query Federa%on Query DB DB EUCLID – Building Linked Data applica%ons 43
  44. 44. Customizable User Interface Demo available at hcp://musicbrainz.fluidops.net Wiki page management Main view area EUCLID – Building Linked Data applica%ons 44 View selec%on toolbar Current resource Naviga%on shortcuts
  45. 45. User Interface Concept: One Page URI Resource page Graph Resource page Resource page Resource page EUCLID – Building Linked Data applica%ons 45
  46. 46. UI templates Template:… Data Driven UI: Ontology as “Structural Backbone” Template:mo:MusicAr7st Ontology (RDFS/OWL) EUCLID – Building Linked Data applica%ons 46 Resource page RDF Data Graph Resource page
  47. 47. Different Views on Every Resource Wiki View Table View Graph View Pivot View EUCLID – Building Linked Data applica%ons 47
  48. 48. CH 4 Widget-­‐Based User Interface Visualiza7on and Explora7on Analy7cs and Repor7ng Mashups with Social Media Authoring and Content Crea7on Widgets are not static and can be integrated into the UI using a Wiki-style syntax. EUCLID – Building Linked Data applica%ons 48
  49. 49. Example: Add Widgets to Wiki • {{#widget: BarChart | • query ='SELECT distinct (COUNT(?Release) AS ?COUNT) ?label WHERE { • ?? foaf:made ?Release . • ?Release rdf:type mo:Release . • ?Release dc:title ?label . • } • GROUP BY ?label • ORDER BY DESC(?COUNT) • LIMIT 10 • ' • | input = 'label' • | output = 'COUNT' • }} Example: Show top 10 released records for an ar=st EUCLID – Building Linked Data applica%ons 49
  50. 50. Music Example Page of a class: • Shows an overview of MusicAr%st instances See hcp://musicbrainz.fluidops.net/resource/mo:MusicAr%st EUCLID – Building Linked Data applica%ons 50
  51. 51. Music Example (2) Page of a class template: • Defines a layout for displaying each resource of the class EUCLID – Building Linked Data applica%ons 51 • Uses seman%c wiki syntax See hcp://musicbrainz.fluidops.net/resource/Template:mo:MusicAr%st
  52. 52. Music Example (3) Page of a class instance: • Displays the data about the resource according to the class EUCLID – Building Linked Data applica%ons 52 template See hcp://musicbrainz.fluidops.net/resource/?uri=hcp%3A%2F %2Fmusicbrainz.org%2Far%st%2Fb10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d%23_
  53. 53. Mashups with external sources • Relevant informa%on and UI elements from external sources can be incorporated in the wiki view • IWB contains mul%ple mashup widgets for popular social media sources – Twicer – Youtube – Facebook – New York Times news – LinkedIn – … {{#widget: Youtube | searchString = $SELECT ?x WHERE { ?? foaf:name ?x . }$ | asynch = 'true’ }} Template instantiation ?? = http://musicbrainz.org/artist/a3cb23fc-­‐ acd3-­‐4ce0-­‐8f36-­‐1e5aa6a18432%23_ ?x = „U2“ EUCLID – Building Linked Data applica%ons 53
  54. 54. Triple Editor Table View • Edit structured data associated with a resource • Make change, add and remove triples EUCLID – Building Linked Data applica%ons 54
  55. 55. Ontology-­‐Based Data Input Triple Editor takes into account the ontology defini%on: • Autosugges%on tool considers the domains and ranges of the proper%es Example: proper%es available for the class mo:MusicGroup are suggested automa%cally EUCLID – Building Linked Data applica%ons 55
  56. 56. Validation of User Input Valida%on uses property defini%ons in the ontology: • The property myIntegerProperty has an associated rdfs:range defini%on. • This ensures that all objects must be of XML schema type xsd:integer. EUCLID – Building Linked Data applica%ons 56
  57. 57. Use Case 3: Mobile App Templates + CSS for Systap Bigdata Russian Museum Project – Architecture and Use Cases Users IWB Frontend IWB Backend Original data sources Data Engineer Website visitor Use Case 1: Data Provisioning Museum visitor Museums and other sources • Data crawling • Data transforma7on • Data Interlinking • Data enrichment / Informa7on extrac7on • Data valida7on Cards • HTML5 mobile devices • Simplified Social networks Russian Museum Data DBpedia Subset Bri%sh Museum Data User Data IWB Wiki View • Google Glass App • QR Code recogni7on • PaPern / image recogni7on Use Case 2: Search and Visualiza7on • Base Templates for visualiza7on • Templates for external data • PivotViewer • Step-­‐by-­‐step visualiza7on • Extended Search widgets • SemFacet
  58. 58. Linked Data Applica%on for the Russian Museum Ontology Data Data Providers Templates Widgets Web Crawl, RDF Dump
  59. 59. Sample Visualization Russian Museum
  60. 60. Google Glass 60
  61. 61. Summary § Linked Data and Semantic Technologies – From data to information to knowledge – Graphs for integration of heterogeneous data in variety of data models – Ontologies for knowledge representation and interpretation of data § Linked Data applications – Publishing and consuming Linked Data – Main components and architecture § Standards-based, declarative models for all aspects of the application – RDF: common data model – OWL Ontology: conceptual domain model – R2RML: Integrating data sources – SPARQL queries: expressing informatin needs – Wiki-templates: interfaces for interacting with the data
  62. 62. Contact us! metaphacts GmbH Kautzelweg 13 69190 Walldorf Germany p +49 6227 8308660 m +49 157 50152441 e info@metaphacts.com @metaphacts 62

×