SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Open Data Publication and ConsumptionAn Overview of Relevant Data Access Approaches and DaaSSolutions@ESWC Summer School, 2014 
DumitruRoman, SINTEF, Norway 
dumitru.roman@sintef.no
Outline 
•The context: Open Data 
•Data access: Web APIs, OData, SPARQL/LDP 
•DaaSsolutions landscape and open DaaSarchitecture 
2
Outline 
•The context: Open Data 
•Data access: Web APIs, OData, SPARQL/LDP 
•DaaSsolutions landscape and open DaaSarchitecture 
3
The context: Open Data 
•Open Data Movement: make data available (primarily government data) 
–Businesses and citizens can develop new ideas, services and applications 
–Can support (government) transparency and accountability 
4 
Source: McKinsey http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information 
Gartner: 
By 2016, the use of "open data" will continue to increase —but slowly, and predominantly limited to Type A enterprises. 
By 2017, over 60% of government open data programs that do not effectively use open data internally, will be scaled back or discontinued. 
By 2020, enterprises and governments will fail to protect 75% of sensitive data and will declassify and grant broad/public access to it. 
Source: Garner http://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data_JUN+2014_v2.pdf
Lots of open datasets on the Web… 
•A large number of datasets have been published as open data in the recent years 
•Many kinds of data: cultural, science, finance, statistics, transport environment, … 
•Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON, … 
5
…but few applications 
•Applications utilizing open and distributed datasets have been rather few, e.g. 
•Challenges include: 
–Lack of resources: unreliable data access 
–Lack of expertise: not easily available to organisations 
–Technical/organizational 
6 
Open Data Portal 
Datasets 
Applications 
data.gov 
~ 110 000 
~ 350 
publicdata.eu 
~ 50000 
~ 80 
data.gov.uk 
~ 20000 
~ 350 
data.norge.no 
~ 300 
~40
Open data publication and access 
• Data publishers: complicated data publishing and maintenance 
process 
• Data consumers/developers: complicated programmatic data 
access 
• A decision which lifts a data publication burden from a data 
publisher will place that burden on the data access for the data 
consumer 
7 
Easy data 
publication 
Easy data 
access 
Complicated 
data access 
Complicated data 
publication 
Simplify data publication ! Simplify data access!
Outline 
•The context: Open Data 
•Data access: Web APIs, OData, SPARQL/LDP 
•DaaSsolutions landscape and open DaaSarchitecture 
8
(Programmatic/Web-based) Data access 
•Traditional approaches for programmatically consuming data: ODBC, JDBC, RMI, CORBA, ... 
•Modern Web applications and data services rely extensively on lightweight Web service based approaches exchanging data via standard protocols (HTTP) and formats (e.g. XML, JSON, RDF, …) 
•Relevant approaches for programmatic access to open data 
–Web APIs 
–OData 
–SPARQL and Linked Data Platform (LDP) 
9
Web APIs 
•Programmatic interfaces accessible through HTTP calls (e.g. GET, POST) 
•Data (requests/responses) typically in JSON or XML 
•Very popular among application developers 
10 
Source: http://www.programmableweb.com/ 
Protocol: HTTP 
Payload: JSON/XML/… 
Data Consumer / Dev 
Data Provider 
Client Library 
App 
Web Service 
Web API
Web APIs -example 
11 
Request: 
GET http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58 
Response payload: 
http://api.yr.no/weatherapi/locationforecast/1.9/documentation
Open Data Protocol (OData) 
•“ODBC for the Web” 
•A protocol for creating and consuming data APIs 
•Builds on HTTP and REST 
•OASIS Standard (2014), promoted by Microsoft, IBM, and SAP 
12 
http://www.odata.org/
OData 
•Principles: Metadata, Data, Querying, Editing, Operations, Vocabularies 
•The OData Data Model –based on the Entity Data Model (EDM) 
•The OData protocol: CRUD + query language 
•XML and JSON serialization 
Source: Microsoft 
http://msdn.microsoft.com/en-us/data/hh237663.aspx
OData -requesting data examples 
14 
Request (entity by ID): 
GET serviceRoot/People('russellwhyte') 
Source: http://www.odata.org/getting-started/basic-tutorial/ 
Response payload: 
Request (collections): 
GET serviceRoot/People 
Request (individual property): 
GET serviceRoot/Airports('KSFO')/Name
OData -querying data examples 
15 
Source: http://www.odata.org/getting-started/basic-tutorial/ 
Request (filter): 
GET serviceRoot/People?$filter=FirstNameeq'Scott' 
Response payload: 
Filter on complex type: 
GET serviceRoot/Airports?$filter=contains(Location/ 
Address, 'San Francisco') 
orderby: 
GET serviceRoot/People('scottketchum')/Trips? 
$orderby=EndsAtdesc 
top: 
GET serviceRoot/People?$top=2 
count: 
GET serviceRoot/People/$count 
expand: 
GET serviceRoot/People('keithpinckney')?$expand= Friends 
select: 
GET serviceRoot/Airports?$select=Name, IcaoCode 
search: 
GET serviceRoot/People?$search=Boise 
Lambda Operators: any / all 
GET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))
OData -data modification example 
16 
Source: http://www.odata.org/getting-started/basic-tutorial/ 
Request (Create an Entity): 
POST serviceRoot/PeopleOData-Version: 4.0Content-Type: application/json;odata.metadata=minimalAccept: application/json 
{ "@odata.type" : "Microsoft.OData.SampleService.Models.TripPin.Person", "UserName": "teresa", "FirstName" : "Teresa", "LastName" : "Gilbert", "Gender" : "Female", "Emails" : ["teresa@example.com", "teresa@contoso.com"],"AddressInfo" : [ { "Address" : "1 Suffolk Ln.", "City" : { "CountryRegion" : "United States", "Name" : "Boise", "Region" : "ID“ } }] } 
Response payload: 
Remove an Entity: 
DELETE serviceRoot/People('vincentcalabrese') 
Update an Entity(uses PATCH or PUT) 
Relationship Operations (Link to Related Entities): 
POST serviceRoot/People('scottketchum')/Friends/$ref… 
{ "@odata.id": "serviceRoot/People('vincentcalabrese')" }
SPARQL 
•A set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store 
17 
Service Description 
Request: 
GET /sparql/ 
Host: www.example.org 
Response: An RDF description, using the Service Description vocabulary 
Protocol for RDF 
Request: 
GET /sparql/?query=[SPARQL Query] 
Host: www.example.org 
Response: A SPARQL Results Document or RDF graph 
Update Language 
PREFIX foaf: <http://xmlns.com/foaf/0.1/> . 
INSERT DATA { <http://www.example.org/alice#me> 
foaf:knows[ foaf:name"Dorothy" ]. } ; 
DELETE { ?person foaf:name?mbox} 
WHERE { <http://www.example.org/alice#me> foaf:knows?person . 
?person foaf:name?name FILTER ( lang(?name) = "EN" ) .} 
Examples taken from http://www.w3.org/TR/sparql11-overview/ 
Query Language 
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
SELECT ?name (COUNT(?friend) AS ?count) 
WHERE { 
?person foaf:name?name . 
?person foaf:knows?friend . 
} GROUP BY ?person ?name 
Result(serialized in XML, JSON, CSV, TSV): 
Graph Store HTTP Protocol 
POST /rdf- graphs/service?graph=http%3A%2F%2Fwww.example.org%2Falice 
Host: example.org 
Content-Type: text/turtle 
@prefix foaf: <http://xmlns.com/foaf/0.1/> . 
<http://www.example.org/alice#me> foaf:knows[ foaf:name"Dorothy" ] . http://www.w3.org/TR/sparql11-overview/
Linked Data Platform 
•Describes the use of HTTP for accessing, updating, creating and deleting resources from servers that expose data as Linked Data 
•Centered around LDPRs, LDPCs, membership, containment 
•Under development at W3C; working draft 
18 
http://www.w3.org/TR/ldp/ 
LDP-BC 
Request: GET /c1/ 
Response payload: 
Resource 
Request: GET /netWorth/nw1 
Response payload: 
LDP-DC 
Request: GET /netWorth/nw1/liabilities/ 
Response payload: 
Examples taken from http://www.w3.org/TR/ldp/ 
LDP-DC 
Request:
Data Access Summary 
•Web APIs 
–Very flexible, popular with Web developers, no specific commitment to data models 
•OData 
–ER-based data model, abstract interface to datastores(focus on CRUD), perceived as vendor-pushed (strong tool support) 
•SPARQL and LDP 
–Graph data model, community-pushed, some interesting features (querying, federation, linking,…) 
•Though there is overlapping between the various approaches, they all aim to simplify access to distributed data sources for application developers 
–Which approach to choose depends on many factors, e.g. type of data, size, relationships, infrastructure, skills to support, frequency of updates, end-use scenarios, … 
19
Outline 
•The context: Open Data 
•Data access: Web APIs, OData, SPARQL/LDP 
•DaaSsolutions landscape and open DaaSarchitecture 
20
Data publication 
•Data access mechanisms simplify data consumption for application developers 
•But data needs to be provisioned to applications according to the chosen data access mechanism 
–And applications will always be dependent on the hosting for the data they use 
•Data publishers and application developers need to rely on generic Cloud platforms and build, deploy and maintain a complex Open Data software and data stack from scratch 
–Complicated data provisioning and maintenance process 
–Data-as-a-Service (DaaS) solutions are emerging to address this issue 
21 
“Likeallmembersofthe"asaService"(XaaS)family,DaaSisbasedontheconceptthattheproduct, datainthiscase,canbeprovidedondemandtotheuserregardlessofgeographicororganizationalseparationofproviderandconsumer.” 
Source:Wikipedia;https://en.wikipedia.org/wiki/DaaS
Relevant DaaSsolutions 
22 
Windows Azure Marketplace 
Socrata 
DataMarket 
Factual 
Junar 
PublishMyData 
DaPaaS 
…
Windows Azure Marketplace 
• A marketplace for applications 
and data (~170 datasets; ~700 
applications) 
• Charging data consumers 
• Tools and APIs for data 
publishing, analytics, metadata 
management, account 
management and pricing, 
monitoring and billing, as well 
as a data portal for dataset 
exploration 
• Supports OData 
23 
https://datamarket.azure.com/ 
Source: Microsoft 
http://go.microsoft.com/fwlink/?LinkID=201129&clcid=0x409
Socrata 
•Specific focus on Open Data 
•Open Data Portal: data publishing & clean-up, metadata generation, data- driven portals for data exploration and portal management 
•API Foundry for creating and deploying RESTfulAPIs on top of the data 
•Hosted data is accessible through the SocrataOpen Data API (SODA) –a RESTfulinterface for searching and reading data in XML, JSON or RDF 
24 
http://www.socrata.com/ 
Source: Socrata
DataMarket 
•Provides statistical data from almost 100 data providers 
•~ 71 000 datasets 
•Supports embeddable visualisations of data, data export, live feeds for data updates, ability for data publishers to monetize data via the marketplace, custom data driven portals for publishers, data portal, Web API 
25 
http://datamarket.com/
Factual 
•Data for ~ 65 million local business and points of interest in 50 countries; a product database of over 650,000 products 
•Used to provide the option for hosting thousands of 3rd party data sets (“Community Data”) but activity has been discontinued 
•Data is populated by means of Web crawls, data extraction and 3rdparty data services; data model is tabular, based on taxonomy of around 400 categories 
•Pricing is based on a pay-per-use model 
•Data access is provided through a RESTfulAPI 
•Provides a set of tools for data management 
26 
http://www.factual.com/
Junar 
•Cloud-based Open Data platform to collect, enrich, publish and analyse open data 
•Data can be consumed either directly via the JunarAPI, or via various visual widgets 
27 
http://www.junar.com/
PublishMyData 
28 
•Hosted, as-a-service solution for Open and Linked Data publishing 
•Uses DCAT and provides data access via Web APIs, a SPARQL endpoint and raw data-dumps 
http://www.swirrl.com/publishmydata
Other relevant solutions 
•Comprehensive Knowledge Archive Network (CKAN) (http://ckan.org/) –web-based open source data management system for the storage and distribution of open data; datahub(http://datahub.io/) 
•LOD2 (http://lod2.eu/) –research project aimed at providing an open source, integrated software stack for managing the lifecycle of Linked Data, from data extraction, enrichment, interlinking, to maintenance; not meant to be as-a-service solution 
•Project Open Data(http://project-open-data.github.io/) –a set of open source tools, methodologies and use cases for publishing and utilising Open Data 
•COMSODE (http://www.comsode.eu/) –research project aiming to create a publication platform for Open Data called Open Data Node 
29
DaPaaS – towards an Open Data- and 
Platfom-as-a-Service for Open Data 
• DaPaaS – research project for simplifying data publication and 
consumption via a Data- and Platform-as-a-Service approach 
30 
http://dapaas.eu 
DaPaaS Platform 
Data Publisher 
End-Users Data Consumer 
Application Developer 
publishes 
open data 
develops and deploys 
applications on top 
published data 
consumes data resulting 
from the available 
applications
DaPaaS – Requirements for Data Publisher 
31 
DP-02: Data 
storage and 
querying 
DP-04: Data 
interlinking 
DP-03: Dataset 
search & 
exploration 
DP-09: Data availability 
DaPaaS Platform 
DP-05: Data 
cleaning & 
transformation 
DP-01: Dataset 
Import 
DP-11: Secure 
access to platform 
DP-10: User 
registration & profile 
management 
Data 
Publisher 
DP-08: Data scalability 
DP-06: Dataset 
bookmarking & 
notifications 
DP-07: Dataset metadata 
management, statistics & 
access policies 
DP-12: UI for data 
publisher 
DP-13: Data 
publishing 
methodology support
DaPaaS–Requirements for Application Developer 
32 
AD-04: Configure application deployment 
AD-01:Access to Data Publisher services 
(DP-01 –DP-13) 
AD-03:Develop applications in state- of-art programming languages 
AD-05:Deploy and monitor application 
AD-06:Application metadata management, statistics & access policies 
DaPaaS Platform 
AD-07:UI for application developer 
AD-08:Application development methodology support 
AD-02:Data export 
Application Developer
DaPaaS – Requirements for End-Users Data 
Consumer 
33 
DaPaaS Platform 
End-User 
Data Consumer 
EU-03: Datasets and 
applications bookmarking 
and notifications 
EU-01: User 
registration & profile 
management 
EU-02: Search & 
explore datasets 
and applications 
EU-04:Mobile and 
desktop GUI access 
EU-07: High availability of 
data and applications 
EU-05: Data export and 
download
DaPaaSPlatform Abstract High-Level Architecture 
34 
Data Layer 
UX Layer 
UX Services 
Open Data Warehouse 
Platform Layer 
Usage Monitoring 
Application Hosting Environment 
Security & Access Control 
Tool-supported Methodology for 
Data Publishing/Consumption 
DaaS Services 
PaaS Services 
Datasets 
DaaS Services 
DaaS Services 
Data-Driven Applications 
PaaS Services 
PaaS Services 
UX Services 
UX Services
Summary 
•Lots of open datasets, but few applications using them 
•Simplifying data publication/consumption can enable an increase in the number (and quality) of applications using open data 
•Various approaches emerging 
–For data access: Web APIs, OData, SPARQL/LDP 
–For data publication/provisioning: DaaSsolutions 
35
Thank you! 
36 
Contact: dumitru.roman@sintef.no

Más contenido relacionado

La actualidad más candente

WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
Stefan Dietze
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
National Information Standards Organization (NISO)
 

La actualidad más candente (20)

Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) Data
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Crossref XML and tools for small publishers (EASE Conference 2018)
Crossref XML and tools for small publishers (EASE Conference 2018)Crossref XML and tools for small publishers (EASE Conference 2018)
Crossref XML and tools for small publishers (EASE Conference 2018)
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reporting
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015
 
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 

Similar a Wed roman tut_open_datapub

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
Robert Meusel
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
Dublinked .
 

Similar a Wed roman tut_open_datapub (20)

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Information Intermediaries
Information IntermediariesInformation Intermediaries
Information Intermediaries
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 
Danbri Drupalcon Export
Danbri Drupalcon ExportDanbri Drupalcon Export
Danbri Drupalcon Export
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
 

Más de eswcsummerschool

Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
eswcsummerschool
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01
eswcsummerschool
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the school
eswcsummerschool
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage data
eswcsummerschool
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddata
eswcsummerschool
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speed
eswcsummerschool
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
eswcsummerschool
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
eswcsummerschool
 
Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked data
eswcsummerschool
 

Más de eswcsummerschool (20)

Semantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projectSemantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student project
 
Syrtaki - ESWC SSchool 14 - Student project
Syrtaki  - ESWC SSchool 14 - Student projectSyrtaki  - ESWC SSchool 14 - Student project
Syrtaki - ESWC SSchool 14 - Student project
 
Keep fit (a bit) - ESWC SSchool 14 - Student project
Keep fit (a bit)  - ESWC SSchool 14 - Student projectKeep fit (a bit)  - ESWC SSchool 14 - Student project
Keep fit (a bit) - ESWC SSchool 14 - Student project
 
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projectArabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
 
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projectFIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
 
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
Personal Tours at the British Museum  - ESWC SSchool 14 - Student projectPersonal Tours at the British Museum  - ESWC SSchool 14 - Student project
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...
 
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
 
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the school
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage data
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddata
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speed
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked data
 

Wed roman tut_open_datapub

  • 1. Open Data Publication and ConsumptionAn Overview of Relevant Data Access Approaches and DaaSSolutions@ESWC Summer School, 2014 DumitruRoman, SINTEF, Norway dumitru.roman@sintef.no
  • 2. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 2
  • 3. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 3
  • 4. The context: Open Data •Open Data Movement: make data available (primarily government data) –Businesses and citizens can develop new ideas, services and applications –Can support (government) transparency and accountability 4 Source: McKinsey http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information Gartner: By 2016, the use of "open data" will continue to increase —but slowly, and predominantly limited to Type A enterprises. By 2017, over 60% of government open data programs that do not effectively use open data internally, will be scaled back or discontinued. By 2020, enterprises and governments will fail to protect 75% of sensitive data and will declassify and grant broad/public access to it. Source: Garner http://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data_JUN+2014_v2.pdf
  • 5. Lots of open datasets on the Web… •A large number of datasets have been published as open data in the recent years •Many kinds of data: cultural, science, finance, statistics, transport environment, … •Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON, … 5
  • 6. …but few applications •Applications utilizing open and distributed datasets have been rather few, e.g. •Challenges include: –Lack of resources: unreliable data access –Lack of expertise: not easily available to organisations –Technical/organizational 6 Open Data Portal Datasets Applications data.gov ~ 110 000 ~ 350 publicdata.eu ~ 50000 ~ 80 data.gov.uk ~ 20000 ~ 350 data.norge.no ~ 300 ~40
  • 7. Open data publication and access • Data publishers: complicated data publishing and maintenance process • Data consumers/developers: complicated programmatic data access • A decision which lifts a data publication burden from a data publisher will place that burden on the data access for the data consumer 7 Easy data publication Easy data access Complicated data access Complicated data publication Simplify data publication ! Simplify data access!
  • 8. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 8
  • 9. (Programmatic/Web-based) Data access •Traditional approaches for programmatically consuming data: ODBC, JDBC, RMI, CORBA, ... •Modern Web applications and data services rely extensively on lightweight Web service based approaches exchanging data via standard protocols (HTTP) and formats (e.g. XML, JSON, RDF, …) •Relevant approaches for programmatic access to open data –Web APIs –OData –SPARQL and Linked Data Platform (LDP) 9
  • 10. Web APIs •Programmatic interfaces accessible through HTTP calls (e.g. GET, POST) •Data (requests/responses) typically in JSON or XML •Very popular among application developers 10 Source: http://www.programmableweb.com/ Protocol: HTTP Payload: JSON/XML/… Data Consumer / Dev Data Provider Client Library App Web Service Web API
  • 11. Web APIs -example 11 Request: GET http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58 Response payload: http://api.yr.no/weatherapi/locationforecast/1.9/documentation
  • 12. Open Data Protocol (OData) •“ODBC for the Web” •A protocol for creating and consuming data APIs •Builds on HTTP and REST •OASIS Standard (2014), promoted by Microsoft, IBM, and SAP 12 http://www.odata.org/
  • 13. OData •Principles: Metadata, Data, Querying, Editing, Operations, Vocabularies •The OData Data Model –based on the Entity Data Model (EDM) •The OData protocol: CRUD + query language •XML and JSON serialization Source: Microsoft http://msdn.microsoft.com/en-us/data/hh237663.aspx
  • 14. OData -requesting data examples 14 Request (entity by ID): GET serviceRoot/People('russellwhyte') Source: http://www.odata.org/getting-started/basic-tutorial/ Response payload: Request (collections): GET serviceRoot/People Request (individual property): GET serviceRoot/Airports('KSFO')/Name
  • 15. OData -querying data examples 15 Source: http://www.odata.org/getting-started/basic-tutorial/ Request (filter): GET serviceRoot/People?$filter=FirstNameeq'Scott' Response payload: Filter on complex type: GET serviceRoot/Airports?$filter=contains(Location/ Address, 'San Francisco') orderby: GET serviceRoot/People('scottketchum')/Trips? $orderby=EndsAtdesc top: GET serviceRoot/People?$top=2 count: GET serviceRoot/People/$count expand: GET serviceRoot/People('keithpinckney')?$expand= Friends select: GET serviceRoot/Airports?$select=Name, IcaoCode search: GET serviceRoot/People?$search=Boise Lambda Operators: any / all GET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))
  • 16. OData -data modification example 16 Source: http://www.odata.org/getting-started/basic-tutorial/ Request (Create an Entity): POST serviceRoot/PeopleOData-Version: 4.0Content-Type: application/json;odata.metadata=minimalAccept: application/json { "@odata.type" : "Microsoft.OData.SampleService.Models.TripPin.Person", "UserName": "teresa", "FirstName" : "Teresa", "LastName" : "Gilbert", "Gender" : "Female", "Emails" : ["teresa@example.com", "teresa@contoso.com"],"AddressInfo" : [ { "Address" : "1 Suffolk Ln.", "City" : { "CountryRegion" : "United States", "Name" : "Boise", "Region" : "ID“ } }] } Response payload: Remove an Entity: DELETE serviceRoot/People('vincentcalabrese') Update an Entity(uses PATCH or PUT) Relationship Operations (Link to Related Entities): POST serviceRoot/People('scottketchum')/Friends/$ref… { "@odata.id": "serviceRoot/People('vincentcalabrese')" }
  • 17. SPARQL •A set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store 17 Service Description Request: GET /sparql/ Host: www.example.org Response: An RDF description, using the Service Description vocabulary Protocol for RDF Request: GET /sparql/?query=[SPARQL Query] Host: www.example.org Response: A SPARQL Results Document or RDF graph Update Language PREFIX foaf: <http://xmlns.com/foaf/0.1/> . INSERT DATA { <http://www.example.org/alice#me> foaf:knows[ foaf:name"Dorothy" ]. } ; DELETE { ?person foaf:name?mbox} WHERE { <http://www.example.org/alice#me> foaf:knows?person . ?person foaf:name?name FILTER ( lang(?name) = "EN" ) .} Examples taken from http://www.w3.org/TR/sparql11-overview/ Query Language PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name (COUNT(?friend) AS ?count) WHERE { ?person foaf:name?name . ?person foaf:knows?friend . } GROUP BY ?person ?name Result(serialized in XML, JSON, CSV, TSV): Graph Store HTTP Protocol POST /rdf- graphs/service?graph=http%3A%2F%2Fwww.example.org%2Falice Host: example.org Content-Type: text/turtle @prefix foaf: <http://xmlns.com/foaf/0.1/> . <http://www.example.org/alice#me> foaf:knows[ foaf:name"Dorothy" ] . http://www.w3.org/TR/sparql11-overview/
  • 18. Linked Data Platform •Describes the use of HTTP for accessing, updating, creating and deleting resources from servers that expose data as Linked Data •Centered around LDPRs, LDPCs, membership, containment •Under development at W3C; working draft 18 http://www.w3.org/TR/ldp/ LDP-BC Request: GET /c1/ Response payload: Resource Request: GET /netWorth/nw1 Response payload: LDP-DC Request: GET /netWorth/nw1/liabilities/ Response payload: Examples taken from http://www.w3.org/TR/ldp/ LDP-DC Request:
  • 19. Data Access Summary •Web APIs –Very flexible, popular with Web developers, no specific commitment to data models •OData –ER-based data model, abstract interface to datastores(focus on CRUD), perceived as vendor-pushed (strong tool support) •SPARQL and LDP –Graph data model, community-pushed, some interesting features (querying, federation, linking,…) •Though there is overlapping between the various approaches, they all aim to simplify access to distributed data sources for application developers –Which approach to choose depends on many factors, e.g. type of data, size, relationships, infrastructure, skills to support, frequency of updates, end-use scenarios, … 19
  • 20. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 20
  • 21. Data publication •Data access mechanisms simplify data consumption for application developers •But data needs to be provisioned to applications according to the chosen data access mechanism –And applications will always be dependent on the hosting for the data they use •Data publishers and application developers need to rely on generic Cloud platforms and build, deploy and maintain a complex Open Data software and data stack from scratch –Complicated data provisioning and maintenance process –Data-as-a-Service (DaaS) solutions are emerging to address this issue 21 “Likeallmembersofthe"asaService"(XaaS)family,DaaSisbasedontheconceptthattheproduct, datainthiscase,canbeprovidedondemandtotheuserregardlessofgeographicororganizationalseparationofproviderandconsumer.” Source:Wikipedia;https://en.wikipedia.org/wiki/DaaS
  • 22. Relevant DaaSsolutions 22 Windows Azure Marketplace Socrata DataMarket Factual Junar PublishMyData DaPaaS …
  • 23. Windows Azure Marketplace • A marketplace for applications and data (~170 datasets; ~700 applications) • Charging data consumers • Tools and APIs for data publishing, analytics, metadata management, account management and pricing, monitoring and billing, as well as a data portal for dataset exploration • Supports OData 23 https://datamarket.azure.com/ Source: Microsoft http://go.microsoft.com/fwlink/?LinkID=201129&clcid=0x409
  • 24. Socrata •Specific focus on Open Data •Open Data Portal: data publishing & clean-up, metadata generation, data- driven portals for data exploration and portal management •API Foundry for creating and deploying RESTfulAPIs on top of the data •Hosted data is accessible through the SocrataOpen Data API (SODA) –a RESTfulinterface for searching and reading data in XML, JSON or RDF 24 http://www.socrata.com/ Source: Socrata
  • 25. DataMarket •Provides statistical data from almost 100 data providers •~ 71 000 datasets •Supports embeddable visualisations of data, data export, live feeds for data updates, ability for data publishers to monetize data via the marketplace, custom data driven portals for publishers, data portal, Web API 25 http://datamarket.com/
  • 26. Factual •Data for ~ 65 million local business and points of interest in 50 countries; a product database of over 650,000 products •Used to provide the option for hosting thousands of 3rd party data sets (“Community Data”) but activity has been discontinued •Data is populated by means of Web crawls, data extraction and 3rdparty data services; data model is tabular, based on taxonomy of around 400 categories •Pricing is based on a pay-per-use model •Data access is provided through a RESTfulAPI •Provides a set of tools for data management 26 http://www.factual.com/
  • 27. Junar •Cloud-based Open Data platform to collect, enrich, publish and analyse open data •Data can be consumed either directly via the JunarAPI, or via various visual widgets 27 http://www.junar.com/
  • 28. PublishMyData 28 •Hosted, as-a-service solution for Open and Linked Data publishing •Uses DCAT and provides data access via Web APIs, a SPARQL endpoint and raw data-dumps http://www.swirrl.com/publishmydata
  • 29. Other relevant solutions •Comprehensive Knowledge Archive Network (CKAN) (http://ckan.org/) –web-based open source data management system for the storage and distribution of open data; datahub(http://datahub.io/) •LOD2 (http://lod2.eu/) –research project aimed at providing an open source, integrated software stack for managing the lifecycle of Linked Data, from data extraction, enrichment, interlinking, to maintenance; not meant to be as-a-service solution •Project Open Data(http://project-open-data.github.io/) –a set of open source tools, methodologies and use cases for publishing and utilising Open Data •COMSODE (http://www.comsode.eu/) –research project aiming to create a publication platform for Open Data called Open Data Node 29
  • 30. DaPaaS – towards an Open Data- and Platfom-as-a-Service for Open Data • DaPaaS – research project for simplifying data publication and consumption via a Data- and Platform-as-a-Service approach 30 http://dapaas.eu DaPaaS Platform Data Publisher End-Users Data Consumer Application Developer publishes open data develops and deploys applications on top published data consumes data resulting from the available applications
  • 31. DaPaaS – Requirements for Data Publisher 31 DP-02: Data storage and querying DP-04: Data interlinking DP-03: Dataset search & exploration DP-09: Data availability DaPaaS Platform DP-05: Data cleaning & transformation DP-01: Dataset Import DP-11: Secure access to platform DP-10: User registration & profile management Data Publisher DP-08: Data scalability DP-06: Dataset bookmarking & notifications DP-07: Dataset metadata management, statistics & access policies DP-12: UI for data publisher DP-13: Data publishing methodology support
  • 32. DaPaaS–Requirements for Application Developer 32 AD-04: Configure application deployment AD-01:Access to Data Publisher services (DP-01 –DP-13) AD-03:Develop applications in state- of-art programming languages AD-05:Deploy and monitor application AD-06:Application metadata management, statistics & access policies DaPaaS Platform AD-07:UI for application developer AD-08:Application development methodology support AD-02:Data export Application Developer
  • 33. DaPaaS – Requirements for End-Users Data Consumer 33 DaPaaS Platform End-User Data Consumer EU-03: Datasets and applications bookmarking and notifications EU-01: User registration & profile management EU-02: Search & explore datasets and applications EU-04:Mobile and desktop GUI access EU-07: High availability of data and applications EU-05: Data export and download
  • 34. DaPaaSPlatform Abstract High-Level Architecture 34 Data Layer UX Layer UX Services Open Data Warehouse Platform Layer Usage Monitoring Application Hosting Environment Security & Access Control Tool-supported Methodology for Data Publishing/Consumption DaaS Services PaaS Services Datasets DaaS Services DaaS Services Data-Driven Applications PaaS Services PaaS Services UX Services UX Services
  • 35. Summary •Lots of open datasets, but few applications using them •Simplifying data publication/consumption can enable an increase in the number (and quality) of applications using open data •Various approaches emerging –For data access: Web APIs, OData, SPARQL/LDP –For data publication/provisioning: DaaSsolutions 35
  • 36. Thank you! 36 Contact: dumitru.roman@sintef.no