Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Wed roman tut_open_datapub

Próximo SlideShare
Scaling up Linked Data
Scaling up Linked Data
Cargando en…3

Eche un vistazo a continuación

1 de 36 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Wed roman tut_open_datapub (20)


Más de eswcsummerschool (20)

Wed roman tut_open_datapub

  1. 1. Open Data Publication and ConsumptionAn Overview of Relevant Data Access Approaches and DaaSSolutions@ESWC Summer School, 2014 DumitruRoman, SINTEF, Norway
  2. 2. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 2
  3. 3. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 3
  4. 4. The context: Open Data •Open Data Movement: make data available (primarily government data) –Businesses and citizens can develop new ideas, services and applications –Can support (government) transparency and accountability 4 Source: McKinsey Gartner: By 2016, the use of "open data" will continue to increase —but slowly, and predominantly limited to Type A enterprises. By 2017, over 60% of government open data programs that do not effectively use open data internally, will be scaled back or discontinued. By 2020, enterprises and governments will fail to protect 75% of sensitive data and will declassify and grant broad/public access to it. Source: Garner
  5. 5. Lots of open datasets on the Web… •A large number of datasets have been published as open data in the recent years •Many kinds of data: cultural, science, finance, statistics, transport environment, … •Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON, … 5
  6. 6. …but few applications •Applications utilizing open and distributed datasets have been rather few, e.g. •Challenges include: –Lack of resources: unreliable data access –Lack of expertise: not easily available to organisations –Technical/organizational 6 Open Data Portal Datasets Applications ~ 110 000 ~ 350 ~ 50000 ~ 80 ~ 20000 ~ 350 ~ 300 ~40
  7. 7. Open data publication and access • Data publishers: complicated data publishing and maintenance process • Data consumers/developers: complicated programmatic data access • A decision which lifts a data publication burden from a data publisher will place that burden on the data access for the data consumer 7 Easy data publication Easy data access Complicated data access Complicated data publication Simplify data publication ! Simplify data access!
  8. 8. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 8
  9. 9. (Programmatic/Web-based) Data access •Traditional approaches for programmatically consuming data: ODBC, JDBC, RMI, CORBA, ... •Modern Web applications and data services rely extensively on lightweight Web service based approaches exchanging data via standard protocols (HTTP) and formats (e.g. XML, JSON, RDF, …) •Relevant approaches for programmatic access to open data –Web APIs –OData –SPARQL and Linked Data Platform (LDP) 9
  10. 10. Web APIs •Programmatic interfaces accessible through HTTP calls (e.g. GET, POST) •Data (requests/responses) typically in JSON or XML •Very popular among application developers 10 Source: Protocol: HTTP Payload: JSON/XML/… Data Consumer / Dev Data Provider Client Library App Web Service Web API
  11. 11. Web APIs -example 11 Request: GET;lon=9.58 Response payload:
  12. 12. Open Data Protocol (OData) •“ODBC for the Web” •A protocol for creating and consuming data APIs •Builds on HTTP and REST •OASIS Standard (2014), promoted by Microsoft, IBM, and SAP 12
  13. 13. OData •Principles: Metadata, Data, Querying, Editing, Operations, Vocabularies •The OData Data Model –based on the Entity Data Model (EDM) •The OData protocol: CRUD + query language •XML and JSON serialization Source: Microsoft
  14. 14. OData -requesting data examples 14 Request (entity by ID): GET serviceRoot/People('russellwhyte') Source: Response payload: Request (collections): GET serviceRoot/People Request (individual property): GET serviceRoot/Airports('KSFO')/Name
  15. 15. OData -querying data examples 15 Source: Request (filter): GET serviceRoot/People?$filter=FirstNameeq'Scott' Response payload: Filter on complex type: GET serviceRoot/Airports?$filter=contains(Location/ Address, 'San Francisco') orderby: GET serviceRoot/People('scottketchum')/Trips? $orderby=EndsAtdesc top: GET serviceRoot/People?$top=2 count: GET serviceRoot/People/$count expand: GET serviceRoot/People('keithpinckney')?$expand= Friends select: GET serviceRoot/Airports?$select=Name, IcaoCode search: GET serviceRoot/People?$search=Boise Lambda Operators: any / all GET serviceRoot/People?$filter=Emails/any(s:endswith(s, ''))
  16. 16. OData -data modification example 16 Source: Request (Create an Entity): POST serviceRoot/PeopleOData-Version: 4.0Content-Type: application/json;odata.metadata=minimalAccept: application/json { "@odata.type" : "Microsoft.OData.SampleService.Models.TripPin.Person", "UserName": "teresa", "FirstName" : "Teresa", "LastName" : "Gilbert", "Gender" : "Female", "Emails" : ["", ""],"AddressInfo" : [ { "Address" : "1 Suffolk Ln.", "City" : { "CountryRegion" : "United States", "Name" : "Boise", "Region" : "ID“ } }] } Response payload: Remove an Entity: DELETE serviceRoot/People('vincentcalabrese') Update an Entity(uses PATCH or PUT) Relationship Operations (Link to Related Entities): POST serviceRoot/People('scottketchum')/Friends/$ref… { "": "serviceRoot/People('vincentcalabrese')" }
  17. 17. SPARQL •A set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store 17 Service Description Request: GET /sparql/ Host: Response: An RDF description, using the Service Description vocabulary Protocol for RDF Request: GET /sparql/?query=[SPARQL Query] Host: Response: A SPARQL Results Document or RDF graph Update Language PREFIX foaf: <> . INSERT DATA { <> foaf:knows[ foaf:name"Dorothy" ]. } ; DELETE { ?person foaf:name?mbox} WHERE { <> foaf:knows?person . ?person foaf:name?name FILTER ( lang(?name) = "EN" ) .} Examples taken from Query Language PREFIX foaf: <> SELECT ?name (COUNT(?friend) AS ?count) WHERE { ?person foaf:name?name . ?person foaf:knows?friend . } GROUP BY ?person ?name Result(serialized in XML, JSON, CSV, TSV): Graph Store HTTP Protocol POST /rdf- graphs/service? Host: Content-Type: text/turtle @prefix foaf: <> . <> foaf:knows[ foaf:name"Dorothy" ] .
  18. 18. Linked Data Platform •Describes the use of HTTP for accessing, updating, creating and deleting resources from servers that expose data as Linked Data •Centered around LDPRs, LDPCs, membership, containment •Under development at W3C; working draft 18 LDP-BC Request: GET /c1/ Response payload: Resource Request: GET /netWorth/nw1 Response payload: LDP-DC Request: GET /netWorth/nw1/liabilities/ Response payload: Examples taken from LDP-DC Request:
  19. 19. Data Access Summary •Web APIs –Very flexible, popular with Web developers, no specific commitment to data models •OData –ER-based data model, abstract interface to datastores(focus on CRUD), perceived as vendor-pushed (strong tool support) •SPARQL and LDP –Graph data model, community-pushed, some interesting features (querying, federation, linking,…) •Though there is overlapping between the various approaches, they all aim to simplify access to distributed data sources for application developers –Which approach to choose depends on many factors, e.g. type of data, size, relationships, infrastructure, skills to support, frequency of updates, end-use scenarios, … 19
  20. 20. Outline •The context: Open Data •Data access: Web APIs, OData, SPARQL/LDP •DaaSsolutions landscape and open DaaSarchitecture 20
  21. 21. Data publication •Data access mechanisms simplify data consumption for application developers •But data needs to be provisioned to applications according to the chosen data access mechanism –And applications will always be dependent on the hosting for the data they use •Data publishers and application developers need to rely on generic Cloud platforms and build, deploy and maintain a complex Open Data software and data stack from scratch –Complicated data provisioning and maintenance process –Data-as-a-Service (DaaS) solutions are emerging to address this issue 21 “Likeallmembersofthe"asaService"(XaaS)family,DaaSisbasedontheconceptthattheproduct, datainthiscase,canbeprovidedondemandtotheuserregardlessofgeographicororganizationalseparationofproviderandconsumer.” Source:Wikipedia;
  22. 22. Relevant DaaSsolutions 22 Windows Azure Marketplace Socrata DataMarket Factual Junar PublishMyData DaPaaS …
  23. 23. Windows Azure Marketplace • A marketplace for applications and data (~170 datasets; ~700 applications) • Charging data consumers • Tools and APIs for data publishing, analytics, metadata management, account management and pricing, monitoring and billing, as well as a data portal for dataset exploration • Supports OData 23 Source: Microsoft
  24. 24. Socrata •Specific focus on Open Data •Open Data Portal: data publishing & clean-up, metadata generation, data- driven portals for data exploration and portal management •API Foundry for creating and deploying RESTfulAPIs on top of the data •Hosted data is accessible through the SocrataOpen Data API (SODA) –a RESTfulinterface for searching and reading data in XML, JSON or RDF 24 Source: Socrata
  25. 25. DataMarket •Provides statistical data from almost 100 data providers •~ 71 000 datasets •Supports embeddable visualisations of data, data export, live feeds for data updates, ability for data publishers to monetize data via the marketplace, custom data driven portals for publishers, data portal, Web API 25
  26. 26. Factual •Data for ~ 65 million local business and points of interest in 50 countries; a product database of over 650,000 products •Used to provide the option for hosting thousands of 3rd party data sets (“Community Data”) but activity has been discontinued •Data is populated by means of Web crawls, data extraction and 3rdparty data services; data model is tabular, based on taxonomy of around 400 categories •Pricing is based on a pay-per-use model •Data access is provided through a RESTfulAPI •Provides a set of tools for data management 26
  27. 27. Junar •Cloud-based Open Data platform to collect, enrich, publish and analyse open data •Data can be consumed either directly via the JunarAPI, or via various visual widgets 27
  28. 28. PublishMyData 28 •Hosted, as-a-service solution for Open and Linked Data publishing •Uses DCAT and provides data access via Web APIs, a SPARQL endpoint and raw data-dumps
  29. 29. Other relevant solutions •Comprehensive Knowledge Archive Network (CKAN) ( –web-based open source data management system for the storage and distribution of open data; datahub( •LOD2 ( –research project aimed at providing an open source, integrated software stack for managing the lifecycle of Linked Data, from data extraction, enrichment, interlinking, to maintenance; not meant to be as-a-service solution •Project Open Data( –a set of open source tools, methodologies and use cases for publishing and utilising Open Data •COMSODE ( –research project aiming to create a publication platform for Open Data called Open Data Node 29
  30. 30. DaPaaS – towards an Open Data- and Platfom-as-a-Service for Open Data • DaPaaS – research project for simplifying data publication and consumption via a Data- and Platform-as-a-Service approach 30 DaPaaS Platform Data Publisher End-Users Data Consumer Application Developer publishes open data develops and deploys applications on top published data consumes data resulting from the available applications
  31. 31. DaPaaS – Requirements for Data Publisher 31 DP-02: Data storage and querying DP-04: Data interlinking DP-03: Dataset search & exploration DP-09: Data availability DaPaaS Platform DP-05: Data cleaning & transformation DP-01: Dataset Import DP-11: Secure access to platform DP-10: User registration & profile management Data Publisher DP-08: Data scalability DP-06: Dataset bookmarking & notifications DP-07: Dataset metadata management, statistics & access policies DP-12: UI for data publisher DP-13: Data publishing methodology support
  32. 32. DaPaaS–Requirements for Application Developer 32 AD-04: Configure application deployment AD-01:Access to Data Publisher services (DP-01 –DP-13) AD-03:Develop applications in state- of-art programming languages AD-05:Deploy and monitor application AD-06:Application metadata management, statistics & access policies DaPaaS Platform AD-07:UI for application developer AD-08:Application development methodology support AD-02:Data export Application Developer
  33. 33. DaPaaS – Requirements for End-Users Data Consumer 33 DaPaaS Platform End-User Data Consumer EU-03: Datasets and applications bookmarking and notifications EU-01: User registration & profile management EU-02: Search & explore datasets and applications EU-04:Mobile and desktop GUI access EU-07: High availability of data and applications EU-05: Data export and download
  34. 34. DaPaaSPlatform Abstract High-Level Architecture 34 Data Layer UX Layer UX Services Open Data Warehouse Platform Layer Usage Monitoring Application Hosting Environment Security & Access Control Tool-supported Methodology for Data Publishing/Consumption DaaS Services PaaS Services Datasets DaaS Services DaaS Services Data-Driven Applications PaaS Services PaaS Services UX Services UX Services
  35. 35. Summary •Lots of open datasets, but few applications using them •Simplifying data publication/consumption can enable an increase in the number (and quality) of applications using open data •Various approaches emerging –For data access: Web APIs, OData, SPARQL/LDP –For data publication/provisioning: DaaSsolutions 35
  36. 36. Thank you! 36 Contact: