1. LINKED DATA AS A SERVICE
SEMTECHBIZ Berlin 2012
Peter Haase, Michael Schmidt
fluid Operations AG
2. fluid Operations (fluidOps)
Linked Data & Semantic Technologies Enterprise Cloud Computing
Software company founded Q1/2008 by team of serial entrepreneurs, privately
held, VC funded
Headquarters in Walldorf / Germany, SAP Partner Port
Currently 40 employees
Named “Cool Vendor for SAP 2010” by
Gartner Mar 2010
Global reseller agreement with EMC focus large
enterprise customers Apr 2010
NetApp Advantage Alliance Partner Oct 2010
3. The Potential of Linked Data
Linked Data
• Set of standards, principles for publishing, sharing
and interrelating structured knowledge
• From data silos to a Web of Data
• RDF as data model, SPARQL for querying
• Ontologies to describe the semantics
Benefits of Linked Data in the Enterprise
• Enterprise Data Integration: Semantically integrate and
interlink data scattered among different information systems
• Simplified publishing and sharing of data: Increase openness and accessibility of
Enterprise Data
• Enrichment and contextualization through interlinking: Value add by linking to
Linked Open Data
4. Everything as a Service
• Abstract from physical implementation details and location
of resources
• Regardless of geographic or organizational separation of
provider and consumer
• “In the cloud” Data as a Service
• Web based
• Virtualized Software as a Service
• On-demand
• Self-service Platform as a Service
• Scalable
• Pay as you go Infrastructure as a Service
Next generation of XaaS is centered around the power of data.
5. Data-as-a-Service
“Like all members of the "as a Service” family, DaaS is based on the
concept that the product, data in this case, can be provided on demand
to the user regardless of geographic or organizational separation of
provider and consumer.”
Source: Wikipedia
• Abstraction layer for data access
abstract the applications from the specific setup of the data
management service (such as local vs. remote, federation,
and distribution)
• Enabling automation of discovery, composition, and use of
datasets
Next generation of XaaS is centered around the power of data.
5
6. Data-as-a-Service – Beyond Data Access
• Data Markets: make it easy to find data from secondary data
sources, consume or acquire the data in a usable – and often unified –
format
• Online Visualization Services: allow users to upload data, make charts and
visualizations and publish these to an online audience
• Data Publishing Solutions: allow data owners to publish their data
collections and make them available to an online audience
• Data Aggregators: integrate, cleanse data from different sources to provide
the aggregated data as a value added service
• BI / Analytics as a Service: provide higher level analytics functionality
(statistical analysis), reporting, predictive analytics
See also: http://blog.datamarket.com/2010/10/24/data-as-a-service-market-definitions/
7. Information Workbench - Linked Data Platform
Information Workbench:
Semantics- & Linked Data-based
integration of private and public
data sources
Intelligent Data Access and
Analytics
Visual Exploration
Semantic Search
Dashboarding and Reporting
Collaboration and knowledge
management platform
Wiki-based curation &
authoring of data
Semantic Web Data Collaborative workflows
7
8. Enabling Data Access:
Virtualization of Data Sources
• Linked Data as abstraction layer for virtualized data access
across data spaces
• Linked Data principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the
standards: RDF, SPARQL
4. Include links to other URIs, so that they can discover more things.
• Enables data portability across current data silos
• Platform independent data access
8
9. Enabling Data Discovery:
Metadata about Data Sets
• Metadata about data sources essential for dynamic discovery
• Access to data registered at global registries, e.g. ckan.org, data.gov, …
• Based on metadata vocabularies (voID, DCAT)
• Sort/filter data sets by topic, license, size and many more facets to identify
relevant data
• Visually explore data sets
10. Enabling Data Composition:
Federation of Virtualized Data Sources
Application Layer
Virtualization Layer
Data Layer
SPARQL SPARQL SPARQL SPARQL
Endpoint Endpoint Endpoint Endpoint
Metadata
Registry
Data Source Data Source Data Source Data Source
See also: FedX: Optimization Techniques for Federated Query Processing on Linked Data (ISWC2011)
11. Semantic Wiki + Widgets as
Self-service Linked Data Frontend
• Semantic Wiki for linking of
unstructured and structured data
• Declarative specification of the UI
based on available pool of widgets and
declarative wiki-based syntax
• Widgets have direct access to the DB
• Type-based template mechanism
Wiki Page in Edit Mode … … and Displayed Result Page
12. Information Workbench:
Data as a Service in a Cloud Platform Architecture
Application Layer (SaaS)
Provisioning, Monitoring and Management
Virtualization Layer
Infrastructure Layer (IaaS) Data Layer (DaaS)
Netw.-Att. Storage Network Computing Resources Enterprise Data Sources Open Data Sources
13. Provisioning, Monitoring and Management
Application Layer (SaaS)
Virtualization Layer
Infrastructure Layer (IaaS) Data Layer (DaaS)
Netw.-Att. Storage Network Computing Resources Enterprise Data Sources Open Data Sources
Self-service Data Integration Self-service UI
Data Discovery
Deployment & Federation & Analytics
• Self-service deployment • On demand access to • Virtualized data • Living UI, composed
of the Information private and public access from semantics-aware
Workbench in the cloud data sources • Dynamic integration & widgets
• Pay-per-use • Dynamic Discovery federation of data • Ad hoc data
• Scalability on demand sources exploration,
visualization, analytics
14. Information Workbench – Linked Data as a Service
Application Areas
Knowledge Management in the
Life Sciences
Digital Libraries, Media and
Content Management
Intelligent Data Center
Management
15. Example:
Conference Explorer
• „Linked-Data-a-Thon“: build an
application that makes use of conference
metadata and contextualizes data with
external data sources in two weeks
• Realized with the Information Workbench
http://semtech2012.fluidops.net/
Data Sources Features
• Conference Metadata (Linked Data) • Conference
• Public bibliographic meta data schedule, timelines, hot topics
• Social Networks: • Statistics and reports
• Twitter • Background information about
• Facebook authors and publications
• LinkedIn • Link to social network profiles and
• LinkedGeoData statistics
15
16. Example: A Cloud Portal for Access to Open Data
with the Information Workbench
Goal
... using the
• Collect meta data from global data markets (LOD Cloud,
WorldBank, CKAN, …) fluid Operations
• Allow integrated search and ad hoc integration of data Technology Stack
sources from different repositories
• Link data with private/internal data sources, if desired
• Support semi-automated linking between data sets
• Provide visualization, exploration, and analytics
functionality on top of integrated data sources
Realization
• Currently running project with the Hasso Plattner Institute
(Potsdam, Germany)
• Create local repository containing data market metadata
• Use self-service technology to make services publicly
available + Information Workbench for analytics
17. Example: Linked Data in Pharma
Main Use Cases
• Integrate data from
company-internal
Search, Interrogate and Visualize, Analyze and Capture and Augment
Reason Explore Knowledge data silos
• Augment company-
Integrated data graph over all data sources
internal data with
Integ Linked Open Data
• Collaborative
knowledge
management
• Support of internal
processes (drug
development)
Private Data Sources Public Data Sources
18. Example: Dynamic Semantic Publishing
Olympics 2012 requirements
• A lot of output... Page per Athlete [10,000+], Page per country
[200+], Page per Discipline [400-500], Time coded, metadata
annotated, on demand video, 58,000 hours of content
• Almost real time statistics and live event pages with too many
web pages for too few journalists
Dynamic Semantic Publishing (DSP) architecture to automate
content aggregation
Information Workbench for DSP
• Collaborative authoring and linking of
unstructured and structured semantic data
• Ontology and instance data management
• DSP editorial workflows
• Automation of content creation and
enrichment