LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
1. Creating Knowledge out of Interlinked Data
WP1:
Requirements, Design &
LOD2 Stack Prototype
Paris, 24.-25. March 2011
Helmut Nagy
Semantic Web Company, Vienna
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
2. Creating Knowledge out of Interlinked Data
WP Overview
2
LOD2 Event . 24-25.03.2011 . Page 2 http://lod2.eu
3. Creating Knowledge out of Interlinked Data
WP 1 Overview
3
LOD2 Event . 24-25.03.2011 . Page 3 http://lod2.eu
4. Creating Knowledge out of Interlinked Data
WP 1 Task Overview & Deliverables
No. Tasks / Action Outcome Lead Due
Deliverables till
1.1. Common • Prosa Use Case Description from end user PoV: Report SWC M6
Requirements problem description & requested solution(s)
Specification • Role Models: Extrapolating relevant roles for
each use case
• Documentation of technical state of the art for
each use case (tool analysis, technical
interdependencies, APIs, …)
• Analysis of available datasets & metadata assets
(structure, formats, volume, IPR, …)
1.2. State of the Art • industrial & academic publication review, Report SWC M4
Analysis standards review, standards white spots, …
1.3. Architecture & • Technical requirements for & interdependencies Report Tenforce M6
System Design of system architecture components
• Coverage of all functional & non-funtional
requirements
1.4. Early LOD2 Stack … to be specified … Software Tenforce M12
Prototype
4
LOD2 Event . 24-25.03.2011 . Page 4 http://lod2.eu
5. Creating Knowledge out of Interlinked Data
WP 1 Task Use Cases
• UC1: LOD2 for Media and Publishing (WKD, WP7)
• UC2: LOD2 for Enterprise Data Webs (EXALEAD,
WP8)
• UC3: GovData.eu – Publishing Governmental
Information as Linked Data (OKFN, WP9)
5
LOD2 Event . 24-25.03.2011 . Page 5 http://lod2.eu
6. Creating Knowledge out of Interlinked Data
WP 7: Media & Publishing Use Case – Short Description
“The application of Linked Data principles shall support
the information management in lawyer-specific
workflows. WKD clients - like attorneys - shall be
supported in their daily workflows currently managed by
AnNoText . Along these workflows the knowledge worker
has to make decisions and take actions to collect, enrich
and manage a diverse set of contents.”
6
LOD2 Event . 24-25.03.2011 . Page 6 http://lod2.eu
7. Creating Knowledge out of Interlinked Data
WP 7: Media & Publishing Use Case Scenarios
7
LOD2 Event . 24-25.03.2011 . Page 7 http://lod2.eu
8. Creating Knowledge out of Interlinked Data
WP 8: Enterprise Use Case – Short Description
“The Formal Hiring use case shows that semantic technology and
linked open data can support hiring processes without having to do
all the configurations that link services, projects or products to
resources manually. Once the target system has identified valid
candidates for a specific purpose, the legal hiring process itself
starts. This process is still knowledge intensive, but it can to some
degree be standardized and therefore Semantic Web technologies
and linked open data can support it”
8
LOD2 Event . 24-25.03.2011 . Page 8 http://lod2.eu
9. Creating Knowledge out of Interlinked Data
WP 8: Enterprise Use Case Scenarios
9
LOD2 Event . 24-25.03.2011 . Page 9 http://lod2.eu
10. Creating Knowledge out of Interlinked Data
WP 9: OGD Use Case – Short Description
“Information about European public datasets is currently scattered across many
different data catalogues, portals and websites in many different languages,
implemented using many different technologies. The kinds of information stored
about public datasets may vary from country to country, and from registry to
registry. publicdata.eu will harvest and federate this information to enable users
to search, query, process, cache and perform other automated tasks on the
data from a single place. This helps to solve the "discoverability problem" of
finding interesting data across many different government websites, at many
different levels of government, and across the many governments in Europe.
In addition to providing access to official information about datasets from public
bodies, publicdata.eu will capture (proposed) edits, annotations, comments and
uploads from the broader community of public data users. In this way,
PublicData.EU will harness the social aspect of working with data to create
opportunities for mass collaboration.”
LOD2 Event . 24-25.03.2011 10Page 10
. http://lod2.eu
11. Creating Knowledge out of Interlinked Data
WP 9: OGD Use Case Scenarios
LOD2 Event . 24-25.03.2011 11Page 11
. http://lod2.eu
12. Creating Knowledge out of Interlinked Data
WP 1: Consolidated Feature Requests
LOD2 Event . 24-25.03.2011 12Page 12
. http://lod2.eu
13. Creating Knowledge out of Interlinked Data
WP 1: Consolidated Feature Requests – Data Acquisition
FR# Title Description WP7 WP8 WP9
REQ 01 Identify data sources Monitoring for identifying new/relevant internal and external x x
sources.
REQ 02 Identify data Identify relevant data within sources. x x
REQ 03 Consume/Harvest data Provide mechanisms to grab, extract, import and store data from x x x
relevant sources.
Data may be datasets to enrich content but also datasets that can
be used as metadata for describing content or metadata sets
describing datasets in other sources.
REQ 04 Upload data Provide interfaces for adding relevant data. x
Upload of relevant datasets and adding metadata describing the
datasets.
REQ 05 Synchronise data Provide mechanism for synchronisation of data and sources. x
Monitoring changes to datasets or metadata in external sources.
Offering changes in datasets or metadata to external sources. Bi-
directional synchronisation of changes.
REQ 06 Store acquired data Provide storage functionality. x x x
Persistent storing of datasets and metadata including versioning
of changes.
LOD2 Event . 24-25.03.2011 13Page 13
. http://lod2.eu
14. Creating Knowledge out of Interlinked Data
WP 1: Consolidated Feature Requests – Editing
FR# Title Description WP7 WP8 WP9
REQ 07 Integrate data Provide mechanism for conversion/mapping to different specific x x x
formats.
Enable mapping of different metadata schemes.
REQ 08 Display data Show existing and new data available in the repository for x x x
technical manipulation.
Display available metadata for datasets and datasets
themselves.
REQ 09 Analyse data Provide mechanism for analysing newly added data and its x
values.
(inconsistencies, validation, syntax errors)
REQ 10 Edit/Update data Provide mechanism for editing and converting data and merging x x x
new with existing data.
Edit available metadata for datasets.
REQ 06 Store data Store new or updated data to the repository. x x x
REQ 05 Synchronise data Provide mechanism for synchronisation of edited data. x x x
Offering changes in metadata/datasets to original data source.
LOD2 Event . 24-25.03.2011 14Page 14
. http://lod2.eu
15. Creating Knowledge out of Interlinked Data
WP 1: Consolidated Feature Requests – Compositing & Bundling
FR# Title Description WP7 WP8 WP9
REQ 11 Search for data Provide search functionality for editorial curation of new datasets. x x x
Advanced search mechanism for metadata and datasets
(moderated search, faceted search, ...)
REQ 08 Display data Provide mechanism for displaying available data sets/documents x x x
and related meta data.
Display of search results and details (metadata, datasets)
REQ 12 Recommend data Provide functionality for data recommendations based on x x x
semantic document analysis
Recommendation of datasets based on search query, selected
datasets and personal profile.
REQ 10 Add data Add new data to a data set/document or meta data set. x x x
Edit available metadata for datasets.
REQ 14 Comment data Provide mechanism for commenting data sets according to x x x
quality and content related criteria
REQ 15 Rate data Provide mechanism for providing information on quality of data x x x
set based on different quality criteria (relevance, popularity etc.)
REQ 16 Tag data Provide mechanism for tagging data sets for specific purposes of x x x
the user
REQ 17 Link/Align data Provide mechanism for detecting and creating semantically x x x
sound connections between and within data sets.
Creating bundles of related datasets.
REQ 06 Store data Store new or updated data to the repository x x x
LOD2 Event . 24-25.03.2011 15Page 15
. http://lod2.eu
16. Creating Knowledge out of Interlinked Data
WP 1: Consolidated Feature Requests – Data Interfacing
FR# Title Description WP7 WP8 WP9
REQ 18 Publish linked data Provide mechanisms to export data (e.g. thesauri) to the lod x x
cloud
Provide metadata schema and metadata as linked data.
REQ 19 Access data Provide mechanisms to access or download data x x x
SPARQL Endpoint, APIs and Download of datasets and
metadata.
LOD2 Event . 24-25.03.2011 16Page 16
. http://lod2.eu
17. Creating Knowledge out of Interlinked Data
WP 1: Consolidated Feature Requests – Services
FR# Title Description WP7 WP8 WP9
REQ 20 Create data models Provide mechanisms to create taxonomies, thesauri or x x x
ontologies for establishing a metadata structure.
Establish a consistent, flexible metadata management.
REQ 21 Quality assessment Provide mechanisms to check semantic consistency, validity and x
representational quality of datasets/documents and meta data
(e.g. thesauri)
REQ 22 LOD monitoring Provide mechanisms to monitor usage and changes to data. x x
Watch list/monitoring for datasets, bundles and search arrows.
REQ 23 Version tracking Provide mechanisms to track changes (version) and usage x x
(history) of datasets.
Logging of changes metadata and to datasets, versioning of
metadata and datasets.
REQ 24 Visualise data Provide mechanisms for visualizing data models and data x x x
structures.
REQ 06 Store data Store data to the repository (new data, changes etc.) x x x
LOD2 Event . 24-25.03.2011 17Page 17
. http://lod2.eu
18. Creating Knowledge out of Interlinked Data
WP 1: D1.1 Common Requirements
Can be found:
•http://svn.aksw.org/lod2/D1.1
•https://grips.punkt.at/pages/viewpageattachments.action?pageId=21
891742
Contact:
•Tassilo Pellegrini (t.pellegrini@semantic-web.at
•Helmut Nagy (h.nagy@semantic-web.at)
LOD2 Event . 24-25.03.2011 17Page 17
. http://lod2.eu
19. Creating Knowledge out of Interlinked Data
Thank you for your attention!
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
Notas del editor
UCS 7.1 - Content Acquisition This use case scenario is dealing with all aspects relevant to identifying, selecting, collecting and approving relevant content for further processing. UCS 7.2 - Content Enrichment and Composition This use case scenario is dealing with the transformation of data formats and integration of data, the precise enrichment of available documents with Linked (Meta)Data (i.e. structured tagging) as well as the composition of new documents or content products by utilizing Linked Data principles. UCS 7.3 - Contextualisation and Cloud-Publishing This use case scenario is dealing with the engineering of domain-specific knowledge models (Thesauri) and the de-referencing of its concepts through LOD-Cloud sources. Additionally all aspects of publishing data to the LOD Cloud are being handled here. UCS 7.4 - Enterprise Applications and Customer Products This use case scenario is dealing with all end-user specific applications utilizing Linked Data to support internal workflows (enterprise applications) or power services or new products for third parties (customer products). UCS 7.5 - Service Innovation This use case scenario is dealing with any additional aspects relevant for service innovation based on Linked Data technology. It is meant to collect ideas for further product & service diversification based on Linked Data principles.
UCS 8.1 - Data Acquisition: The data acquisition is a combination of two acquisition processes: internal data collected from inside the enterprise information system frontier and external data coming from the web. Internal content acquisition: The internal acquisition use case scenario consists of the deployment of various enterprise components to collect and extract content from the different ERP applications of the enterprise IT system. Specific interactions are required according to the formats and protocols of the different ERP applications. For our use case, the acquisition will focus on all the systems that provide data on the human resources of a company. Different systems could contain this information starting from basic file systems hosting excel sheets files on the employees information to complex modules like HR ERP systems. External content acquisition: The external content acquisition use case scenario is defined as the process of fetching data from the web. The targeted data is to be extracted from various sources having three typologies: Structured sources where data is usually retrieved by APIs and/or SQL/SPARQL like queries. The LOD cloud is an example of these sources where data is served by API requests (REST for example), SPARQL queries from SPARQL endpoints or directly by fetching structured files (like CSV). Semi-structured sources where data can be obtained using extraction rules like XPath queries from websites having uniform presentation structure (e-commerce or news web sites for example). Unstructured sources which includes most web pages where data has to be extracted from free text, media content, etc. In the formal hiring use case, several external sources can be targeted starting from job opportunities web sites like http://monster.com . Gathering and refreshing candidates profiles from web sites like LinkedIn would also provide interesting input to the application. UCS 8.2 - Content LODification and integration The integration and LODification of the content includes the set of tools and processes that edit, filter, clean, transform, enrich and interlink the acquired content. Additional knowledge, like ontologies and taxonomies, can be required to formalize and guide the previous process. Clustering and classification techniques are additional steps that could be used to increase the efficiency of the integration process. This would be achieved by clustering data into logical units or directly relate to concepts to better organise data and prune abnormal patterns of data. UCS 8.3 - Service To Consumer (S2C) Service to consumer is the part that prepares the mashed data from the previous process to be finalized. By finalized we mean bundled, published and ready to be consumed by final users or third-tier applications. The finalization of the mashed data includes the following features: The accessibility and security strategy that defines and grants the rights of viewing, using and consuming the data. Bundling policies where various export and presentation formats are proposed to target different uses. Search service to browse data using queries. The service functionality in the integration of LOD data into a corporate application is the most important feature in the process we are describing. The benefit of mixing and mashing data could mainly be measured by the quality of the service we provide to the end users. The notion of this service relies first on an interface that proposes widgets for visualising different mashed data. In the case of hiring use case, widgets will provide the ability to browse the used taxonomies with the corresponding profiles of candidates, the job opportunities and the matching candidates. In addition to this interface, a service for exporting the mashed data will be provided. UCS 8.4 - Monetization and sales This use case scenario defines the business exploitation strategy of the final content and is discussed for reasons of completeness (like in UCS5 from the Media & Publishing use case – see above). This implies the setup of quality and support structures. This is beyond the scope of the project but shall be dealt with from a theoretical perspective.
UCS 9.1 - Data Harvesting from External Sources This use case scenario is dealing with all aspects relevant for collecting data from several catalogues and data portals in Europe (regional, national, private). UCS 9.2 - Upload of Datasets This use case is dealing with all aspects relevant for the upload of individual datasets or bulk upload of datasets directly into publicdata.eu. UCS 9.3 - Data Curation, bundling, rating and commenting This use case is dealing with all aspects of the enrichment and maintenance of datasets, supporting crowd-sourcing mechanisms to enable a flourishing community of data re-users, curators and publishers. UCS 9.4 - Search and Browse Data This use case is dealing with all aspects of establishing easy to use search and browse mechanisms making it easier for people to find datasets they are looking for and datasets that they might be interested in. UCS 9.5 - Download and Interfaces This use case is dealing with all aspects of availability of data for consumption in several formats and via several interfaces. UCS 9.6 - Portal & Future Services This use case is dealing with establishing general data portal services (help etc.) and creating new services based on existing infrastructure.