2. About me
• Drupal developer, trainer and consultant
• Founding member of Drupal Romania
Association
3. The Semantic Web
• Tim Berners Lee:
‘‘The first step is putting data on the
Web in a form that machines can
naturally understand, or converting
it to that form. This creates what I
call a Semantic Web – a Web of data
that can be processed directly or
indirectly by machines.’’
4. What’s the hype?
• Most organizations need to organize/analyze/
relate huge amounts of textual, unstructured,
dissipated data
• Examples:
• keyword extraction from content: annotate
abstracts
• text categorization: organize big volumes of text
based on a thesaurus
• media monitoring of tags: occurences of a specific
keyword on social media channels
6. Linked data
• Project started in 2007
• Aimed at building the Web of Data by:
• identifying open access data sets
• converting them into RDF
vocabularies
• publish them as open access data
sets
7. Linked data ecosystem
• Linked Open Vocabularies (LOV):
http://lov.okfn.org/dataset/lov/
• Provides a conceptual map of the
vocabularies
• Various providers: libraries,
governmental actors, NGOs
8. Linked data ecosystem
• Where to find other data sets?
• http://www.w3.org/2001/sw/wiki/
SKOS/Datasets
• Swoogle: http://swoogle.umbc.edu/
• PoolParty: http://
vocabulary.semantic-web.at
10. Semantic annotation
• Creates specific metadata that enable
new ways to retrieve and aggregate
information
• Annotations are done based on a
conceptual scheme, an ontology (ex.
FOAF, DC Core)
• For more on ontologies see: http://
www.w3.org/wiki/Good_Ontologies
• The annotations build semantic
11. Semantic annotation
• Most common uses:
• Named Entity Linking: limited
recognizing entities of type person,
organization, place (e.g. OpenCalais)
• Entityhub Linking: annotation based on
vocabularies with no limitations of
entity types. Requires more natural
language processing prior to annotation.
12. Apache Stanbol on the fly
• Here comes Apache Stanbol
• A new approach:
• modular semantic analysis of documents
• processing components can be built for
virtually any language
• flexible workflows via semantic annotation
chains
• any vocabulary (Linked Data, custom) can be
used
13. Service oriented
architecture
• Stanbol is designed to offer service oriented
integration
• RESTful web services API returning RDF or
JSON/JSON-LD
• Each component exposes an endpoint
independently
• Open Services Gateway initiative compliant
(OSGi) via Apache Felix and Apache Sling
• Remote component management
14. Implementation
• OSGi layer: Apache Felix and Apache Sling
• Build environment: Apache Maven
• RDF framework: Apache Clerezza
• Triples store, reasoning engine: Apache Jena
• Indexing and semantic search: Apache Solr
• Content analysis/metadata extraction: Apache
Tika
• Natural language processing: Apache OpenNLP
17. Content enhancement
• Examples:
• retrieve additional metadata for a piece of
content
• identify the language of a text
• extract entities (persons, places, organizations)
• create annotations to external sources
• use 3rd party services for named entities
recognition
18. Drupal meets Stanbol
• Several modules implement RDF
support allowing data transport to
Stanbol semantic annotations
• Taxonomy system allows for complex
annotation
• Fieldable taxonomy terms allow for
storage of complex semantic data
19. User scenarios
• Semantic indexing via Stanbol (SOLR
yard)
• Content enrichment with semantically
related information (documents,
factual data, images etc.)
• Tag as you type: dynamic annotation
of text in editors
20. How it works
• POST request sends content via REST API
• content is processed by an enhancement chain
• Returns JSON-LD, RDF/XML, RDF/JSON etc
JSON-LD - JavaScript Object Notation for Linked
Data a human readable and simple linked data
transport format
• for best results an enancement chain should do
language detection, tokenization, POS Tagging
prior to performing semantic annotation
• http://stanbol-yle.jelastic.planeetta.net/demo/
enhancer
22. Drupal distribution: IKS
CE
• IKS CE distribution - Wolfgang Ziegler (fago),
Stéphane Corlosquet (scor)
• Components:
• Search API Stanbol
• VIE.js - semantic annotation UI
• https://drupal.org/project/iksce
• http://drupal.org/project/vie
• http://drupal.org/project/search_api_stanbol
• https://github.com/fago/stanbol-for-drupal
23. Search API Stanbol
• enables the indexing of Drupal
entities such as nodes, users,
taxonomy terms, files, etc. in Stanbol
EntityHub.
• data sent as RDF
• data can be mashed up with data from
other sources (Managed Sites, Remote
Sites)
24. VIE.js
• “Vienna IKS Editables”
• JavaScript library for
implementing decoupled Content
Management Systems and semantic
interaction in web applications.
25. Monolitic vs Decoupled
Content Management Systems
• Monolitic vs Decoupled Content
Management Systems
source: Henri Bergius - http://bergie.iki.fi
26. Demo setup
• we store Drupal entities in a SOLR index
• annotations are to be made based on:
• DBPedia - bundled with Apache Stanbol
• a custom vocabulary of terms related to
semantic web - Social Semantic Web
Thesaurus
• SemWeb is imported as a SOLR index
into Apache Stanbol
27. Custom vocabularies
• PoolParty Semantic Web
• 224 concepts related to semantic web
• Author: Andreas Blumauer
• http://vocabulary.semantic-web.at/
PoolPartySemanticWeb.html
• http://vocabulary.semantic-web.at/
PoolPartySemanticWeb/Drupal.html
28. Demo
• index Drupal entities in Apache Stanbol
• retrieve annotated entites via REST API
• annotate entities using dbpedia and
semweb indexes
• edit Drupal entities and annotate on the
fly
• retrieve linked data tag recommendations