SlideShare a Scribd company logo
1 of 31
Download to read offline
Gabriel Dragomir

Drupal and Apache
Stanbol
SEMANTIC ANNOTATION WITH CUSTOM
VOCABULARIES
About me

• Drupal developer, trainer and consultant
• Founding member of Drupal Romania
Association
The Semantic Web
• Tim Berners Lee:
‘‘The first step is putting data on the
Web in a form that machines can
naturally understand, or converting
it to that form. This creates what I
call a Semantic Web – a Web of data
that can be processed directly or
indirectly by machines.’’
What’s the hype?
• Most organizations need to organize/analyze/

relate huge amounts of textual, unstructured,
dissipated data

• Examples:
• keyword extraction from content: annotate
abstracts

• text categorization: organize big volumes of text
based on a thesaurus

• media monitoring of tags: occurences of a specific
keyword on social media channels
Linked data

http://lod-cloud.net/
Linked data
• Project started in 2007
• Aimed at building the Web of Data by:
• identifying open access data sets
• converting them into RDF
vocabularies

• publish them as open access data
sets
Linked data ecosystem
• Linked Open Vocabularies (LOV):
http://lov.okfn.org/dataset/lov/

• Provides a conceptual map of the
vocabularies

• Various providers: libraries,
governmental actors, NGOs
Linked data ecosystem
• Where to find other data sets?
• http://www.w3.org/2001/sw/wiki/
SKOS/Datasets

• Swoogle: http://swoogle.umbc.edu/
• PoolParty: http://

vocabulary.semantic-web.at
Linked data at work!
Semantic annotation
• Creates specific metadata that enable
new ways to retrieve and aggregate
information

• Annotations are done based on a

conceptual scheme, an ontology (ex.
FOAF, DC Core)

• For more on ontologies see: http://

www.w3.org/wiki/Good_Ontologies

• The annotations build semantic
Semantic annotation
• Most common uses:
• Named Entity Linking: limited

recognizing entities of type person,
organization, place (e.g. OpenCalais)

• Entityhub Linking: annotation based on

vocabularies with no limitations of
entity types. Requires more natural
language processing prior to annotation.
Apache Stanbol on the fly
• Here comes Apache Stanbol
• A new approach:
• modular semantic analysis of documents
• processing components can be built for
virtually any language

• flexible workflows via semantic annotation
chains

• any vocabulary (Linked Data, custom) can be
used
Service oriented
architecture
• Stanbol is designed to offer service oriented
integration

• RESTful web services API returning RDF or
JSON/JSON-LD

• Each component exposes an endpoint
independently

• Open Services Gateway initiative compliant
(OSGi) via Apache Felix and Apache Sling

• Remote component management
Implementation
• OSGi layer: Apache Felix and Apache Sling
• Build environment: Apache Maven
• RDF framework: Apache Clerezza
• Triples store, reasoning engine: Apache Jena
• Indexing and semantic search: Apache Solr
• Content analysis/metadata extraction: Apache
Tika

• Natural language processing: Apache OpenNLP
Architecture
Components
• Semantic layer:
• Enhancer, EntityHub, ContentHub
• Enhancement engines: internal, 3rd party
• User interfaces
• Knowledge integration (rule sets,
reasoners)

• Storage integration
Content enhancement
• Examples:
• retrieve additional metadata for a piece of
content

• identify the language of a text
• extract entities (persons, places, organizations)
• create annotations to external sources
• use 3rd party services for named entities
recognition
Drupal meets Stanbol
• Several modules implement RDF

support allowing data transport to
Stanbol semantic annotations

• Taxonomy system allows for complex
annotation

• Fieldable taxonomy terms allow for
storage of complex semantic data
User scenarios
• Semantic indexing via Stanbol (SOLR
yard)

• Content enrichment with semantically
related information (documents,
factual data, images etc.)

• Tag as you type: dynamic annotation
of text in editors
How it works
• POST request sends content via REST API
• content is processed by an enhancement chain
• Returns JSON-LD, RDF/XML, RDF/JSON etc

JSON-LD - JavaScript Object Notation for Linked
Data a human readable and simple linked data
transport format

• for best results an enancement chain should do
language detection, tokenization, POS Tagging
prior to performing semantic annotation

• http://stanbol-yle.jelastic.planeetta.net/demo/
enhancer
Drupal integration

Source: blog.iks-project.eu
Drupal distribution: IKS
CE
• IKS CE distribution - Wolfgang Ziegler (fago),
Stéphane Corlosquet (scor)

• Components:
• Search API Stanbol
• VIE.js - semantic annotation UI
• https://drupal.org/project/iksce
• http://drupal.org/project/vie
• http://drupal.org/project/search_api_stanbol
• https://github.com/fago/stanbol-for-drupal
Search API Stanbol
• enables the indexing of Drupal

entities such as nodes, users,
taxonomy terms, files, etc. in Stanbol
EntityHub.

• data sent as RDF
• data can be mashed up with data from

other sources (Managed Sites, Remote
Sites)
VIE.js
• “Vienna IKS Editables”
• JavaScript library for

implementing decoupled Content
Management Systems and semantic
interaction in web applications.
Monolitic vs Decoupled
Content Management Systems
• Monolitic vs Decoupled Content
Management Systems

source: Henri Bergius - http://bergie.iki.fi
Demo setup
• we store Drupal entities in a SOLR index
• annotations are to be made based on:
• DBPedia - bundled with Apache Stanbol
• a custom vocabulary of terms related to
semantic web - Social Semantic Web
Thesaurus

• SemWeb is imported as a SOLR index
into Apache Stanbol
Custom vocabularies
• PoolParty Semantic Web
• 224 concepts related to semantic web
• Author: Andreas Blumauer
• http://vocabulary.semantic-web.at/
PoolPartySemanticWeb.html

• http://vocabulary.semantic-web.at/

PoolPartySemanticWeb/Drupal.html
Demo
• index Drupal entities in Apache Stanbol
• retrieve annotated entites via REST API
• annotate entities using dbpedia and
semweb indexes

• edit Drupal entities and annotate on the
fly

• retrieve linked data tag recommendations
Questions?
Contact me

• gabriel.dragomir@webikon.com
• twitter: gabidrg
Thank you!

More Related Content

What's hot

Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
ORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and ComplexityORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and ComplexityEduserv Foundation
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataFuming Shih
 
Semantic Web
Semantic WebSemantic Web
Semantic Webhardchiu
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)jottevanger
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsArmin Haller
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlPrimal Pappachan
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 

What's hot (20)

Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
ORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and ComplexityORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and Complexity
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF data
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Webofdata
WebofdataWebofdata
Webofdata
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Danbri Drupalcon Export
Danbri Drupalcon ExportDanbri Drupalcon Export
Danbri Drupalcon Export
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
LIBRIS - Linked Library Data
LIBRIS - Linked Library DataLIBRIS - Linked Library Data
LIBRIS - Linked Library Data
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 

Viewers also liked

02 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 200802 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 2008doctorado
 
Clib(20090925)
Clib(20090925)Clib(20090925)
Clib(20090925)真 岡本
 
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...AIST
 
オープンデータカフェ・セミナー@八王子 桑山
オープンデータカフェ・セミナー@八王子 桑山オープンデータカフェ・セミナー@八王子 桑山
オープンデータカフェ・セミナー@八王子 桑山Code for Hachioji
 
хуен бхMo
хуен бхMoхуен бхMo
хуен бхMobongxinh19
 
Tavant Technologies - Business Intelligence Brochure
Tavant Technologies - Business Intelligence BrochureTavant Technologies - Business Intelligence Brochure
Tavant Technologies - Business Intelligence BrochureTavant Technologies Inc.
 
219 fullbook
219 fullbook219 fullbook
219 fullbookCut Nta
 
A View on the Future of Sakai
A View on the Future of SakaiA View on the Future of Sakai
A View on the Future of SakaiCharles Severance
 
Senior Thesis Reality Tv
Senior Thesis Reality TvSenior Thesis Reality Tv
Senior Thesis Reality TvZosoManiac
 
Richard Rogers - Methods in Media
Richard Rogers - Methods in MediaRichard Rogers - Methods in Media
Richard Rogers - Methods in MediamedialabSciencesPo
 
Aprender a Convivir y estudio
Aprender a Convivir y estudioAprender a Convivir y estudio
Aprender a Convivir y estudioFernanDo CA
 
Target List of Hesper-BOT Malware
Target List of Hesper-BOT MalwareTarget List of Hesper-BOT Malware
Target List of Hesper-BOT MalwareSenad Aruc
 

Viewers also liked (20)

Безопасный двор
Безопасный дворБезопасный двор
Безопасный двор
 
02 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 200802 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 2008
 
Reki rossii
Reki rossiiReki rossii
Reki rossii
 
Clib(20090925)
Clib(20090925)Clib(20090925)
Clib(20090925)
 
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
 
Cbs executive magazine may 2010
Cbs executive magazine may 2010Cbs executive magazine may 2010
Cbs executive magazine may 2010
 
XenServer und Storage
XenServer und StorageXenServer und Storage
XenServer und Storage
 
La grammaire dl
La grammaire dlLa grammaire dl
La grammaire dl
 
オープンデータカフェ・セミナー@八王子 桑山
オープンデータカフェ・セミナー@八王子 桑山オープンデータカフェ・セミナー@八王子 桑山
オープンデータカフェ・セミナー@八王子 桑山
 
хуен бхMo
хуен бхMoхуен бхMo
хуен бхMo
 
Tavant Technologies - Business Intelligence Brochure
Tavant Technologies - Business Intelligence BrochureTavant Technologies - Business Intelligence Brochure
Tavant Technologies - Business Intelligence Brochure
 
219 fullbook
219 fullbook219 fullbook
219 fullbook
 
A View on the Future of Sakai
A View on the Future of SakaiA View on the Future of Sakai
A View on the Future of Sakai
 
Hackday Ml
Hackday MlHackday Ml
Hackday Ml
 
Senior Thesis Reality Tv
Senior Thesis Reality TvSenior Thesis Reality Tv
Senior Thesis Reality Tv
 
Cara i'rab bhs arb
Cara i'rab bhs arbCara i'rab bhs arb
Cara i'rab bhs arb
 
POEMAS DE AMOR
POEMAS DE AMORPOEMAS DE AMOR
POEMAS DE AMOR
 
Richard Rogers - Methods in Media
Richard Rogers - Methods in MediaRichard Rogers - Methods in Media
Richard Rogers - Methods in Media
 
Aprender a Convivir y estudio
Aprender a Convivir y estudioAprender a Convivir y estudio
Aprender a Convivir y estudio
 
Target List of Hesper-BOT Malware
Target List of Hesper-BOT MalwareTarget List of Hesper-BOT Malware
Target List of Hesper-BOT Malware
 

Similar to Drupal Semantic Annotation with Custom Vocabularies

If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologiesgagravarr
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebNuxeo
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebPascal-Nicolas Becker
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensStoitsis Giannis
 
Drupal status report for all staff day
Drupal status report for all staff dayDrupal status report for all staff day
Drupal status report for all staff daysbclapp
 
High Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed ContentHigh Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed ContentNicolle Morton
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017BookNet Canada
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Domingo Suarez Torres
 
Alfresco overview EDM
Alfresco overview EDMAlfresco overview EDM
Alfresco overview EDMsang nguyen
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibDavid Nzoputa Ofili
 

Similar to Drupal Semantic Annotation with Custom Vocabularies (20)

If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Apache drill
Apache drillApache drill
Apache drill
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
Drupal status report for all staff day
Drupal status report for all staff dayDrupal status report for all staff day
Drupal status report for all staff day
 
High Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed ContentHigh Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed Content
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 
Alfresco overview EDM
Alfresco overview EDMAlfresco overview EDM
Alfresco overview EDM
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Drupal Semantic Annotation with Custom Vocabularies

  • 1. Gabriel Dragomir Drupal and Apache Stanbol SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES
  • 2. About me • Drupal developer, trainer and consultant • Founding member of Drupal Romania Association
  • 3. The Semantic Web • Tim Berners Lee: ‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’
  • 4. What’s the hype? • Most organizations need to organize/analyze/ relate huge amounts of textual, unstructured, dissipated data • Examples: • keyword extraction from content: annotate abstracts • text categorization: organize big volumes of text based on a thesaurus • media monitoring of tags: occurences of a specific keyword on social media channels
  • 6. Linked data • Project started in 2007 • Aimed at building the Web of Data by: • identifying open access data sets • converting them into RDF vocabularies • publish them as open access data sets
  • 7. Linked data ecosystem • Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/ • Provides a conceptual map of the vocabularies • Various providers: libraries, governmental actors, NGOs
  • 8. Linked data ecosystem • Where to find other data sets? • http://www.w3.org/2001/sw/wiki/ SKOS/Datasets • Swoogle: http://swoogle.umbc.edu/ • PoolParty: http:// vocabulary.semantic-web.at
  • 10. Semantic annotation • Creates specific metadata that enable new ways to retrieve and aggregate information • Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core) • For more on ontologies see: http:// www.w3.org/wiki/Good_Ontologies • The annotations build semantic
  • 11. Semantic annotation • Most common uses: • Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais) • Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.
  • 12. Apache Stanbol on the fly • Here comes Apache Stanbol • A new approach: • modular semantic analysis of documents • processing components can be built for virtually any language • flexible workflows via semantic annotation chains • any vocabulary (Linked Data, custom) can be used
  • 13. Service oriented architecture • Stanbol is designed to offer service oriented integration • RESTful web services API returning RDF or JSON/JSON-LD • Each component exposes an endpoint independently • Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling • Remote component management
  • 14. Implementation • OSGi layer: Apache Felix and Apache Sling • Build environment: Apache Maven • RDF framework: Apache Clerezza • Triples store, reasoning engine: Apache Jena • Indexing and semantic search: Apache Solr • Content analysis/metadata extraction: Apache Tika • Natural language processing: Apache OpenNLP
  • 16. Components • Semantic layer: • Enhancer, EntityHub, ContentHub • Enhancement engines: internal, 3rd party • User interfaces • Knowledge integration (rule sets, reasoners) • Storage integration
  • 17. Content enhancement • Examples: • retrieve additional metadata for a piece of content • identify the language of a text • extract entities (persons, places, organizations) • create annotations to external sources • use 3rd party services for named entities recognition
  • 18. Drupal meets Stanbol • Several modules implement RDF support allowing data transport to Stanbol semantic annotations • Taxonomy system allows for complex annotation • Fieldable taxonomy terms allow for storage of complex semantic data
  • 19. User scenarios • Semantic indexing via Stanbol (SOLR yard) • Content enrichment with semantically related information (documents, factual data, images etc.) • Tag as you type: dynamic annotation of text in editors
  • 20. How it works • POST request sends content via REST API • content is processed by an enhancement chain • Returns JSON-LD, RDF/XML, RDF/JSON etc JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format • for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation • http://stanbol-yle.jelastic.planeetta.net/demo/ enhancer
  • 22. Drupal distribution: IKS CE • IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor) • Components: • Search API Stanbol • VIE.js - semantic annotation UI • https://drupal.org/project/iksce • http://drupal.org/project/vie • http://drupal.org/project/search_api_stanbol • https://github.com/fago/stanbol-for-drupal
  • 23. Search API Stanbol • enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub. • data sent as RDF • data can be mashed up with data from other sources (Managed Sites, Remote Sites)
  • 24. VIE.js • “Vienna IKS Editables” • JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.
  • 25. Monolitic vs Decoupled Content Management Systems • Monolitic vs Decoupled Content Management Systems source: Henri Bergius - http://bergie.iki.fi
  • 26. Demo setup • we store Drupal entities in a SOLR index • annotations are to be made based on: • DBPedia - bundled with Apache Stanbol • a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus • SemWeb is imported as a SOLR index into Apache Stanbol
  • 27. Custom vocabularies • PoolParty Semantic Web • 224 concepts related to semantic web • Author: Andreas Blumauer • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb.html • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb/Drupal.html
  • 28. Demo • index Drupal entities in Apache Stanbol • retrieve annotated entites via REST API • annotate entities using dbpedia and semweb indexes • edit Drupal entities and annotate on the fly • retrieve linked data tag recommendations