Towards the Integration of Research Group Website into the Web of Data

Towards the Integration of a Research Group
Website into the Web of Data

Mikel Emaldi David Buj´n Diego L´pez de Ipi˜a
a o n
{m.emaldi, dbujan, dipina}@deusto.es

Deusto Institute of Technology - DeustoTech

November 2011

Motivation Our Solution Linked Data Extension Conclusions Future Work

1 Motivation

2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work

Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data

Table of Contents

1 Motivation

2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture

3 Linked Data Extension

4 Conclusions

5 Future Work


Motivation

The desire of oﬀering our research group website’s
(http://www.morelab.deusto.es) data as Linked Data
Our web is supported by Joomla! CMS
The data is unstructured



Motivation

The desire of offering our research group website’s
(http://www.morelab.deusto.es) data as Linked Data
Our web is supported by Joomla! CMS
The data is unstructured
We chose our publications section as first attempt
Almost 100 publications
Possibility to link them to external datasets
We saw the oportunity of centralize group’s FOAF files



First Approach

First Approach

A solution based on Python web-script (mod python)
The core code of Joomla! was to be modiﬁed
Here there was a major problem:
When a security update was installed, Joomla! used to destroy
our custom code



Solution Overview

Joomla! Extension

A solution based on an Extension for Joomla!



Solution Overview

Joomla! Extension

Component



Solution Overview

Joomla! Extension

Plugin



Solution Overview

Joomla! Extension

It oﬀers a feasible solution for analyze published publications
and to generate correspondent Linked Data



Data Extraction

Joomla! Content Example

TALISMAN+: Intelligent System for Follow-Up and
Promotion of Personal Autonomy
o n e ´
David Aus´ Diego L´pez-de-Ipiã, Jos´ Bravo, Miguel Angel Valero, Francisco Fl´rez. TALISMAN+:
ın, o
Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop on
Ambient Assisted Living - IWAAL 2011. M´laga, Spain. June 2011.
a

The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to research
and demonstrate innovative solutions transferable to society which offer services and products based on
information and communication technologies in order to promote personal autonomy in prevention and
monitoring scenarios. It will solve critical interoperability problems among systems and emerging
technologies in a context where heterogeneity brings about accessibility barriers not yet overcome and
demanded by the scientific, technological or social-health settings.

Download



Data Extraction

Overview

Data is extracted throught three ways:



Data Extraction

Overview

User deﬁned Regular Expression



Data Extraction

Overview

DBLP SPARQL Endpoint



Data Extraction

Overview

DBLP SPARQL Endpoint
Google Scholar search engine



Data Extraction

Regex I

User defines a regular expression to parse its content
User has to define used ontologies and their prefixes into the
admin control panel
The regex tags are clearly understandable
The ontology properties to be mapped are tagged between {}
Every delimiter (also the {}) is identified by a
The term {dummy } can be used to ignore content



Data Extraction

Regex II

o n e ´
ın, o
a


Download



Data Extraction

Regex II

o n e ´
ın, o
a


Download

{dc : c r e a t o r , s e p ( , ) } . {dc : t i t l e }.
{ s w r c : s e r i e s }. { s w r c : l o c a t i o n }.
{dc : d a t e }. { b i b o : a b s t r a c t } Download$



Data Extraction

DBLP I

Digital Bibliography & Library Project
> 1.3 million articles
SPARQL endpoint at:
http://dblp.l3s.de/d2r/sparql/
http://dblp.l3s.de/d2r/snorql/



Data Extraction

DBLP II

DBLP SPARQL endpoint is used to search data about
publications
SELECT DISTINCT ?uri ?p ?o WHERE {?uri dc:title
“title-of-article”ˆˆ<http://www.w3.org/2001/XMLSchema#string>}

Data is enriched with our own data and saved into the RDF
store
We also link members FOAF’s to DBLP authors data
<http://www.morelab.deusto.es/resource/dipina> owl:sameAs
<http://dblp.l3s.de/d2r/resource/authors/Diego L´pez-de-Ipi˜a> ;
o n



Data Extraction

Google Scholar I

A simple way to broadly search for scholarly literature
http://scholar.google.com
It exports data in diferent formats
BibTeX
EndNote
RefMan
RefWorks
WenXiangWang



Data Extraction

Google Scholar II

The data from GS is extracted via BibTeX scrapping



Data Extraction

Google Scholar II

An HTTP request using an speciﬁc cookie to retrieve BibTeX
data



Data Extraction

Google Scholar II

BibTeX data is retrieved



Data Extraction

Google Scholar II

Mapping from BibTeX data to RDF



Data Extraction

FOAF

Every member of our group has its own FOAF ﬁle
http://www.morelab.deusto.es/resource/member-alias
Every publication is linked to its author’s URI
<http://www.morelab.deusto.es/resource/imhotep-an-approach-to-user-and-device-conscious-
mobile-applications> dc:creator
<http://www.morelab.deusto.es/resource/dipina>

This is done automatically looking for author’s nicknames



Data Extraction

Flowchart



System Architecture

Overview



System Architecture

Joseki + SDB

Joseki
A SPARQL server for Jena
Storage into RDF ﬁles and relational databases
It allows SPARQL Updates
It is private for our system



System Architecture

Joseki + SDB

Joseki
A SPARQL server for Jena
Storage into RDF ﬁles and relational databases
It allows SPARQL Updates
It is private for our system
SDB
A component of Jena
It provides:
Scalable storage
Query of RDF datasets using conventional SQL databases



System Architecture

Pubby
Pubby adds Linked Data interfaces to SPARQL endpoints
It allows content negotiation among these formats:
HTML
RDF/XML
N3



System Architecture

Snorql

An AJAXy front-end for exploring RDF SPARQL endpoints
More usable than Joseki
It is MoreLab’s public SPARQL endpoint



Admin Overview

Dataset Creation:



Admin Overview
Ontology Prefix Definition:

Regex Definition:



User Overview



Conclusions

This solution integrates our data into Web of Data easily
Provides a reusable solution
Opens the door to more extendable solutions



Future Work

Link our datasets with more external datasets
DBPedia
Geonames
RDF and SPARQL search form
Externalize linked data sources
Building the Extension modularly


Towards the Integration of Research Group Website into the Web of Data

Recomendados

Recomendados

Más contenido relacionado

Similar a Towards the Integration of Research Group Website into the Web of Data

Similar a Towards the Integration of Research Group Website into the Web of Data (20)

Último

Último (20)

Towards the Integration of Research Group Website into the Web of Data