The document discusses integrating a research group's website data into the Web of Data. It describes motivating the project by making the website's unstructured data available as Linked Data. The proposed solution is a Joomla! extension that extracts publication data through regular expressions, DBLP queries, and Google Scholar scraping. The extension generates RDF that links publications to authors' FOAF profiles and enriches the data with information from external sources. The system architecture includes a Joomla! plugin for data extraction and an RDF store to centralize the Linked Data.
Towards the Integration of Research Group Website into the Web of Data
1. Towards the Integration of a Research Group
Website into the Web of Data
Mikel Emaldi David Buj´n Diego L´pez de Ipi˜a
a o n
{m.emaldi, dbujan, dipina}@deusto.es
Deusto Institute of Technology - DeustoTech
November 2011
2. Motivation Our Solution Linked Data Extension Conclusions Future Work
1 Motivation
2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
3. Table of Contents
1 Motivation
2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
4. Motivation Our Solution Linked Data Extension Conclusions Future Work
Motivation
The desire of offering our research group website’s
(http://www.morelab.deusto.es) data as Linked Data
Our web is supported by Joomla! CMS
The data is unstructured
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
5. Motivation Our Solution Linked Data Extension Conclusions Future Work
Motivation
The desire of offering our research group website’s
(http://www.morelab.deusto.es) data as Linked Data
Our web is supported by Joomla! CMS
The data is unstructured
We chose our publications section as first attempt
Almost 100 publications
Possibility to link them to external datasets
We saw the oportunity of centralize group’s FOAF files
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
6. Table of Contents
1 Motivation
2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
7. Motivation Our Solution Linked Data Extension Conclusions Future Work
First Approach
First Approach
A solution based on Python web-script (mod python)
The core code of Joomla! was to be modified
Here there was a major problem:
When a security update was installed, Joomla! used to destroy
our custom code
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
8. Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
9. Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
Component
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
10. Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
Plugin
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
11. Motivation Our Solution Linked Data Extension Conclusions Future Work
Solution Overview
Joomla! Extension
A solution based on an Extension for Joomla!
It offers a feasible solution for analyze published publications
and to generate correspondent Linked Data
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
12. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Joomla! Content Example
TALISMAN+: Intelligent System for Follow-Up and
Promotion of Personal Autonomy
o n e ´
David Aus´ Diego L´pez-de-Ipi˜a, Jos´ Bravo, Miguel Angel Valero, Francisco Fl´rez. TALISMAN+:
ın, o
Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop on
Ambient Assisted Living - IWAAL 2011. M´laga, Spain. June 2011.
a
The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to research
and demonstrate innovative solutions transferable to society which offer services and products based on
information and communication technologies in order to promote personal autonomy in prevention and
monitoring scenarios. It will solve critical interoperability problems among systems and emerging
technologies in a context where heterogeneity brings about accessibility barriers not yet overcome and
demanded by the scientific, technological or social-health settings.
Download
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
13. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
14. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
User defined Regular Expression
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
15. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
User defined Regular Expression
DBLP SPARQL Endpoint
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
16. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Overview
Data is extracted throught three ways:
User defined Regular Expression
DBLP SPARQL Endpoint
Google Scholar search engine
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
17. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Regex I
User defines a regular expression to parse its content
User has to define used ontologies and their prefixes into the
admin control panel
The regex tags are clearly understandable
The ontology properties to be mapped are tagged between {}
Every delimiter (also the {}) is identified by a
The term {dummy } can be used to ignore content
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
18. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Regex II
o n e ´
David Aus´ Diego L´pez-de-Ipi˜a, Jos´ Bravo, Miguel Angel Valero, Francisco Fl´rez. TALISMAN+:
ın, o
Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop on
Ambient Assisted Living - IWAAL 2011. M´laga, Spain. June 2011.
a
The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to research
and demonstrate innovative solutions transferable to society which offer services and products based on
information and communication technologies in order to promote personal autonomy in prevention and
monitoring scenarios. It will solve critical interoperability problems among systems and emerging
technologies in a context where heterogeneity brings about accessibility barriers not yet overcome and
demanded by the scientific, technological or social-health settings.
Download
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
19. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Regex II
o n e ´
David Aus´ Diego L´pez-de-Ipi˜a, Jos´ Bravo, Miguel Angel Valero, Francisco Fl´rez. TALISMAN+:
ın, o
Intelligent System for Follow-Up and Promotion of Personal Autonomy. III International Workshop on
Ambient Assisted Living - IWAAL 2011. M´laga, Spain. June 2011.
a
The TALISMAN+ project, financed by the Spanish Ministry of Science and Innovation, aims to research
and demonstrate innovative solutions transferable to society which offer services and products based on
information and communication technologies in order to promote personal autonomy in prevention and
monitoring scenarios. It will solve critical interoperability problems among systems and emerging
technologies in a context where heterogeneity brings about accessibility barriers not yet overcome and
demanded by the scientific, technological or social-health settings.
Download
{dc : c r e a t o r , s e p ( , ) } . {dc : t i t l e }.
{ s w r c : s e r i e s }. { s w r c : l o c a t i o n }.
{dc : d a t e }. { b i b o : a b s t r a c t } Download$
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
20. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
DBLP I
Digital Bibliography & Library Project
> 1.3 million articles
SPARQL endpoint at:
http://dblp.l3s.de/d2r/sparql/
http://dblp.l3s.de/d2r/snorql/
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
21. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
DBLP II
DBLP SPARQL endpoint is used to search data about
publications
SELECT DISTINCT ?uri ?p ?o WHERE {?uri dc:title
“title-of-article”ˆˆ<http://www.w3.org/2001/XMLSchema#string>}
Data is enriched with our own data and saved into the RDF
store
We also link members FOAF’s to DBLP authors data
<http://www.morelab.deusto.es/resource/dipina> owl:sameAs
<http://dblp.l3s.de/d2r/resource/authors/Diego L´pez-de-Ipi˜a> ;
o n
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
22. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar I
A simple way to broadly search for scholarly literature
http://scholar.google.com
It exports data in diferent formats
BibTeX
EndNote
RefMan
RefWorks
WenXiangWang
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
23. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
24. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
An HTTP request using an specific cookie to retrieve BibTeX
data
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
25. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
BibTeX data is retrieved
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
26. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Google Scholar II
The data from GS is extracted via BibTeX scrapping
Mapping from BibTeX data to RDF
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
27. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
FOAF
Every member of our group has its own FOAF file
http://www.morelab.deusto.es/resource/member-alias
Every publication is linked to its author’s URI
<http://www.morelab.deusto.es/resource/imhotep-an-approach-to-user-and-device-conscious-
mobile-applications> dc:creator
<http://www.morelab.deusto.es/resource/dipina>
This is done automatically looking for author’s nicknames
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
28. Motivation Our Solution Linked Data Extension Conclusions Future Work
Data Extraction
Flowchart
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
29. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
30. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
31. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
32. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
33. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
34. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Overview
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
35. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Joseki + SDB
Joseki
A SPARQL server for Jena
Storage into RDF files and relational databases
It allows SPARQL Updates
It is private for our system
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
36. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Joseki + SDB
Joseki
A SPARQL server for Jena
Storage into RDF files and relational databases
It allows SPARQL Updates
It is private for our system
SDB
A component of Jena
It provides:
Scalable storage
Query of RDF datasets using conventional SQL databases
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
37. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Pubby
Pubby adds Linked Data interfaces to SPARQL endpoints
It allows content negotiation among these formats:
HTML
RDF/XML
N3
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
38. Motivation Our Solution Linked Data Extension Conclusions Future Work
System Architecture
Snorql
An AJAXy front-end for exploring RDF SPARQL endpoints
More usable than Joseki
It is MoreLab’s public SPARQL endpoint
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
39. Table of Contents
1 Motivation
2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
40. Motivation Our Solution Linked Data Extension Conclusions Future Work
Admin Overview
Dataset Creation:
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
41. Motivation Our Solution Linked Data Extension Conclusions Future Work
Admin Overview
Ontology Prefix Definition:
Regex Definition:
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
42. Motivation Our Solution Linked Data Extension Conclusions Future Work
User Overview
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
43. Table of Contents
1 Motivation
2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
44. Motivation Our Solution Linked Data Extension Conclusions Future Work
Conclusions
This solution integrates our data into Web of Data easily
Provides a reusable solution
Opens the door to more extendable solutions
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
45. Table of Contents
1 Motivation
2 Our Solution
First Approach
Solution Overview
Data Extraction
System Architecture
3 Linked Data Extension
4 Conclusions
5 Future Work
46. Motivation Our Solution Linked Data Extension Conclusions Future Work
Future Work
Link our datasets with more external datasets
DBPedia
Geonames
RDF and SPARQL search form
Externalize linked data sources
Building the Extension modularly
Mikel Emaldi, David Buj´n, Diego L´pez de Ipi˜a
a o n DeustoTech - Internet
Towards the Integration of a Research Group Website into the Web of Data
47. Towards the Integration of a Research Group
Website into the Web of Data
Mikel Emaldi David Buj´n Diego L´pez de Ipi˜a
a o n
{m.emaldi, dbujan, dipina}@deusto.es
Deusto Institute of Technology - DeustoTech
November 2011