This document presents the LEAPS framework for exploiting semantic web technologies and linked data for the algal biomass domain. It discusses modeling algal biomass knowledge, lifting XML datasets to linked data, the system architecture including triple stores and SPARQL endpoints, and querying the linked algal biomass data. The goal is to enable screening of data for promising biomass production sites and provide a knowledge base for further planning in the algal biomass community.
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
Realising the Potential of Algal Biomass Production through Semantic Web and Linked data
1. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Realising the Potential of Algal Biomass
Production through Semantic Web and
Linked data
The LEAPS Framework
Monika Solanki
Knowledge Based Engineering Lab
Birmingham City University, UK
Joint work with
Johannes Skarka
Karlsruhe Institute of Technology, ITAS
2. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Outline
1 Motivation
2 Modelling Algal Biomass Knowledge
3 Lifting XML datasets to Linked data
4 System Architecture
5 Querying Linked Algal Biomass Data
6 Conclusion and Future work
4. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Algal biomass as biofuels
Extensive research* is being undertaken in the search and
production of naturally viable and sustainable energy
sources.
The idea that algae biomass based biofuels could serve as
an alternative to fossil fuels has been embraced by
councils across the globe.
Major companies, government bodies and dedicated non
profit organisations* are getting involved.
The domain is a rich source of data/information/knowledge.
*http://www.algalbiomass.org/
*http://www.eaba-association.eu/
*http://www.enalgae.eu/
5. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Algal biomass as biofuels: Observations
No systematic analysis of the algae biomass potential for
North-Western Europe.
Most of the knowledge buried in various formats of images,
spreadsheets, proprietary data sources and grey literature.
Lack of a knowledge level infrastructure that is equipped
with the capabilities to provide semantic grounding to the
datasets for algal biomass.
Low levels of motivation among stakeholders, for datasets
to be interlinked, shared and reused within the biomass
community.
6. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
LEAPS: A Potential Solution
Linked Entities for Algal Plant Sites
motivate the use of Semantic Web technologies and LOD
for the algal biomass domain.
laying out a set of ontological requirements for knowledge
representation that support the publication of algal
biomass data.
elaborating on how algal biomass datasets are transformed
to their corresponding RDF model representation.
interlinking the generated RDF datasets along spatial
dimensions with other datasets on the Web of data.
visualising the linked datasets via an end user LOD REST
Web service.
7. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
EnAlgae: Energetic Algae
Aims to reduce CO2 emissions and dependency on
unsustainable energy sources in North West Europe.
4 Year Strategic initiative of Interreg IVb NWE programme.
19 partners and 14 Observers across 7 EU states.
Coordinated set of activities focussing on sharing best
practice, developing effective stakeholder engagement and
encouraging transnational cooperation.
http://www.enalgae.eu/
8. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
EnAlgae: Some of the objectives
Accelerate development of sustainable technologies for
Biomass production.
Create a network of pilot scale algal facilities across NWE
in order to address the current lack of verifiable information
on algal productivity.
Maintain an up to date inventory in which pilots collect and
share data in a standardised manner.
Combine information across the entire algal bioenergy
delivery chain into a comprehensive and user friendly
Decision Support System for practitioners, policy makers
and investors
http://www.enalgae.eu/
9. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
SW and Linked data for Algal Biomass
Algal biomass data manifests itself across several facets.
The value/supply chain ranges from cultivation of algae to
production of biofuels and other products.
Cultivation, harvesting, processing and fuel production
further involves several intermediate processes.
Every stage in the algal supply chain is governed by
requirements, regulatory policies and strategies.
Each of the facets consumes and produces a large volume
of unstructured data and information.
11. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
SW, Linked data and the Algal Supply Chain
12. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Competency questions for stage 1 datasets
Data driven
Which are the algal operation sites with CO2 sources that
have CO2 emissions less than 130000 kgs, where total
costs of supplying CO2 is lower then 5000 GBP per ton of
CO2 , areal yield is greater than 30 tons per hectare and
which are located within the NUTS region “UKM61”?
Supplement the data with supporting information about the
region.
Which are the top ten algal operation sites with the lowest
impact on global warming potential?
For a given algal operation site which are the first five most
cost effective combinations of light, water, nutrients and
CO2 sources?
14. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Ontological requirements
Ontologies needed to represent
Spatiality: location of possible algae cultivation sites,
location of the sources of consumables (CO2 , nutrients
and water).
Geometries: area of the cultivation site - extents,
polygons, linear and ring arrays.
Units and Measurements: conventional measurement
units such as Kgs for quantities and hectares for area,
bespoke units of measurements, i.e., Kgs/hectare or
Kgs/annum.
Territorial units for statistics: core concepts of the NUTS
system.
Domain specific knowledge: algae cultivation sites, CO2
sources, pipelines.
15. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Ontologies for Algal Biomass: Reuse
Spatial Data: WGS84, spatial relations, Geonames,
NeoGeo
Geometries: WGS84, extended NeoGeo.
Units and Measurements: extended QUDT
http://www.w3.org/2003/01/geo/wgs84_pos
http://www.ordnancesurvey.co.uk/oswebsite/ontology/
spatialrelations.owl
http://www.geonames.org/ontology/ontology_v2.2.1.rdf
http://geovocab.org/geometry
http://qudt.org/1.1/vocab/dimensionalunit
16. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Ontologies for Algal Biomass: Reuse
17. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Ontologies for Algal Biomass: Domain
knowledge
Ontologies for modelling spatial knowledge, units and
measurements were reused.
Discovering vocabularies conceptualising the domain
knowledge for algal biomass was non trivial.
Concepts and relationships for algal biomass had to be
defined from ground-up in accordance to the principles of
ontology development
The design was very strongly guided by feedback from
questionnaires made available to the stakeholders,
interviews with domain experts, providers of raw datasets
and grey literature from the algal biomass and biofuels
domain.
18. Ontologies for Algal Biomass: Domain
knowledge
Ontologies available at http:/purl.org/biomass/ontologies
19. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Designing URIs for Algal Biomass Data
20. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Lifting XML datasets to
Linked data
21. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Lifting XML datasets to Linked data
Raw data
22. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Lifting XML datasets to Linked data
First step
The first part of the data processing and the potential
calculation are performed in a GIS-based model which was
developed for this purpose using ArcGIS.
Raw datasets with various origins and formats -
transformed using bespoke computational algorithms to an
ArchGIS specific XML format.
brings uniformity in the format of representation of the
datasets and in the process of transformation.
important computations that are part of the final datasets
are performed.
23. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Lifting XML datasets to Linked data
Second step
The original data sources had several limitations and a
one-to-one transformation was not possible.
The XML data sources related the biomass production sites
and the CO2 sources via the pipeline dataset.
In order to query for all sources that supplied CO2 to a
specific site, the query would have to be made via the
pipeline dataset.
The site, source and NUTS identifiers in the datasets were
string literals rather than URIs.
A bespoke parser that exploits XPath to selectively query
the XML datasets and generate linked data was
implemented.
It utilises a complex underlying data structure to facilitate
the transformation.
24. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Lifting XML datasets to Linked data
Four datasets were transformed and stored in distributed
triple store repositories.
The NUTS regions dataset in RDF was available but there
was no SPARQL endpoint or service to query the dataset.
We retrieved the dataset dump and curated it in our local
triple store as a separate repository.
The transformed datasets interlinked resources defining
sites, CO2 sources, pipelines, regions and NUTS data.
25. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Lifting XML datasets to Linked data
28. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Architecture: Main components
Parsing modules: lifting the data
from their original formats to RDF.
Ontologies.
Linking engine: producing the linked
data representation of the datasets.
Triple store: OWLIM SE 5.0.
REST Web services.
SPARQL endpoints.
Web Interface.
29. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Querying Linked Algal Biomass Data
Most queries over the datasets are based on retrieving
knowledge centered around location information.
The queries are federated across the various repositories
holding the linked data.
Representative Query:
Which are the algal operation sites with CO2 sources that have
CO2 emissions less than 130000 kgs, where total costs of
supplying CO2 is lower then 5000 GBP per ton of CO2 , areal
yield is greater than 30 tons per hectare and which are located
within the NUTS region “UKM61”? Supplement the data with
supporting information about the region.
32. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Related Efforts,
Conclusions and
Future Work
33. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Related efforts
AquaFuels*: a taxonomy of algal strains available as PDF.
BioEnerGIS *: a GIS based Decision support tool,
BIOPOLE, for biomass plants feeding district heating
systems.
BioKDF *: Bioenergy knowledge discovery framework from
the U.S. department of Energy.
Reegle *: various energy related datasets as linked open
data and a SPARQL endpoint to access the datasets.
*http://www.aquafuels.eu/
*http://www.bioenergis.eu/
*https://bioenergykdf.net/
*http://data.reegle.info
34. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Conclusions
Investigations into using algal biomass as an alternative
source of fuel is gaining widespread momentum.
The Algal biomass community currently does not employ
any knowledge representation techniques to formalise and
structure valuable knowledge harnessed through their
operations.
As research in the sector progresses, a wealth of
information will be available that could be exploited by
domain specific applications.
35. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Summary
The LEAPS framework exploits SW and LD for the algal
biomass community,
enabling the screening of data for promising individual
plant sites and provides base data for more detailed
planning purposes.
proposing a set of domain specific ontologies for algal
plant sites, CO2 and pipelines to be shared and extended
by the community.
defining a linked data publishing architecture that
transforms raw data in disparate formats to a uniform XML
representation.
using a set of well established and domain specific
ontologies as metadata to transform it further into linked
data.
providing various data access options such as a SPARQL
endpoint, an interactive Google map interface and a REST
API for making the data accessible to stakeholders.
36. monika.solanki@bcu.ac.uk I-Semantics 2012, 5th September 2012, Graz
Future Work
Several other datasets need to be integrated once they
become available.
One of the core datasets - algal strains from Algaebase*.
Multifaceted visualisation of the integrated datasets to
facilitate the uptake of the framework by stakeholders.
Rule based reasoning to model and inference domain
specific constraints.
*http://www.algaebase.org/