Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
ERSA 2017: A linked open data based system for flexible delineation of geographic areas
1. A linked open data based system for flexible
delineation of geographic areas
Peter van den Besselaar, Ali Khalili and Klaas Andries de Graaf
August 2017
Artificial Intelligence Section
Department of Computer Science
Faculty of Science
Department of Organization Sciences
Faculty of Social Science
2. Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 2
SMS Platform Focus Goal
How to capture new insights
by integrating data from
multiple heterogeneous data
sources in the STI domain?
3. Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 2
SMS Platform Focus Goal
How to capture new insights
by integrating data from
multiple heterogeneous data
sources in the STI domain?
4. Linked Data Creation
Linked Data Services
Applications
Use Cases
Data Ingestion
RISISPublicData
RISISPrivateData
OpenDataontheWeb
combine one or more SMS services with other
existing services and applications to build novel
and innovative applications.
a
b
c
d
a) Dataset Metadata Editor
b) RISIS Datasets Portal
c) Spreadsheet add-on
d) Data Linking UI
In order to enable batch processing of data,
SMS provides a Google spreadsheet add-on.
This add-on allows users to enrich their data
directly in their spreadsheets.
A set of user interfaces to allow users
create their lenticular lenses –
different views on entity linking.
The RISIS dataset holders all have to describe their
datasets in a detailed, consistent, and uniform way.
To achieve this goal, RDF data model is used to describe
the RISIS datasets. To stimulate non-Semantic Web
users to generate valid RDF metadata descriptions, we
designed a novel user-friendly editor which hides the
complexity of RDF from non-technical users. The metadata
editor exploits the state of the art Web technologies to provide
user-friendly component to view and edit the metadata.
In order to exploit the generated metadata, RISIS datasets
portal brings a user interface to view and browse the
metadata. Faceted browsing allows users to explore the
dataset via multiple entry points, or when users do not
know what they are looking for beforehand. The portal also
handles user registration and supports the process of
reviewing visit/access requests to certain
RISIS datasets.
convert unstructured and structured data to RDF.
expose the functionality of the SMS platform to third-party
users by standard Application Programming Interfaces (APIs).
Identity Resolution Services
Named Entity Recognition Services
Metadata Services
Category Services
Basic Geo Services
Innovative Geo Services
Integration Services
handle the entity disambiguation
problem and manage the same or
similar entities found in different
datasets.
extract and classify
named entities (e.g.
people, places,
organizations) in
unstructured text.
resolve the mapping
between heterogeneous
classification schemas
(e.g. WoS and FoS, or
different ISO codes for
countries).
allow search on datasets based on
the description of datasets in
multiple metadata categories (e.g.
language, time coverage, etc.).
deal with basic representation of
geographical data together with
geocoding functions and
identifying the boundaries
containing a given point.
provide innovative services based
on new notions of distance (e.g.
traffic congestion, language factor,
flight routes, etc.)
allow integration of data with RISIS public and
private datasets as well as open datasets and social
data available on the Web.
DBpedia
Wikidata
…
OrgRef
GRID
FundRef
Geoname
ISNI
VIAF
Cordis
?
Lenticular Lenses
When comparing two entities,
depending on the user’s
perspective and the context of
study, they might be considered
the same, similar or different.
Sometimes two organizations
(e.g. departments) can be the
same – because they are parts
of the same organization
(university). But if one wants to
compare departments, this is
not the case.
Lenticular lenses support linking
the entities based on the
context of use and provided
features of data.
access and retrieve heterogeneous data.
Access Control Points (ACPs)
- provide standard interfaces to reduce
technical difficulties of accessing data.
- provide a mechanism to coordinate access
to data based on the user role and the
datasets owner’s requirements.
WWW
e) Faceted browser
Allows browsing a dataset using a set
of pre-defined facets
e
http://sms.risis.eu
The Semantically Mapping
Science (SMS)
Platform
5. RISIS Datasets: Entity Types
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 4
SMS Platform Data
Organization Product Agreement
Person Policy
Policy
Evaluation
Location
CIB ETER EUPRO JOREP Leiden-Ranking
MORE I Nano Profile SIPER VICO
Higher
Education
Firm
Funding
Body
Publication
Patent
Project
Investment
Funding
Program
Integration
6. RISIS Datasets: Entity Types
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 4
SMS Platform Data
Organization Product Agreement
Person Policy
Policy
Evaluation
Location
CIB ETER EUPRO JOREP Leiden-Ranking
MORE I Nano Profile SIPER VICO
Higher
Education
Firm
Funding
Body
Publication
Patent
Project
Investment
Funding
Program
Integration
11. defined by OECD in collaboration with EC/Eurostat
consider factors beyond the predefined city boundaries to better
reflect the economic geography of where people live and work
Functional Urban Areas (FUAs)
OECD Metropolitan eXplorer: http://measuringurban.oecd.org
12. defined by OECD in collaboration with EC/Eurostat
consider factors beyond the predefined city boundaries to better
reflect the economic geography of where people live and work
population
area
GDP
environment (CO2 emissions and air pollution)
labour market (employment and unemployment growth)
innovation (patent intensity)
urban form and territorial organization
Functional Urban Areas (FUAs)
OECD Metropolitan eXplorer: http://measuringurban.oecd.org
17. Problem
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 10
SMS Platform Geo Services
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
- Geocode to LAU (municipality)
OECD FUAs List
18. Problem
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 10
SMS Platform Geo Services
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
- Geocode to LAU (municipality)
- Shapefiles for FUAs or LAUs?
OECD FUAs List
19. Problem
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 10
SMS Platform Geo Services
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
- Geocode to LAU (municipality)
- Shapefiles for FUAs or LAUs?
OECD FUAs List
20. Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 11
SMS Platform Geo Services
Building a Linked Open Data Space
for Flexible Delineation of
Geographic Areas
Goal
23. LifeCycle
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 13
SMS Platform Geo Services
Data Discovery &
Collection
Data Extraction
& Conversion
Service to
Application
Data Storage &
Querying
Data to Service
Data Linkage
Linked Data
24. LifeCycle
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 13
SMS Platform Geo Services
Data Discovery &
Collection
Data Extraction
& Conversion
Service to
Application
Data Storage &
Querying
Data to Service
Data Linkage
Linked Data
25. DATA DISCOVERY & COLLECTION
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 14
SMS Platform Geo Services
• OpenStreepMap (OSM)
• Database of Global Administrative Areas (GADM)
• Flickr Shapefiles Dataset
• Published Shapefiles for Individual Countries
• Published Geospatial RDF Datasets
26. Open Administrative Boundaries
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 15
SMS Platform Geo Services
• Level 1: super-national
administrations e.g.
European Union.
• Level 2: country borders
based on the political
entities listed on the ISO
3166 standard.
• Level 3 to 11: subnational
borders such as ``state'',
``province'', ``region'' and
``district''.
• Level 0: countries.
• Level 1 to 5: lower level
subdivisions such as
provinces, departments,
counties, etc.
depending on the size
and availability of data
for the underlying
country.
• Level 1: country
• Level 2: region
• Level 3: county
• Level 4: locality
• Level 5:
neighborhood
27. Open Administrative Boundaries: Example
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 16
SMS Platform Geo Services
28. DATA EXTRACTION & CONVERSION
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 17
SMS Platform Geo Services
GeoJSON
Enrichment
Functions
Mapping
Configurations
OSM XML
PBF
ESRI shapes
triplify
mapshaper
osmtogeojson
osmosis
29. DATA EXTRACTION & CONVERSION
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 18
SMS Platform Geo Services
Metadata about different levels provided by OSM
http://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative
30. DATA STORAGE & QUERYING
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 19
SMS Platform Geo Services
Virtuoso Geo Spatial
Geometry as SMS
internal representation
for Geo-data in RDF
31. DATA LINKAGE
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 20
SMS Platform Geo Services
- Query on metadata about the
administrative boundaries
- Find the alignment between levels
in different datasets
32. DATA LINKAGE
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 21
SMS Platform Geo Services
- used the possible mappings between datasets at different levels.
- check the overlaps of areas at the similar level, and for the matching areas apply
string matching to make sure that they refer to the same administrative boundary.
40. Investigating the effect of regional socio-economic properties
innovative activities, as stimulated by recent RTD policies in the
Netherlands.
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 27
SMS Platform Geo Services
Address
FUA
Administrative Boundaries
Coordinates
geocode
RVO Dataset
(research) and innovation
subsidies for organizations
and companies in the
Netherlands
GADM Dataset
CBS Dataset
statistical information on dimensions such
as labour and income, economy, society
and regional aspects of municipalities and
regions in the Netherlands.
Use Case
41. Iden5fying innova5ve areas
Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 28
SMS Platform
CBS-NL
RVO-NL
• Couple open data on the innovation projects with the
theoretically defined geo-boundaries ….
• …. to investigate the geography of innovation
42. Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 29
SMS Platform Geo Services Use Case
People Hybrid OECD FUAsBusinesses
People Hybrid OECD FUAsBusinesses
FGAs
Projects mapped to FGAs
43. Semantically Mapping Science (SMS) Platform: http://sms.risis.eu 30
SMS Platform
(3) Statistical data about
boundaries to create an own
geo-classification, e.g. CBS-NL
(2) Open boundaries
e.g. OpenStreetMap
(1) e.g. innovation project
in RVO-NL database
(4) Distribution of innovation projects over
self (theoretically) defined area’s
Overview
(5) Link to open statistical
data : e.g., Statistics
Netherlands or OECD
- a wealth of contextual
variables
(6) Link to open data on
organizations: ORGREF,
ETER, CORDIS
AREAS