Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This presentation highlights these challenges in the context of using Linked Data for environmental and social science databases.
2. Estuarine Flooding
Financial implications
Damage
Loss of business
Personal factors
Emotional impact
Flood prediction
Locations
Severity
Requires correlating
Sea-state data
Weather forecasts
Details of sea defences
Response Planning
Evacuation routes
Personnel deployment
…
Requires more data
Traffic reports
Shipping
…
8 April 2015 SICSA Env. & Social Databases 2
Image: http://www.metro.co.uk/
3. Flood Predication
Solent Use Case
Busy shipping
channel
Two major ports
Complex tidal
and
wave patterns
8 April 2015 SICSA Env. & Social Databases 3
6. Data Linkage and Querying
Web of Data
8 April 2015 SICSA Env. & Social Databases 6
7. 1. Global ID – URI
2. Resolvable ID
3. Useful content
HTML for humans
RDF for machines
4. Link to other resources
Like the Web,
but for data!
Linked Data Approach
8 April 2015 SICSA Env. & Social Databases 7
“RDF and OWL do not
solve the interoperability
problem, they just lay it
bare on the table!”
10. Querying Approach
Use ontologies as common model
Requires:
Representation of data:
sensors and databases
Establishing mappings between ontology
models and data source schemas
Accessing data sources through queries
over ontology model
Expressing continuous queries over sensors
8 April 2015 SICSA Env. & Social Databases 10
11. WSN Resource Concerns
Energy
Running off battery
Computation Capabilities
Limited CPU
Limited memory
Limited storage
Radio Transmission
Limited range
Energy impact
Lost transmissions
8 April 2015 SICSA Env. & Social Databases 12
12. Data Matching
Administrative Data Research Centre - Scotland
Messy data
Probabilistic matches
Schema matching
John Grant
Fisherman
Fiona Sinclair
Ian Grant
Smithy
Born: 1861
Stuart Adam
Wheelwright
Morag Scott
Flora Adam
Seamstress
Born: 1866
Married: 1884
John Grant
Farmer
Fiona Grant
Iain Grant
Born: 1860
13
13. Administrative Data Research Network
Administrative Data Research Centre - Scotland
Administrative
Data Service
14
14. ADRC-Scotland
Administrative Data Research Centre - Scotland
Co-located with Farr Institute,
Scottish Government and NHS.
Universities of Aberdeen, Dundee,
Edinburgh, Glasgow, Herriot-Watt,
St Andrews and Stirling.
Expertise in administrative data and public
engagement, linkage, law and relevant computer
science techniques.
Provide research support, facilities, training
15
15. Research Focus
Administrative Data Research Centre - Scotland
http://www.gov.scot/Resource/0044/00442276-390.jpg
Schools, colleges and universities
The criminal and justice system
Social work services
Social welfare
Housing system
Transport system
Health system
Historical administrative data
16
16. Multiple Identities
Andy Law's Third Law
“The number of unique identifiers
assigned to an individual is never
less than the number of Institutions
involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
8 April 2015 SICSA Env. & Social Databases 17
P12047
X31045
GB:29384
http://rdf.ebi.ac.uk/resource/ch
embl/molecule/CHEMBL1642
https://www.ebi.ac.uk/chembl/co
mpound/inspect/CHEMBL1642
17. Query Performance
Response time
Data freshness
Reliability
Volume of requests
Hosting resources
8 April 2015 SICSA Env. & Social Databases 18
Data
Source
Data
Source
Data Warehouse
Queries
Data
Source
Data
Source
Mediator
Queries
18. How FAIR is your Data?
8 April 2015 SICSA Env. & Social Databases 19
19. Summary
Web of Data
Global
Identifiers
Interoperable
data
Domain
ontologies
Challenges
Data matching
Multiple
identifiers
Query
performance
FAIR data
8 April 2015 SICSA Env. & Social Databases 20
www.alasdairjggray.co.uk
A.J.G.Gray@hw.ac.uk
@gray_alasdair
Notas del editor
Environmental decision support systems
Flood emergency response:
real-time data mash-ups
real-time data linkage
Strait of water separating Isle of Wight from English mainland
Two high tides -> increased opportunities for getting ships in and out -> better for business
Complex tidal pattern
Non-standard models
Overtopping: a wave or tide exceeds the height of the sea defence: simplified as threshold in graph
Sensor data provides current sea-state conditions
National Flood and Coastal Defences Database (NFCDD) provides height of sea walls, etc
Lots of forms of heterogeneity in the system
Contextual Data
Weather feed provides predicted wind speed and direction,
contextual streaming data
Maps -> contextual visual data
Report data in a form understandable to the user, ontology
Data from heterogeneous sources: discover relevant sources; different temporal modalities; different data models and representations
Interlink data: common representation, align data models/schemas, identify common entities
Query decomposition across distributed sources
Efficient in-network processing: Save energy, increase network lifetime
Enable new insights through novel user interfaces
Linked data offers a platform on which to do data science
Linked Data hugely successful since inception in 2006, revision 2009
About 300 datasets published
Wide range of topics
Coverage of 10,000+ athletes, 200+ countries, 400-500 disciplines and 30 venues
Page for every athlete and country drawing on open data
Internally
DBPedia and Geonames
Previous streaming extensions to SPARQL have problems
Bird habitat monitoring, Coastal monitoring, Glacier movement, Farms, Volcanoes…
Cost effective monitoring, high spatial/temporal resolution
What is the underlying technology/software?
Trade-off of capabilities vs QoS vs Lifetime
Every system performed their own bespoke evaluations, how do you compare?
Social science example from ADRC Scotland
Same problem in environmental science: bore holes in the North Sea
Four Administrative Data Research Centres (ADRCs), one in each UK country
England – led by University of Southampton
Northern Ireland – led by Queens Uni Belfast
Scotland – led by University of Edinburgh
Wales – led by Swansea University
Coordinating Administrative Data Service (ADS) – led by University of Essex
Each captures a subtly different view of the world
Are they the same? … depends on your point of view
Different URIs for different representations (content negotiation)