This document outlines research on using virtual clusters and stream processing topologies for integrating sensor data streams. It describes using the Wirbelsturm tool to set up a virtual cluster on which Storm topologies can be deployed to perform real-time processing of sensor observations modeled with ontologies. A use case of integrating heterogeneous environmental sensor data from a sensor cloud is presented where the SSN and SWEET ontologies are used.
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Virtual Clusters for (RDF) Stream Processing
1. Alejandro Llaves
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
Oct 21 2015
Virtual Clusters for
(RDF) Stream Processing
4. Motivation
Integrating an unbounded stream of heterogeneous
sensor observations
Solution:
– Storm topologies for real-time processing
– Semantic Sensor Network (SSN) ontology for
modelling observations
– SWEET ontology for environmental phenomena
5. Use case: Sensor Cloud data integration (1/3)
Sensor Cloud
Viticulture, water
management, weather
monitoring, oyster farming...
RESTful API – JSON
Network → Platform →
Sensor → Phenomenon →
Observation
Lack of semantic
descriptions, e.g.
rain_trace vs Rain.
Multiple HTTP requests to
query various streams.
Source: CSIRO
6. Use case: Sensor Cloud data integration (2/3)
Sensor Cloud messages to field-named tuples
SWEET annotations for heterogeneous phenomena descriptions
<sample time=”20150528T16:30” value=”15” sensor=”bom_gov_au.94961.air.air_temp”/>
[“20150528T16:32”, “20150528T16:30”, “15”, “bom_gov_au”, “94961”, “air”, “air_temp”,
“43.3167”, “147.0075”]
network
phenomenon
platform sensorsampling time
system time
latitude longitude
SensorCloudParser
Bolt
SweetAnnotations
Bolt
7. Use case: Sensor Cloud data integration (3/3)
SSN mapping
SSNConverter
Bolt
8. Topologies everywhere
A Storm topology “is a graph of stream transformations
where each node is a spout or bolt”.
https://storm.apache.org/documentation/Tutorial.html
Example of simple topology
9.
10.
11. Setting up a virtual cluster (1/2)
Wirbelsturm - https://github.com/miguno/wirbelsturm/
Allows deploying (local or remote) virtual clusters.
Focus on Big Data technologies: Storm, Kafka,
Zookeeper...
Uses Vagrant for “easy to configure, reproducible, and
portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html
Uses Puppet for provisioning: installation and
configuration of SW packages in the cluster nodes.
12. Setting up a virtual cluster (2/2)
$ ./deploy
Show wirbelsturm.yaml
Check Storm GUI -
http://localhost:28080/index.html
15. Conclusion
Conclusion
Wirbelsturm allows easy configuration & deployment of virtual clusters,
with focus on Big Data technologies.
SSN and SWEET ontologies to model and integrate environmental
sensor observations.
Parallelization of bottleneck tasks reduces the average message
processing latency (up to some extent). More about Storm
parallelization: http://bit.ly/1NVyjU2
Delaying RDF conversion does not speed up the processing of Sensor
Cloud messages in the tested environment.
Submitted paper to IJSWIS, special issue on Velocity and Variety
Dimensions of Big Data – Llaves, Corcho et al.
What's coming next
Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
16. The presented research has has been funded by Ministerio de
Economía y Competitividad (Spain) under the project ”4V:
Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora
de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie
IRSES project SemData (612551), and supported by an AWS in
Education Research Grant award.
Alejandro Llaves
allaves@fi.upm.es
Thanks!