Big Data tech can tame volume and velocity. Taming Variety in presence of volume and velocity is the real challenge. I’ve been working on taming variety and velocity simultaneously (Stream Reasoning) for 10 years, now. In this talk, I give you some examples of application domains where this is necessary. I explain where the Stream Reasoning community went so far in theory, applications and products. In particular I focus on my applications and my startup Fluxedo, which is offering real-time social media analytics across social networks. I conclude the talk discussing what comes next: 1) the need to focus on languages and abstractions able to easily capture user needs; 2) the need to find the sweet-spot between scalability and expressive semantics; 3) the need to used semantics to model more than the data access; and 4) the need to get over imperfect data. If you are exited, I did my job for today!
Aspirational Block Program Block Syaldey District - Almora
Stream reasoning: an approach to tame the velocity and variety dimensions of Big Data
1. STREAM REASONING
AN APPROACH TO TAME THE VELOCITY
AND VARIETY DIMENSIONS OF BIG DATA
Emanuele Della Valle
Politecnico di Milano
http://emanueledellavalle.org
@manudellavalle
Oslo, Norway - 15.6.2017
2. BIG DATA TECHS
CAN TAME VOLUME
▸ Hadoop, MapReduce, HIVE
▸ “schema on read” methodology
▸ spark (x100 faster)
▸ “data lake” concept
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
3. BIG DATA TECHS
CAN TAME VELOCITY
▸ Storm
▸ Kafka
▸ Spark Streaming
▸ Flink
▸ paradigmatic change
▸ from persistent data and transient queries
▸ to persistent queries and transient data
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
4. BIG DATA TECHS
CANNOT TAME VOLUME AND VELOCITY SIMULTANEOUSLY
ZB
EB
PB
TB
GB
MB
KB
months days hours min. sec. ms.
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
5. BIG DATA TECHS
CAN TAME VARIETY USING SEMANTIC TECHNOLOGIES
▸ RDF data model
▸ SPARQL query language
▸ OWL ontological language
▸ R2RML mapping language
▸ Ontology Based Data Access methodology
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
6. BIG DATA TECHS
VARIETY MAKES PROBLEMS HARDER
ZB
EB
PB
TB
GB
MB
KB
months days hours min. sec. ms.
VARIETY
STILL THERE ARE USERS
WHOSE DECISIONS
NEED TO TAME ALL Vs
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
7. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
OFF-SHORE OIL OPERATIONS
‣ When sensors on a drilling pipe in an oil-rig indicate that it is about to get
stuck, how long — according to historical records — can I keep drilling?
‣ 400,000 sensors from 10s of differente producers
‣ 10,000 observations per second, many out-of-operational-ranges
8. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SMART CITIES
▸ Can you suggest where to spend my next hours given my interests,
the presence of people and what their doing?
▸ 100,000s people generating 10,000s information items per second
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
9. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SOCIAL MEDIA ANALYSIS
▸ Who are the current top influencer users that are driving the
discussion about the top emerging topics across all the social
networks
▸ billions of active users (facebook, 1.86 bln in February 2017)
▸ millions of actions (facebook, 2.92 mln post per minute)
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
10. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
REQUIREMENT ANALYSIS
A system able to answer those queries must be able to
▸ handle massive datasets x
▸ process data streams on the fly x
▸ cope with heterogeneous datasets x
▸ cope with incomplete data x x
▸ cope with noisy data x
▸ provide reactive answers x
▸ support fine-grained information access x x
▸ integrate complex domain models x
Volume
Velocity
Variety
VERACITY
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
11. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
(PARTIAL) SOLUTIONS: STREAM PROCESSING
▸ A paradigmatic change!
window
input streams streams of answerRegistered
Continuous
Query
Dynamic
System
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
12. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
STREAM PROCESSING VS. REQUIREMENTS
Requirement SP
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
13. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
(PARTIAL) SOLUTIONS: SEMANTIC TECHS
▸ Given an ontology O (an information model), a query Q and
a set of ground facts A contained in multiple heterogenous databases …,
▸ use O to rewrite Q as Q’ so that
▸ answer(Q,O,A) = answer(Q’,!,A)
The answer of the query Q using the ontology O for any set of ground facts A is equal to
answer of a query Q’ without considering the ontology O
▸ Use mapping M to map Q’ to multiple SQL queries to the various databases
Rewrite
O
Q
Q’
Map
SQL
M
answer
A
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
14. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SEMANTIC TECHS VS. REQUIREMENTS
Requirement SP ST
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
15. Is it possible to make sense in real time
of multiple, heterogeneous, gigantic and
inevitably noisy and incomplete data streams
in order to support the decision processes of
extremely large numbers of concurrent
users?
E. Della Valle, S. Ceri, F. van Harmelen & H. Stuckenschmidt, 2010
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
STREAM REASONING RESEARCH QUESTION
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
16. ( , 13), ( , 12), ( , 8) , ( , 8)
STREAM REASONING
THEORY: STREAM PROCESSING
time
1 minute wide window
Which are the top-4
most frequent colours
in the last minute?
Is there a
followed by a
in the last minute yes, many
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
17. STREAM REASONING
THEORY: STREAM PROCESSING + SEMANTIC TECHS
time
1 minute wide window
An ontology of colours
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
18. ( , 13),( , 8) , ( , 8)
STREAM REASONING
THEORY: STREAM REASONING
time
1 minute wide window
Which are the top-2 most
frequent cool colours in
the last minute?
Is there a primary cool
colour followed by a
secondary warm one
yes, followed by .
An ontology of colours
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
19. STREAM REASONING
THEORY: STREAM REASONING
time
1 minute wide window
A better
ontology of colours
Which are the most
frequent sentiments in
the last minute?
Is there a impulsive,
irritating colour followed
by an happy one
The better is the ontology of the colours we are using
the more expressive are the queries we can register
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
20. STREAM REASONING
THEORY: 1000 SCIENTIFIC PAPERS IN 10 YEAR
▸ It is possible extend the Semantic Web stack in order
to represent heterogeneous data streams (RDF streams), continuous
queries (C-SPARQL, CQELS-QL, … RSP-QL), and continuous reasoning
(LARS, STARQL, …) tasks
▸ The ordered nature of data streams and the possibility to forget old
enough information allow to optimise continuous querying (C-SPARQL
Engine, CQELS, MorphStream, … RSP Engine) and continuous
reasoning (IMaRS, RDFox, StreamRule, ETALIS…) tasks so to provide
reactive answers
▸ Semantic Web and Machine Learning technologies can be jointly
employed to cope with the noisy and incomplete nature of data streams
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
21. Traditional
STREAM REASONING
THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED
TRADITIONAL APPROACH
Data
“in-motion” Data
“in-motion”
Registered
analysis
Insights
“in-motion”
Data put
“at-rest”
in DWH
Analysis
Analysis
Insight
PANOPTIQUE APPROACH
Ontology
+
Mappings
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
22. Traditional Stream Reasoning
STREAM REASONING
THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED
TRADITIONAL APPROACH
Data
“in-motion” Data
“in-motion”
Registered
analysis
Insights
“in-motion”
Data put
“at-rest”
in DWH
Analysis
Analysis
Insight
PANOPTIQUE APPROACH
Ontology
+
Mappings
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
23. STREAM REASONING
(MY) APPLICATIONS
BOTTARI
Winner of
Semantic Web Challenge 2011
URBAN BIG DATA SCIENCE
Winner of IBM faculty award 2013
Funded by 8 EIT Digital yearly grants
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
25. STREAM REASONING
URBAN BIG DATA SCIENCE: CROWDINSIGHTS PROJECT
October July
1000
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
29. STREAM REASONING
STREAM REASONING VS. REQUIREMENTS
Requirement Stream Reasoning
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
not specifically treated so far treated but not resolved universally addressed by all studies
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
30. STREAM REASONING
NOW WHAT?
▸ Focus on languages and abstractions able to easily capture user needs
▸ Analytic queries
▸ Which electricity-producing turbine has sensor readings similar
(i.e., Pearson correlated by at least 0.75) to any turbine that
subsequently had a critical failure in the past year?
▸ Advance analytics (Machine Learning) tasks
▸ Where am I likely going to run into a traffic jam during my commute
tonight and how long will it take, given current weather and traffic
conditions?
▸ … many more …
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
31. ▸ Find the sweet-spot between scalability and expressive semantics
▸ the data access layers are clear (enough)
▸ … but, what kind of reasoning should we put at the top?
▸ Rule language? Answer set programming? Temporal logic?
STREAM REASONING
NOW WHAT?
Complexity
Raw Stream Processing
Semantic Streams
DL-Lite
???Abstraction
Selection
Interpretation
Reasoning
Re-writing
Mapping
Change Frequency
PTIME
NEXPTIME
104 Hz
1 Hz
Complexity vs. Dynamics
AC0
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
32. STREAM REASONING
NOW WHAT?
▸ Used semantics to model more than the data access
▸ Data are imperfect, get over it!
33. STREAM REASONING
ARE YOU INTERESTED TO LEARN MORE?
▸ the official stream reasoning community web site
▸ http://streamreasoning.org/
▸ the RDF Stream Processing W3C community
▸ https://www.w3.org/community/rsp/
▸ my personal pages
▸ http://emanueledellavalle.org/ + twitter: @manudellavalle
▸ my company page
▸ http://fluxedo.com/en/
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
34. STREAM REASONING
THANK YOU!
ANY QUESTION?
Emanuele Della Valle
Politecnico di Milano
http://emanueledellavalle.org
@manudellavalle
Oslo, Norway - 15.6.2017