SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
STREAM REASONING
AN APPROACH TO TAME THE VELOCITY
AND VARIETY DIMENSIONS OF BIG DATA
Emanuele Della Valle

Politecnico di Milano

http://emanueledellavalle.org

@manudellavalle
Oslo, Norway - 15.6.2017
BIG DATA TECHS
CAN TAME VOLUME
▸ Hadoop, MapReduce, HIVE
▸ “schema on read” methodology
▸ spark (x100 faster)
▸ “data lake” concept
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
CAN TAME VELOCITY
▸ Storm
▸ Kafka
▸ Spark Streaming
▸ Flink
▸ paradigmatic change
▸ from persistent data and transient queries
▸ to persistent queries and transient data
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
CANNOT TAME VOLUME AND VELOCITY SIMULTANEOUSLY
ZB
EB
PB
TB
GB
MB
KB
months days hours min. sec. ms.
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
CAN TAME VARIETY USING SEMANTIC TECHNOLOGIES
▸ RDF data model
▸ SPARQL query language
▸ OWL ontological language
▸ R2RML mapping language
▸ Ontology Based Data Access methodology
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
VARIETY MAKES PROBLEMS HARDER
ZB
EB
PB
TB
GB
MB
KB
months days hours min. sec. ms.
VARIETY
STILL THERE ARE USERS
WHOSE DECISIONS 

NEED TO TAME ALL Vs
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
OFF-SHORE OIL OPERATIONS
‣ When sensors on a drilling pipe in an oil-rig indicate that it is about to get
stuck, how long — according to historical records — can I keep drilling?
‣ 400,000 sensors from 10s of differente producers
‣ 10,000 observations per second, many out-of-operational-ranges
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SMART CITIES
▸ Can you suggest where to spend my next hours given my interests, 

the presence of people and what their doing?
▸ 100,000s people generating 10,000s information items per second

Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SOCIAL MEDIA ANALYSIS
▸ Who are the current top influencer users that are driving the
discussion about the top emerging topics across all the social
networks
▸ billions of active users (facebook, 1.86 bln in February 2017)
▸ millions of actions (facebook, 2.92 mln post per minute)
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
REQUIREMENT ANALYSIS
A system able to answer those queries must be able to
▸ handle massive datasets x
▸ process data streams on the fly x
▸ cope with heterogeneous datasets x
▸ cope with incomplete data x x
▸ cope with noisy data x
▸ provide reactive answers x
▸ support fine-grained information access x x
▸ integrate complex domain models x
Volume
Velocity
Variety
VERACITY
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
(PARTIAL) SOLUTIONS: STREAM PROCESSING
▸ A paradigmatic change!
window
input streams streams of answerRegistered
Continuous
Query
Dynamic
System
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
STREAM PROCESSING VS. REQUIREMENTS
Requirement SP
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
(PARTIAL) SOLUTIONS: SEMANTIC TECHS
▸ Given an ontology O (an information model), a query Q and 

a set of ground facts A contained in multiple heterogenous databases …,
▸ use O to rewrite Q as Q’ so that
▸ answer(Q,O,A) = answer(Q’,!,A)

The answer of the query Q using the ontology O for any set of ground facts A is equal to
answer of a query Q’ without considering the ontology O
▸ Use mapping M to map Q’ to multiple SQL queries to the various databases
Rewrite
O
Q
Q’
Map
SQL
M
answer
A
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SEMANTIC TECHS VS. REQUIREMENTS
Requirement SP ST
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
Is it possible to make sense in real time 

of multiple, heterogeneous, gigantic and 

inevitably noisy and incomplete data streams 

in order to support the decision processes of
extremely large numbers of concurrent
users?
E. Della Valle, S. Ceri, F. van Harmelen & H. Stuckenschmidt, 2010
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
STREAM REASONING RESEARCH QUESTION
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
(					,	13),	(					,	12),	(					,	8)	,	(					,	8)	
STREAM REASONING
THEORY: STREAM PROCESSING
time
1	minute	wide	window
Which are the top-4
most frequent colours
in the last minute?
Is there a 

followed by a 

in the last minute yes, many
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THEORY: STREAM PROCESSING + SEMANTIC TECHS
time
1	minute	wide	window
An ontology of colours
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
(					,	13),(					,	8)	,	(					,	8)	
STREAM REASONING
THEORY: STREAM REASONING
time
1	minute	wide	window
Which are the top-2 most
frequent cool colours in
the last minute?
Is there a primary cool
colour followed by a
secondary warm one

yes, followed by .
An ontology of colours
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THEORY: STREAM REASONING
time
1	minute	wide	window
A better 

ontology of colours
Which are the most
frequent sentiments in
the last minute?
Is there a impulsive,
irritating colour followed
by an happy one

The better is the ontology of the colours we are using
the more expressive are the queries we can register
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THEORY: 1000 SCIENTIFIC PAPERS IN 10 YEAR
▸ It is possible extend the Semantic Web stack in order 

to represent heterogeneous data streams (RDF streams), continuous
queries (C-SPARQL, CQELS-QL, … RSP-QL), and continuous reasoning
(LARS, STARQL, …) tasks
▸ The ordered nature of data streams and the possibility to forget old
enough information allow to optimise continuous querying (C-SPARQL
Engine, CQELS, MorphStream, … RSP Engine) and continuous
reasoning (IMaRS, RDFox, StreamRule, ETALIS…) tasks so to provide
reactive answers
▸ Semantic Web and Machine Learning technologies can be jointly
employed to cope with the noisy and incomplete nature of data streams
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
Traditional
STREAM REASONING
THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED
TRADITIONAL APPROACH
Data
“in-motion” Data
“in-motion”
Registered
analysis
Insights
“in-motion”
Data put
“at-rest”
in DWH
Analysis
Analysis
Insight
PANOPTIQUE APPROACH
Ontology
+
Mappings
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
Traditional Stream Reasoning
STREAM REASONING
THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED
TRADITIONAL APPROACH
Data
“in-motion” Data
“in-motion”
Registered
analysis
Insights
“in-motion”
Data put
“at-rest”
in DWH
Analysis
Analysis
Insight
PANOPTIQUE APPROACH
Ontology
+
Mappings
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
(MY) APPLICATIONS
BOTTARI
Winner of 

Semantic Web Challenge 2011
URBAN BIG DATA SCIENCE
Winner of IBM faculty award 2013

Funded by 8 EIT Digital yearly grants
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
URBAN BIG DATA SCIENCE: CITYSENSING PROJECT
STREAM REASONING
URBAN BIG DATA SCIENCE: CROWDINSIGHTS PROJECT
October July
1000
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
PRODUCTS: I STARTED UP
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
PRODUCTS: I STARTED UP
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
PRODUCTS: I STARTED UP
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
STREAM REASONING VS. REQUIREMENTS
Requirement Stream Reasoning
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
not specifically treated so far treated but not resolved universally addressed by all studies
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
NOW WHAT?
▸ Focus on languages and abstractions able to easily capture user needs
▸ Analytic queries
▸ Which electricity-producing turbine has sensor readings similar 

(i.e., Pearson correlated by at least 0.75) to any turbine that
subsequently had a critical failure in the past year?
▸ Advance analytics (Machine Learning) tasks
▸ Where am I likely going to run into a traffic jam during my commute
tonight and how long will it take, given current weather and traffic
conditions?
▸ … many more …
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
▸ Find the sweet-spot between scalability and expressive semantics
▸ the data access layers are clear (enough)
▸ … but, what kind of reasoning should we put at the top?
▸ Rule language? Answer set programming? Temporal logic?
STREAM REASONING
NOW WHAT?
Complexity
Raw Stream Processing
Semantic Streams
DL-Lite
???Abstraction
Selection
Interpretation
Reasoning
Re-writing
Mapping
Change Frequency
PTIME
NEXPTIME
104 Hz
1 Hz
Complexity vs. Dynamics
AC0
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
NOW WHAT?
▸ Used semantics to model more than the data access
▸ Data are imperfect, get over it!
STREAM REASONING
ARE YOU INTERESTED TO LEARN MORE?
▸ the official stream reasoning community web site
▸ http://streamreasoning.org/
▸ the RDF Stream Processing W3C community
▸ https://www.w3.org/community/rsp/
▸ my personal pages
▸ http://emanueledellavalle.org/ + twitter: @manudellavalle
▸ my company page
▸ http://fluxedo.com/en/
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THANK YOU!
ANY QUESTION?
Emanuele Della Valle

Politecnico di Milano

http://emanueledellavalle.org

@manudellavalle
Oslo, Norway - 15.6.2017

Más contenido relacionado

Similar a Stream reasoning: an approach to tame the velocity and variety dimensions of Big Data

Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
Stream Reasoning - where we got so far 2011.1.18 Oxford Key NoteStream Reasoning - where we got so far 2011.1.18 Oxford Key Note
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
Emanuele Della Valle
 
Advancing Software for Ecological Forecasting: Public Sessions
Advancing Software for Ecological Forecasting: Public SessionsAdvancing Software for Ecological Forecasting: Public Sessions
Advancing Software for Ecological Forecasting: Public Sessions
David LeBauer
 

Similar a Stream reasoning: an approach to tame the velocity and variety dimensions of Big Data (20)

Knowledge graphs in search engines
Knowledge graphs in search enginesKnowledge graphs in search engines
Knowledge graphs in search engines
 
Ist16-03 An Introduction to the Semantic Web
Ist16-03 An Introduction to the Semantic Web Ist16-03 An Introduction to the Semantic Web
Ist16-03 An Introduction to the Semantic Web
 
Challenges, Approaches, and Solutions in Stream Reasoning
Challenges, Approaches, and Solutions in Stream ReasoningChallenges, Approaches, and Solutions in Stream Reasoning
Challenges, Approaches, and Solutions in Stream Reasoning
 
Discrete Mathematics Cse131
Discrete Mathematics Cse131Discrete Mathematics Cse131
Discrete Mathematics Cse131
 
IST16-01 - Introduction to Interoperability and Semantic Technologies
IST16-01 - Introduction to Interoperability and Semantic TechnologiesIST16-01 - Introduction to Interoperability and Semantic Technologies
IST16-01 - Introduction to Interoperability and Semantic Technologies
 
Taming velocity - a tale of four streams
Taming velocity - a tale of four streamsTaming velocity - a tale of four streams
Taming velocity - a tale of four streams
 
Mapping Online Publics: Researching the Uses of Twitter
Mapping Online Publics: Researching the Uses of TwitterMapping Online Publics: Researching the Uses of Twitter
Mapping Online Publics: Researching the Uses of Twitter
 
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
Stream Reasoning - where we got so far 2011.1.18 Oxford Key NoteStream Reasoning - where we got so far 2011.1.18 Oxford Key Note
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
 
Stream Reasoning: State of the Art and Beyond
Stream Reasoning: State of the Art and BeyondStream Reasoning: State of the Art and Beyond
Stream Reasoning: State of the Art and Beyond
 
Data Processing with Microservices - Michael Minella
Data Processing with Microservices - Michael MinellaData Processing with Microservices - Michael Minella
Data Processing with Microservices - Michael Minella
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language Processing
 
A Biological Internet?: Eywa
A Biological Internet?: EywaA Biological Internet?: Eywa
A Biological Internet?: Eywa
 
One does not simply crowdsource the Semantic Web
One does not simply crowdsource the Semantic WebOne does not simply crowdsource the Semantic Web
One does not simply crowdsource the Semantic Web
 
LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19
 
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Advancing Software for Ecological Forecasting: Public Sessions
Advancing Software for Ecological Forecasting: Public SessionsAdvancing Software for Ecological Forecasting: Public Sessions
Advancing Software for Ecological Forecasting: Public Sessions
 
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 
Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1
 

Más de Emanuele Della Valle

On the need to include functional testing in RDF stream engine benchmarks
On the need to include functional testing in RDF stream engine benchmarks On the need to include functional testing in RDF stream engine benchmarks
On the need to include functional testing in RDF stream engine benchmarks
Emanuele Della Valle
 
twindex.fuorisalone.it - Social Listening of FUORISALONE 2013
twindex.fuorisalone.it  - Social Listening of FUORISALONE 2013twindex.fuorisalone.it  - Social Listening of FUORISALONE 2013
twindex.fuorisalone.it - Social Listening of FUORISALONE 2013
Emanuele Della Valle
 

Más de Emanuele Della Valle (17)

Work in progress on Inductive Stream Reasoning
Work in progress on Inductive Stream ReasoningWork in progress on Inductive Stream Reasoning
Work in progress on Inductive Stream Reasoning
 
Big Data and Data Science W's
Big Data and Data Science W'sBig Data and Data Science W's
Big Data and Data Science W's
 
La città dei balocchi 2017 in numeri - Fluxedo
La città dei balocchi 2017 in numeri - FluxedoLa città dei balocchi 2017 in numeri - Fluxedo
La città dei balocchi 2017 in numeri - Fluxedo
 
Big Data: how to use it to create value
Big Data: how to use it to create valueBig Data: how to use it to create value
Big Data: how to use it to create value
 
Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF
 
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
 
Social listener-brera-design-district-2015-03
Social listener-brera-design-district-2015-03Social listener-brera-design-district-2015-03
Social listener-brera-design-district-2015-03
 
City Data Fusion for Event Management (in Italiano)
City Data Fusion for Event Management (in Italiano)City Data Fusion for Event Management (in Italiano)
City Data Fusion for Event Management (in Italiano)
 
Semantic technologies and Interoperability
Semantic technologies and InteroperabilitySemantic technologies and Interoperability
Semantic technologies and Interoperability
 
Big data: why, what, paradigm shifts enabled , tools and market landscape
Big data: why, what, paradigm shifts enabled , tools and market landscapeBig data: why, what, paradigm shifts enabled , tools and market landscape
Big data: why, what, paradigm shifts enabled , tools and market landscape
 
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
 
On the effectiveness of a Mobile Puzzle Game UI to Crowdsource Linked Data Ma...
On the effectiveness of a Mobile Puzzle Game UI to Crowdsource Linked Data Ma...On the effectiveness of a Mobile Puzzle Game UI to Crowdsource Linked Data Ma...
On the effectiveness of a Mobile Puzzle Game UI to Crowdsource Linked Data Ma...
 
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
 
On the need to include functional testing in RDF stream engine benchmarks
On the need to include functional testing in RDF stream engine benchmarks On the need to include functional testing in RDF stream engine benchmarks
On the need to include functional testing in RDF stream engine benchmarks
 
twindex.fuorisalone.it - Social Listening of FUORISALONE 2013
twindex.fuorisalone.it  - Social Listening of FUORISALONE 2013twindex.fuorisalone.it  - Social Listening of FUORISALONE 2013
twindex.fuorisalone.it - Social Listening of FUORISALONE 2013
 
Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data
Order Matters! Harnessing a World of Orderings for Reasoning over Massive DataOrder Matters! Harnessing a World of Orderings for Reasoning over Massive Data
Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data
 
People Dimension in Software Projects
People Dimension in Software ProjectsPeople Dimension in Software Projects
People Dimension in Software Projects
 

Último

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Último (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Stream reasoning: an approach to tame the velocity and variety dimensions of Big Data

  • 1. STREAM REASONING AN APPROACH TO TAME THE VELOCITY AND VARIETY DIMENSIONS OF BIG DATA Emanuele Della Valle
 Politecnico di Milano
 http://emanueledellavalle.org
 @manudellavalle Oslo, Norway - 15.6.2017
  • 2. BIG DATA TECHS CAN TAME VOLUME ▸ Hadoop, MapReduce, HIVE ▸ “schema on read” methodology ▸ spark (x100 faster) ▸ “data lake” concept Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 3. BIG DATA TECHS CAN TAME VELOCITY ▸ Storm ▸ Kafka ▸ Spark Streaming ▸ Flink ▸ paradigmatic change ▸ from persistent data and transient queries ▸ to persistent queries and transient data Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 4. BIG DATA TECHS CANNOT TAME VOLUME AND VELOCITY SIMULTANEOUSLY ZB EB PB TB GB MB KB months days hours min. sec. ms. Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 5. BIG DATA TECHS CAN TAME VARIETY USING SEMANTIC TECHNOLOGIES ▸ RDF data model ▸ SPARQL query language ▸ OWL ontological language ▸ R2RML mapping language ▸ Ontology Based Data Access methodology Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 6. BIG DATA TECHS VARIETY MAKES PROBLEMS HARDER ZB EB PB TB GB MB KB months days hours min. sec. ms. VARIETY STILL THERE ARE USERS WHOSE DECISIONS 
 NEED TO TAME ALL Vs Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 7. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs OFF-SHORE OIL OPERATIONS ‣ When sensors on a drilling pipe in an oil-rig indicate that it is about to get stuck, how long — according to historical records — can I keep drilling? ‣ 400,000 sensors from 10s of differente producers ‣ 10,000 observations per second, many out-of-operational-ranges
  • 8. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs SMART CITIES ▸ Can you suggest where to spend my next hours given my interests, 
 the presence of people and what their doing? ▸ 100,000s people generating 10,000s information items per second
 Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 9. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs SOCIAL MEDIA ANALYSIS ▸ Who are the current top influencer users that are driving the discussion about the top emerging topics across all the social networks ▸ billions of active users (facebook, 1.86 bln in February 2017) ▸ millions of actions (facebook, 2.92 mln post per minute) Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 10. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs REQUIREMENT ANALYSIS A system able to answer those queries must be able to ▸ handle massive datasets x ▸ process data streams on the fly x ▸ cope with heterogeneous datasets x ▸ cope with incomplete data x x ▸ cope with noisy data x ▸ provide reactive answers x ▸ support fine-grained information access x x ▸ integrate complex domain models x Volume Velocity Variety VERACITY Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 11. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs (PARTIAL) SOLUTIONS: STREAM PROCESSING ▸ A paradigmatic change! window input streams streams of answerRegistered Continuous Query Dynamic System Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 12. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs STREAM PROCESSING VS. REQUIREMENTS Requirement SP massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 13. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs (PARTIAL) SOLUTIONS: SEMANTIC TECHS ▸ Given an ontology O (an information model), a query Q and 
 a set of ground facts A contained in multiple heterogenous databases …, ▸ use O to rewrite Q as Q’ so that ▸ answer(Q,O,A) = answer(Q’,!,A)
 The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O ▸ Use mapping M to map Q’ to multiple SQL queries to the various databases Rewrite O Q Q’ Map SQL M answer A Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 14. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs SEMANTIC TECHS VS. REQUIREMENTS Requirement SP ST massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 15. Is it possible to make sense in real time 
 of multiple, heterogeneous, gigantic and 
 inevitably noisy and incomplete data streams 
 in order to support the decision processes of extremely large numbers of concurrent users? E. Della Valle, S. Ceri, F. van Harmelen & H. Stuckenschmidt, 2010 STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs STREAM REASONING RESEARCH QUESTION Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 16. ( , 13), ( , 12), ( , 8) , ( , 8) STREAM REASONING THEORY: STREAM PROCESSING time 1 minute wide window Which are the top-4 most frequent colours in the last minute? Is there a 
 followed by a 
 in the last minute yes, many Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 17. STREAM REASONING THEORY: STREAM PROCESSING + SEMANTIC TECHS time 1 minute wide window An ontology of colours Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 18. ( , 13),( , 8) , ( , 8) STREAM REASONING THEORY: STREAM REASONING time 1 minute wide window Which are the top-2 most frequent cool colours in the last minute? Is there a primary cool colour followed by a secondary warm one
 yes, followed by . An ontology of colours Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 19. STREAM REASONING THEORY: STREAM REASONING time 1 minute wide window A better 
 ontology of colours Which are the most frequent sentiments in the last minute? Is there a impulsive, irritating colour followed by an happy one
 The better is the ontology of the colours we are using the more expressive are the queries we can register Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 20. STREAM REASONING THEORY: 1000 SCIENTIFIC PAPERS IN 10 YEAR ▸ It is possible extend the Semantic Web stack in order 
 to represent heterogeneous data streams (RDF streams), continuous queries (C-SPARQL, CQELS-QL, … RSP-QL), and continuous reasoning (LARS, STARQL, …) tasks ▸ The ordered nature of data streams and the possibility to forget old enough information allow to optimise continuous querying (C-SPARQL Engine, CQELS, MorphStream, … RSP Engine) and continuous reasoning (IMaRS, RDFox, StreamRule, ETALIS…) tasks so to provide reactive answers ▸ Semantic Web and Machine Learning technologies can be jointly employed to cope with the noisy and incomplete nature of data streams Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 21. Traditional STREAM REASONING THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED TRADITIONAL APPROACH Data “in-motion” Data “in-motion” Registered analysis Insights “in-motion” Data put “at-rest” in DWH Analysis Analysis Insight PANOPTIQUE APPROACH Ontology + Mappings Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 22. Traditional Stream Reasoning STREAM REASONING THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED TRADITIONAL APPROACH Data “in-motion” Data “in-motion” Registered analysis Insights “in-motion” Data put “at-rest” in DWH Analysis Analysis Insight PANOPTIQUE APPROACH Ontology + Mappings Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 23. STREAM REASONING (MY) APPLICATIONS BOTTARI Winner of 
 Semantic Web Challenge 2011 URBAN BIG DATA SCIENCE Winner of IBM faculty award 2013
 Funded by 8 EIT Digital yearly grants Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 24. STREAM REASONING URBAN BIG DATA SCIENCE: CITYSENSING PROJECT
  • 25. STREAM REASONING URBAN BIG DATA SCIENCE: CROWDINSIGHTS PROJECT October July 1000 Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 26. STREAM REASONING PRODUCTS: I STARTED UP Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 27. STREAM REASONING PRODUCTS: I STARTED UP Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 28. STREAM REASONING PRODUCTS: I STARTED UP Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 29. STREAM REASONING STREAM REASONING VS. REQUIREMENTS Requirement Stream Reasoning massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models not specifically treated so far treated but not resolved universally addressed by all studies Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 30. STREAM REASONING NOW WHAT? ▸ Focus on languages and abstractions able to easily capture user needs ▸ Analytic queries ▸ Which electricity-producing turbine has sensor readings similar 
 (i.e., Pearson correlated by at least 0.75) to any turbine that subsequently had a critical failure in the past year? ▸ Advance analytics (Machine Learning) tasks ▸ Where am I likely going to run into a traffic jam during my commute tonight and how long will it take, given current weather and traffic conditions? ▸ … many more … Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 31. ▸ Find the sweet-spot between scalability and expressive semantics ▸ the data access layers are clear (enough) ▸ … but, what kind of reasoning should we put at the top? ▸ Rule language? Answer set programming? Temporal logic? STREAM REASONING NOW WHAT? Complexity Raw Stream Processing Semantic Streams DL-Lite ???Abstraction Selection Interpretation Reasoning Re-writing Mapping Change Frequency PTIME NEXPTIME 104 Hz 1 Hz Complexity vs. Dynamics AC0 Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 32. STREAM REASONING NOW WHAT? ▸ Used semantics to model more than the data access ▸ Data are imperfect, get over it!
  • 33. STREAM REASONING ARE YOU INTERESTED TO LEARN MORE? ▸ the official stream reasoning community web site ▸ http://streamreasoning.org/ ▸ the RDF Stream Processing W3C community ▸ https://www.w3.org/community/rsp/ ▸ my personal pages ▸ http://emanueledellavalle.org/ + twitter: @manudellavalle ▸ my company page ▸ http://fluxedo.com/en/ Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  • 34. STREAM REASONING THANK YOU! ANY QUESTION? Emanuele Della Valle
 Politecnico di Milano
 http://emanueledellavalle.org
 @manudellavalle Oslo, Norway - 15.6.2017