Gen AI in Business - Global Trends Report 2024.pdf
StreamSight: A Query-Driven Framework Extending Streaming IoT Analytics to the Fog Continuum
1. 3/31/22 1
Demetris Trihinas
trihinas.d@unic.ac.cy
1
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
StreamSight
A Query-Driven Framework Extending
Streaming IoT Analytics to the Fog Continuum
Dr. Demetris Trihinas
Department of Computer Science
ailab @ University of Nicosia
trihinas.d@unic.ac.cy
2. 3/31/22 2
Demetris Trihinas
trihinas.d@unic.ac.cy
2
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
“Designing and developing scalable and self-adaptive tools for data
management, exploration and visualization”
trihinas.d@unic.ac.cy http:///dtrihinas.info dtrihinas
Dr. Demetris Trihinas
Lecturer at University of Nicosia
Artificial Intelligence Laboratory (AILab)
Open and trusted fog computing
platform that facilitates the
deployment of scalable and
heterogeneous IoT services
Enabling power-efficient Machine
Learning and its applications to
drone technology for handling
time-critical missions
Bridging the early diagnosis and
treatment gap of brain diseases
via smart, connected, proactive
and evidence-based technology
3. 3/31/22 3
Demetris Trihinas
trihinas.d@unic.ac.cy
3
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Distributed Data Processing Engines
• Big data processing engines are contributing to the democratization of analytics
by hiding the complexity for:
• M2M communication and syncing.
• Resource management.
• Task scheduling and supervision for analytic jobs.
• Fault tolerance for both the infrastructure and execution state.
• Monitoring and logging.
• ...
5. 3/31/22 5
Demetris Trihinas
trihinas.d@unic.ac.cy
5
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
• Spark-SQL and Structure-Steaming leveling the game… but..
Challenge: Steep Learning Curve
...
...
Compute the mean of a
metric using a 60s
sliding window
• Unlike SQL, there is a difficulty for IoT operators/Data Scientists to issue ad-hoc
queries -> requires knowledge of underlying engine programming model.
6. 3/31/22 6
Demetris Trihinas
trihinas.d@unic.ac.cy
6
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Challenge: Analytics Governance Lock-in
• Analytics landscape still fairly open and non-dominant
• Switching big data framework requires massive re-coding
• Apache Beam (former Google DataFlow) and
Summingbird towards right direction…
• But…
https://sigmodrecord.org/2020/02/12/the-seattle-report-on-database-research/
7. 3/31/22 8
Demetris Trihinas
trihinas.d@unic.ac.cy
8
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Challenge: Fog-Aware Queries?
The “Edge”
SpO2
HR
...
motion…
temp
pollutants
…
Less powerful nodes
Network bandwidth far from uniform… many uncertainties
Reduce unnecessary computations and data movement
air quality
The “Fog”
physical and/or network distance
Less load on
centralized services
LAN/WAN
(one hop away)
Internet
8. 3/31/22 9
Demetris Trihinas
trihinas.d@unic.ac.cy
9
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
The StreamSight Analytics Framework for IoT
SQL-like query
model for streaming
analytics with fog
optimization hints
Big data engine
agnostic query plan
Compilers for
multiple big data
engines
StreamSight: A Query-Driven Framework for Streaming Analytics in Edge Computing. Z. Georgiou, M. Symeonides, D. Trihinas, G. Pallis and M. Dikaiakos, IEEE/ACM UCC, 2018.
Query-Driven Descriptive Analytics for IoT and Edge Computing. M. Symeonides, D. Trihinas, Z. Georgiou, G. Pallis and M. Dikaiakos, IEEE IC2E, 2019.
9. 3/31/22 10
Demetris Trihinas
trihinas.d@unic.ac.cy
10
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
StreamSight Query Model
Queries are
applied on metric
streams with the
intent to derive
insights
Insights can be reused-
transformed-composed
with other metric
streams to create new
insights
Query Model
• Descriptive
statistics
• Filtering
• Transformations
• Windowing
• Grouping
• Sampling
• Query Prioritization
• Outlier Detection
• Operator Placement
• Job Scheduling Hints
• …
10. 3/31/22 11
Demetris Trihinas
trihinas.d@unic.ac.cy
11
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
COMPUTE bus_delay
WHEN > ( RUNNING_MEAN(bus_delay) + 3 * RUNNING_SDEV(bus_delay) )
BY city_segment EVERY 5 SECONDS;
COMPUTE
ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
BY city_segment EVERY 5 SECONDS
Examples Queries
• Window Operations: several aggregations (sum, count, sdev, median, percentile, etc)
• Filter Composition:
Metric of interest Window length
Aggregate
Updating
Interval
Group by key
for multivariate
data
Apache Spark
15 Ops
Apache Spark
41 Ops
Filter predicate
11. 3/31/22 12
Demetris Trihinas
trihinas.d@unic.ac.cy
12
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Query Parser and Validation
• Query syntax mapped to an Abstract Syntax Tree (AST).
• Syntactic correctness validation.
• Independent of underlying engine.
12. 3/31/22 13
Demetris Trihinas
trihinas.d@unic.ac.cy
13
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
AST Optimization
• Naive AST... extremely inefficient, ignore geo-distributed nature of IoT.
• Unnecessary intermediate re-computations
• Increased data movement
Cache and broadcast
across worker nodes
expressions, composites
and results to reduce
unnecessary re-
computations
Intermediate results
can be shared among
queries
13. 3/31/22 14
Demetris Trihinas
trihinas.d@unic.ac.cy
14
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Other (User-Annotated) Optimizations…
COMPUTE MAX(taxis_fare_amount, 60 MINUTES)
BY city_segment EVERY 1 MINUTES
WITH SALIENCE 1
Query Prioritization
On high-load influx
critical queries are
not delayed
COMPUTE
ARITHMETIC_MEAN(taxi_passengers, 10 MINUTES)
EVERY 30 SECONDS
WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95
Error upper bound Confidence Interval
Query execution
with bounded error
guarantees for
sampling
Sampling
Low-Cost Adaptive Monitoring Techniques for the Internet of Things. D. Trihinas, G. Pallis and M. Dikaiakos, IEEE Trans. On Services Computing, 2018.
14. 3/31/22 15
Demetris Trihinas
trihinas.d@unic.ac.cy
15
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Other Optimizations…
• Dedicated execution on specific nodes
• Job optimization strategies
16. 3/31/22 17
Demetris Trihinas
trihinas.d@unic.ac.cy
17
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Performance Evaluation
• Dublin Smart City Bus Network
• 968 Buses (Jan 2014), 16 metrics/record incl. bus_id, bus_delay, city_segment
• Used 7 insights introduced from the examples
16 Edge servers
● 1 vCPU, 1GB MEM, 2↑ 16↓ Mbps
Evaluation Metric
● Batch Processing Time
Unstable
System
Stable
System
x1.4 speedup over baseline Spark StreamSight+Samling x4.3 over baseline
17. 3/31/22 18
Demetris Trihinas
trihinas.d@unic.ac.cy
18
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
Performance Evaluation (reusing results)
• Dublin Bus Workload
• Average Processing Time ( Fixed Input rate 700 req/s )
StreamSight
DOES NOT incur a
performance overhead
Baseline failed
18. 3/31/22 19
Demetris Trihinas
trihinas.d@unic.ac.cy
19
Workshop: Processing Data in the Fog – Aristotle University, GR – Apr. 4, 2022
Department of
Computer Science
StreamSight
A Query-Driven Framework Extending
Streaming IoT Analytics to the Fog Continuum
Thank You!