Building Streaming Applications with Streaming SQL

Building Streaming Applications with
Streaming SQL
Technical Lead, WSO2
Mohanadarshan Vivekanandalingam
Lead Solutions Engineer, WSO2
Vanjikumaran Sivajothy

ALL ABOUT
STREAMING APPS &
WSO2 STREAM PROCESSOR
Agenda

Streaming Application
An Application that provides
analytical operators to
orchestrate data ﬂow, calculate
analytics, and detect patterns on
event data from multiple,
disparate live data sources to
allow developers to build
applications that sense, think, and
act in real-time.
- Forrester

Challenges !
Write Code
Complex Deployment
(5-6 nodes)
Inability to Change
Fast

Solutions
- Streaming SQL + graphical
editor
- 2 node minimum HA
deployment (scale beyond
with distributed deployment)
- Templated SQL scripts + drag
& drop UI

● Lightweight, lean & cloud native
● Easy to learn Streaming SQL
● High performance analytics with just 2 nodes (HA)
● Native support for streaming Machine Learning
● Long term aggregations without batch analytics
● Highly scalable deployment with exactly-once processing
● Tools for development and monitoring
● Tools for business users to write their own rules
Overview of WSO2 Stream Processor

WSO2 Stream Processor
• Editor/Studio - Developer environment
• Worker/Resource - Resource node
• Dashboard
– Portal - Business dashboard
– Business Rules Manager - Management
console for business users
– Status Dashboard - Monitoring dashboard
• Manager - Job manager for distributed
processing
Proﬁles

Use Case:
Online Shopping Application

Use Case
Sweet Factory Management

Use Case
• Monitor supply, production and sales
• Optimize resource utilization
• Detect and alert failures
• Predict demand
• Manage processing rules online
• Visualize real-time performance
Sweet Factory Management

Phases in Stream Processing
Receive
- Define Streams
- Configure
- Event Sources
- Event Mappings
- Define Mappings
Analyze
- Stream Processing
- Long-term Incremental
Processing
- Complex Event Processing
- Machine Learning
- Storage Integration
Report
- Define Streams
- Configure
- Event Sinks
- Event Mappings
- Publish results
- View results in dashboard

Streaming Processing
With WSO2 Stream Processor
Siddhi Streaming App
- Process events in streaming manner
- Isolated unit with set of queries, input and
output streams
- SQL Like Query Language
from Sales#window.time(1 hour)
select region, brand, avg(quantity) as AvgQuantity
group by region, brand
insert into LastHourSales ;
Stream
Processor
Siddhi App
{ Siddhi }
Input Streams Output Streams
Filter Aggregate
JoinTransform
Pattern
Siddhi Extensions

Developer Studio
Supports both Drag n Drop
& Source Editor
Editor
Debugger
Simulation
Testing

Stream Processor Studio
• Writing Siddhi applications
– Syntax highlighting
– Auto completion
– Error reporting
– Documentation support
• Debugging Siddhi apps
– Inspect events
– Inspect query states
Developer Environment

Stream Processor Studio
• Testing Siddhi apps via Event Simulation
– Send Event by Event
– Simulate Random Data
– Simulate via CSV ﬁle
– Simulate from Database
• Support for running and testing on Python
– as PySiddhi
• IDE Tools
– Intellij Idea Plugin
Developer Environment

Create a Siddhi App
@app:name(‘Sweet-Factory-Analytics’)
Name the Siddhi application

Deﬁne Input Streams
define stream RawMaterialStream(name string, amount double);
define stream SweetProductionStream (name string, amount double);
Deﬁne input event streams

Deﬁne Event Sources
@source(type = ‘http’, @map(type = ‘json’))
Consume and publish events via
MQTT, HTTP, TCP, Kafka, JMS, RabbitMQ, etc.

Default Mapping
Using default mapping
{
"event": {
"name": "sugar",
"amount": 75.5
}
}
Supported Mapping
Text, XML, JSON, Binary, KeyValue,
WSO2Event, etc.

Custom Mapping
define stream RawMaterialStream
(name string, amount double);
@source(type = ‘http’, @map(type = ‘json’,
@attributes( name = ‘$.item.id’,
amount = ‘$.item.amount’)))
Using custom mapping
{
"item": {
"id": "toffees",
"amount": 30.0
}
}

Deﬁne Event Sinks
To Log Events
@sink(type =‘log’, level = ‘info’)
@source(type = ‘http’, @map(type = ‘json’,
@attributes( name = ‘$.item.id’,
amount = ‘$.item.amount’)))
@sink(type =‘log’, level = ‘info’)
Logger sink
to log events on console

Use Case 1 :
Production at each factory should not
reduce below 5000 units per hour !

1.1 Monitor and Identify events that
indicate low production

Total Amount Produced
from SweetProductionStream
select sum(amount) as hourlyTotal
insert into LowProducitonAlertStream ;
Calculate total amount
produced forever

Total Amount Produced in the Last Hour
from SweetProductionStream#window.time(1 hour)
select sum(amount) as hourlyTotal
insert into LowProducitonAlertStream ; Calculate total amount
produced for last hour

Amount Produced Per Product
select name, sum(amount) as hourlyTotal
group by name
Calculate total amount
produced for each product

Identify Low Production Rates
select name, sum(amount) as hourlyTotal
group by name
having hourlyTotal < 5000
Filter events where produced
amount is less than 5000

Consider Working Hours for Calculation
select name, sum(amount) as hourlyTotal,
time:extract(currentTimeMillis(), 'HOUR') as currentHour
group by name
having hourlyTotal < 5000 and
currentHour > 9 and currentHour < 17
Use functions to extract the
hour of event arrival time

Rate Limit Low Production Alerts
select name, sum(amount) as hourlyTotal,
time:extract(currentTimeMillis(), 'HOUR') as currentHour
group by name
having hourlyTotal < 5000 and
currentHour > 9 and currentHour < 17
output last every 15 min
Send alerts every 15 minutes

1.2 Send alerts to factory managers
via email

Send Alerts via Email
@sink(type =‘email’, to=‘manager@sf.com’,
subject=‘Low Production of {{name}}!’,
@map (type=‘text’, @payload(“““
Hi Manager,
Production of {{name}} has gone down to {{hourlyTotal}}
in last hour!
From Sweet Factory”””)))
define stream LowProducitonAlertStream (name string, hourlyTotal double,
currentHour int);
Context sensitive
email

Use Case 2 :
Raw material storage at the factories
should be closely monitored

2.1 Store raw material shipment
details in a data store

Data Store Integration
● Allow to perform operations with the data store while
processing the events on the ﬂy.
Store, Retrieve, Remove and Modify
● Provides a REST endpoint to Query Data Store
● Query Optimizations using Primary and Indexing Keys
● Search ● Insert ● Delete ● Update ● Insert/Update

Store Raw Material Info
define table LatestShipmentDetailTable (name string, amount double);
In-memory table to
store last shipment of raw
material

Store Data
With Primary Key & Index
@primaryKey(‘name’)
@Index(‘amount’)
Support for Primary Key and
Index for fast data access

Store in External Data Store
@store(type=‘rdbms’, … )
Table backed by
RDBMS, MongoDB, HBase, Cassandra, Solr,
Hazelcast, etc.

Insert Events into Table
from RawMaterialStream
select name, amount
insert into LatestShipmentDetailTable ;
Insert into table from stream

Update-Insert Events into Table
select name, amount
update or insert into LatestShipmentDetailTable
on LatestShipmentDetailTable.name == name ;
Update or Insert into
the table with stream

2.2 Calculate hourly to yearly raw
material storage by each type

Streaming Data Summarization
Aggregations Over Long Time Periods
• Incremental Aggregation for every
– Seconds, Minutes, Hours, Days, …, Year
• Support for out-of-order event arrival
• Fast data retrieval from memory and disk for
real time update
Current Min
Current Hour
Sec
Min
Hour
0 - 1 - 5 ...
- 1
- 2 - 3 - 4 - 64 - 65 ...
- 2
- 124

Deﬁne Aggregation
define aggregation RawMaterialAggregation
select name, sum(amount) as totalAmount, avg(amount) as
averageAmount
group by name
aggregate every min … year
Calculate total and average amount
for each
min upto year

Deﬁne Aggregation ...
define aggregation RawMaterialAggregation
select name, sum(amount) as totalAmount, avg(amount) as
averageAmount
group by name
aggregate every min … year
Like Table Store Aggregation in
RDBMS, MongoDB, HBase, Cassandra, Solr, Hazelcast, etc.

Data Retrieval API
• Can perform data search on Data
Stores or pre-deﬁned Aggregations.
• Supports both REST and Java APIs

Retrieve Summarized Data
Perform REST Call
curl -X POST https://localhost:9443/stores/query
-H "content-type: application/json"
-u "admin:admin"
-d '{"appName" : "Sweet-Factory-Analytics-3",
"query" : "from RawMaterialAggregation
on name == 'caramel'
within '2018-**-** **:**:**'
per 'minutes'
select name, totalAmount, averageAmount ;"
}'

2.3 Visualize summary results in a
dashboard

Portal
Dashboard & Widgets for Business Users
• Generate dashboard and widgets
• Fine grained permissions
– Dashboard level
– Widget level
– Data level
• Localization support
• Inter widget communication
• Shareable dashboards with widget state persistence

Use Case 3 :
Warehouse managers should be
alerted if there will be a shortage of
raw material for future production
cycles

3.1 Check if the current raw material
input rate is enough for production

Join Raw Material with Production Input
define stream RawMaterialStream (name string, amount double);
define stream ProductionInputStream (name string, amount double);
from ProductionInputStream#window.time(1 hour) as p
join RawMaterialStream#window.time(1 hour) as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial,
sum(p.amount) as totalConsumption
group by r.name
having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5
insert into RawMaterialInputRateAlertStream ;
Identify 5% increase by
joining two streams

Join with External Window
define window RawMaterialWindow (name string, amount double) time(1 hour);
from ProductionInputStream#window.time(1 hour) as p
join RawMaterialWindow as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial,
sum(p.amount) as totalConsumption
group by r.name
having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5
Joining a stream with a
deﬁned window

3.2 Predict the future raw material
requirement

Real-time Predictions
Using Machine Learning
• Use pre-created machine learning
models and perform predictions.
– PMML, TensorFlow, etc
• Streaming Machine Learning
– Clustering, Classiﬁcation,
Regression
– Markov Models, Anomaly
detection, etc...

Using Pre-built PMML Model
define stream ProductionInputStream
(name string, currentHourAmmount double,
previousHourAmount double);
from ProductionInputStream#pmml:predict(‘file/model.pmml’, name,
previousHourAmount, currentHourAmmount )
select name, nextHourAmount, getEventTime() as currentTime
insert into PredictedProdInputStream ;
Predict required raw materials using a
static model

Online Machine Learning
define stream ProductionInputStream (currentHourAmount double,
previousHourAmount double );
define stream ProductionInputResultsStream ( currentHourAmount double,
previousHourAmount double, nextHourAmount double );
from ProductionInputResultsStream#streamingml:updateAMRulesRegressor
(currentHourAmount, previousHourAmount, nextHourAmount )
select *
insert into TrainOutputStream;
from ProductionInputStream#streamingml:AMRulesRegressor
(currentHourAmount, previousHourAmount )
select currentHourAmount, previousHourAmount, prediction as nextHourAmount
insert into PredictedProdInputStream;
Predict required raw materials while
learning in a streaming manner.

3.3 Check predicted raw material
availability with warehouse stocks
and alert if insufficient

Predict & Alert
define window RawMaterialWindow (name string, amount double) time(1 hour);
define stream ProductionInputResultsStream ( currentHourAmount double,
previousHourAmount double, nextHourAmount double );
from ProductionInputResultsStream#streamingml:updateAMRulesRegressor
(currentHourAmount, previousHourAmount, nextHourAmount )
select *
insert into TrainOutputStream;
from PredictedProdInputStream as p join RawMaterialWindow as r
on r.name == p.name
select r.name, p.predictedAmount, sum(r.amount) as totalRawMaterial
having totalRawMaterial < totalConsumption

Use Case 4 :
Factory Managers should be alerted if
production does not start within 15
min from raw material arrival

Non-occurrence through Patterns
define stream RawMaterialStream (name string, amount double);
from every (e1 = RawMaterialStream)
-> not ProductionInputStream[name == e1.name and
amount == e1.amount] for 15 min
select e1.name, e1.amount
insert into ProductionStartDelayed ; Identity non-occurrence Pattern

Use Case 5 :
Alert factory managers if rate of
production continuously decreases for
X time period

5.1 Identify production trends over a
time period

Identify Trends
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream#window.timeBatch(1 min)
select name, sum(amount) as amount, currentTimeMillis() as timestamp
group by name
insert into LastMinProdStream;
partition with (name of LastMinProdStream)
begin
from every e1=LastMinProdStream,
e2=LastMinProdStream[timestamp - e1.timestamp < 10 * 60000
and e1.amount > amount]*,
e3=LastMinProdStream[timestamp - e1.timestamp > 10 * 60000
and e2[last].amount > amount]
select e1.name, e1.amount as initalamout, e3.amount as finalAmount
insert into ContinousProdReductionStream ;
end;
Identify decreasing trends
for 10 mins

5.2 Make this rule conﬁgurable

Business Rules Manager
• Hide Siddhi app creation complexity to business users
• Build rules via a simple web-based UI
– From scratch
Build custom ﬁlters to event streams
– From a template
Build rules from developer created template
Dashboard for Rule Management

Template as Business Rules
define stream SweetProductionStream(...);
…
partition by (name of LastMinProdStream)
begin
from every e1=LastMinProdStream,
e2=LastMinProdStream[timestamp - e1.timestamp < $TimeInMin * 60000
and e1.amount > amount]*,
e3=LastMinProdStream[timestamp - e1.timestamp > $TimeInMin * 60000
and e2[last].amount > amount]
select e1.name, e1.amount as initalamout, e3.amount as finalAmount
insert into ContinousProdReductionStream ;
end;
Identify decreasing trend for
X mins

Minimum HA with 2 Nodes
Stream Processor
Stream Processor
• High Performance
– Process around 100k
events/sec
– Just 2 nodes
– While most others need 5+
• Zero Downtime
• Zero Event Loss
• Simple deployment with RDBMS
– No zookeeper, kafka, etc
• Multi Data Center Support
Event Sources
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Event
Store

• Exactly-once processing
• Fault tolerance
• Highly scalable
• No back pressure
• Distributed development conﬁgurations via annotations
• Pluggable distribution options (YARN, K8, etc.)
Distributed Deployment

Distributed Deployment with Kafka
Event
Source
Event
Sink
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Kafka Topic
Kafka Topic
Kafka Topic
Kafka
Topic
Kafka
Topic

Siddhi App for Distributed Deployment
@source(type = ‘kafka’,..., @map(type = ‘json’))
define stream ProductionStream (name string, amount double, factoryId int);
@dist(parallel = ‘4’, execGroup = ‘gp1’)
from ProductionStream[amount > 100]
select *
insert into HighProductionStream ;
@dist(parallel = ‘2’, execGroup = ‘gp2’)
partition with (factoryId of HighProductionStream)
Begin
from HighProductionStream#window.timeBatch(1 min)
select factoryId, sum(amount) as amount
group by factoryId
insert into ProdRateStream ;
end;
Filter
Source
FilterFilterFilter
PartitionPartition

Analytics Extension Store
https://store.wso2.com/store/assets/analyticsextension/list

Status Dashboard
• Understand system performance via
– Throughput
– Latency
– CPU, memory utilizations
• Monitor in various scales
– Node Level
– Siddhi App Level
– Siddhi Query Level
Monitor resource nodes and Siddhi apps

Inbuilt Support for
Analytics Use Cases

• Finance and Banking
• Retail
• Location
• Operational
• Smart Energy
• Social Media
• System and Network
• Healthcare
Available Options

Running Siddhi on the Edge
● Lightweight and lean
● OOTB support for consuming events from Android
sensors
● Support for Python
○ https://github.com/wso2/PySiddhi/
In Android & Raspberry Pi

WSO2 Stream Processor 4.2.0 Release.
Distribution URL: https://wso2.com/analytics
Download, Try & Comment

Building Streaming Applications with Streaming SQL

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Building Streaming Applications with Streaming SQL

Similar a Building Streaming Applications with Streaming SQL (20)

Último

Último (20)

Building Streaming Applications with Streaming SQL