The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business. This session explores how to enable digital transformation by building a data analytics platform.
4. Software that provides analytical operators to
orchestrate data flow, calculate analytics, and
detect patterns on event data from multiple,
disparate live data sources to allow developers
to build applications that sense, think, and act
in real time.
- Forester
Streaming Analytics
7. • Lightweight, lean & cloud native
• Easy to learn Streaming SQL
• Native support for streaming Machine Learning
• Long term aggregations without batch analytics
• High performance analytics with just 2 nodes (HA)
• Highly scalable deployment with exactly-once
processing
• Tools for development, monitoring and business users
Overview of WSO2 Stream Processor
8. Siddhi App
Single configuration
for Analytics!
Stream Processing
from Sales#window.time(1 hour)
select region, brand, avg(quantity) as AvgQuantity
group by region, brand
insert into LastHourSales ;
Stream
Processor
Siddhi App
{ Siddhi }
Input Streams Output Streams
Filter Aggregate
JoinTransform
Pattern
Siddhi Extensions
Streaming SQL
10. • To know what stream processing can do!
• To understand the difference between
– database applications & stream processing
• Where to use what?
• Best practices
Why Patterns ?
11. 1. Streaming data pre processing
2. Data store integration
3. Streaming data summarization
4. KPI analysis and alerts
5. Event correlation
6. Trend analysis
7. Real-time prediction
8. Streaming machine learning
Streaming Analytics Patterns
14. • Monitor Supply, Production & Sales
• Optimize resource utilization
• Detect and alert failures
• Predict demand
• Manage processing rules online
• Visualise realtime performance
Use Case : Sweet Factory Management
15. • Collect events from multiple sources
• Convert them to streams
• Filter events
• Add defaults to missing fields
• Change event stream structure
1. Streaming Data Pre Processing
Filtering, Add Defaults, and Projection
Filter
Transform
Process
Add
Defaults
16. 1. Streaming Data Pre Processing
@app:name(‘Sweet-Factory-Analytics’)
define stream SweetProductionStream(name string, amount double);
Define Stream
17. 1. Streaming Data Pre Processing
@app:name(‘Sweet-Factory-Analytics’)
@source(type = mqtt, …, @map(type = json, …))
define stream SweetProductionStream(name string, amount double);
Consume Events :
MQTT, HTTP, TCP, Kafka, JMS, RabitMQ, etc.
Map Events to Streams :
JSON, XML, Text, Binary, WSO2Event, KeyValue, etc.
18. 1. Streaming Data Pre Processing
@app:name(‘Sweet-Factory-Analytics’)
@source(type = mqtt, …, @map(type = json, …))
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream
select *
insert into LawCandyProdcutionStream ;
Write Query
19. 1. Streaming Data Pre Processing
@app:name(‘Sweet-Factory-Analytics’)
@source(type = mqtt, …, @map(type = json, …))
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream[amount < 100 and name == ‘candy’]
select *
insert into LawCandyProdcutionStream ;
Filter
20. 1. Streaming Data Pre Processing
@app:name(‘Sweet-Factory-Analytics’)
@source(type = mqtt, …, @map(type = json, …))
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream[amount < 100 and name == ‘candy’]
select name, (amount * 0.05) + 5 as cost, ‘GBP’ as currency
insert into LawCandyProdcutionCostStream ;
Transformation and Defaults
21. 1. Streaming Data Pre Processing
@app:name(‘Sweet-Factory-Analytics’)
@source(type = mqtt, …, @map(type = json, …))
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream[amount < 100 and name == ‘candy’]
select name, calculateCost(amount,name) as cost, ‘GBP’ as currency
insert into LawCandyProdcutionCostStream ;
Functions :
Inbuilt, Custom UDF or Siddhi Extensions
22. • Incoming events to perform
operations on data stores
• Optimizations with
Primary and Indexing Keys
2. Data Store Integration
Store, Retrieve and Modify
• Update
• Contains
• Search
• Insert
• Delete
23. 2. Data Store Integration
define stream SweetProductionStream(name string, amount double);
define table ProductionTable(id string, name string, amount double);
Define Table (In-Memory)
24. 2. Data Store Integration
define stream SweetProductionStream(name string, amount double);
@primaryKey(‘id’)
@Index(amount)
define table ProductionTable(id string, name string, amount double);
Index and Primary Keys
25. 2. Data Store Integration
define stream SweetProductionStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘id’)
@Index(amount)
define table ProductionTable(id string, name string, amount double);
Table Backed by:
RDBMS, MongoDB, HBase, Cassandra, Solr, Hazelcast, etc.
26. 2. Data Store Integration
define stream SweetProductionStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘id’)
@Index(amount)
define table ProductionTable(id string, name string, amount double);
from SweetProductionStream
select UUID() as id, name, amount
insert into ProductionTable; Insert into Table
27. 2. Data Store Integration
define stream SweetProductionStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘id’)
@Index(amount)
define table ProductionTable(id string, name string, amount double);
from SweetProductionStream
select UUID() as id, name, amount
update or insert into ProductionTable
set ProductionTable.amount = amount
on ProductionTable.name == name;
Update or Insert into Table
28. • Sum, Count, Min, Max, etc.
within the
– last 5 minutes
– last 20 events
3. Streaming Data Summarization
Aggregations Over Short Time Periods
29. 3. Streaming Data Summarization
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream#window.time(‘1 min’)
select name, sum(amount) as totalSweets, currentTimeMillis() as timestamp
group by name
insert into LastMinProdStream;
Windows Sliding and Batch for Time, Length, etc.
30. • Incremental Aggregation for every
– Seconds, Minutes, Hours, Days, …, Year
• Support for out-of-order event arrival
• Fast data retrieval from memory and disk for
realtime update
3. Streaming Data Summarization
Current Min
Aggregations Over Long Time Periods
Current Hour
Sec
Min
Hour
0 - 1 - 5 ...
- 1
- 2 - 3 - 4 - 64 - 65 ...
- 2
- 124
31. 3. Streaming Data Summarization
Aggregations Over Long Time Periods
define stream SweetProductionStream(name string, amount double);
define aggregation ProductionAggirgation
from SweetProductionStream
select name, sum(amount) as totalSweets, count(*) as noOfPacks
group by name
aggregate every seconds ... years ;
Predefined Aggregation
32. 3. Streaming Data Summarization
Aggregations Over Long Time Periods
from ProductionAggirgation
select name, totalSweets, noOfPacks
on name == ‘candy’
within 2017/01/01 2017/02/01
per day ;
Data Retrieval for Dashboards
33. • Generate dashboard and widgets
• Fine grained permissions
– Dashboard level
– Widget level
– Data level
• Localization support
• Inter widget communication
• Shareable dashboards with widget
state persistence
Dashboard for
Business Users
34. • Identify KPIs using
– Filter, ifThenElse, having, etc.
• Send alerts using Sinks
4. KPI Analysis and Alerts
Generate Alerts Based on KPIs
35. 4. KPI Analysis and Alerts
define stream LastMinProdStream (name string, totalSweets long,
timestamp long);
@sink(type=‘email’, to=‘manager@sf.com’
@map(type=‘text’,
@payload(‘‘‘
Low Production of {{name}} at factory {{factoryId}}.
’’’))
define stream LowProdAlertStream (name string, factoryId int, ...);
from LastMinProdStream[totalSweets < 5000]
insert into LowProdAlertStream;
Publishing with Mapping via :
Email, HTTP, TCP, Kafka, RabbitMQ, MQTT, etc.
36. • Identify complex patterns
– Followed by, non-occurrence, etc.
• Identify trends
– Peek, triple bottom, etc.
Event Correlation and Trend Analysis
Complex Event Processing
37. 5. Event Correlation
Patterns
define stream RawMaterialStream (name string, amount double);
define stream ProductionInputStream (name string, amount double);
from every (e1 = RawMaterialStream)
-> not ProductionInputStream[name == e1.name and
amount == e1.amount] for 15 min
select e1.name, e1.amount
insert into ProductionStartDelayed ;
Non-occurrence of an event detection
38. 6. Trend Analysis
Sequences
define stream LastMinProdStream (name string, totalSweets long,
timestamp long);
partition with (name of LastMinProdStream)
begin
from every e1=LastMinProdStream,
e2=LastMinProdStream[timestamp - e1.timestamp < 10
and e1.amount > amount]*,
e3=LastMinProdStream[timestamp - e1.timestamp > 10
and e2[last].amount > amount]
select e1.name, e1.amount as initialAmount, e3.amount as finalAmount
insert into ContinousProdReductionStream ;
end;
Identify decreasing
trend for 10 mins
40. define stream SugarSyrupDataStream (temperature double, density double);
from SugarSyrupDataStream#pmml:predict(“/home/user/ml.model”, “string”)
select *
Insert into PredictionStream ;
7. Real-Time Predictions
Run ML Models In Realtime
41. define stream SugarSyrupDataStream (temperature double, density double);
define stream SugarSyrupResultStream (temperature double, density double,
decision string);
from SugarSyrupResultStream
#streamingml:hoeffdingTreeTrain(‘Model’,temperature,density,decision)
...
from SugarSyrupDataStream
#streamingml:hoeffdingTreeClassifie(‘Model’,temperature,density)
...
8. Streaming Machine Learning
Continuous Learning and Feedback
Online Training !
Online Prediction !
47. • High Performance
– Process around 100k events/sec
– Just 2 nodes
– While most others need 5+
• Zero Downtime
• Zero Event Loss
• Simple deployment with RDBMS
– No zookeeper, kafka, etc
• Multi Data Center Support
Stream Processor
Stream Processor
Minimum HA With 2 Nodes
Event Sources
Event
Store
Dashboard
Notification
Invocation
Data Source
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
48. • Exactly-once processing
• Fault tolerance
• Highly scalable
• No backpressure
• Distributed development configurations via
annotations
• Plugable distribution options (YARN, K8, etc)
Distributed Deployment with Kafka
51. • Give your everything you need to build
Streaming Analytics
– Manage data streams
– Powerful Streaming SQL language
– Dashboards and more
• Can provide 100K+ events per second with two
node HA ( most alternatives need 5+ nodes) and
can scale more on top of Kafka
WSO2 Stream Processor
51
52. • We looked at what’s streaming analytics is ?
• Patterns of streaming analytics
• How WSO2 Stream Processor can help you implement
analytics patterns
• The business benefit of manageable rules and dashboards
• The ease of development, multi mode deployments and
monitoring with WSO2 Stream Processor
Summary
52