Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
* Plug in the WSO2 Analytics Platform to some common business use cases
* Showcase the numerous capabilities of the platform
* Demonstrate how to collect data, analyze, predict and communicate effectively
* Demonstrate how it can analyze integration, security and IoT scenarios
Stick around till the end and you will walk away with the necessary skills to create a winning data strategy for your organization to stay ahead of its competition.
WSO2 Analytics Platform - The one stop shop for all your data needs
1. WSO2 Analytics Platform: The One Stop
Shop for All Your Data Needs
Sriskandarajah Suhothayan
Associate Director/Architect, WSO2
Anjana Fernando
Senior Technical Lead, WSO2
2. WSO2 Analytics Platform
WSO2 Analytics Platform uniquely combines simultaneous
real-time, interactive, batch with predictive analytics to turn
data from IoT, mobile and Web apps into actionable insights
4. WSO2 Data Analytics Server
• Fully-open source solution with the ability to build systems and applications
that collect and analyze both realtime and persisted data and communicate
the results.
• Part of WSO2 Big Data Analytics Platform
• High performance data capture framework
• Highly available and scalable by design
• Pre-built Data Agents for WSO2 products
5. Case Study : Smart Home
• DEBS (Distributed Event Based Systems) is a premier academic
conference, which post yearly event processing challenge (http:
//www.cse.iitb.ac.in/debs2014/?page_id=42)
• Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion
events
• We posted fastest single node solution measured (400K events/sec)
and close to one million distributed throughput.
• WSO2 CEP based solution is one of the four finalists (with Dresden
University of Technology, Fraunhofer Institute, and Imperial College
London)
• Only generic solution to become a finalist
6. a
Experian delivers a digital marketing platform, where CEP plays a key role to analyze in real-time
customers behavior and offer targeted promotions. CEP was chosen after careful analysis, primarily for
its openness, its open source nature, the fact support is driven by engineers and the availability of a
complete middleware, integrated with CEP, for additional use cases.
Eurecat is the Catalunya innovation center (in Spain) - Using CEP to analyze data from iBeacons
deployed within department stores to offer instant rebates to user or send them help if it detected that
they seem “stuck” in the shop area. They chose WSO2 due to real time processing, the variety of IoT
connectors available as well as the extensible framework and the rich configuration language. They
also use WSO2 ESB in conjunction with WSO2 CEP.
Pacific Controls is an innovative company delivering an IoT platform of platforms: Galaxy 2021. The
platform allows to manage all kinds of devices within a building and take automated decisions such as
moving an elevator or starting the air conditioning based on certain conditions. Within Galaxy2021,
CEP is used for monitoring alarms and specific conditions.Pacific Controls also uses other products
from the WSO2 platform, such as WSO2 ESB and Identity..
A leading airline uses CEP to enhance customer experience by calculating the average time to reach
their boarding gate (going through security, walking, etc.). They also want to track the time it takes to
clean a plane, in order to better streamline the boarding process and notify both the airline and
customers about potential delays. They evaluated WSO2 CEP first as they were already using our
platform and decided to use it as it addressed all their requirements.
Customer Stories
7. Healthcare Data Monitoring
• Allows to search/visualize/analyze healthcare records (HL7) across 20 hospitals
in Italy
• Used in combination with WSO2 ESB
• Custom toolbox tailored to customer’s requirement (to replace existing system)
9. Data Processing Pipeline
Collect Data
• Define scheme for
data
• Send events to batch
and/or Real time
pipeline
•Publish events
Analyze
•Spark SQL for batch
analytics
•Siddhi Query Language
for real time analytics
•Predictive models for
Machine Learning.
Communicate
•Alerts
•Dashboards
•API
11. Data Model
Data published conforming to a strongly typed data stream
{
'name': 'stream.name',
'version': '1.0.0',
'nickName': 'stream nick name',
'description': 'description of the stream',
'metaData':[
{'name':'meta_data_1','type':'STRING'},
],
'correlationData':[
{'name':'correlation_data_1','type':'STRING'}
],
'payloadData':[
{'name':'payload_data_1','type':'BOOL'},
{'name':'payload_data_2','type':'LONG'}
]
}
12. Data Persistence
• Data Abstraction Layer to enable pluggable data connectors
– RDBMS, Cassandra, HBase, custom..
• Analytics Tables
– The data persistence entity in WSO2 Data Analytics Server
– Provides a backend data source agnostic way of storing and retrieving
data
– Allows applications to be written in a way, that it does not depend on a
specific data source, e.g. JDBC (RDBMS), Cassandra APIs etc..
– WSO2 DAS gives a standard REST API in accessing the Analytics Tables
13. Data Persistence
• Analytics Record Stores
– An Analytics Record Store, stores a specific set of Analytics Tables
– Event persistence can configure which Analytics Record Store to be used for
storing incoming events
– Single Analytics Table namespace, the target record store only given at the time
of table creation
– Useful in creating Analytics Tables where data will be stored in multiple target
databases
15. Interactive Analysis
• Full text data indexing support
powered by Apache Lucene
• Drilldown search support
• Distributed data indexing
– Designed to support scalability
• Near real time data indexing and
retrieval
– Data indexed immediately as
received
17. Activity Monitoring
• Correlate the messages collected based on the activity_id in the
metadata of the event
• Trace the transaction path where the events could be in different
tables using lucene queries
20. Batch Analytics
● Powered by Apache Spark up to 30x higher performance than Hadoop
● Parallel, distributed with optimized in-memory processing
● Scalable script-based analytics written using an easy-to-learn, SQL-like
query language powered by Spark SQL
● Interactive built in web interface for ad-hoc query execution
● HA/FO supported scheduled query script execution
● Run Spark on a single node, Spark embedded Carbon server cluster or
connect to external Spark cluster
23. ● Idea is to given the “Overall idea” in a glance (e.
g. car dashboard)
● Support for personalization, you can build
your own dashboard.
● Also the entry point for Drill down
● How to build?
○ Dashboard via Google Gadget and
content via HTML5 + Javascript
○ Use WSO2 User Engagement Server to
build a dashboard (or JSP/PHP)
○ Use charting libraries like Vega or D3
Communicate: Dashboards
24. ● Start with data in tabular format
● Map each column to dimension in your plot like X,Y, color,
point size, etc
● Also do drill-downs
● Create a chart with few clicks
Gadget Generation Wizard
26. What’s Realtime Analytics?...
Realtime Analytics in Complex Event Processor
→
• Gather data from multiple sources
• Correlate data streams over time
• Find interesting occurrences
• And Notify
• All in Realtime !
27. Market Recognition
• Named as a Strong Performer in The Forrester Wave™: Big Data Streaming
Analytics, Q1 2016.
• Highest score possible in 'Acquisition and Pricing' criteria, and among
second-highest scores in 'Ability to execute' criteria
• The Forrester Report notes…..
“WSO2 is an open source middleware provider that includes a full spectrum of architected-as-
one components such as application servers, message brokers, enterprise service bus, and many
others.
Its streaming analytics solution follows the complex event processor architectural approach, so it
provides very low-latency analytics. Enterprises that already use WSO2 middleware can add CEP
seamlessly. Enterprises looking for a full middleware stack that includes streaming analytics will
find a place for WSO2 on their shortlist as well.”
30. Realtime Execution
• Process in streaming fashion
(one event at a time)
• Execution logic written as Execution Plans
• Execution Plan
– An isolated logical execution unit
– Includes a set of queries, and relates to multiple input and output
event streams
– Executed using dedicated WSO2 Siddhi engine
34. define stream SoftDrinkSales
(region string, brand string, quantity int,
price double);
from SoftDrinkSales
select brand, avg(price*quantity) as avgCost,‘USD’ as currency
insert into AvgCostStream
from AvgCostStream
select brand, toEuro(avgCost) as avgCost,‘EURO’ as currency
insert into OutputStream ;
Enriching Streams
Using Functions
Siddhi Query ...
35. define stream SoftDrinkSales
(region string, brand string, quantity int,
price double);
from SoftDrinkSales[region == ‘USA’ and quantity > 99]
select brand, price, quantity
insert into WholeSales ;
from SoftDrinkSales#window.time(1 hour)
select region, brand, avg(quantity) as avgQuantity
group by region, brand
insert into LastHourSales ;
Filtering
Aggregation over 1 hour
Other supported window types:
timeBatch(), length(), lengthBatch(), etc.
Siddhi Query (Filter & Window) ...
36. define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 10] ) ->
a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
Siddhi Query (Pattern) ...
37. define stream StockStream (symbol string, price double, volume int);
partition by (symbol of StockStream)
begin
from t1=StockStream,
t2=StockStream [(t2[last] is null and t1.price < price) or
(t2[last].price < price)]+
within 5 min
select t1.price as initialPrice, t2[last].price as finalPrice,t1.symbol
insert into IncreaingMyStockPriceStream
end;
Siddhi Query (Trends & Partition)...
38. define table CardUserTable (name string, cardNum long) ;
@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name =
‘UserTable’, caching.algorithm’=‘LRU’)
define table CardUserTable (name string, cardNum long)
Cache types supported
• Basic: A size-based algorithm based on FIFO.
• LRU (Least Recently Used): The least recently used event is dropped
when cache is full.
• LFU (Least Frequently Used): The least frequently used event is dropped
when cache is full.
Siddhi Query (Table) ...
Supported for RDBMS, In-
Memory, Analytics Table,
Hazelcast
39. define stream Purchase (price double, cardNo long, place string);
define stream CardUserStream (name string, cardNo long) ;
define table CardUserTable (name string, cardNum long) ;
from Purchase#window.length(1) join CardUserTable
on Purchase.cardNo == CardUserTable.cardNum
select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price
insert into PurchaseUserStream ;
from CardUserStream
select name, cardNo as cardNum
update CardUserTable
on CardUserTable.name == name ;
Similarly insert into and
delete are also supported!
Siddhi Query (Table) ...
40. • Function extension
• Aggregator extension
• Window extension
• Stream Processor extension
define stream SalesStream (brand string, price double, currency string);
from SalesStream
select brand, custom:toUSD(price, currency) as priceInUSD
insert into OutputStream ;
Referred with namespaces
Siddhi Query (Extension) ...
41. • geo: Geographical processing
• nlp: Natural language Processing (with Stanford NLP)
• ml: Running machine learning models of WSO2 Machine Lerner
• pmml: Running PMML models learnt by R
• timeseries: Regression and time series
• math: Mathematical operations
• str: String operations
• regex: Regular expression
• ...
Siddhi Extensions
43. Realtime Dashboard
• Dashboard
– Google Gadget
– HTML5 + javascripts
• Support gadget generation
– Using D3 and Vega
• Gather data for UI from
– Websockets
– Polling
• Support Custom Gadgets
and Dashboards
45. What’s Predictive Analytics?...
Predictive Analytics in Machine Learner
→
• Extract, pre-process, and explore data
• Create models, tune algorithms and make predictions
• Integrate for better intelligence
46. Predictive Analytics
• Guided UI to build machine
learning models via
– Apache Spark MLlib
– H2O.ai (for deep learning
algorithms)
– R and export them as PMML
• Run models using CEP, DAS and ESB
• Run R Scripts, Regression and Anomaly Detection on Realtime
47. Terminology
• Input data must be in a tabular format
• Each row is called a data point
• Each column is called a feature
• Value you are going to predict is called the “response variable”
59. WSO2 CEP (Realtime) Scalability
Distributed Realtime = Siddhi +
Advantages over Apache Storm
• No need to write Java code (Supports SQL like query language)
• No need to start from basic principles (Supports high level language)
• Adoption for change is fast
• Govern artifacts using Toolboxes
• etc ...