In this webinar, Srinath Perera, director of research at WSO2, will discuss
Big data landscape: concepts, use cases, and technologies
Real-time analytics with WSO2 CEP
Batch analytics with WSO2 BAM
Combining batch and real-time analytics
Introducing WSO2 Machine Learner
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Worlds
1. Introduction to Big
Data Analytics: Batch,
Real-Time, and the
Best of Both Worlds
Srinath Perera
Director, Research, WSO2 Inc.
Visiting Faculty, University of Moratuwa
Member, Apache Software Foundation
Research Scientist, Lanka Software Foundation
2.
3. What can We do with Big Data?
§ Optimize (World is inefficient)
o 30% food wasted farm to plate
o GE 1% initiative (http://goo.gl/eYC0QE )
- 1% saving in trains can save 2B/ year
- 1% in US healthcare is 20B/ year
- In contrast, Sri Lanka total exports 9B/ year.
§ Save lives
o Weather, Disease identification,
Personalized treatment
§ Technology advancement
o Most high tech research are done via
simulations
8. 8
Combined Power
§ Users can send
events to both BAM
and CEP via the same
APIs
§ CEP can combine
output from batch
Processing and data
from various storage
(e.g. databases) with
real-time processing
o e.g. Implementing Lambda
Architecture
11. WSO2
BAM
● Powered
by
Apache
Hadoop
with
management
and
queries
using
Apache
Hive
● Parallel,
distributed
processing
based
on
the
MapReduce
programming
model
● Runs
on
local
Hadoop
node
or
can
be
delegated
to
a
cluster
of
Hadoop
nodes
● Scalable
script-‐based
analyAcs
wriBen
using
an
easy-‐to-‐learn,
SQL-‐
like
query
language.
Analyzer
Engine
Hadoop
Cluster
Data Store
(Cassandra/
RDBMS)
12. 1
High Level Languages
§ For both batch and real-time, we provide
structured , SQL-like query languages.
o No Java programming is required
§ Lowers the adoption entry point
§ BAM
o Relies on Apache Hive
§ CEP
o Implemented though our own solution, Siddhi.
13. 1
Event
table:(Map
a
database
as
an
event
stream)
Filter:
(Process
single
transacAon)
Windows:(Track
a
window
of
events)
CEP Operators with Siddhi
§ define stream RequestStream ( correlationID string,
serviceID string,userID string, tear string,
requestTime long, ... ) ;
§ define table BlacklistedUserTable(userID string,time
long,requestCount long);
§ from RequestStream[tear==‘BRONZE’]#window.time(1 min)
§ select userID, requestTime as time,
count(correlationID) as requestCount
§ group by userID
§ having up requestCount > 5
§ insert into BlacklistedUserTable ;
14. 1
Smart Home
§ DEBS (Distributed Event Based Systems) is a
premier academic conference, which post
yearly event processing challenge (
http://www.cse.iitb.ac.in/debs2014/?
page_id=42)
§ Smart Home electricity data: 2000 sensors, 40
houses, 4 Billion events
§ We posted fastest single node solution
measured (400K events/sec) and close to one
million distributed throughput.
§ WSO2 CEP based solution is one of the four
finalists (with Dresden University of
Technology, Fraunhofer Institute, and Imperial
College London)
§ Only generic solution to become a finalist
15. 1
Healthcare Data Monitoring
§ Allows to search/visualize/analyze healthcare
records (HL7) across 20 hospitals in Italy
§ Used in combination with WSO2 ESB and BAM
§ Custom toolbox tailored to customer’s requirement
( to replace existing system)
§
16. 1
Cloud IDE Analytics
§ Custom solution created in partnership
with Codenvy to bring analytics to
Codenvy management team and its
customers
§ Developed in less than a month, with a
custom plug-in to MongoDB.
§ Deployed in the codenvy.com platform.
18. 1
Additional Customers Use Cases
§ Used in Healthcare, Parking Monitoring (see Solution patterns based
approach to rapidly create IoE solutions across industries,
o http://us14.wso2con.com/videos/#Coumara-Radja
§ Used by a Large Scale IoT System Provider for use cases including Vehicle
tracking, Smart City, Building Monitoring (CEP)
o See “Internet of Big Things: The Story of Pacific Controls,
http://us14.wso2con.com/videos/#Sajaad-Chaudry”
§ Transaction Monitoring in a Large Bank (CEP)
§ Knowledge Mining and tracking Prospective Customers through Natural
Language data sources (CEP)
§ CEP Embedded in edge Devices
o See WSO2Con 2013 - Keynote:Emerging Foundations of Next-
Generation Business Systems
https://www.youtube.com/watch?v=7CyG3JKUxWw
§ Throttling and Anomaly Detection by Group of Telecom Companies
19. 1
Extensions and Toolboxes
§ Fraud and Anomaly Detection Toolbox - ( Static Rules, Statistical
outliers, Markov Chains)
§ Time Series Toolbox
§ Natural Language Processing Plugin (Entity Extraction, POS tagging,
Sentiment analysis)
§ GIS Toolbox (Geo Fencing, Tracking, Speed Alarms)
§ Running machine learning models exported as PMML with CEP (e.g.
from R)
§ Video Monitoring with OpenCV
§ For more info,
http://wso2.com/library/articles/2014/08/wso2-cep-in-action-an-analysis-
of-use-in-real-world-applications-of-different-domains/
21. 2
SolidCon
Demo
-‐
hBp://wso2.com/library/arAcles/
2014/09/demonstraAon-‐on-‐
architecture-‐of-‐internet-‐of-‐
things-‐an-‐analysis/
IoT Demos and Use Cases
§ IOT Reference Architecture,
http://wso2.com/landing/internet-of-
things-uk-2014/
§ Internet of Big Things: The Story of
Pacific Controls,
http://us14.wso2con.com/videos/
#Sajaad-Chaudry
§ Federated Identity for IoT with
OAuth,
http://www.infoq.com/presentations/
federated-identity-IoT-OAuth
26. 2
BAM Enhancements
§ Work underway to Switch to Apache
Spark and Shark SQL like Queries
support in BAM
o Faster Queries
o Keeping SQL like language
§ Use “Hive on Spark” for migration
purposes
§ Lower the adoption point of BAM by
packaging by default an RDBMS instead
of Cassandra.
o Architecture already scales from small
deployments to BigData