1. Sensing the world with
Data of Things
By:Sriskandarajah Suhothayan (Suho)
Technical Lead at WSO2
@suhothayan
suho@wso2.com
STRUCTURE DATA 2016
MARCH 9 - 10 • SAN FRANCISCO
2. Any customer can have a car
painted any colour that he wants
so long as it is black
~ Henry Ford ~
3. Me Me Me !!!
Your customers want to have a
personalized experience.
We are in the time of ME!
4.
5.
6. What to do ?
You need to know the customer profile, e.g.
historical data, to take a decision
You need to understand the context in which the
customer evolves
You need to be able to react in real time to certain
conditions or patterns
7. Is IoT New ?
• source: http://community.arm.com/groups/internet-of-things/blog/2014/06
10. WSO2 IoT Server M3 : https://goo.gl/nhbxnG
http://wso2.com/iot
11. Concepts of IoT Analytics
● Type of Data
● Distributed Nature
● Event-Drivenness
● Possible Type of Analytics
● Scalability
● Edge Analytics
● Uncertainty
12. Data Types of Things
● Time based data
○ Continuous monitoring & reporting
○ Time series processing (e.g. Energy
consumption over time)
○ Specialised DBs - OpenTSDB
● Location based data
○ Things are allover the place & they move
○ Tracked via GPS / iBeacons
○ Geospatial processing (e.g Traffic planning,
better route suggestion for vehicles)
○ Geospatial optimised processing engines -
GeoTrellis
13. IoT is Distributed
● Constant changes
○ When components added and removed
○ Data flows are modified or repurposed
● Data collection need to support
○ Weak 3G networks to Ad-hoc peer-to-peer networks.
○ Message Queuing Telemetry Transport (MQTT)
○ Common Open Source Publishing Platform (CoApp)
○ ZigBee or Bluetooth low energy (BLE)
● Dynamic scaling
○ Hybrid cloud
14. IoT Analytics are Event-Driven
● Sensors report data as Event Streams
● Analysis on flowing (or perishable) data
● Realtime Analytics
○ Detect temporal and logical patterns
○ Identify KPIs and Thresholds
○ Send out alerts immediately
○ E.g. Alert when temperature sensor hit a limit, notify in
car dashboard of low tire pressure
○ Systems : Apache Storm, Google Cloud DataFlow &
WSO2 CEP
15. History Repeats
● Present vs usual behavior
● Understand the history
● Batch Analytics
○ Perform periodic summarisation/analytics
○ E.g. Average temperature in a room last month, total
power usage of the factory last year
○ Systems : Apache Hadoop, Apache Spark + Storage
16. ● Ad-Hoc Queries
● Interactive Analytics
○ Provides searchability
○ E.g. Identify fraud rings from simple fraud alerts
○ Systems : Apache Drill, indexed storage systems such
as Couchbase, Apache Lucene
Deep Investigations
17. Thinking Ahead
● When you don’t Know the equations
● Focusing conditions & preventing issues
● Predictive Analytics
○ Incremental Learning
○ E.g. Proactive maintenance, fraud detection and health
warnings
○ Systems : Apache Mahout, Apache Spark MLlib,
Microsoft Azure Machine Learning, WSO2 ML, Skytree
20. Plenty of Data
Scalable Data Processing
source : http://www.websitemagazine.com/content/blogs/posts/archive/2014/09/25/customer-service-in-2039.aspx
23. ● Publishing all events is not good!
○ Hardware may not be scalable
○ Network getting flooded
● What we usually need
○ Aggregation over time
○ Trends that exceed thresholds
○ Event matching a rare condition
● Results in
○ Local optimisation
○ Quick detection of issues
○ Instant notification
Is Every Event Significant?
27. Uncertainty in Data of Things
Data can be
● Duplicated
● Arrives out of order
● Not arrive at all
● Wrong readings
28. Events Duplicates & Out of Order …
● Due redundant sensors & network latency
● Difficult for temporal data processing
○ Time Windows
○ Temporal ordering
● Such as Fraud detection
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 10] ) ->
a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
29. Events Arriving Out of Order
E.g. Realtime Soccer Analytics (DEBS 2013) https://goo.gl/c2gPrQ
● Identify ball kicks, ball possession, shot on goal & offside
● Solutions : K-Slack Based Algorithms
https://www2.informatik.uni-erlangen.de/publication/download/IPDPS2013.pdf
30. Missing Data
● Due to network outages
● E.g. Smart Meters (DEBS 2014)
○ Smart home electricity data: 2000 sensors,
40 houses, 4 Billion events in four months
○ Processed 400K events/sec
● Solutions:
○ Approximate using complimenting
sensor reading
■ Electricity Monitoring
● Frequent Load readings
● Occasional Work readings
○ Fault-tolerant data streams : Google
Millwheel
31. Wrong Sensor Readings
● From GPS
● E.g.TFL Traffic Analysis
○ Using Transport for London open
data feeds.
○ http://goo.gl/04tX6k, http://goo.
gl/9xNiCm
○ Scales to 500,000 Events/Sec
and more
● From iBcons at shops, ships
and airport
● Solution: Kalman Filter
32. Visualisation
● Per-device & Summarization View
● Ability to group by categories
● Solutions: Composable Dashboard with sampling &
indexing
33. Communicate to Mobile & 3rd Party Apps
● Expose analytics
Results as API
○ Mobile Apps,
Third Party
● Provides
○ Security, Billing,
○ Throttling, Quotas
& SLA
● Solution
○ Write data to database
○ Expose them via secured APIs (E.g. WSO2 API Manager)
35. IoT Analytics
● (WSO2 DAS) 3.0.1
○ Combines all types of analytics.
● (WSO2 CEP) 4.1
○ For who need to analyze event streams in realtime.
● (WSO2 ML) 1.1
○ For building Predictive Models
http://wso2.com/analytics
http://wso2.com/iot