Automating Google Workspace (GWS) & more with Apps Script
IOT, Streaming Analytics and Machine Learning
1. IOT, STREAMING ANALYTICS AND MACHINE LEARNING
Delivering Real-Time Intelligence With Apache NiFi
Paul Kent, VP of Big Data, Platform R&D
Dan Zaratsian, Sr. Solutions Architect
7. KEY CONCEPTSESP MODEL - PROCESS FLOW
SAS EVENT STREAM PROCESSING ENGINE
DATA IN
(Events)
DATA OUT
(Events)
Design of the rule model (called “Continuous Query”)
using components (called “Windows”)
DATA IN
(Events)
DATA IN
(Events)
DATA OUT
(Events)
SOURCE
1
WINDOW
SOURCE
2
WINDOW
SOURCE
3
WINDOW
FILTER
WINDOW
CALCULATIONS
WINDOW
JOIN
WINDOW
JOIN
WINDOW
NOTIFICATION
WINDOW
PREDICTIVE
MODEL
(SCORING)
WINDOW
19. STREAMING
ANALYTICS
Where are the Opportunities?
• Competitive Pressure (Technology, Sensors, Analytics)
• Risk
• Safety
• Security
• Personalization
Extend the existing analytical footprint!
Capture value otherwise lost through information lag
20. INTEGRATION
SAS EVENT STREAM PROCESSING &
HORTONWORKS DATA FLOW (NIFI)
&
SAS Event Stream Processing Hortonworks Data Flow (Nifi)
The last window I wanted to highlight today is the Procedural window.
As its name state, this window is dedicated to create custom procedures using code. In comparison to all other ESP windows that are dedicated to perform a specific operation, the Procedural window can do nearly anything that you want. You just need to code it.
You can code directly in C++ or using the SAS DataStep 2 language. Coding in C++ is quite self explanatory, so I will focus on the DS2 procedural window.
As you would easily guess, the goal is to allow to ease the implementation of analytics models into ESP, to be performed in real time on streaming data.
We can for example build analytical models using enterprise miner, VA or any other SAS tool, convert the SAS code to DS2 code, and deploy the model to ESP in to the procedural window, usually with just a few tweaks.
This is already used at least on 2 customer projects, in US and in Belgium with very good results.
We need to aware of a few important points though, mostly due to the streaming nature of ESP:
First, not every model has a meaning on streaming data. Some type of analytics models make only sense on batch data.
ESP processes data on the move, so these procedural window models will only receive one event at a time. So we cannot use DS2 models that require for example multi-scan of a set of data, or need to lookup information on another external set of data. All required information has to be on incoming event data, or in the DS2 code itself. If a more complex model is needed, it then has to be implemented using other ESP components, like for example a combination of ESP windows. But based on this characteristic I would say that this window is a very good fit for all scoring models.
Last important thing to be aware of is that the DS2 code is not natively build to cope with the speed and throughput of SAS ESP. So don’t expect usual ESP performance when using DS2 procedural windows.
Having said that, This is a great feature a a strong differentiator, so don’t hesitate to use it when it fits the need.
Including as much analytics features as possible into ESP and being able to seamlessly integrate existing SAS models is probably the main strategy for ESP in the next releases, so stay tune for a lot of improvements in this area in the near future.
The Pattern window is probably the one that reflects the most what ESP is used for: This window allows to detect temporal conditions on events like for example “Tell me when an event A was followed by an event B and not event C within 3 minutes”, then when this sequence of conditions is detected, build and generate an event, usually to alert or trigger another application, or just to be processed downstream.
This is very powerful to detect specific complex behaviors on real time.
As an example, in the operator tree illustrated on the left of the screen, we want to detect when we have event 1 and event 2 or, event 1 and event 3, followed by event 4 in the next 5 minutes but we don’t want any occurrence of event 5 in these 5 minutes, then, followed by event 6 in the next hour. If this sequence of events is matched by the input events, at some moment during the stream, the pattern window will generate an event.
This generated event is usually built using values coming from all the event that have been caught by these pattern condition.
This is of course just an example and this window allows to detect very complex occurring or non occurring patterns of events using this operator tree paradigm.
On another interesting domain ESP is able to process unstructured fields using Text Analytics dedicated windows.
It was already possible in the previous ESP version to find classified terms on events text fields using the Text Context window. The version 3.1 brings 2 additional windows, the Text Category and the Text Sentiment windows in order to further enrich ESP capabilities in this domain.
These windows generate new events that can be further analyzed by other window types. For example, a pattern window could follow a text context window to look for tweet patterns of interest.
Just as a side note remember that you need to have an appropriate license of SAS Text Analytics to be able to use these windows.
Lets understand this with a simple example.
Let’s cover now how ESP connects to other applications.
SAS Event Stream Processing provides a public pub/sub API for developing and creating any custom connector or adapter, hence having the flexibility to connect to any application. It can be done in C language or Java for maximum flexibility.
Actually all existing Connectors & Adapters are build using the Pub/Sub API
Adapters and connectors are nearly identical, the difference is that adapters are standalone components, external to the ESP process.
As a result, Adapters can be networked.
SAS ESP provides many Connectors and Adapters out of the box, for files, MQ messaging buses, JMS, databases, XML, SAS LASR or HDAT, Hadoop, Tibco, OSIsoft PI, etc… and many more will come.
This release 3.1 of ESP also brings 3 new adapters and connectors:
A new REST web service adapter to connect ESP with RTDM or other web services ESP subscribing applications.
A new Sniffer connector dedicated to capture packets from network interfaces, and do network analysis.
The Twitter Adapter that was on tool pool is now part of the ESP standard installation.
Multiple improvements have also been added to the existing adapters.
ANAND: That explains why we also hear from our customers various ways in which they refer to the need of applying analytics at different stages of processing their data and as a result why Streaming Analytics goes by many names.
Specially in todays world of Big Data and Internet of Things which is just picking up pace, the same terms might mean different across organizations.
You will be coming across many of these names when discussing with customers, so make sure you understand what they mean as that can change the scope of how, where and what type of analytics needs to be applied.
{DAN}
Right said Anand.. There are considerations for each one of these types and all are important towards deriving value from data. In its simplest form, you can look at :
Edge Analytics Analytics applied at specific device/sensor, i.e, at the asset and not upstream. Examples include video analytics, sensor networks, optimizing smart grids.
In-Motion Analytics Analytics applied while the data is in motion.. Between sensors.. Between the sensor and another machine or human interface. Here examples include analysis of online transactions, system logs and web clickstreams with continuous application of analytics for monitoring, identification and action in real-time.
At-Rest Event Stream Processing is not limited to real-time data stream, but rather it’s important to leverage and integrate data at-rest with real-time streams in order to make better informed decisions at all levels of your business.
1) First are ecommerce interactions for example:
Clickstream analysis will help in optimizing user experience on commercial web sites, to adapt advertising or page layout to a specific user behavior, history or profile.
This requires low latency decision, with immediate pattern recognition as we are dealing with live events.
It will result with a better customer experience and increase in sales or customer satisfaction. Or maybe also reduce the churn.
2) In Fraud detection there are many applications of Event Stream Processing:
Event Stream Processing can analyze and correlate in real time transactions and user behaviors to detect suspicious events and potential frauds.
It can then halt the pending transaction and issue alerts or further investigations.
This requires extremely low latency decision and complex pattern recognition based and user behavior history, their usual network behavior, well known scenarios, and in the same time a great flexibility to adapt to the ever growing “creativity” of fraud organizations.
3) Connected devices and the Internet of Things is probably an area where we will have the most uses cases in the very near future:
With the advent of what we call the IoT, many equipment will generate data through sensors to measure their activity like voltage values or temperature. Event Stream Processing can then be used to monitor these streams of information and detect failure signs or specific behaviors in real time to take the appropriate decision faster.
4) Many other use cases could be found in telecommunications environments, analyzing the huge amount of communications information to improve in real time, advertising, customer interaction, IT systems, fraud detection, etc…
5) Manufacturing
For example in manufacturing systems, Event processing can be used in plants to detect anomalies or determine if significant changes require re-planning of production.
Plant floor systems get events from numerous sensors and push them to a centralized control system that will explore event patterns and emit aggregated, rich events to take decisions.
6) Energy and utility
Energy and utility is another important domain where for example optimized grid power networks can choose the best power source based on existing conditions and projected needs.
Monitored water systems can prevent infrastructure failures, alert staff about leakages and help understand the impact of water usage on the surrounding environment.
Avoid downtimes caused by defect assets on oil drilling platforms.
Let’s walk through some real customer cases