This document discusses real-time stream processing and analysis. It describes how stream analysis can be used for monitoring, prediction, and control. Real-time stream analysis provides benefits like smart data warehousing, real-time exploitation of data, and dynamic control systems. The document introduces concepts from control theory and discusses how relationships in data streams can be detected and used to build controllers to automate control of variables. It presents a cloud-based solution for real-time stream monitoring and control and highlights unique features such as dealing with missing data and intuitive relationship browsing.
3. The Benefits of Real-TimeThe Benefits of Real-Time
“Smart” data warehousing
◦ Retain raw data only around graph inflection points
Real-time exploitation
◦ “Complex” alarms, etc.
Dynamic Control Systems
◦ Control VoI by the best means available
And many others…
4. A bit of TheoryA bit of Theory
“Everything is a Control System”
◦ Basic structure:
◦ Objective is always: conserve some quantity
◦ Control systems:
Simple: thermostat, cruise control
Complex: society, language
Key to recognizing a control system is recognizing what is being
controlled
5. A bit more TheoryA bit more Theory
Control systems from scratch:
◦ An ab initio thermostat
Generalizing:
◦ Observation of system relationships within
system
7. Relationship bank ControllerRelationship bank Controller
Select the variable controlled for
Use relationship graph to find variables
that can control it
Use response function formalism to build
controller
10. The practical sideThe practical side
A cloud-based solution that lets you:
◦ Monitor (via Web/Desktop/Custom UI)
◦ Control (via API)
Cutting-edge technology saves money
(one server does the work of 5)
Software is tuned to meet specific needs
11. Unique Features + BenefitsUnique Features + Benefits
Your data streams can be anonymous to
protect privacy
Deals with missing data
Intuitive relationship browser (via network
graphs)
12. In SummaryIn Summary
Automate data stream analytics at the
apex of the value chain
Utilize the result to predict/control
Make a friend of the data avalanche!
15. Is This ML?*Is This ML?*
Principle of ML:
X 1000000’s
◦ Accumulation of knowledge
◦ Don’t repeat mistakes (hopefully)
*Not quite
16. Case Study:Case Study:
Maximize PUE in a datacenter *Maximize PUE in a datacenter *
Conditions:
120 signal streams
An explicit value function
Google Deepmind Approach:
Reinforcement learning (ML) via Neural Nets
Can altaridey do well with this problem?
*http://www.theverge.com/2016/7/21/12246258/google-deepmind-ai-data-center-cooling
17. Case Study, cont’dCase Study, cont’d
ML (ANN) Altaridey
Analogy Brain-like, cognitive Cell-like, reactive
Gets smarter
with age
possibly no
Vert Data
requirements
high low
Advantages Very fast once trained; Real-time; system-
agnostic;
Disadvantages Training is system-
specific and takes time
Resource utilization high
throughout;
Dynamic control theory requires frequent maintenance of
the model; “substitute speed for intelligence”
Notas del editor
Set the scene: 1. I like the faucet analogy because I boils the problem down to the essentials. 2. a data faucet produces a data stream: a continuous stream of date-value pairs corresponding to some observable. 3. notice that we go from the data streams to a graph that relates the data streams to each other, denoting the dependencies within the data. 4. and this is what the application does: it takes the large volumes of time data and constructs a far more compact dependency graph. And it does this as often as it needs to in order to be up to date.
The only question you should ask yourself before deciding to really pay attention is: does the data that I care about fit this template. If it does, then using real-time analysis will add value to what you’re doing.
How it will add value depends on how you use the application, but the data can be exploited to do anything from simple monitoring of the graph, to using this graph to make predictions about a specific stream, to using it as input to construct a control system to control the stream’s output.
Of course, that was interesting in theory, but how can it be applied in practice? The honest answer is that “it depends”, mostly on your specific situation and data flows, but let me give you a few examples of where this software can make a difference:
-because we’re recomputing the dependency graph continuously, we can cut down the amount of raw data we have to store, to zero in on “interesting” data
-presumably we installed the sensor because we want to be able to act on this data (smoke sensor->fire alarm). But what if the alarm we want to set off isn’t tied to a single metric, and, more precisely, there is no model relating the sensor to the alarm. Let me be specific: a mood alarm (at various times, one or more of dozens of factors may control your mood, and you don’t have a formula).
-an alarm is a very crude mechanism of predicting outcomes, and if we’ve come far enough to claim that we can predict outcomes, why not come further and start to control outcomes. This is the most interesting and the most ambitious application of this software. Most of this presentation is about this third category of applications, both because it is the hardest to conceptualize, and because, by considering it in detail, we can illuminate the “lower” applications as well.
-what if we didn’t know anything about thermostats, furnaces, and temperatures. How would you construct a thermostat from general observations?
-by devising processes that control factors that you can control, you end up controlling factors that you want to control
So, let’s now think about how we’d automate this process.
First, we’d need a set of relationships.
-we continually update the relationship bank
-controller is rebuilt as needed
--what you’ve just seen in the demo is the system viewer, which is an entirely web-based application platform. But suppose you want to build an actual controller that controls something in your enterprise. We have an API for that.
--exploits graphics processing units to speed up computations and to deliver results in fractions of second
--we can build proof-of-concept controllers for you, using desktop platforms
--you don’t have to reveal the source of the data stream (i.e. what real-life system is throwing off the data)
--sensors and data sources will occasionally skip a point, but our data analysis techniques—both analytic and statistical—are designed to manage missing data and deliver sound analysis
-we’ve seen how a dynamic control system concept potentially eliminates the need for human input into control systems that are coupled to real-time stream analytics, and we’ve seen how control systems are ubiquitous and can be applied to virtually any business model.
-as a corollary, we have seen how we can stop short of building a control system, and use the relationship graph to browse and discover stream relationships, or to make predictions about the further behavior of specific series
-we’ve seen how a dynamic control system virtually eliminates the need for human intervention in control systems and data analytics
-as a direct consequence, we’ve also seen how fear of significant data volumes is no longer justified, since all the information that can be collected can be exploited nearly in real-time, and the data subsequently discarded
“explicit value function” == the thing that you’re maximizing is itself a signal (or can be made into one)
-in order to answer the question of whether altaridey can do it, we need to examine the distinctions between the two approaches, to highlight the relative strengths.
“explicit value function” == the thing that you’re maximizing is itself a signal (or can be made into one)
-in order to answer the question of whether altaridey can do it, we need to examine the distinctions between the two approaches, to highlight the relative strengths.