Complex Event Processing (CEP) is a powerful technology in realtime distributed environments for analyzing fast and distributed streams of data, and deriving conclusions from them. CEP permits defining complex events based on the events produced by the incoming sources in order to identify complex meaningful circumstances and to respond to them as quickly as possible. However, in many situations the information that needs to be analyzed is not structured as a mere sequence of events, but as graphs of interconnected data that evolve over time. This paper proposes an extension of CEP systems that permits dealing with graph-structured information. Two case studies are used to validate the proposal and to compare its performance with traditional CEP systems. We discuss the benefits and limitations of the CEP extensions presented.
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
Extending Complex Event Processing to Graph-structured Information
1. Extending Complex Event Processing
to Graph-structured Information
Gala Barquero1, Loli Burgueño2, Javier Troya3, Antonio Vallecillo1
1Universidad de Málaga, Spain
2Universitat Oberta de Catalunya, Spain
3Universidad de Sevilla, Spain
2. Complex Event Processing
1. CEP is a method for data stream-processing for analyzing and correlating streams of
information about real-time events in order to derive conclusions from them.
2. CEP permits defining complex events on top of other events (primitive or complex)
3. CEP programs are composed of rules which are in charge of processing the events
2
3. Complex Event Processing
1. CEP is a method for data stream-processing for analyzing and correlating streams of
information about real-time events in order to derive conclusions from them.
2. CEP permits defining complex events on top of other events (primitive or complex)
3. CEP programs are composed of rules which are in charge of processing the events
3
Queries Data Results Data Results
Queries
(patterns)
4. Complex Event Processing
1. CEP is a method for data stream-processing for analyzing and correlating streams of
information about real-time events in order to derive conclusions from them.
2. CEP permits defining complex events on top of other events (primitive or complex)
3. CEP programs are composed of rules which are in charge of processing the events
4. CEP programs define (size or temporal) windows on the stream of events
4
5. Current CEP technologies
1. Efficient languages and technologies for processing huge streams of data
6.5 zettabytes (10^21) in 2016
15.3 zettabytes expected in 2020
2. Increasingly used (and useful) in applications for critical infrastructure monitoring,
real-time market trend analysis, plagues and natural disasters prediction, ...
5
7. However, real information is normally structured in more complex ways
1. The data is not only structured as a sequence of timed events, but as graphs that
combine transient (streams) and persistent (database) information
Queries about social trends based on Twitter feeds and shared Flickr photos
Monitoring tendencies via Twitter and Facebook posts
7
8. Our contribution
1. Extend CEP systems and languages to deal with graph-based information
Able to deal both with streams of timed events and with graphs of persistent data
Extend the concept of a CEP “sequential window” to a “spatial window”
Keep up with the stringent requirements on performance and scalability of CEP
systems
2. For this we decided to:
Generalize the structure of a CEP stream from a sequence of time-ordered events to a
Model (i.e., a graph of interrelated elements – time being just one dimension)
Consider the behavior of a CEP system as a particular kind of in-place Model
Transformation
Use the concept of “vicinity graphs” to define and implement spatial windows in
models (a generalization of CEP’s sequential windows)
Use recent graph parallel computational technologies to provide the supporting
storage and access infrastructure for the models, and graph-processing systems to
implement the corresponding in-place model transformations
8
10. Case study: Twitter and Flicker
10
Q1
A HotTopic event is generated every time a hashtag has been used
by both Twitter and Flickr users at least 100 times in the last hour
11. Case study: Twitter and Flicker
Q1: A HotTopic event is generated every time a hashtag has been used by both
Twitter and Flickr users at least 100 times in the last hour.
Q2: A PopularTwitterPhoto element is created when the hashtag of a photo is
mentioned in a tweet that receives more than 30 likes in the last hour.
Q3: A PopularFlickrPhoto element is created when a photo is favored by more
than 50 Flickr users who have more than 50 followers.
Q4: We generate a NiceTwitterPhoto event when a user, with an h-index higher
than 50, posts three tweets in a row in the last hour containing a hashtag
that describes a photo.
Q5: A InfluencerTweeted event is generated, considering the 10K most recent
tweets, when a user with h-index higher than 70 and more than 50K
followers, sends a tweet.
11
12. Current Implementation
1. Models implemented with Apache Spark
RDDs (resilient distributed dataset) used to store both model elements (graph vertices)
and their relations (edges)
Models populated using the sources’ APIs to obtain the data
One thread for each stream of events in case of streaming data
2. Model transformation rules (modeling the corresponding CEP rules) implemented in
Scala
Implemented in terms of Spark and GraphX functions
One dedicated running thread for each rule
Produced events stored using RDDs too
3. Data lifecycle
Transient data (and their relationships) have an “expiration date” (ED)
The ED is determined by the largest window of the rules that deal with the event
Once the ED of an element has passed, the element is removed from the system
12
14. Analyses
1. Performance
How fast are we?
Is the performance of our
proposal acceptable for dealing
with large systems?
How do we compare with CEP
systems? (when only
one-dimensional streams are
used)
2. Expressiveness
Are we as expressive as CEP
languages?
Can we write all CEP patterns
with GraphX?
How easy is to write Rules with
our proposal?
14
15. Performance analysis
1. Performance Figures for the Twitter and Flickr case study (in milliseconds)
2. Comparison figures with other solutions (127K/6500K):
15
16. Performance analysis: comparison with streaming CEP systems
1. A different case study (Motorbike) implemented using both our solution and Esper
16
17. Expressiveness
1. We have been able to express all queries using Scala and GraphX
2. However, the expression of the queries is not simple
17
Scala code for the “DriverLeftSeat” rule:
19. Technology (and its rapid evolution) is an issue in this context
19
Technology In
memory
Query
Language
Pros Cons
Neo4j No Cypher * Expressiveness and usability of Cypher!!!
* Easy to install and to use
* Scalability
* Disk Access (R/W) very slow
* No in-memory implementation available
Spark +
Graphx
Yes Scala * Versatile and very expressive language.
* Easy to install
* Implements cluster mode (distributed)
* Cumbersome as query lang. for graphs
* Uses lazy evaluation
* Complex configuration in cluster mode
Viatra Yes Viatra * Speed and general performance
* Good language for querying models
* Very expressive
* Difficult to install and configure
* Documentation is scarce
Tinkergraph Yes Gremlin * Graph-native language and tools
* In-memory implementation
* Easy to install and to use
* Learning curve of Gremlin
CrateDB No SQL * Uses disk but very efficiently (scalability).
* SQL is well known and used
* Implements cluster mode (distributed)
* Easy to install and to use
* Writting graph queries in SQL is not easy
(specially those queries involving hops)
20. Conclusions and future work
Contribution: Extension of CEP systems to deal with graph-structured information:
Able to deal both with streams of timed events and with graphs of persistent data
Represent the information to manage as a Model
Consider the behavior of a CEP system as an in-place Model Transformation
Extend the concept of CEP windows to models’ spatial windows
Use graph parallel computational technologies to provide the supporting storage
and access infrastructure, and
Use of graph-processing languages and systems to implement the corresponding
model transformations
20
21. Future work
1. Performance:
Experiment with other technologies, beyond Spark+GraphX
Each one has pros and cons (expressiveness, performance, scalability, distribution)
Volatility is an issue… They change too rapidly!
2. Expressiveness
Compilers from Query languages to Storage technologies can be a solution
For example, from Cypher to Gremlin or to Scala+GraphX
3. Correctness/Accuracy
What is the error introduced by the use of spatial windows?
Here we need to trade accuracy for performance
Approximate queries and model transformations…
21
Q: A YoungInfluencer is a TwitterUser younger
than 25 years old, which has more than 30
followers older than 25 years old.
22. Extending Complex Event Processing
to Graph-structured Information
Gala Barquero1, Loli Burgueño2, Javier Troya3, Antonio Vallecillo1
1Universidad de Málaga, Spain
2Universitat Oberta de Catalunya, Spain
3Universidad de Sevilla, Spain