Big data, with its four main characteristics (Volume, Velocity,
Variety, and Veracity) pose challenges to the gathering, management, analytics, and visualization of events. These very same four characteristics, however, also hold a great promise in unlocking the story behind data. In this talk, we focus on the observation that event creation is guided by processes. For example, GPS information, emitted by buses in an urban setting follow the bus scheduled route. Also, RTLS information about the whereabouts of patients and nurses in a hospital is guided by the predefined schedule of work. With this observation at hand, we thoroughly seek a method for mining, not the data, but rather the rules that guide data creation and show how, by knowing such rules, big data tasks become more efficient and more effective. In particular, we demonstrate how, by knowing the rules that govern event creation, we can detect complex events sooner and make use of historical data to predict future behaviors.
2. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Presentation Outline
Big data: the New Playground
Events, Processes, and Anything in Between
Complex Event Processing Optimizaion
Process Mining with Schedules
4. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Big data is a game changer
From Theory to Systems: empirical evaluation counts
From Systems to Data: large scale empirical evaluation
counts
5. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Who is a Data Scientist?
The ability to take data – to be able to understand it, to
process it, to extract value from it, to visualize it, to
communicate it – that’s going to be a hugely important skill in
the next decades. (Hal Varian, Google’s Chief Economist)
6. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Volume: No Longer the Size of a Teacup
Volume
Table: Big Data Cross Table
Big data may be a single dataset with a lot of data
7. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Volume: No Longer the Size of a Teacup
Table: Big Data Cross Table
Big data may be a single dataset with a lot of data
8. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Velocity: Replacing a Teacup with a Tea Hose
Volume
Velocity
Table: Big Data Cross Table
Big data may be data that rapidly changes
9. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Velocity: Replacing a Teacup with a Tea Hose
Table: Big Data Cross Table
Big data may be data that rapidly changes
10. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Velocity: Replacing a Teacup with a Tea Hose
Table: Big Data Cross Table
Big data may be data that rapidly changes
11. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Velocity: Replacing a Teacup with a Tea Hose
Table: Big Data Cross Table
Big data may be data that rapidly changes
12. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Variety: When One Tea Type is Just not
Enough
Volume
Velocity
Variety
Table: Big Data Cross Table
Big data may be a small dataset with many different schemata
13. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Variety: When One Tea Type is Just not
Enough
Table: Big Data Cross Table
Big data may be a small dataset with many different schemata
14. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Veracity: Is it Coffee or Black Tea with Milk?
Volume
Velocity
Variety
Veracity
Table: Big Data Cross Table
Big data may be data with varying levels of trustworthiness
15. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Veracity: Is it Coffee or Black Tea with Milk?
Table: Big Data Cross Table
Big data may be data with varying levels of trustworthiness
16. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Gathering: where and when to expect the
fountain to burst
Gathering
Volume
Velocity
Variety
Veracity
Signal and Event Processing
Table: Big Data Cross Table
18. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Management: Not your typical DBA anymore
Gathering Managing
Volume
Velocity
Variety
Veracity
Cloud Computing, NoSQL, NewSQL
Table: Big Data Cross Table
19. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Analytics: When Data Analysis Explodes
Multi-Dimensionally
Gathering Managing Analyzing
Volume
Velocity
Variety
Veracity
Data & Process Mining
ML, IR, NLP
Table: Big Data Cross Table
20. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Data Visualization: The Machine Offering to
Mankind
Gathering Managing Analyzing Visualizing
Volume
Velocity
Variety
Veracity
User Experience
Table: Big Data Cross Table
22. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Events
Processes
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Big Data Cross Table
Gathering Managing Analyzing Visualizing
Volume Ev Pro
Velocity en ce
Variety t ss
Veracity s es
Table: Big Data Cross Table
23. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Events
Processes
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Event Processing
Events
An event e is an occurrence within a particular system or
domain.
It is something that has happened, or is contemplated as
having happened in that domain.
[Etzion and Niblett, 2010]
Point-based semantics.
An event type E ∈ E is a specification for a set of events
that share the same semantic intent and structure.
Complex Event Processing
Systems: Amit [Adi and Etzion, 2004],
SASE [Wu et al., 2006], Cayuga [Demers et al., 2007],
CEDR [Barga et al., 2007], ESPER [].
DEBS 2016: Oragne County, California
27. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Events
Processes
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Events and Big Data
Volume: 23 Million records per month (∼ 4GB)
Velocity: 770,000 new records per day (an event each 2-6
seconds)
Variety: Homogeneous
Veracity: GPS locations
28. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Events
Processes
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Processes
Processes
Process models describe time dependencies among
activities:
Business processes
Scheduled activities
Used as a template for execution by a process engine.
A process model can be modeled as a graph containing
activity nodes and control nodes:
Petri nets [Reisig, 1985]
BPMN [bpm, 2011]
30. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Events
Processes
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Between Events and Processes
Given processes, detect (complex) events
Given events, discover processes
31. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
From Processes to CEP
Optimisation of event pattern matching on three levels
Approach based on domain knowledge
Results taken from: M. Weidlich, H. Ziekow, A. Gal, J.
Mendling, M. Weske - Optimising Event Pattern Matching
using Business Process Models. IEEE Transactions on
Knowledge and Data Engineering (TKDE), accepted for
publication, 2015.
36. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Performance Analysis
Datasets
publicly available process log that contains recorded
execution sequences of a paper reviewing process.a
The model denes 20 activities.
The log comprises 3730 events that are related to 100
process instances.
Each event is associated with a timestamp and a reference
to an activity of the process model.
Process models of a German insurance company.
1021 process models, ranging from 4 to 339 nodes.
The average size of the process models is around 23 nodes.
The log was simulated using annotations of the process
models.
a
http://www.processmining.org/logs/start
39. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Complex Events Processing with Processes
Gathering ...
Volume
Velocity Optimization
Variety Optimisation in event processing networks
Veracity
Table: Big Data Cross Table
40. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Complex Events Processing with Processes
... Analysis
Volume Mining of constraints
Velocity
Variety
Veracity Probabilistic mining of constraints
Table: Big Data Cross Table
41. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
From Events to Processes
Online Traveling Time Prediction: when Processes Rule Events
Using information on bus stops, the prediction of the journey
traveling time T( ω1, . . . , ωn , tω1 ) is traced back to the sum of
traveling times per segment:
T( ω1, . . . , ωn , tω1 ) = T( ω1, ω2 , tω1 ) + . . . + T( ωn−1, ωn , tωn−1 )
where
tωn−1 = tω1 + T( ω1, ωn−1 , tω1 ).
s d
Traveling Time = Drive Time + Delay Time + Stop Time
ω_2 ω_3 ω_i ω_{n-1}
(Thanks to Arik Senderovich for the slides)
42. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
From Events to Processes
Online Traveling Time Prediction: when Processes Rule Events
Using information on bus stops, the prediction of the journey
traveling time T( ω1, . . . , ωn , tω1 ) is traced back to the sum of
traveling times per segment:
T( ω1, . . . , ωn , tω1 ) = T( ω1, ω2 , tω1 ) + . . . + T( ωn−1, ωn , tωn−1 )
where
tωn−1 = tω1 + T( ω1, ωn−1 , tω1 ).
s d
Traveling Time = Drive Time + Delay Time + Stop Time
ω_2 ω_3 ω_i ω_{n-1}
(Thanks to Arik Senderovich for the slides)
43. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Prediction: The Snapshot Principle in
Single-Station Queues
The snapshot principle stems from a heavy-traffic
approximation of a queueing system under limits of its
parameters, as the workload converges to capacity.
Station1
The principle states that the total time in the station
(waiting+service) remains constant.
In our context, bus that passes through a segment, e.g.,
ωi, ωi+1 ∈ S × S, will have the same traveling time as
another bus that has just passed through that segment (not
necessarily of the same type, line, etc.).
44. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Prediction: The Snapshot Principle in
Single-Station Queues
The snapshot principle stems from a heavy-traffic
approximation of a queueing system under limits of its
parameters, as the workload converges to capacity.
Station1
The principle states that the total time in the station
(waiting+service) remains constant.
In our context, bus that passes through a segment, e.g.,
ωi, ωi+1 ∈ S × S, will have the same traveling time as
another bus that has just passed through that segment (not
necessarily of the same type, line, etc.).
45. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Prediction: The Snapshot Principle in
Single-Station Queues
The snapshot principle stems from a heavy-traffic
approximation of a queueing system under limits of its
parameters, as the workload converges to capacity.
Station1
The principle states that the total time in the station
(waiting+service) remains constant.
In our context, bus that passes through a segment, e.g.,
ωi, ωi+1 ∈ S × S, will have the same traveling time as
another bus that has just passed through that segment (not
necessarily of the same type, line, etc.).
46. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
The Snapshot Principle in Single-Station Queues
Based on the above, we define a single-segment snapshot
predictor, Last-Bus-to-Travel-Segment (LBTS), denoted by
θLBTS( ωi, ωi+1 , tω1 ).
In real-life settings, applicability of the snapshot principle
predictors should be tested ad-hoc.
The snapshot principle was shown to be of an empirical value
in previous research, where queueing techniques were applied to
predict delays.
47. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
The Snapshot Principle in Single-Station Queues
Based on the above, we define a single-segment snapshot
predictor, Last-Bus-to-Travel-Segment (LBTS), denoted by
θLBTS( ωi, ωi+1 , tω1 ).
In real-life settings, applicability of the snapshot principle
predictors should be tested ad-hoc.
The snapshot principle was shown to be of an empirical value
in previous research, where queueing techniques were applied to
predict delays.
48. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Snapshot Principle in a Network
In our case, the LBTS predictor needs to be lifted to a network
setting.
The snapshot principle holds for networks of queues, when the
routing through this network is known in advance.
In scheduled transportation such as buses this is the case as the
order of stops (and segments) is predefined:
Station1 Station2 Station3
Station5 Station6
Station4
Station7
49. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Snapshot Principle in a Network
In our case, the LBTS predictor needs to be lifted to a network
setting.
The snapshot principle holds for networks of queues, when the
routing through this network is known in advance.
In scheduled transportation such as buses this is the case as the
order of stops (and segments) is predefined:
Station1 Station2 Station3
Station5 Station6
Station4
Station7
50. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Snapshot Principle in a Network
In our case, the LBTS predictor needs to be lifted to a network
setting.
The snapshot principle holds for networks of queues, when the
routing through this network is known in advance.
In scheduled transportation such as buses this is the case as the
order of stops (and segments) is predefined:
Station1 Station2 Station3
Station5 Station6
Station4
Station7
51. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Snapshot Principle in a Network
We define a multi-segment (network) snapshot predictor that
we refer to as the Last-Bus-to-Travel-Network or
θLBTN ( ω1, ..., ωn , tω1 ), given a sequence of stops (with ω1
being the start stop and ωn being the end stop).
According to the snapshot principle in networks we get that:
θLBTN ( ω1, ..., ωn , tω1 ) =
n
i=1
θLBTS( ωi, ωi+1 , tω1 ).
52. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Snapshot Principle in a Network
We define a multi-segment (network) snapshot predictor that
we refer to as the Last-Bus-to-Travel-Network or
θLBTN ( ω1, ..., ωn , tω1 ), given a sequence of stops (with ω1
being the start stop and ωn being the end stop).
According to the snapshot principle in networks we get that:
θLBTN ( ω1, ..., ωn , tω1 ) =
n
i=1
θLBTS( ωi, ωi+1 , tω1 ).
53. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Snapshot Principle in a Network
We define a multi-segment (network) snapshot predictor that
we refer to as the Last-Bus-to-Travel-Network or
θLBTN ( ω1, ..., ωn , tω1 ), given a sequence of stops (with ω1
being the start stop and ωn being the end stop).
According to the snapshot principle in networks we get that:
θLBTN ( ω1, ..., ωn , tω1 ) =
n
i=1
θLBTS( ωi, ωi+1 , tω1 ).
54. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Performance Analysis
Data
8 days of bus data, between September and October of
2014.
Each day: approximately 11500 traveled segments.
First trip for each day: no associated last travel time.
Prediction for line 046A.
Data comes from all buses that share segments with line
046A.
55. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Performance Analysis
10 20 30 40 50
Index of the segment in the trip
100
101
102
103
104
105
106
107
Samplesquareestimationerror
40
50
60
70
80
90
100
110
RootMeanSquareError
56. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Process Mining with Schedules
... Analysis
Volume Better prediction
Velocity Segmentation
Variety
Veracity
Table: Big Data Cross Table
57. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Process Mining with Schedules
... Management ...
Volume
Velocity
Variety
Veracity Event Cleaning
Table: Big Data Cross Table
59. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
A. Adi and O. Etzion.
Amit - the situation manager.
The International Journal on Very Large Data Bases, 13(2):177–203, May
2004.
Roger S. Barga, Jonathan Goldstein, Mohamed H. Ali, and Mingsheng
Hong.
Consistent streaming through time: A vision for event stream processing.
In CIDR [DBL, 2007], pages 363–374.
Business Process Model and Notation (BPMN) Version 2.0.
Technical report, Object Management Group (OMG), January 2011.
CIDR 2007, Third Biennial Conference on Innovative Data Systems
Research, Asilomar, CA, USA, January 7-10, 2007, Online Proceedings.
www.cidrdb.org, 2007.
Alan J. Demers, Johannes Gehrke, Biswanath Panda, Mirek Riedewald,
Varun Sharma, and Walker M. White.
Cayuga: A general purpose event monitoring system.
In CIDR [DBL, 2007], pages 412–422.
Opher Etzion and Peter Niblett.
Event Processing in Action.
Manning Publications Company, 2010.
60. Lecture
Outline
Big Data: the
New
Playground
Events,
Processes, and
Anything in
Between
Complex
Event
Processing
Optimization
Process
Mining with
Schedules
Wolfgang Reisig.
Petri Nets: An Introduction, volume 4 of Monographs in Theoretical
Computer Science. An EATCS Series.
Springer, 1985.
Eugene Wu, Yanlei Diao, and Shariq Rizvi.
High-performance complex event processing over streams.
In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD international
conference on Management of data, pages 407–418, New York, NY, USA,
2006. ACM.