Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Debs 2011 tutorial on non functional properties of event processing
1. Non Functional Properties of Event Processing Presenters: Opher Etzion and Tali Yatzkar-Haham Participated in the preparation: Ella Rabinovich and Inna Skarbovsky
3. The variety There are variety of cheesecakes There are many systems that conceptually look like EPN, but they are different in non functional properties
4. Two examples Very large network management: Millions of events every minute; Very few are significant, same event is repeated. Time windows are very short. Patient monitoring according to medical Treatment protocol : Sporadic events, but each is meaningful, time windows can span for weeks. Both of them can be implemented by event Processing – but very differently.
5. Agenda Introduction to Non functional properties of event processing Performance and scalability considerations Availability considerations Usability considerations Security and privacy considerations Summary I II III IV V VI
7. Performance benchmarks There is a large variance among applications, thus a collection of benchmarks should be devised, and each application should be classified to a benchmark Some classification criteria: Application complexity Filtering rate Required Performance metrics
8. Performance benchmarks – cont. Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. Mendes M., Bizarro P., Marques P. Benchmarking event processing systems: current state and future directions. WOSP/SIPEW 2010: 259-260 . Previous studies indicate that there is a major performance degradation as application complexity increases.
9.
10. Performance indicators One of the sources of variety Observations: The same system provides extremely different behavior based on type of functions employed Different application may require different metrics
11. Throughput Input throughput output throughput Processing throughput Measures: number of input events that the system can digest within a given time interval Measures: Total processing times / # of event processed within a given time interval Measures: # of events that were emitted to consumers within a given time interval
12. Latency latency In the E2E level it is defined as the elapsed time FROM the time-point when the producer emits an input event TO the time-point when the consumer receives an output event The latency definition But – input event may not result in output event: It may be filtered out, participate in a pattern but does not result in pattern detection, or participates in deferred operation (e.g. aggregation) Similar definitions for the EPA level, or path level
13. Latency definition – two variations: Producer 1 Producer 2 Producer 3 EPA Detecting Sequence (E1,E2,E3) within Sliding window of 1 hour E1 E2 E3 Consumer 11:00 12:00 11:10 11:15 11:30 E1 E2 E3 11:40 E2 Variation I: We measure the latency of E3 only Variation II: We measure the Latency of each event; for events that don’t create derived events directly, we measure the time until the system finishes processing them
16. Scalability Scalability is the ability of a system to handle growing amounts of work in a graceful manner, or its ability to be enlarged effortlessly and transparently to accommodate this growth Scale up Vertical scalability Adding resources within the same logical unit to increase capacity Scale out Horizontal scalability Adding additional logical units to increase processing power
17.
18.
19.
20.
21. Scalability in event processing: various dimensions # of producers # of input events # of EPA types # of concurrent runtime instances # of concurrent runtime contexts Internal state size # of consumers # of derived events Processing complexity # of geographical Locations # of geographical Locations
26. Scalability in a number of producers/consumers Growth in a number of producers usually results in growth in event load even if number of events produced by each one is small Growth in a number of consumers Requires optimization at routing level, such as multicasting
38. Usability 101 Definition by Jakob Nilsen * * http :// www . useit . com / alertbox / 20030825 . html Learnability: How easy it is for Users to accomplish basic tasks the first time they encounter the system? Efficiency: Once users have Learned the system, How quickly can they perform tasks? Memorability: When users return after period of not using the system, How easily can they reestablish proficiency ? Errors: How many errors do users make, how severe are these errors, and how easily they can recover from the errors? Satisfaction: How pleasant is it to use the system? Utility: Does the system do what the user intended?
39. In this part of the tutorial we’ll talk about Build time IDE Runtime control and audit tools Correctness – internal Consistency Debug and validation Consistency with the environment - Transactional behavior
40. Build time interfaces Text based programming languages Visual languages Form based languages Natural languages interfaces
47. Run time tools Performance monitoring Dashboards Audit and provenance Two types of run time tools: Monitoring the application Monitoring the event processing systems
52. Provenance and audit Tracking all consequences of an event Tracking the reasons that something happens Within the event processing system: Derivation of events, routing of events, Actions triggered by the events
63. Correctness The ability of a developer to create correct implementation for all cases (including the boundaries) Observation: A substantial amount of effort is invested today in many of the tools to workaround the inability of the language to easily create correct solutions
64. Some correctness topics The right interpretation of language constructs The right order of events The right classification of events to windows
65. The right interpretation of language constructs – example All (E1, E2) – what do we mean? A customer both sells and buys the same security in value of more than $1M within a single day Deal fulfillment: Package arrival and payment arrival 6/3 10:00 7/3 11:00 8/3 11:00 8/3 14:00
66. Fine tuning of the semantics (I) When should the derived event be emitted? When the Pattern is matched ? At the window end?
67. Fine tuning of the semantics (II) How many instances of derived events should be emitted? Only once? Every time there is a match ?
68. Fine tuning of the semantics (III) What happens if the same event happens several times? Only one – first, last, higher/lower value on some predicate? All of them participate in a match?
69. Fine tuning of the semantics (IV) Can we consume or reuse events that participate in a match?
70.
71.
72. Ordering in a distributed environment - possible issues Even if the occurrence time of an event is accurate, it might arrive after some processing has already been done If we used occurrence time of an event as reported by the sources it might not be accurate, due to clock accuracy in the source Most systems order event by detection time – but events may switch their order on the way
73. Clock accuracy in the source Clock synchronization Time server, example: http://tf.nist.gov/service/its.htm
74.
75.
76. Classification to windows - scenario Calculate Statistics for each Player (aggregate per quarter) Calculate Statistics for each Team (aggregate per quarter) Window classification: Player statistics are calculated at the end of each quarter Team statistics are calculated at the end of each quarter based on the players events arrived within the same quarter All instances of player statistics that occur within a quarter window must be classified to the same window, even if they are derived after the window termination.
77.
78. Transactional behavior in event processing? Typically, event processing systems have decoupled architecture, and does not exhibit transactional behavior However, in several cases event processing is embedded within a transactional environment
79. CASE I: Transactional ECA at the consumer side When a derived event is emitted to a consumer, there is an ECA rule, with several actions, that is required to run as atomic unit. If failed, the Derived event should be withdrawn
80. CASE II: An event processing system monitors transactional system In this case, the producer may emit events that are not confirmed and may be rolled back.
81.
82. Case IV: A path in the event processing network should act as “unit of work” Example: the “determine winner” fails, and the bid is cancelled, all bid events are not kept in the event stores, and are withdrawn for other processing purposes
85. Security, privacy and trust Security requirements ensure that operations are only performed by authorized parties, and that privacy considerations are met. Based on Enhancing the Development Life Cycle to Produce Secure Software [DHS/DACS 08] Characteristics of secure application: Containing no malicious logic that causes it to behave in a malicious manner. Trustworthiness Recovering as quickly as possible with as little damage as possible from attacks. Survivability Executing predictably and operating correctly under all conditions, including hostile conditions. Dependability
91. Summary Non Functional properties determine the nature of event processing applications – distribution, availability, optimization, correctness and security are some of the dimensions There are often the main decision factor in selecting whether to use an event processing system, and in the selection among various alternatives.
Notas del editor
For example scalability can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added, and which can be upgraded easily and transparently without shutting the system down
Actor model is a concurrent computation model that treats "actors" as the universal primitives of concurrent computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.
The Master/Worker pattern consists of two logical entities: a Master , and one or more instances of a Worker . The Master initiates the computation by creating a set of tasks, puts them in some shared space and then waits for the tasks to be picked up and completed by the Workers A Shared Nothing system typically partitions its data among many nodes on different databases (assigning different computers to deal with different users or queries), or may require every node to maintain its own copy of the application's data, using some kind of coordination protocol. This is often referred to as data sharding . One of the approaches to achieve SN architecture for stateful applications (which typically maintain state in a centralized database ) is the use of a data grid , also known as distributed caching Space based architecture ( Space-Based Architecture ( SBA ) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications using the tuple space paradigm Applications are built out of a set of self-sufficient units, known as processing-units (PU). These units are independent of each other, so that the application can scale by adding more units.) Packaging services into PUs based on their runtime dependencies to reduce network chattiness and number of moving parts. 1. Scaling-out by spreading our application bundles across the set of available machines. 2. Scaling-up by running multiple threads in each bundle. MapReduce is a framework for processing huge datasets of certain kinds of distributable problems using a large number of nodes in a cluster . Computational processing can occur on data stored either in a filesystem (unstructured) or within a database (structured). "Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output – the answer to the problem it was originally trying to solve.
Increased management complexity – have to deal with partial failure, consistency Issues as throughput and latency between nodes – network traffic costs, serialization/deserialization
Scaling out spreading application modules (services with runtime dependencies) across a set of available machines Load-partitioning and load-balancing between the application modules Using distributed cache for stateful applications
Care should be taken when referring to a large event volume (MAX input throughput metric) Some system might filter out a large percentage of events before they hit the “heavy” processing layer Complexity of computation should be taken into account
Growth in a number of context partitions leads to growth in overall internal state of the system
Redundancy Failover – automatic reconfiguration of the system ensuring continuation of service after failure of one or more of its components Load balancing is one of the players in implementing failover Components are monitored continuously (“heart-bit monitoring”) When one fails load-balancer no longer sends traffic to it and instead sends to another component When the initial component comes back online the load-balancer begins to route traffic back
After detection of failure and maybe reconfiguration/resolution of the fault, the effects of errors must be eliminated. Normally the system operation is backed up to some point in its processing that preceded the fault detection, and operation recommences from this point. This form of recovery, often called rollback, usually entails strategies using backup files, checkpointing, and journaling. In in memory db the implementation of the persistence layer is more complex – need to decide how we sync with the db (write-through? Periodically?) Need to decide how and when to load data on cache misses. Etc. Lots of commercial solutions exist now for in-memory db with caching.