BIRTE Panel at VLDB: Are we solving the core Problems in stream processing?

•

1 like•355 views

This presentation highlight recent developments in the Apache Flink community and recent related research publications. The presentation was held at the panel discussion of the BIRTE workshop at VLDB 2018 in Rio de Janeiro. More information can be found on http://db.cs.pitt.edu/birte2018/ Panel Title: Are we making any attempts towards solving the hardest problems in stream processing today? Panel Abstract: Most of today’s Internet applications are data-centric and generate vast amounts of data that needs to be processed and analyzed for detailed reporting, enhancing user experience and increasing monetization. Streaming data processing systems must be designed based on a varying set of requirements. The list of requirements can be categorized based on different properties of such systems: 1. Consistency: Does every record in the input (or equivalently an input event) need to be committed exactly-once or at-least-once or at-most-once to the output? Is the event committed atomically or eventually to all outputs? 2. Scale: How many events per second can the system process? Tens of events per second? Or Thousands? Millions? Billions or even more? Does the system auto-scale to a new workload? 3. Failure Resilience: What kind of failures is the system resilient to? Machine-level or partial datacenter-level or entire datacenter-level? Is it enough to ensure that the data processing system itself is failure-resistant? Does the output need to be stored in globally consistent way? Is the system resilient to a bug in input data, a bug in user’s business logic, etc? 4. Latency: How long does it take every event from the time it is generated to the time it is committed? Milliseconds or seconds or minutes or hours or days? Should we target SLOs for median latencies or 90th percentile or higher tail latencies? 5. Expressiveness: What kind of operations can the user express in the system? From simple stateless operations (e.g. filter) to complex joins or stateful operations (e.g. HAVING clause in SQL)? How flexible is the system to add more input sources and output sinks? 6. Cost: This includes not only hardware cost (CPU, RAM, Disk, network, etc) but also engineering design complexity, cost of production support to run as a service and providing SLOs for latency / completeness, etc. From a pure business perspective, all this cost needs to be justified by the value the end user gets. 7. Service: Does the system run as a service for the users? Multi-tenant? What kind of isolation (e.g. performance, security, etc) is provided amongst users? How is business logic isolated from infrastructure? How easy is it for users to modify business logic in a self-service way? Lots of systems provide a lambda architecture: Use stream processing for best-effort (approximate) analysis, and use batch processing (e.g. daily) for strong consistency, high reliability, etc. This represents an easy way out. But is it the right thing to do? [...]

Science

1 Jonas Traub, BIRTE @ VLDB, 20181 Jonas Traub, BIRTE @ VLDB, 2018
Are we solving the core problems
in stream processing?
Jonas Traub
Technische Universität Berlin / DFKI IAM
www.dima.tu-berlin.de | jonas.traub@tu-berlin.de
Panel Discussion with:
Manpreet Singh (Google)
Karthik Ramasamy (Stremlio)
C. Mohan (IBM)
Badrish Chandramouli (Microsoft)
Neng Lu (Twitter)
Alok Pareek (Striim)
Jonas Traub (TU-Berlin)

2 Jonas Traub, BIRTE @ VLDB, 20182 Jonas Traub, BIRTE @ VLDB, 2018
Are we solving the core problems
in stream processing?
Jonas Traub
Technische Universität Berlin / DFKI IAM
www.dima.tu-berlin.de | jonas.traub@tu-berlin.de

3 Jonas Traub, BIRTE @ VLDB, 20183 Jonas Traub, BIRTE @ VLDB, 2018
Are we solving the core problems
in stream processing?
Yes, we do!

4 Jonas Traub, BIRTE @ VLDB, 20184 Jonas Traub, BIRTE @ VLDB, 2018
Are we solving the core problems
in stream processing?
Yes, we do!
Apache Flink and
its success story
What are the core problems
and how are we solving them?
Examples

5 Jonas Traub, BIRTE @ VLDB, 2018
5
5 Jonas Traub, BIRTE @ VLDB, 2018
Apache Flink Timeline

6 Jonas Traub, BIRTE @ VLDB, 2018
6
6 Jonas Traub, BIRTE @ VLDB, 2018

7 Jonas Traub, BIRTE @ VLDB, 2018
Apache Flink - Stateful Computations over Data Streams
source: flink.apache.org
• Event-driven Applications
• Stream & Batch Analytics
• Data Pipelines & ETL
• Exactly-once state consistency
• Event-time processing
• Sophisticated late data handling
• Scale-out architecture
• Support for very large state
• Incremental checkpointing

8 Jonas Traub, BIRTE @ VLDB, 20188 Jonas Traub, BIRTE @ VLDB, 2018
Examples:
What are core problems and
how are we solving them?

9 Jonas Traub, BIRTE @ VLDB, 20189 Jonas Traub, BIRTE @ VLDB, 2018
Examples:
Expressiveness: Event-time processing and sophisticated late data handling
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-
Scale, Unbounded, Out-of-Order Data Processing (Akidau et al.)

10 Jonas Traub, BIRTE @ VLDB, 201810 Jonas Traub, BIRTE @ VLDB, 2018
Examples:
Expressiveness: Event-time processing and sophisticated late data handling
Service: Common APIs and feature sets
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-
Scale, Unbounded, Out-of-Order Data Processing (Akidau et al.)
Apache Beam: An advanced unified programming model
“Implement batch and streaming data processing jobs that run on any
execution engine. (beam.apache.org)”

11 Jonas Traub, BIRTE @ VLDB, 201811 Jonas Traub, BIRTE @ VLDB, 2018
Examples:
Consistency: Exactly-once state consistency
Expressiveness: Event-time processing and sophisticated late data handling
Service: Common APIs and feature sets
Lightweight asynchronous snapshots for distributed dataflows
P Carbone, G Fóra, S Ewen, S Haridi, K Tzoumas
State management in Apache Flink: consistent stateful distributed stream processing
P Carbone, S Ewen, G Fóra, S Haridi, S Richter, K Tzoumas
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-
Scale, Unbounded, Out-of-Order Data Processing (Akidau et al.)
Apache Beam: An advanced unified programming model
“Implement batch and streaming data processing jobs that run on any
execution engine. (beam.apache.org)”

More from Jonas Traub

Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures such as temporary sensor outages. Session windows are widely used to separate different periods of user activity from each other. Current versions of Apache Flink use Window Buckets to process stream aggregations with session windows and out-of-order tuples. This Approach does not share partial aggregates among overlapping windows. In our talk, we present Scotty, a high throughput operator for window discretization and aggregation in Apache Flink. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all overlapping windows including session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to tumbling and sliding windows and (2) processes out-of-order tuples efficiently. Scotty was first published at ICDE 2018 (http://www.user.tu-berlin.de/powibol/assets/publications/traub-scotty-icde-2018.pdf).

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing

Jonas Traub

This poster was presented at ICDE 2018. Abstract: Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures such as temporary sensor outages. Session windows are widely used to separate different periods of user activity from each other. In this paper, we present Scotty, a high throughput operator for window discretization and aggregation. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all concurrent queries with arbitrary combinations of tumbling, sliding, and session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to tumbling and sliding windows and (2) processes out-of-order tuples efficiently. Our technique is generally applicable to a broad group of dataflow systems which use a unified batch and stream processing model. Our experiments show that we achieve a throughput an order of magnitude higher than alternative state-of-the-art solutions.

Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing

Jonas Traub

Paper: Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing Abstract: Machine learning techniques for data stream analysis suffer from concept drifts such as changed user preferences, varying weather conditions, or economic changes. These concept drifts cause wrong predictions and lead to incorrect business decisions. Concept drift detection methods such as adaptive windowing (Adwin) allow for adapting to concept drifts on the fly. In this paper, we examine Adwin in detail and point out its throughput bottlenecks. We then introduce several parallelization alternatives to address these bottlenecks. Our optimizations lead to a speedup of two orders of magnitude over the original Adwin implementation. Thus, we explore parallel adaptive windowing to provide scalable concept detection for high-velocity data streams with millions of tuples per second.

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...

Jonas Traub

This poster was presented at the 21st International Conference on Extending Database Technology (EDBT), March 26-29, 2018. Paper: Efficient SIMD Vectorization for Hashing in OpenCL Abstract: Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processorspecific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code. OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our results show that OpenCL-based vectorization is competitive to intrinsics on CPUs but not on Xeon Phi coprocessors.

Efficient SIMD Vectorization for Hashing in OpenCL

Jonas Traub

About the Workshop: The Stream Reasoning Workshop took place from January 16th to 17th, 2018. Processing, querying and reasoning over streaming data is studied in different communities such as KR&R, Semantic Web, Databases, Stream Processing, Complex Event Processing, etc., where researchers have different perspectives and face different challenges. This workshop aims at advancing Stream Reasoning as research theme by bringing together these different views and goals. In addition to invited talks, the workshop will provide opportunities for all participants to engage in discussions on open problems and future directions. (http://www.ifi.uzh.ch/en/ddis/events/streamreasoning2018.html) About the Talk: Real-time sensor data enables diverse applications such as smart metering, traffic monitoring, and sport analysis. In the Internet of Things, billions of sensor nodes form a sensor cloud and offer data streams to analysis systems. However, it is impossible to transfer all available data with maximal frequencies to all applications. Therefore, we need to tailor data streams to the demand of applications. We contribute a technique that optimizes communication costs while maintaining the desired accuracy. Our technique schedules reads across huge amounts of sensors based on the data-demands of a huge amount of concurrent queries. We introduce user-defined sampling functions that define the data-demand of queries and facilitate various adaptive sampling techniques, which decrease the amount of transferred data. Moreover, we share sensor reads and data transfers among queries. Our experiments with real-world data show that our approach saves up to 87% in data transmissions.

UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...

Jonas Traub

JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...

Jonas Traub

We present I², an interactive development environment for real-time analysis pipelines, which is based on Apache Flink and Apache Zeppelin. The sheer amount of available streaming data frequently makes it impossible to visualize all data points at the same time. I² coordinates running Flink jobs and corresponding visualizations such that only the currently depicted data points are processed in Flink and transferred towards the front end. We show how Flink jobs can adapt to changed visualization properties at runtime to allow interactive data exploration on high bandwidth data streams. Moreover, we present a data reduction technique which minimizes data transfer while providing loss free time-series plots. We show I² in a live demonstration in which we replay recorded sensor data from a football match (ca. 12k event/s). I² was first presented at EDBT'17 where it was awarded as best demonstration. The demonstration is available as open source at github.com/TU-Berlin-DIMA/i2.

I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...

Jonas Traub

I²: Interactive Real-Time Visualization for Streaming Data

Jonas Traub

LWA 2015: The Apache Flink Platform (Poster)

Jonas Traub

LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis

Jonas Traub

More from Jonas Traub (10)

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing

Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...

Efficient SIMD Vectorization for Hashing in OpenCL

UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...

JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...

I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...

I²: Interactive Real-Time Visualization for Streaming Data

LWA 2015: The Apache Flink Platform (Poster)

LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis

Recently uploaded

Context. WASP-76 b has been a recurrent subject of study since the detection of a signature in high-resolution transit spectroscopy data indicating an asymmetry between the two limbs of the planet. The existence of this asymmetric signature has been confirmed by multiple studies, but its physical origin is still under debate. In addition, it contrasts with the absence of asymmetry reported in the infrared (IR) phase curve. Aims. We provide a more comprehensive dataset of WASP-76 b with the goal of drawing a complete view of the physical processes at work in this atmosphere. In particular, we attempt to reconcile visible high-resolution transit spectroscopy data and IR broadband phase curves. Methods. We gathered 3 phase curves, 20 occultations, and 6 transits for WASP-76 b in the visible with the CHEOPS space telescope. We also report the analysis of three unpublished sectors observed by the TESS space telescope (also in the visible), which represents 34 phase curves. Results. WASP-76 b displays an occultation of 260±11 and 152±10 ppm in TESS and CHEOPS bandpasses respectively. Depending on the composition assumed for the atmosphere and the data reduction used for the IR data, we derived geometric albedo estimates that range from 0.05 ± 0.023 to 0.146 ± 0.013 and from <0.13 to 0.189 ± 0.017 in the CHEOPS and TESS bandpasses, respectively. As expected from the IR phase curves, a low-order model of the phase curves does not yield any detectable asymmetry in the visible either. However, an empirical model allowing for sharper phase curve variations offers a hint of a flux excess before the occultation, with an amplitude of ∼40 ppm, an orbital offset of ∼−30◦ , and a width of ∼20◦ . We also constrained the orbital eccentricity of WASP-76 b to a value lower than 0.0067, with a 99.7% confidence level. This result contradicts earlier proposed scenarios aimed at explaining the asymmetry observed in high-resolution transit spectroscopy. Conclusions. In light of these findings, we hypothesise that WASP-76 b could have night-side clouds that extend predominantly towards its eastern limb. At this limb, the clouds would be associated with spherical droplets or spherically shaped aerosols of an unknown species, which would be responsible for a glory effect in the visible phase curves.

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Sérgio Sacani

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...

Silpa

Module for Grade 9 for Asynchronous/Distance learning

levieagacer

Zoology 5th semester notes( Sumit_yadav).pdf

Sumit Kumar yadav

The computation of anti-derivatives is just an in-tellectual challenge, we know how to take deriv-atives, but … can we invert the process? We call this Computing the indefinite integral . In the last presentation we have seen a few indefinite integrals (we called them bricks), but they did not include the anti-derivative of many functions! We are going to try and do better !

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)

AkefAfaneh2

Bacterial Identification and Classifications

Areesha Ahmad

Dr. E. Muralinath_ Blood indices_clinical aspects

muralinath2

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx

DiariAli

Human genetics..........................pptx

Silpa

FAIRSpectra - Enabling the FAIRification of Analytical Science

Alex Henderson

Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...

Monika Rani

fruit fly, this slide mainly made for pumpkin fruit fly, this is also known as drosophila melangastor, this type of fruit fly destroyed the mainly vegetables crops. if you want to known examples this types of fly which is destroy the pumpkin, tomato, brinjal, potato, bottle guard, ridge guard, bitter guard, cucumber, water melon, musk melon, bean, long bean and other many vegetables which has fruits. they distryed fruit fly. thank you...

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly

PRADYUMMAURYA1

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA

Dr. TATHAGAT KHOBRAGADE

www.whatsapp.com+917728919243 HOT & SEXY MODELS // COLLEGE GIRLS AVAILABLE FOR COMPLETE ENJOYMENT WITH HIGH PROFILE INDIAN MODEL AVAILABLE HOTEL & HOME ★ SAFE AND SECURE HIGH CLASS SERVICE AFFORDABLE RATE SATISFACTION,UNLIMITED ENJOYMENT. ★ All Meetings are confidential and no information is provided to any one at any cost. ★ EXCLUSIVE PROFILes Are Safe and Consensual with Most Limits Respected ★ Service Available In: - HOME *Star Hotel Service .In Call & Out call SeRvIcEs : ★ A-Level ★ Strip-tease ★ BBBJ (Bareback Blowjob)Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without a Condom)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service

shivanisharma5244

Chemistry 5th semester paper 1st Notes.pdf

Sumit Kumar yadav

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor

muralinath2

Ultrasound color Doppler imaging has been routinely used for the diagnosis of cardiovascular diseases, enabling real-time flow visualization through the Doppler effect. Yet, its inability to provide true flow velocity vectors due to its one-dimensional detection limits its efficacy. To overcome this limitation, various VFI schemes, including multi-angle beams, speckle tracking, and transverse oscillation, have been explored, with some already available commercially. However, many of these methods still rely on autocorrelation, which poses inherent issues such as underestimation, aliasing, and the need for large ensemble sizes. Conversely, speckle-tracking-based VFI enables lateral velocity estimation but suffers from significantly lower accuracy compared to axial velocity measurements. To address these challenges, we have presented a speckle-tracking-based VFI approach utilizing multi-angle ultrafast plane wave imaging. Our approach involves estimating axial velocity components projected onto individual steered plane waves, which are then combined to derive the velocity vector. Additionally, we've introduced a VFI visualization technique with high spatial and temporal resolutions capable of tracking flow particle trajectories. Simulation and flow phantom experiments demonstrate that the proposed VFI method outperforms both speckle-tracking-based VFI and autocorrelation VFI counterparts by at least a factor of three. Furthermore, in vivo measurements on carotid arteries using the Prodigy ultrasound scanner demonstrate the effectiveness of our approach compared to existing methods, providing a more robust imaging tool for hemodynamic studies. Learning objectives: - Understand fundamental limitations of color Doppler imaging. - Understand principles behind advanced vector flow imaging techniques. - Familiarize with the ultrasound speckle tracking technique and its implications in flow imaging. - Explore experiments conducted using multi-angle plane wave ultrafast imaging, specifically utilizing the pulse-sequence mode on a 128-channel ultrasound research platform.

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...

Scintica Instrumentation

Grade 7 - Lesson 1 - Microscope and Its Functions

OrtegaSyrineMay

GBSN - Microbiology (Unit 2)

Areesha Ahmad

Factory Acceptance Test( FAT).pptx .

Poonam Aher Patil

Recently uploaded (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...

Module for Grade 9 for Asynchronous/Distance learning

Zoology 5th semester notes( Sumit_yadav).pdf

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)

Bacterial Identification and Classifications

Dr. E. Muralinath_ Blood indices_clinical aspects

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx

Human genetics..........................pptx

FAIRSpectra - Enabling the FAIRification of Analytical Science

Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA

Call Girls Ahmedabad +917728919243 call me Independent Escort Service

Chemistry 5th semester paper 1st Notes.pdf

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...

Grade 7 - Lesson 1 - Microscope and Its Functions

GBSN - Microbiology (Unit 2)

Factory Acceptance Test( FAT).pptx .

BIRTE Panel at VLDB: Are we solving the core Problems in stream processing?

1. 1 Jonas Traub, BIRTE @ VLDB, 20181 Jonas Traub, BIRTE @ VLDB, 2018 Are we solving the core problems in stream processing? Jonas Traub Technische Universität Berlin / DFKI IAM www.dima.tu-berlin.de | jonas.traub@tu-berlin.de Panel Discussion with: Manpreet Singh (Google) Karthik Ramasamy (Stremlio) C. Mohan (IBM) Badrish Chandramouli (Microsoft) Neng Lu (Twitter) Alok Pareek (Striim) Jonas Traub (TU-Berlin)

2. 2 Jonas Traub, BIRTE @ VLDB, 20182 Jonas Traub, BIRTE @ VLDB, 2018 Are we solving the core problems in stream processing? Jonas Traub Technische Universität Berlin / DFKI IAM www.dima.tu-berlin.de | jonas.traub@tu-berlin.de

3. 3 Jonas Traub, BIRTE @ VLDB, 20183 Jonas Traub, BIRTE @ VLDB, 2018 Are we solving the core problems in stream processing? Yes, we do!

4. 4 Jonas Traub, BIRTE @ VLDB, 20184 Jonas Traub, BIRTE @ VLDB, 2018 Are we solving the core problems in stream processing? Yes, we do! Apache Flink and its success story What are the core problems and how are we solving them? Examples

5. 5 Jonas Traub, BIRTE @ VLDB, 2018 5 5 Jonas Traub, BIRTE @ VLDB, 2018 Apache Flink Timeline

6. 6 Jonas Traub, BIRTE @ VLDB, 2018 6 6 Jonas Traub, BIRTE @ VLDB, 2018

7. 7 Jonas Traub, BIRTE @ VLDB, 2018 Apache Flink - Stateful Computations over Data Streams source: flink.apache.org • Event-driven Applications • Stream & Batch Analytics • Data Pipelines & ETL • Exactly-once state consistency • Event-time processing • Sophisticated late data handling • Scale-out architecture • Support for very large state • Incremental checkpointing

8. 8 Jonas Traub, BIRTE @ VLDB, 20188 Jonas Traub, BIRTE @ VLDB, 2018 Examples: What are core problems and how are we solving them?

9. 9 Jonas Traub, BIRTE @ VLDB, 20189 Jonas Traub, BIRTE @ VLDB, 2018 Examples: Expressiveness: Event-time processing and sophisticated late data handling The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive- Scale, Unbounded, Out-of-Order Data Processing (Akidau et al.)

10. 10 Jonas Traub, BIRTE @ VLDB, 201810 Jonas Traub, BIRTE @ VLDB, 2018 Examples: Expressiveness: Event-time processing and sophisticated late data handling Service: Common APIs and feature sets The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive- Scale, Unbounded, Out-of-Order Data Processing (Akidau et al.) Apache Beam: An advanced unified programming model “Implement batch and streaming data processing jobs that run on any execution engine. (beam.apache.org)”

11. 11 Jonas Traub, BIRTE @ VLDB, 201811 Jonas Traub, BIRTE @ VLDB, 2018 Examples: Consistency: Exactly-once state consistency Expressiveness: Event-time processing and sophisticated late data handling Service: Common APIs and feature sets Lightweight asynchronous snapshots for distributed dataflows P Carbone, G Fóra, S Ewen, S Haridi, K Tzoumas State management in Apache Flink: consistent stateful distributed stream processing P Carbone, S Ewen, G Fóra, S Haridi, S Richter, K Tzoumas The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive- Scale, Unbounded, Out-of-Order Data Processing (Akidau et al.) Apache Beam: An advanced unified programming model “Implement batch and streaming data processing jobs that run on any execution engine. (beam.apache.org)”

BIRTE Panel at VLDB: Are we solving the core Problems in stream processing?

Recommended

Recommended

More Related Content

More from Jonas Traub

More from Jonas Traub (10)

Recently uploaded

Recently uploaded (20)

BIRTE Panel at VLDB: Are we solving the core Problems in stream processing?