SlideShare una empresa de Scribd logo
1 de 72
Will it Scale?
The Secrets behind Scaling Stream Processing
Applications
Navina Ramesh
Software Engineer, LinkedIn
Apache Samza, Committer & PMC
navina@apache.org
What is this talk about ?
● Understand the architectural choices in stream processing systems that may
impact performance/scalability of stream processing applications
● Have a high level comparison of two streaming engines (Flink/Samza) with a
focus on scalability of the stream-processing application
What this talk is not about ?
● Not a feature-by-feature comparison of existing stream processing systems
(such as Flink, Storm, Samza etc)
Agenda
● Use cases in Stream Processing
● Typical Data Pipelines
● Scaling Data Ingestion
● Scaling Data Processing
○ Challenges in Scaling Data Processing
○ Walk-through of Apache Flink & Apache Samza
○ Observations on state & fault-tolerance
● Challenges in Scaling Result Storage
● Conclusion
0 ms
RPC
Stream Processing
Synchronous
Milliseconds to minutes
Later. Typically, hours
Response Latency
Batch Processing
Spectrum of Processing
Newsfeed
Cyber Security
Selective Push Notifications
Agenda
● Use cases in Stream Processing
●Typical Data Pipelines
● Scaling Data Ingestion
● Scaling Data Processing
○ Challenges in Scaling Data Processing
○ Walk-through of Apache Flink & Apache Samza
○ Observations on state & fault-tolerance
● Challenges in Scaling Result Storage
● Conclusion
Typical Data Pipeline - Batch
Ingestion
Service
HDFS
Mappers Reducers
HDFS/
HBase
Query
Typical Data Pipeline
Ingestion
Service
HDFS
Mappers Reducers
HDFS/
HBase
Data
Ingestion
Query
Typical Data Pipeline - Batch
Ingestion
Service
HDFS
Mappers Reducers
HDFS/
HBase
Data
Ingestion
Data
Processing
Query
Typical Data Pipeline - Batch
Ingestion
Service
HDFS
Mappers Reducers
HDFS/
HBase
Data
Ingestion
Data
Processing
Result Storage /
Serving
Query
Typical Data Pipeline - Batch
Parallels in Streaming
Ingestion
Service
HDFS
Mappers Reducers
HDFS/
HBase
Processors Processors
HDFS
KV
Store
Partition 0
Partition 1
Partition N
...
Data
Ingestion
Data
Processing
Result Storage /
Serving
Query
Query
Ingestion
Service
HDFS
Mappers Reducers
HDFS/
HBase
Processors Processors
HDFS
KV
Store
Partition 0
Partition 1
Partition N
...
Data
Ingestion
Data
Processing
Result Storage /
Serving
Query
Query
Parallels in Streaming
Batch Streaming
● Data Processing on bounded data
● Acceptable Latency - order of hours
● Processing occurs at regular intervals
● Throughput trumps latency
● Horizontal scaling to improve processing
time
● Data processing on unbounded data
● Low latency - order of sub-seconds
● Processing is continuous
● Horizontal scaling is not straightforward
(stateful applications)
● Need tools to reason about time (esp.
when re-processing stream)
Agenda
● Use cases in Stream Processing
● Typical Data Pipelines
●Scaling Data Ingestion
● Scaling Data Processing
○ Challenges in Scaling Data Processing
○ Walk-through of Apache Flink & Apache Samza
○ Observations on state & fault-tolerance
● Challenges in Scaling Result Storage
● Conclusion
Typical Data Ingestion
Producers
Partition 0
Partition 1
Partition 3
key=0
key=3
key=23
Stream A
Consumer
(host A)
Consumer
(host B)
Partition 2
- Typically, streams are
partitioned
- Messages sent to partitions
based on “Partition Key”
- Time-based message
retentionkey=10
Kafka Kinesis
Scaling Data Ingestion
Producers
Partition 0
Partition 1
Partition 3
Stream A
Consumer
(host A)
Consumer
(host B)
Partition 2
- Scaling “up” -> Increasing
partitions
- Changing partitioning logic
re-distributes* the keys
across the partitions
Partition 4
key=0
key=10
key=23
key=3
Kafka Kinesis
Scaling Data Ingestion
Producers
Partition 0
Partition 1
Partition 3
Stream A
Consumer
(host A)
Consumer
(host B)
Partition 2
- Scaling “up” -> Increasing
partitions
- Changing partitioning logic
re-distributes* the keys
across the partitions
- Consuming clients (includes
stream processors) should be
able to re-adjust!
- Impact -> Over-provisioning
of partitions in order to handle
changes in load
Partition 4
key=0
key=10
key=23
key=3
Kafka Kinesis
Agenda
● Use cases in Stream Processing
● Typical Data Pipelines
● Scaling Data Ingestion
●Scaling Data Processing
○ Challenges in Scaling Data Processing
○ Walk-through of Apache Flink & Apache Samza
○ Observations on state & fault-tolerance
● Challenges in Scaling Result Storage
● Conclusion
Scaling Data Processing
● Increase number of processing units → Horizontal Scaling
Scaling Data Processing
● Increase number of processing units → Horizontal Scaling
But more machines means more $$$
● Impact NOT only CPU cores, but “large” (order of TBs) stateful applications
impact network and disk!!
Key Bottleneck in Scaling Data Processing
● Accessing State
○ Operator state
■ Read/Write state that is maintained during stream processing
■ Eg: windowed aggregation, windowed join
○ Adjunct state
■ To process events, applications might need to lookup related or ‘adjunct’ data.
Repartitioner Assembler
homepage_service_call
feed_service_call
profile_service_call
pymk_service_call
...
Homepage_service_call (tree id: 10)
| |
| Pymk_service_call (tree id: 10)
| |
| Profile_service_call (tree id: 10)
|
Feed_service_call (tree id: 10)
Stateful Process!
Service
Calls
Accessing Operator State: Assemble Call Graph
(Partition events by
“tree id”)
(Aggregate events
by “tree id”)
Repartitioner Assembler
homepage_service_call
feed_service_call
profile_service_call
pymk_service_call
...
Homepage_service_call (tree id: 10)
| |
| Pymk_service_call (tree id: 10)
| |
| Profile_service_call (tree id: 10)
|
Feed_service_call (tree id: 10)
In-Memory Mapping
Service
Calls
Accessing Operator State: Assemble Call Graph
- In-memory structure to aggregate events until ready to
output
- Concerns:
- Large windows can cause overflow!
- Restarting job after a long downtime can
increase memory pressure!
(Partition events by
“tree id”)
(Aggregate events
by “tree id”)
Repartitioner Assembler
homepage_service_call
feed_service_call
profile_service_call
pymk_service_call
...
Homepage_service_call (tree id: 10)
| |
| Pymk_service_call (tree id: 10)
| |
| Profile_service_call (tree id: 10)
|
Feed_service_call (tree id: 10)
Service
Calls
Accessing Operator State: Assemble Call Graph
Remote
KV Store
(operator state)
(Partition events by
“tree id”)
(Aggregate events
by “tree id”)
Concerns:
- Remote RPC is Slow!! (Stream: ~1 million
records/sec ; DB: ~3-4K writes/sec)
- Mutations can’t rollback!
- Task may fail & recover
- Change in logic!
Accessing Operator State: Push Notifications
B2
Online
Apps
Relevance
Score
User
Action Data
Task
(Generate active notifications -
filtering, windowed-aggregation,
external calls etc)
Notification System
(Scheduler)
Accessing Operator State: Push Notifications
B2
Online
Apps
Relevance
Score
User
Action Data
Task
(Generate active notifications -
filtering, windowed-aggregation,
external calls etc)
Notification System
(Scheduler)
- Stream processing tasks
consume from multiple sources
- offline/online
- Performs multiple operations
- Filters information and
buffers data for window of
time
- Aggregates / Joins
buffered data
- Total operator state per
instance can easily grow to
multiple GBs per Task
Accessing Adjunct Data: AdQuality Updates
Task
AdClicks AdQuality Update
Read Member Data
Member Info
Stream-to-Table Join
(Look-up memberId & generate
AdQuality improvements for the
User)
Accessing Adjunct Data: AdQuality Updates
Task
AdClicks AdQuality Update
Read Member Data
Member Info
Stream-to-Table Join
(Look-up memberId & generate
AdQuality improvements for the
User)
Concerns:
- Remote look-up Latency is
high!
- DDoS on shared store -
MemberInfo
Accessing Adjunct Data using Cache: AdQuality Updates
Task
AdClicks AdQuality Update
Read Member Data
Member Info
Stream-to-Table Join
(Maintain a cache of
member Info & do local
lookup)
Accessing Adjunct Data using Cache: AdQuality Updates
Task
AdClicks AdQuality Update
Read Member Data
Member Info
Stream-to-Table Join
(Maintain a cache of
member Info & do local
lookup)
Concerns:
- Overhead of maintaining cache
consistency based on the source of
truth (MemberInfo)
- Warming up the cache after the job’s
downtime can cause temporary spike
in QPS on the shared store
Agenda
● Use cases in Stream Processing
● Typical Data Pipelines
● Scaling Data Ingestion
● Scaling Data Processing
○ Challenges in Scaling Data Processing
○Walk-through of Apache Flink & Apache Samza
○ Observations on state & fault-tolerance
● Challenges in Scaling Result Storage
● Conclusion
Apache Flink
Apache Flink: Processing
● Dataflows with streams and
transformation operators
● Starts with one or more source and
ends in one or more sinks
Actor System
Scheduler
Checkpoint
Coordinator
Job Manager
Task
Slot
Task
Slot
Task Manager
Task
Slot
Actor System
Network Manager
Memory & I/O Manager
Task
Slot
Task
Slot
Task Manager
Task
Slot
Actor System
Network Manager
Memory & I/O Manager
Stream
Task
Slot
● JobManager (Master) coordinates
distributed execution such as,
checkpoint, recovery management,
schedule tasks etc.
● TaskManager (JVM Process)
execute the subtasks of the dataflow,
and buffer and exchange data
streams
● Each Task Slot may execute multiple
subtasks and runs on a separate
thread.
Apache Flink: Processing
Apache Flink: State Management
● Lightweight Asynchronous Barrier Snapshots
● Master triggers checkpoint and source inserts barrier
● On receiving barrier from all input sources, each operator stores the entire state, acks the
checkpoint to the master and emits snapshot barrier in the output
Apache Flink: State Management
Job
Manager
Task
Manager
HDFS
Snapshot Store
Task
Manager
Task
Manager
● Lightweight Asynchronous Barrier
Snapshots
● Periodically snapshot the entire state
to snapshot store
● Checkpoint mapping is stored in Job
Manager
● Snapshot Store (typically, HDFS)
○ operator state
(windows/aggregation)
○ user-defined state
(checkpointed)
Apache Flink: State Management
● Operator state is primarily
stored In-Memory or local File
System
● Recently added RocksDB
● Allows user-defined operators
to define state that should be
checkpointed
Job
Manager
Task
Manager
HDFS
Snapshot Store
Task
Manager
Task
Manager
Apache Flink: Fault Tolerance of State
Job
Manager
Task
Manager
Snapshot Store
Task
Manager
Task
Manager
Task Failure
Apache Flink: Fault Tolerance of State
Job
Manager
Task
Manager
HDFS
Task
Manager
Task
Manager
● Full restore of snapshot from last
completed checkpointed state
● Continues processing after restoring
from the latest snapshot from the
store
Full Restore
Apache Flink: Summary
● State Management Primitives:
○ Within task, local state info is stored primarily in-memory (recently, rocksdb)
○ Periodic snapshot (checkpoints + user-defined state + operator state) written to Snapshot
Store
● Fault-Tolerance of State
○ Full state restored from Snapshot Store
Apache Flink: Observations
● Full snapshots are expensive for large states
● Frequent snapshots that can quickly saturate network
● Applications must trade-off between snapshot frequency and how large a
state can be built within a task
Apache Samza
Apache Samza: Processing
● Samza Master handles container
life-cycle and failure handling
Samza
Master
Task Task
Container
Task Task
Container
Apache Samza: Processing
● Samza Master handles container life-
cycle and failure handling
● Each container (JVM process)
contains more than one task to
process the input stream partitions
Samza
Master
Task Task
Container
Task Task
Container
Apache Samza: State Management
● Tasks checkpoint periodically to a
checkpoint stream
● Checkpoint indicates which position
in the input from which processing
has to continue in case of a container
restart
Samza
Master
Task Task
Container
Task Task
Container
Checkpoint Stream
Apache Samza: State Management
● State store is local to the task -
typically RocksDB (off-heap) and In-
Memory (backed by a map)
● State store contains any operator
state or adjunct state
● Allows application to define state
through a Key Value interface
Samza
Master
Task Task
Container
Task Task
Container
Checkpoint Stream
Apache Samza: State Management
● State store is continuously replicated
to a changelog stream
● Each store partition is mapped to a
specific changelog partition
Samza
Master
Task Task
Container
Task Task
Container
Changelog Stream
Checkpoint Stream
Apache Samza: Fault Tolerance of State
Samza
Master
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
Container Failure
Machine A Machine B
Samza
Master
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
Re-allocated on
different host!
Machine A Machine X
● When container is recovered in a
different host, there is no state
available locally
Apache Samza: Fault Tolerance of State
Samza
Master
Task Task
Container
Task
Checkpoint Stream
Re-allocated on
different host!
Machine A Machine X
● When container comes up in a
different host, there is no state
available locally
● Restores from the beginning of the
changelog stream -> Full restore!
Task Task
Container
Apache Samza: Fault Tolerance of State
Samza
Master
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
Container Failure
● State store is persisted to local disk
on the machine, along with info on
which offset to begin restoring the
state from changelogMachine A Machine B
Apache Samza: Fault Tolerance of State
Samza
Master
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
Re-allocated on same host!
Machine A Machine B
● Samza Master tries to re-allocate the
container on the same host
● The feature where the Samza Master
attempts to co-locate the task with
their built-up state stores (where they
were previously running) is called
Host-affinity.
Apache Samza: Fault Tolerance of State
Samza
Master
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
Machine A Machine B
Re-allocated on same host!
● Samza Master tries to re-allocate the
container on the same host
● The feature where the Samza Master
attempts to co-locate the task with
their built-up state stores (where they
were previously running) is called
Host-affinity.
● If container is re-allocated on the
same host, state store is partially
restored from changelog stream
(delta restore)
Apache Samza: Fault Tolerance of State
Samza
AppMaster
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
● Once state is restored, checkpoint
stream contains the correct offset for
each task to begin processing
Machine A Machine B
Re-allocated on same host!
Apache Samza: Fault Tolerance of State
● Persisting state on local disk + host-
affinity effectively reduces the time-
to-recover state from failure (or)
upgrades and continue with
processing
Samza
AppMaster
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
Apache Samza: Fault Tolerance of State
● Persisting state on local disk + host-
affinity effectively reduces the time-
to-recover state from failure (or)
upgrades and continue with
processing
● Only a subset of tasks may require
full restore, thereby, reducing the
time to recover from failure or time to
restart processing upon upgrades!
Samza
AppMaster
Task Task
Container
Task Task
Container
Checkpoint Stream
Changelog Stream
Apache Samza: Fault Tolerance of State
Apache Samza: Summary
● State Management Primitives
○ Within task, data is stored in-memory or on-disk using RocksDB
○ Checkpoint state stored in checkpoint-stream
○ User-defined and operator state continuously replicated in a changelog stream
● Fault-Tolerance of State
○ Full state restored by consuming changelog stream, if user-defined state not persisted on
task’s machine
○ If locally persisted, only partial restore
Apache Samza: Observations
● State recovery from changelog can be time-consuming. It could potentially
saturate Kafka clusters. Hence, partial restore is necessary.
● Host-affinity allows for faster failure recovery of task states, and faster job
upgrades, even for large stateful jobs
● Since checkpoints are written to a stream and state is continuously replicated
in changelog, frequent checkpoints are possible.
Agenda
● Use cases in Stream Processing
● Typical Data Pipelines
● Scaling Data Ingestion
● Scaling Data Processing
○ Challenges in Scaling Data Processing
○ Walk-through of Apache Flink & Apache Samza
○Observations on state & fault-tolerance
● Challenges in Scaling Result Storage
● Conclusion
Comparison of State & Fault-tolerance
Apache Samza Apache Flink
Durable State
RocksDB FileSystem (Recently added,
RocksDB)
State Fault Tolerance Kafka based Changelog Stream HDFS
State Update Unit Delta Changes Full Snapshot
State Recovery Unit
Full Restore + Improved recovery
with host-affinity
Full Restore
Agenda
● Use cases in Stream Processing
● Typical Data Pipelines
● Scaling Data Ingestion
● Scaling Data Processing
○ Challenges in Scaling Data Processing
○ Walk-through of Apache Flink & Apache Samza
○ Observations on state & fault-tolerance
●Challenges in Scaling Result Storage
● Conclusion
Challenges in Scaling Result Storage / Serving
● Any fast KV store can handle very small (order of thousands) QPS compared
to the rate of stream processing output rate (order of millions)
● Output store can DoS due to high-throughput
Online +Async Processor
Challenges in Scaling Result Storage / Serving
Processor Downtime ~30 min
Processor Restarts
(Derived Data)
Online Apps
QueryServing
Platform
Stream
Processing
Scaling Result Storage / Serving
Offline
Processing
Distributed
Queue
(Kafka)
Bulk Load
Change Stream
Agenda
● Use cases in Stream Processing
● Typical Data Pipelines
● Scaling Data Ingestion
● Scaling Data Processing
○ Challenges in Scaling Data Processing
○ Walk-through of Apache Flink & Apache Samza
○ Observations on state & fault-tolerance
● Challenges in Scaling Result Storage
●Conclusion
Conclusion
● Ingest/Process/Serve should be wholistically scalable to successfully scale
stream processing applications
● The notion of a “locally” accessible state is great to scale stream processing
applications for performance. It brings in the additional cost of making the
state fault-tolerant
References
● Apache Samza - http://samza.apache.org
● Apache Flink - http://flink.apache.org
● https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
● https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html
● http://www.slideshare.net/JamieGrier/stateful-stream-processing-at-inmemory-speed
● https://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/
● Apache Kafka - http://kafka.apache.org
● http://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html
● http://aws.amazon.com/streaming-data/
Contribute!
Exciting features coming up in Apache Samza:
● SAMZA-516 - Standalone deployment of Samza (independent of Yarn)
● SAMZA-390 - High-level Language for Samza
● SAMZA-863 - Multithreading in Samza
Join us!
● Apache Samza - samza.apache.org
● Samza Hello World! - https://samza.apache.org/startup/hello-samza/latest/
● Mailing List - dev@samza.apache.org
● JIRA - http://issues.apache.org/jira/browse/SAMZA
Thanks!

Más contenido relacionado

La actualidad más candente

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA
 
Essential ingredients for real time stream processing @Scale by Kartik pParam...
Essential ingredients for real time stream processing @Scale by Kartik pParam...Essential ingredients for real time stream processing @Scale by Kartik pParam...
Essential ingredients for real time stream processing @Scale by Kartik pParam...Big Data Spain
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingSantosh Sahoo
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla Maheedhar Gunturu
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 

La actualidad más candente (20)

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
 
Essential ingredients for real time stream processing @Scale by Kartik pParam...
Essential ingredients for real time stream processing @Scale by Kartik pParam...Essential ingredients for real time stream processing @Scale by Kartik pParam...
Essential ingredients for real time stream processing @Scale by Kartik pParam...
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark Streaming
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
 

Destacado

Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big dataCharlie Hull
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeTao Feng
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with CassandraDataStax Academy
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsFlink Forward
 
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsLatency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsZbigniew Jerzak
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingZbigniew Jerzak
 
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingZbigniew Jerzak
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best PracticesDoiT International
 

Destacado (14)

Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
 
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsLatency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
 
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
 
Cluster Schedulers
Cluster SchedulersCluster Schedulers
Cluster Schedulers
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
 

Similar a Will it Scale? The Secrets behind Scaling Stream Processing Applications

Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Application Scalability in Server Farms - NCache
Application Scalability in Server Farms - NCacheApplication Scalability in Server Farms - NCache
Application Scalability in Server Farms - NCacheAlachisoft
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaRicardo Bravo
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTX
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTXHA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTX
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTXThinL389917
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...Angad Singh
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
 
Four Ways to Improve ASP .NET Performance and Scalability
 Four Ways to Improve ASP .NET Performance and Scalability Four Ways to Improve ASP .NET Performance and Scalability
Four Ways to Improve ASP .NET Performance and ScalabilityAlachisoft
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan finalpreethaappan
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestDataGyula Fóra
 
SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management jems7
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Till Rohrmann
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 

Similar a Will it Scale? The Secrets behind Scaling Stream Processing Applications (20)

Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Application Scalability in Server Farms - NCache
Application Scalability in Server Farms - NCacheApplication Scalability in Server Farms - NCache
Application Scalability in Server Farms - NCache
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTX
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTXHA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTX
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTX
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Four Ways to Improve ASP .NET Performance and Scalability
 Four Ways to Improve ASP .NET Performance and Scalability Four Ways to Improve ASP .NET Performance and Scalability
Four Ways to Improve ASP .NET Performance and Scalability
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 

Último

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 

Último (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Will it Scale? The Secrets behind Scaling Stream Processing Applications

  • 1. Will it Scale? The Secrets behind Scaling Stream Processing Applications Navina Ramesh Software Engineer, LinkedIn Apache Samza, Committer & PMC navina@apache.org
  • 2. What is this talk about ? ● Understand the architectural choices in stream processing systems that may impact performance/scalability of stream processing applications ● Have a high level comparison of two streaming engines (Flink/Samza) with a focus on scalability of the stream-processing application
  • 3. What this talk is not about ? ● Not a feature-by-feature comparison of existing stream processing systems (such as Flink, Storm, Samza etc)
  • 4. Agenda ● Use cases in Stream Processing ● Typical Data Pipelines ● Scaling Data Ingestion ● Scaling Data Processing ○ Challenges in Scaling Data Processing ○ Walk-through of Apache Flink & Apache Samza ○ Observations on state & fault-tolerance ● Challenges in Scaling Result Storage ● Conclusion
  • 5. 0 ms RPC Stream Processing Synchronous Milliseconds to minutes Later. Typically, hours Response Latency Batch Processing Spectrum of Processing
  • 9. Agenda ● Use cases in Stream Processing ●Typical Data Pipelines ● Scaling Data Ingestion ● Scaling Data Processing ○ Challenges in Scaling Data Processing ○ Walk-through of Apache Flink & Apache Samza ○ Observations on state & fault-tolerance ● Challenges in Scaling Result Storage ● Conclusion
  • 10. Typical Data Pipeline - Batch Ingestion Service HDFS Mappers Reducers HDFS/ HBase Query
  • 11. Typical Data Pipeline Ingestion Service HDFS Mappers Reducers HDFS/ HBase Data Ingestion Query Typical Data Pipeline - Batch
  • 14. Parallels in Streaming Ingestion Service HDFS Mappers Reducers HDFS/ HBase Processors Processors HDFS KV Store Partition 0 Partition 1 Partition N ... Data Ingestion Data Processing Result Storage / Serving Query Query
  • 15. Ingestion Service HDFS Mappers Reducers HDFS/ HBase Processors Processors HDFS KV Store Partition 0 Partition 1 Partition N ... Data Ingestion Data Processing Result Storage / Serving Query Query Parallels in Streaming
  • 16. Batch Streaming ● Data Processing on bounded data ● Acceptable Latency - order of hours ● Processing occurs at regular intervals ● Throughput trumps latency ● Horizontal scaling to improve processing time ● Data processing on unbounded data ● Low latency - order of sub-seconds ● Processing is continuous ● Horizontal scaling is not straightforward (stateful applications) ● Need tools to reason about time (esp. when re-processing stream)
  • 17. Agenda ● Use cases in Stream Processing ● Typical Data Pipelines ●Scaling Data Ingestion ● Scaling Data Processing ○ Challenges in Scaling Data Processing ○ Walk-through of Apache Flink & Apache Samza ○ Observations on state & fault-tolerance ● Challenges in Scaling Result Storage ● Conclusion
  • 18. Typical Data Ingestion Producers Partition 0 Partition 1 Partition 3 key=0 key=3 key=23 Stream A Consumer (host A) Consumer (host B) Partition 2 - Typically, streams are partitioned - Messages sent to partitions based on “Partition Key” - Time-based message retentionkey=10 Kafka Kinesis
  • 19. Scaling Data Ingestion Producers Partition 0 Partition 1 Partition 3 Stream A Consumer (host A) Consumer (host B) Partition 2 - Scaling “up” -> Increasing partitions - Changing partitioning logic re-distributes* the keys across the partitions Partition 4 key=0 key=10 key=23 key=3 Kafka Kinesis
  • 20. Scaling Data Ingestion Producers Partition 0 Partition 1 Partition 3 Stream A Consumer (host A) Consumer (host B) Partition 2 - Scaling “up” -> Increasing partitions - Changing partitioning logic re-distributes* the keys across the partitions - Consuming clients (includes stream processors) should be able to re-adjust! - Impact -> Over-provisioning of partitions in order to handle changes in load Partition 4 key=0 key=10 key=23 key=3 Kafka Kinesis
  • 21. Agenda ● Use cases in Stream Processing ● Typical Data Pipelines ● Scaling Data Ingestion ●Scaling Data Processing ○ Challenges in Scaling Data Processing ○ Walk-through of Apache Flink & Apache Samza ○ Observations on state & fault-tolerance ● Challenges in Scaling Result Storage ● Conclusion
  • 22. Scaling Data Processing ● Increase number of processing units → Horizontal Scaling
  • 23. Scaling Data Processing ● Increase number of processing units → Horizontal Scaling But more machines means more $$$ ● Impact NOT only CPU cores, but “large” (order of TBs) stateful applications impact network and disk!!
  • 24. Key Bottleneck in Scaling Data Processing ● Accessing State ○ Operator state ■ Read/Write state that is maintained during stream processing ■ Eg: windowed aggregation, windowed join ○ Adjunct state ■ To process events, applications might need to lookup related or ‘adjunct’ data.
  • 25. Repartitioner Assembler homepage_service_call feed_service_call profile_service_call pymk_service_call ... Homepage_service_call (tree id: 10) | | | Pymk_service_call (tree id: 10) | | | Profile_service_call (tree id: 10) | Feed_service_call (tree id: 10) Stateful Process! Service Calls Accessing Operator State: Assemble Call Graph (Partition events by “tree id”) (Aggregate events by “tree id”)
  • 26. Repartitioner Assembler homepage_service_call feed_service_call profile_service_call pymk_service_call ... Homepage_service_call (tree id: 10) | | | Pymk_service_call (tree id: 10) | | | Profile_service_call (tree id: 10) | Feed_service_call (tree id: 10) In-Memory Mapping Service Calls Accessing Operator State: Assemble Call Graph - In-memory structure to aggregate events until ready to output - Concerns: - Large windows can cause overflow! - Restarting job after a long downtime can increase memory pressure! (Partition events by “tree id”) (Aggregate events by “tree id”)
  • 27. Repartitioner Assembler homepage_service_call feed_service_call profile_service_call pymk_service_call ... Homepage_service_call (tree id: 10) | | | Pymk_service_call (tree id: 10) | | | Profile_service_call (tree id: 10) | Feed_service_call (tree id: 10) Service Calls Accessing Operator State: Assemble Call Graph Remote KV Store (operator state) (Partition events by “tree id”) (Aggregate events by “tree id”) Concerns: - Remote RPC is Slow!! (Stream: ~1 million records/sec ; DB: ~3-4K writes/sec) - Mutations can’t rollback! - Task may fail & recover - Change in logic!
  • 28. Accessing Operator State: Push Notifications B2 Online Apps Relevance Score User Action Data Task (Generate active notifications - filtering, windowed-aggregation, external calls etc) Notification System (Scheduler)
  • 29. Accessing Operator State: Push Notifications B2 Online Apps Relevance Score User Action Data Task (Generate active notifications - filtering, windowed-aggregation, external calls etc) Notification System (Scheduler) - Stream processing tasks consume from multiple sources - offline/online - Performs multiple operations - Filters information and buffers data for window of time - Aggregates / Joins buffered data - Total operator state per instance can easily grow to multiple GBs per Task
  • 30. Accessing Adjunct Data: AdQuality Updates Task AdClicks AdQuality Update Read Member Data Member Info Stream-to-Table Join (Look-up memberId & generate AdQuality improvements for the User)
  • 31. Accessing Adjunct Data: AdQuality Updates Task AdClicks AdQuality Update Read Member Data Member Info Stream-to-Table Join (Look-up memberId & generate AdQuality improvements for the User) Concerns: - Remote look-up Latency is high! - DDoS on shared store - MemberInfo
  • 32. Accessing Adjunct Data using Cache: AdQuality Updates Task AdClicks AdQuality Update Read Member Data Member Info Stream-to-Table Join (Maintain a cache of member Info & do local lookup)
  • 33. Accessing Adjunct Data using Cache: AdQuality Updates Task AdClicks AdQuality Update Read Member Data Member Info Stream-to-Table Join (Maintain a cache of member Info & do local lookup) Concerns: - Overhead of maintaining cache consistency based on the source of truth (MemberInfo) - Warming up the cache after the job’s downtime can cause temporary spike in QPS on the shared store
  • 34. Agenda ● Use cases in Stream Processing ● Typical Data Pipelines ● Scaling Data Ingestion ● Scaling Data Processing ○ Challenges in Scaling Data Processing ○Walk-through of Apache Flink & Apache Samza ○ Observations on state & fault-tolerance ● Challenges in Scaling Result Storage ● Conclusion
  • 36. Apache Flink: Processing ● Dataflows with streams and transformation operators ● Starts with one or more source and ends in one or more sinks
  • 37. Actor System Scheduler Checkpoint Coordinator Job Manager Task Slot Task Slot Task Manager Task Slot Actor System Network Manager Memory & I/O Manager Task Slot Task Slot Task Manager Task Slot Actor System Network Manager Memory & I/O Manager Stream Task Slot ● JobManager (Master) coordinates distributed execution such as, checkpoint, recovery management, schedule tasks etc. ● TaskManager (JVM Process) execute the subtasks of the dataflow, and buffer and exchange data streams ● Each Task Slot may execute multiple subtasks and runs on a separate thread. Apache Flink: Processing
  • 38. Apache Flink: State Management ● Lightweight Asynchronous Barrier Snapshots ● Master triggers checkpoint and source inserts barrier ● On receiving barrier from all input sources, each operator stores the entire state, acks the checkpoint to the master and emits snapshot barrier in the output
  • 39. Apache Flink: State Management Job Manager Task Manager HDFS Snapshot Store Task Manager Task Manager ● Lightweight Asynchronous Barrier Snapshots ● Periodically snapshot the entire state to snapshot store ● Checkpoint mapping is stored in Job Manager ● Snapshot Store (typically, HDFS) ○ operator state (windows/aggregation) ○ user-defined state (checkpointed)
  • 40. Apache Flink: State Management ● Operator state is primarily stored In-Memory or local File System ● Recently added RocksDB ● Allows user-defined operators to define state that should be checkpointed Job Manager Task Manager HDFS Snapshot Store Task Manager Task Manager
  • 41. Apache Flink: Fault Tolerance of State Job Manager Task Manager Snapshot Store Task Manager Task Manager Task Failure
  • 42. Apache Flink: Fault Tolerance of State Job Manager Task Manager HDFS Task Manager Task Manager ● Full restore of snapshot from last completed checkpointed state ● Continues processing after restoring from the latest snapshot from the store Full Restore
  • 43. Apache Flink: Summary ● State Management Primitives: ○ Within task, local state info is stored primarily in-memory (recently, rocksdb) ○ Periodic snapshot (checkpoints + user-defined state + operator state) written to Snapshot Store ● Fault-Tolerance of State ○ Full state restored from Snapshot Store
  • 44. Apache Flink: Observations ● Full snapshots are expensive for large states ● Frequent snapshots that can quickly saturate network ● Applications must trade-off between snapshot frequency and how large a state can be built within a task
  • 46. Apache Samza: Processing ● Samza Master handles container life-cycle and failure handling Samza Master Task Task Container Task Task Container
  • 47. Apache Samza: Processing ● Samza Master handles container life- cycle and failure handling ● Each container (JVM process) contains more than one task to process the input stream partitions Samza Master Task Task Container Task Task Container
  • 48. Apache Samza: State Management ● Tasks checkpoint periodically to a checkpoint stream ● Checkpoint indicates which position in the input from which processing has to continue in case of a container restart Samza Master Task Task Container Task Task Container Checkpoint Stream
  • 49. Apache Samza: State Management ● State store is local to the task - typically RocksDB (off-heap) and In- Memory (backed by a map) ● State store contains any operator state or adjunct state ● Allows application to define state through a Key Value interface Samza Master Task Task Container Task Task Container Checkpoint Stream
  • 50. Apache Samza: State Management ● State store is continuously replicated to a changelog stream ● Each store partition is mapped to a specific changelog partition Samza Master Task Task Container Task Task Container Changelog Stream Checkpoint Stream
  • 51. Apache Samza: Fault Tolerance of State Samza Master Task Task Container Task Task Container Checkpoint Stream Changelog Stream Container Failure Machine A Machine B
  • 52. Samza Master Task Task Container Task Task Container Checkpoint Stream Changelog Stream Re-allocated on different host! Machine A Machine X ● When container is recovered in a different host, there is no state available locally Apache Samza: Fault Tolerance of State
  • 53. Samza Master Task Task Container Task Checkpoint Stream Re-allocated on different host! Machine A Machine X ● When container comes up in a different host, there is no state available locally ● Restores from the beginning of the changelog stream -> Full restore! Task Task Container Apache Samza: Fault Tolerance of State
  • 54. Samza Master Task Task Container Task Task Container Checkpoint Stream Changelog Stream Container Failure ● State store is persisted to local disk on the machine, along with info on which offset to begin restoring the state from changelogMachine A Machine B Apache Samza: Fault Tolerance of State
  • 55. Samza Master Task Task Container Task Task Container Checkpoint Stream Changelog Stream Re-allocated on same host! Machine A Machine B ● Samza Master tries to re-allocate the container on the same host ● The feature where the Samza Master attempts to co-locate the task with their built-up state stores (where they were previously running) is called Host-affinity. Apache Samza: Fault Tolerance of State
  • 56. Samza Master Task Task Container Task Task Container Checkpoint Stream Changelog Stream Machine A Machine B Re-allocated on same host! ● Samza Master tries to re-allocate the container on the same host ● The feature where the Samza Master attempts to co-locate the task with their built-up state stores (where they were previously running) is called Host-affinity. ● If container is re-allocated on the same host, state store is partially restored from changelog stream (delta restore) Apache Samza: Fault Tolerance of State
  • 57. Samza AppMaster Task Task Container Task Task Container Checkpoint Stream Changelog Stream ● Once state is restored, checkpoint stream contains the correct offset for each task to begin processing Machine A Machine B Re-allocated on same host! Apache Samza: Fault Tolerance of State
  • 58. ● Persisting state on local disk + host- affinity effectively reduces the time- to-recover state from failure (or) upgrades and continue with processing Samza AppMaster Task Task Container Task Task Container Checkpoint Stream Changelog Stream Apache Samza: Fault Tolerance of State
  • 59. ● Persisting state on local disk + host- affinity effectively reduces the time- to-recover state from failure (or) upgrades and continue with processing ● Only a subset of tasks may require full restore, thereby, reducing the time to recover from failure or time to restart processing upon upgrades! Samza AppMaster Task Task Container Task Task Container Checkpoint Stream Changelog Stream Apache Samza: Fault Tolerance of State
  • 60. Apache Samza: Summary ● State Management Primitives ○ Within task, data is stored in-memory or on-disk using RocksDB ○ Checkpoint state stored in checkpoint-stream ○ User-defined and operator state continuously replicated in a changelog stream ● Fault-Tolerance of State ○ Full state restored by consuming changelog stream, if user-defined state not persisted on task’s machine ○ If locally persisted, only partial restore
  • 61. Apache Samza: Observations ● State recovery from changelog can be time-consuming. It could potentially saturate Kafka clusters. Hence, partial restore is necessary. ● Host-affinity allows for faster failure recovery of task states, and faster job upgrades, even for large stateful jobs ● Since checkpoints are written to a stream and state is continuously replicated in changelog, frequent checkpoints are possible.
  • 62. Agenda ● Use cases in Stream Processing ● Typical Data Pipelines ● Scaling Data Ingestion ● Scaling Data Processing ○ Challenges in Scaling Data Processing ○ Walk-through of Apache Flink & Apache Samza ○Observations on state & fault-tolerance ● Challenges in Scaling Result Storage ● Conclusion
  • 63. Comparison of State & Fault-tolerance Apache Samza Apache Flink Durable State RocksDB FileSystem (Recently added, RocksDB) State Fault Tolerance Kafka based Changelog Stream HDFS State Update Unit Delta Changes Full Snapshot State Recovery Unit Full Restore + Improved recovery with host-affinity Full Restore
  • 64. Agenda ● Use cases in Stream Processing ● Typical Data Pipelines ● Scaling Data Ingestion ● Scaling Data Processing ○ Challenges in Scaling Data Processing ○ Walk-through of Apache Flink & Apache Samza ○ Observations on state & fault-tolerance ●Challenges in Scaling Result Storage ● Conclusion
  • 65. Challenges in Scaling Result Storage / Serving ● Any fast KV store can handle very small (order of thousands) QPS compared to the rate of stream processing output rate (order of millions) ● Output store can DoS due to high-throughput
  • 66. Online +Async Processor Challenges in Scaling Result Storage / Serving Processor Downtime ~30 min Processor Restarts
  • 67. (Derived Data) Online Apps QueryServing Platform Stream Processing Scaling Result Storage / Serving Offline Processing Distributed Queue (Kafka) Bulk Load Change Stream
  • 68. Agenda ● Use cases in Stream Processing ● Typical Data Pipelines ● Scaling Data Ingestion ● Scaling Data Processing ○ Challenges in Scaling Data Processing ○ Walk-through of Apache Flink & Apache Samza ○ Observations on state & fault-tolerance ● Challenges in Scaling Result Storage ●Conclusion
  • 69. Conclusion ● Ingest/Process/Serve should be wholistically scalable to successfully scale stream processing applications ● The notion of a “locally” accessible state is great to scale stream processing applications for performance. It brings in the additional cost of making the state fault-tolerant
  • 70. References ● Apache Samza - http://samza.apache.org ● Apache Flink - http://flink.apache.org ● https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html ● https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html ● http://www.slideshare.net/JamieGrier/stateful-stream-processing-at-inmemory-speed ● https://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/ ● Apache Kafka - http://kafka.apache.org ● http://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html ● http://aws.amazon.com/streaming-data/
  • 71. Contribute! Exciting features coming up in Apache Samza: ● SAMZA-516 - Standalone deployment of Samza (independent of Yarn) ● SAMZA-390 - High-level Language for Samza ● SAMZA-863 - Multithreading in Samza Join us! ● Apache Samza - samza.apache.org ● Samza Hello World! - https://samza.apache.org/startup/hello-samza/latest/ ● Mailing List - dev@samza.apache.org ● JIRA - http://issues.apache.org/jira/browse/SAMZA